Homework Assignment #3

  1. [20 points] Record your voide by reading the material provided by TA. Submit these audio clips to TA by this Thursday (March 23). These dataset will include two sets - training and testing. Two kinds of testing sets will be provided ¨C the same document or different document.

    Recorded Audio File:
       
    VoiceRecordings.zip

    For those who have devices to record the files by yourselves, please follow the following format during the recording:
        Sampling rate: 16KHz
        Channel: mono
        Bits per sample: 16bit
        File Format: Waveform (.wav)
    Materials:
      [LongTrain] I have a dream that one day this nation will rise up and live out the true meaning of its creed--we hold these truths to be self-evident that all men are created equal.
    I have a dream that one day on the red hills of Georgia the sons of former slaves and the sons of former slave owners will be able to sit down together at the table of brotherhood.
    I have a dream today!
      [ShortTrain] I have a dream today!
      [Test] I have a dream that my four little children will one day live in a nation where they will not be judged by the color of their skin but by the content of their character.
    Let freedom ring from the mighty mountains of New York.
    Please find a quite place to record the files.
    Read and record [LongTrain] twice. Name them as YourName_LongTrain_#.wav.
    Read and record [ShortTrain] twice. Name them as YourName_ShortTrain_#.wav.
    Read and record [Test] twice. Name them as YourName_Test_#.wav.
    Package all of these files into one .zip or .rar file named YourName_voice.zip and send it to TA.

  2. [80 points] Implement the speaker model training and speaker verification. Based on your group work status, you will be graded using different methods.

    If you are going to work on this homework by yourself:

      [50 points] Implement LPC or MFCC coefficient extractor. Apply it on the training and the testing data. Train your own speaker.
      [30 points] Use the Nearest Neighborhood Method based on the Hausdoff distance to do the speaker verification. Illustrate the system performance. For this purpose, treat all of the feature vectors from the same student as a set, and calcuate the distance between it and other sets. Label the test data set using the label of the training set that has a minimum distance with it.
    Detailed instruction (PDF)

    If you are two people working on this homework together

      [50 points] Implement LPCC and MFCC extractors. Apply them on the training and the testing data. Train both team membersĄŻ models. For each of the member, there should be your own model and an auti-you speaker model.
      [30 points] Use the Nearest Neighborhood Method based on the Hausdoff distance to do the speaker verification. Illustrate the system performance.
     

    If three people are going to work together

      [40 points] Implement LPCC and MFCC extractors. Apply them on the training and the testing data. Train both team members' models. For each of the member, there should be your own model and an auti-you speaker model.
      [20 points] Use the Nearest Neighborhood Method based on the Hausdoff distance to do the speaker verification. Illustrate the system performance.
      [20 points] Use the Gaussian Mixture Models for training and testing.
     
  3. [Bonus 10 points] Repeat the processes. Conduct experiments on the speaker identification.

    Notes: [1] For the bonus points you need to run experiment to validate the identity of other students. For this purpose, you need to train specific model for each student, and apply it on the student.
       
  4. [Bonus 10 points] Beside the Hausdoff distance, use Nearest Neghborgood method with centroid distance, and compare the performance of these two distance metrics. In this case, firstly calcuate out the centroid feature vector for each set. Then apply Euclidean distance to label the set.

  5. [Bonus 30 points] For single person or two people, apply Gaussian Mixture Models on testing and traning; for three people, apply Support Vector Machines for training and testing.