These examples each consist of a single word from the 11-word vocabulary "zero" to "nine" plus "oh". Each word is repeated twice by 5 female and 5 male speakers. I have indicated a 'training' and 'test' set, so you could use this to build a very simple speech recognizer, using the first 3 speakers of each gender to set the parameters of your classifier, then test its performance on the remaining two male and/or two female speakers.
Each row in the following table corresponds to a particular speaker, with the words arranged in columns. For each word, there is an "A" repetition and a "B" repetition.
Speaker | oh | zero | one | two | three | four | five | six | seven | eight | nine | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MAE (male) | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B |
MBD (male) | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B |
MCB (male) | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B |
FAC (female) | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B |
FBH (female) | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B |
FCA (female) | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B |
Speaker | oh | zero | one | two | three | four | five | six | seven | eight | nine | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MDL (male) | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | MEH (male) | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | FDC (female) | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | FEA (female) | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B | A | B |
The spoken digits are from the TIDIGITS corpus of several thousand continuous digits utterances, which also include isolated digits for each of their 55 male and 55 female training speakers.