This page contains another example of sound examples recorded from multiple channels at the same time. This is an interesting case because sometimes it allows us to distinguish between different sound sources on the basis of the different timing and amplitude levels at each sensor.
The recordings here were made by microphones in the left and right ears of a "dummy head", so they are good approximations to the two channels of information available to human listeners. They were recorded at ATR in Japan, in collaboration with some researchers from Sheffield University, hence the name. You can find out more from the ShATR homepage, including details of the recordings in this paper.
The session consisted of five people sitting around a table, with the dummy head microphone in between them. Each soundfile is a stereo recording (sampled at 48 kHz) of 1 mimute duration; there are 5 of them covering the first 5 minutes of the session. Minute 3 has a good spread of different speakers, so is a good one to work with.
These data have also been transcribed; each of the five participants is transcribed separately, with the files containing one line per 'sound event'; the first part of the line describes the event (e.g. the transcription of the words spoken), and the last part of the line consists of two numbers: the sample index of the beginning of the event (i.e. the time in seconds multiplied by the sampling rate of 48000 samples/sec), then the duration of the event, also in 1/48000 sec units. There are separate transcriptions for the words spoken by each participant, and for the general class of sounds from each participant (speech, "um"s, breath) which includes background sounds (such as door slams) common to all participants.