Dan Ellis: Research Projects:

Sound Browser/Visualization


One of the characteristics of research into sound analysis is the large amount of data involved. Sound is in itself a fairly voluminous data stream, and it is the subtlety with which information is represented within it that makes the whole research area so rich. In addition, techniques of statistical classification, and the procedures of empirical evaluation, often call for large collections of sound examples.

Confronted with such large volumes of data, it becomes critical to have efficient and convenient tools for inspecting and investigating the data. Particularly in the early stages of investigating a new research idea, it is valuable to be able to inspect the data ­ both at the input and output of algorithms ­ to get a 'feeling' for what is happening, to diagnose unexpected results, and to identify new opportunities for analysis and processing.

We use many different kinds of represtation in our work on sound analysis. In addition to the basic waveform, there are time-frequency representations of the spectrogram family, scalar or vector features resulting from analysis algorithms, and discrete labels for particular time ranges generated by classifiers. Ideally, we want to be able to visualize each of these data sets in the most convenient form, and to be able to make direct comparisons between data sets corresponding to the same underlying sound, even when their formats may be very different. Each new question may require a new form or configuration of the display elements, so the tool needs to be very flexible and easy to extend for new datatypes.

Although an effort to ennumerate all the possible dimensions of any dataset we may ever wish to work with, certain aspects seem universal. At the top level, there is the dimension of soundfile within a corpus: the sound visualization should probably be invoked from a kind of database manager that allows browsing among the different examples in a database, and that keeps track of the correspondence between the base waveform files and their analyses in the various other representations (which may occur in separater files, or as records in a single large archive file).

Within the sound display, the main organizing dimension is the time axis, since all sounds have a finite, usually explicit, duration. As in the sketch above, this can be used most successfully when various data displays share a common left-to-right time axis, and are displayed in a stacked, synchronized pattern. Other dimensions that may apply to multiple representations are the channel within a sound (e.g. for stereo, or for the 16-channel meeting recordings) and frequency (e.g. a separate plot of vertical 'slices' through a spectrogram, where frequency becomes the x-axis, with the vertical axis indicating intensity.

The different data formats that it will be necessary to support include:

.. and doubtless many others.

The development of an in-house visualization solution would include:

Some relevant, related ideas include:

 


Last updated: $Date: $
Dan Ellis <dpwe@ee.columbia.edu>