"Deformable Spectrograms"
Manuel Reyes-Gomez           Nebojsa Jojic DanielP.W.Ellis
ColumbiaUniversity        Microsoft Research ColumbiaUniversity
 

II.- SPECTRAL DEFORMATION MODEL
   VIDEO 1.
Figure 1 shows a narrow band spectrogram representation of a speech signal, where each
column depicts the energy content across frequency in a short-time window, or time-frame.
The value in each cell is actually the log-magnitude of the short-time Fourier transform.


                      Figure 1

Using the subscript C to designate current and P to indicate previous, the model predicts
a patch of Nc time-frequency bins centered at the kth frequency bin of frame t as a
``transformation'' of a  patch of Np bins around the kth bin of frame t-1.

Figure 1, shows an example with Nc = 3 and Np = 5 to illustrate the intuition behind this
approach. The selected patch in frame t can be seen as a close replica of an  upward shift
of part of the patch highlighted in frame t-1.  This ``upward'' relationship can be captured by a
transformation matrix such as the one shown in the figure.
The patch in frame t-1 is larger than the patch in frame t to permit both upward and
downward motions.

The generative graphical model for a single layer is depicted in figure 2.


                             Figure 2: a)Graphical model;  b) Graphical simplification

X nodes correspond to the observations, and T nodes to the tranformations at each frequency
bin. At each bin, the local likelihood potentials involve: the Nc bins used in the current frame,
the Np bins used in the previous frame and the set of all possible transformation matrix defined
by T. Please read the paper for complete details.

Inference is efficiently performed via loopy belief propagations. Once the posteriors of the
transformation  nodes are estimated, we can find the "expected" transformation maps an
appealing description of the harmonic's dynamics, as can be observed in figure 3.

In these panels, the links between three specific time-frequency bins and their corresponding
transformations on the map are highlighted. Bin 1 is described by a steep downward
transformation, while bin 3 also has a downward motion but is described by a less steep
transformation, consistent with the dynamics visible in the spectrogram. Bin 2, in other hand,
is described by a steep upwards transformation.


                             Figure 3.- Tranformation Map.

DEMO INTRODUCTION

We have built a real time demo that performs a variety of applications using this model.

The user can change the different parameters of the model on the user interfase, (Figure 4).
There are several panels and function buttons that we will explain using different applications.
The information displayed on each panel changes with each application.

We will present ten short videos of the demo for each application. Before each video we
will describe the application, the information displayed in each panel and the functionality of
the buttons.

Description of Video 1.

We first present an instance on the demo performing basic estimation of the harmonics
transformation maps followed by a harmonics tracking application.

Figure 4, shows a typical "screen shot" of the demo for this application. The figure displays
three panels. Panel 1 displays the signal to be processed.Panel 2 shows the most likely
transformation obtained from the local likelihood potential.Here, as in the transformations maps,
the color relates to the motion present in the signal, however the structure is not clearly
defined as in the transformations maps.Also notice the total lack of a clear structure on the
silent regions of the signal. Panel 3 shows the transformation  maps obtained after each
complete iteration.

Each complete iteration consists of complete belief propagation messages passes through all
the vertical chains.Each vertical chain consists of all the coefficients for a given frame, followed
by the complete belief propagation passes on all the horizontal chains, each horizontal frame
consist of all the frames for a given coefficient.The belief propagation rules for this chains can be
implemented using efficient forward/backward, upward/downward recursions, see extended
paper for details. The strength of the belief propagation in each direction is controlled by transition
potentials in each direction. Parameters "Ver. Factor" and "Hor. Factor" affect the probability of
switching to a different transform, a higher value on this factor results in "smother" transformation
patterns on that direction. The video also shows the effect of changes on thosefactors.

Once the transformation maps are estimated, some interesting applications can be performed,
like tracking harmonics. The user "clicks" in a certain region of the spectrogram, and if the
"Track H" button is pushed, the demo shows the history of that particular time-frequency bin.

VIDEO 1. - Harmonics transformations maps and harmonics
tracking application.

CLICK ON THE SCREEN TO ACTIVE THE VIDEO !

GO BACK TO INDEX
PREVIOUS
NEXT