"Deformable Spectrograms"
Manuel Reyes-Gomez           Nebojsa Jojic DanielP.W.Ellis
ColumbiaUniversity        Microsoft Research ColumbiaUniversity

I.- INTRODUCTION

In many audio signals including speech and musical instruments, there is high correlation
between adjacent frames of their spectral representation. Our approach consists of exploiting
this correlation so that explicit models are required for those frames that cannot be accurately
predicted from their context.

Our model captures the general properties of such audio sources by modeling the evolution
of their harmonics components. Using the common source-filter model for such signals, we
devise a layered generative model that describes these two components in separate layers:
one for the excitation harmonics, and another for resonances such as vocal tract formants.

Our approach explicitly models the self-similarity and dynamics of each layer by fitting the
log-spectra in frame t with a set of transformations of the log-spectra in frame t-1. As a result,
we do not require separate states for every possible spectral configuration, but only a limited
set of "sharp" states that can still cover the full spectral variety of a source via such
transformations. This approach is thus suitable for any time series data with high correlation
between adjacent observations.

We will first introduce a model that captures the spectral deformation field of the speech
harmonics, and show how this can be exploited to interpolate missing observations. Then, we
introduce the two-layer model that separately models the deformation fields for harmonic
and formant resonance components, and show that such a separation is necessary to
accurately describe speech signals through examples of the missing data scenario with
one and two layers.
Then we will present the complete model including the two deformation fields and the
"sharp" states. This model, with only a few states and both deformation fields, can
accurately reconstruct the signal.

Finally, we briefly describe a range of existing applications including semi-supervised source
separation, and discuss the model's possible application to unsupervised source separation.

GO BACK TO INDEX
NEXT