cascade time-frequency
linear prediction of sound textures

overview:

This page stands as a proof of concept for the cascade time-frequency linear prediction (CTFLP) algorithm as it appears in the paper:

M. Athineos and D. Ellis (2003). Sound Texture Modelling with Linear Prediction in both Time and Frequency Domains. Proc. ICASSP-03, Hong Kong, April 2003 (to appear). (4pp)

It demonstrates the modeling and time-stretching capabilities of CTFLP on rough noisy textures by contrasting it with existing schemes.

The column TDLP presents the results of a standard noise-excited time domain linear prediction (TDLP) for various stretching factors.
The column CTFLP presents the results of our noise-excited cascade time-frequency linear prediction for various stretching factors.
The column PVoc presents the results of Phase Vocoder stretching (note that this method by definition is not noise-excited).
The column SR presents the results of simple stretching by resampling (note that this method by definition is also not noise-excited).

In order to be fair to TDLP, CTFLP and PVoc, the analysis window was kept the same across all methods and all sounds with length 512 samples (23ms @ 22050) and 50% overlap. Moreover, we used 50 poles per frame for TDLP and 40 time- / 10 frequency- domain poles per frame for CTFLP in all examples which means we are fair on the pole rate too. Note that even better resynthesis results can be achieved for CTFLP by fine tuning the pole allocation but this is not the purpose of this test.

First one can listen to the 1x TDLP resynthesis and observe what a conventional, spectral envelope based, noise-exited resynthesis scheme can achieve. Then by listening to the 1x CTFLP, the improvement in quality of the resynthesis can be immediately noticed. Remember that one way to compare the two schemes is by thinking that we are taking 10 poles from the spectral envelope fit and we allocate them in constructing a temporal envelope. The roughness of the textures is now preserved.

The second step is to compare the stretched versions where the temporal resolution of the resynthesis is in a sense "magnified". It easy now to observe the time smearing artifacts of spectral envelope based techniques like the TDLP and the Phase Vocoder (PVoc). In contrast, the proposed CTFLP method preserves the intelligibility of individual microtransients up to 8x stretches.

(Click on the name of each example to download the original sound. The format is .wav, PCM 16-bit, 22050 Hz Mono)

examples:

Applause: A large number of people clapping rhythmically. A rough repetitive noise texture full of microtransients.

Stretch	TDLP	CTFLP	PVoc	SR
1x	applause_tdlp	applause_ctflp
2x	applause_tdlp_2	applause_ctflp_2	applause_pvoc_2	applause_sr_2
4x	applause_tdlp_4	applause_ctflp_4	applause_pvoc_4	applause_sr_4
8x	applause_tdlp_8	applause_ctflp_8	applause_pvoc_8	applause_sr_8

Bottle: Pouring soda off a bottle. A lot of microtransients in the beginning that slowly transform to smooth noise.

Stretch	TDLP	CTFLP	PVoc	SR
1x	bottle_tdlp	bottle_ctflp
2x	bottle_tdlp_2	bottle_ctflp_2	bottle_pvoc_2	bottle_sr_2
4x	bottle_tdlp_4	bottle_ctflp_4	bottle_pvoc_4	bottle_sr_4
8x	bottle_tdlp_8	bottle_ctflp_8	bottle_pvoc_8	bottle_sr_8

Fire: Crackling fire texture. A background of fire "noise floor" with occasional wood crackles.

Stretch	TDLP	CTFLP	PVoc	SR
1x	fireplace_tdlp	fireplace_ctflp
2x	fireplace_tdlp_2	fireplace_ctflp_2	fireplace_pvoc_2	fireplace_sr_2
4x	fireplace_tdlp_4	fireplace_ctflp_4	fireplace_pvoc_4	fireplace_sr_4
8x	fireplace_tdlp_8	fireplace_ctflp_8	fireplace_pvoc_8	fireplace_sr_8

Type: The noise of a fast typewriter. Well defined key stroke transients that are resynthesized and stretched very accurately.

Stretch	TDLP	CTFLP	PVoc	SR
1x	typewriter_tdlp	typewriter_ctflp
2x	typewriter_tdlp_2	typewriter_ctflp_2	typewriter_pvoc_2	typewriter_sr_2
4x	typewriter_tdlp_4	typewriter_ctflp_4	typewriter_pvoc_4	typewriter_sr_4
8x	typewriter_tdlp_8	typewriter_ctflp_8	typewriter_pvoc_8	typewriter_sr_8

Footsteps 1: Footsteps on a creaky floor. Microtransients after/during each footstep. The rug on the floor contributes to some smooth noise.

Stretch	TDLP	CTFLP	PVoc	SR
1x	rug_tdlp	rug_ctflp
2x	rug_tdlp_2	rug_ctflp_2	rug_pvoc_2	rug_sr_2
4x	rug_tdlp_4	rug_ctflp_4	rug_pvoc_4	rug_sr_4
8x	rug_tdlp_8	rug_ctflp_8	rug_pvoc_8	rug_sr_8

Footsteps 2: Footsteps in the mud.

Stretch	TDLP	CTFLP	PVoc	SR
1x	mud_tdlp	mud_ctflp
2x	mud_tdlp_2	mud_ctflp_2	mud_pvoc_2	mud_sr_2
4x	mud_tdlp_4	mud_ctflp_4	mud_pvoc_4	mud_sr_4
8x	mud_tdlp_8	mud_ctflp_8	mud_pvoc_8	mud_sr_8

Rain: Rain texture with occasional louder water drops.

Stretch	TDLP	CTFLP	PVoc	SR
1x	rain_tdlp	rain_ctflp
2x	rain_tdlp_2	rain_ctflp_2	rain_pvoc_2	rain_sr_2
4x	rain_tdlp_4	rain_ctflp_4	rain_pvoc_4	rain_sr_4
8x	rain_tdlp_8	rain_ctflp_8	rain_pvoc_8	rain_sr_8

[Home]

cascade time-frequency linear prediction of sound textures

overview:

examples:

cascade time-frequency
linear prediction of sound textures