Audio Time Stretching with an Adaptive Multiresolution Phase Vocoder

Companion page: audio samples and implementation

This page contains audio examples and a command-line tool to illustrate the time stretching technique presented in the following paper:

Nicolas Juillerat and Béat Hirsbrunner, "Audio Time Stretching with an Adaptive Multiresolution Phase Vocoder", IEEE International Conference on Acoustics, Speech and Signal Processing, March 2017, New Orlean

Summary of the idea and goal

Audio time stretching (phase vocoder) suffers from the time-frequency trade-off.

Illustration (slowing down by 1.5):

Original	High frequency resolution	Low frequency resolution	Best trade-off

	Good on steady-state sound, but transient smearing	Good on transients (percussive), poor on steady-state	Perfectible

The idea: combine the two!

	Transients	Steady-state
1. Split transients / steady state
2. Process with appropriate time-frequency resolution	Low frequency resolution	High frequency resolution	These steps are far from being straightforward to implement. See paper for details and proposed solution.
3. Mix!	Proposed technique, final result:

Steps of the algorithm

Key ideas:

Split the audio into multiple layers corresponding to "degrees of transience"
Use single (high frequency) resolution for phase
Use a master-slave phase vocoder to preserve phase coherence between layers
Use multiple, adaptive resolutions for magnitude
Combine the single resolution phase with the multiresolution magnitude

The following table contains audio examples (taken from [1]) corresponding to the different steps of the algorithm. A time-stretching factor of 1.5 has been used.

Input signal	x[t]
After transience splitting (TS)	x₀[t]	x₁[t]	x₂[t]
After the master-slave phase vocoder (MS-PV)	v₀[t]	v₁[t]	v₂[t]
After the magnitude correction (MC) steps	y₀[t]	y₁[t]	y₂[t]
Result (proposed technique)	y[t]
Unmodified phase vocoder (for comparison)

One of the most relevant step is the magnitude correction of the most transient components, that transforms v₂[t] into y₂[t] (rightmost column) and "fixes" the smearing.

Comparisons with other approaches

The following table shows various music excerpts, time stretched using different techniques for comparison. A time stretching factor of 1.5 is used. The techniques are:

Phase Vocoder: An unmodified phase vocoder (without transient handling), using scaled phase locking. This illustrates audio time stretching in the absence of any transient processing scheme
Proposed technique: The proposed technique. The phase vocoder part of the technique also uses scaled phase locking.
Rubberband: An open source time-stretching tool. It uses an approach of resetting the phase and shifting the transients in time without modifying them (by temporarily setting the time stretching factor to 1 during the transients). Note that it also uses a different phase locking scheme (both between components and between channels), and as such may result in a different global timbre.
Radius: A commercial time-stretching tool by the iZotope company, considered here as a "state of the art" reference. The version that ships in the Adobe Audition CS6 product was used, with the default values for the fine-tuning parameters. While the exact algorithm is unknown, it generally exhibits the same properties of algorithms that process transients in time only, such as Rubberband.

*Original excerpt*	Phase Vocoder	Proposed technique	Rubberband	Radius
Instrumental [1]
Electronic [2]
Pop [3]			Note 1	Note 1
Classical [4]			Note 2	Note 2
New age [5]

Notes

The first voice note (A long "I" from the sentence "I'm gonna swing from the chandelier") sounds "hashed" by transients occurring at the same time, because they are detected and processed in time only. The problem is also present at other places but less audible. The proposed technique mitigates this problem by detecting and processing transients in both time and frequency
This excerpt mostly has "hidden" transients (mixed with steady components). Long string notes sound slightly "hashed" when hidden transients occurring during them are treated like transients. The proposed technique mitigates this problem using several degrees of transience

Different time stretching factors

This tables show the same musical excerpt ("New Age" from the previous table), time stretched using different factors.

Factor Phase vocoder Proposed technique Radius

0.7

1.1

2.1

Implementation

Download the command-line tools (as a zip archive) written in Java that can time stretch audio files using the proposed technique.

The zip archive contains:

Adaptive-ts.jar: command-line tool to time stretch audio using the proposed technique (adaptive multiresolution phase vocoder)
Vocoder-ts.jar: command-line tool to time stretch audio using the unmodified phase vocoder (for comparison)
excerpts (folder): the audio excerpts (untransformed) of the above tables. They can be used to play with the two command-line tools

Usage (proposed technique):

java -jar Adaptive-ts.jar <input file> <time-stretch ratio> <output-file>

Usage (phase vocoder):

java -jar Vocoder-ts.jar <input file> <time-stretch ratio> <output-file>

Examples:

java -jar Adaptive-ts.jar excerpts/NewAge.mp3 1.5 result-adaptive.wav

java -jar Vocoder-ts.jar excerpts/NewAge.mp3 1.5 result-vocoder.wav

Notes:

For the command-line tools to work, you need to download and install a Java Runtime Environment (JRE), version 6 or greater
The tools can only be used from the command line. There is no graphical user interface
The tools can only save the result in .wav, .aiff or .au format
The tools only accept 44.1 kHz or 48 kHz audio
The adaptive version of the tool does not implement the further improvements suggested in chapter 6 of the paper, apart from stereo handling.

Further improvements (not in the paper)

Pyramid Resolutions

Original [6] Non Adaptive (Phase vocoder) Adaptive (Proposed) Adaptive + Pyramid Resolutions

Phasiness Reduction

Based on the following paper:

Nicolas Juillerat, "Audio Time Stretching with Controllable Phase Coherence", 142^nd Audio Engineering Society Convention, May 2017, Berlin

Original [7]	Non adaptive (Phase vocoder)	Adaptive (Proposed)
No Phasiness Reduction
With Phasiness Reduction

Music References

The musical excerpts are used under fair use from the following sources:

[1] Philip Nixon, "Horror Level Soundtrack", Oscar, Flair Software, 1993

[2] Jean-Michel Jarre, "Oxygène, part. 4", Oxygène, Disques Dreyfus, 1977

[3] Sia Furler, "Chandelier", 1000 Forms of Fear, RCA Records, 2014

[4] Ramin Djawadi, "Game of Throne, Main Title", Game of Throne: Season 3, WaterTower Music, 2013

[5] David Arkenstone, "Trail of Tears", Return of the Guardians, Narada, 1996

[6] Ethan Winer, "Men at Work", ethanwiner.com/e-tunes.html

[7] John Lennon, "Imagine", Imagine (2010 - Remaster), EMI Records LTD, 2010

Factor	Phase vocoder	Proposed technique	Radius
0.7
1.1
2.1

Original [6]	Non Adaptive (Phase vocoder)	Adaptive (Proposed)	Adaptive + Pyramid Resolutions