This page contains audio examples and a command-line tool to illustrate the time stretching technique presented in the following paper:
Nicolas Juillerat and Béat Hirsbrunner, "Audio Time Stretching with an Adaptive Multiresolution Phase Vocoder", IEEE International Conference on Acoustics, Speech and Signal Processing, March 2017, New Orlean
Audio time stretching (phase vocoder) suffers from the time-frequency trade-off.
Illustration (slowing down by 1.5):
Original | High frequency resolution | Low frequency resolution | Best trade-off |
Good on steady-state sound, but transient smearing | Good on transients (percussive), poor on steady-state | Perfectible |
The idea: combine the two!
Transients | Steady-state | ||
1. Split transients / steady state | |||
2. Process with appropriate time-frequency resolution |
Low frequency resolution |
High frequency resolution |
These steps are far from being straightforward to implement. See paper for details and proposed solution. |
3. Mix! |
Proposed technique, final result:
|
Key ideas:
The following table contains audio examples (taken from [1]) corresponding to the different steps of the algorithm. A time-stretching factor of 1.5 has been used.
Input signal |
x[t]
|
||
After transience splitting (TS) |
x0[t]
|
x1[t]
|
x2[t]
|
After the master-slave phase vocoder (MS-PV) |
v0[t]
|
v1[t]
|
v2[t]
|
After the magnitude correction (MC) steps |
y0[t]
|
y1[t]
|
y2[t]
|
Result (proposed technique) |
y[t]
|
||
Unmodified phase vocoder (for comparison) |
One of the most relevant step is the magnitude correction of the most transient components, that transforms v2[t] into y2[t] (rightmost column) and "fixes" the smearing.
The following table shows various music excerpts, time stretched using different techniques for comparison. A time stretching factor of 1.5 is used. The techniques are:
Original excerpt | Phase Vocoder | Proposed technique | Rubberband | Radius |
Instrumental [1] |
|
|
|
|
Electronic [2] |
|
|
|
|
Pop [3] |
|
|
Note 1 |
Note 1 |
Classical [4] |
|
|
Note 2 |
Note 2 |
New age [5] |
|
|
|
|
This tables show the same musical excerpt ("New Age" from the previous table), time stretched using different factors.
Factor | Phase vocoder | Proposed technique | Radius |
0.7 | |||
1.1 | |||
2.1 |
Download the command-line tools (as a zip archive) written in Java that can time stretch audio files using the proposed technique.
The zip archive contains:
Usage (proposed technique):
java -jar Adaptive-ts.jar <input file> <time-stretch ratio> <output-file>
Usage (phase vocoder):
java -jar Vocoder-ts.jar <input file> <time-stretch ratio> <output-file>
Examples:
java -jar Adaptive-ts.jar excerpts/NewAge.mp3 1.5 result-adaptive.wavjava -jar Vocoder-ts.jar excerpts/NewAge.mp3 1.5 result-vocoder.wav
Notes:
Original [6] | Non Adaptive (Phase vocoder) | Adaptive (Proposed) | Adaptive + Pyramid Resolutions |
Based on the following paper:
Nicolas Juillerat, "Audio Time Stretching with Controllable Phase Coherence", 142nd Audio Engineering Society Convention, May 2017, Berlin
Original [7] |
Non adaptive (Phase vocoder) | Adaptive (Proposed) |
No Phasiness Reduction | ||
With Phasiness Reduction |
The musical excerpts are used under fair use from the following sources:
[1] Philip Nixon, "Horror Level Soundtrack", Oscar, Flair Software, 1993
[2] Jean-Michel Jarre, "Oxygène, part. 4", Oxygène, Disques Dreyfus, 1977
[3] Sia Furler, "Chandelier", 1000 Forms of Fear, RCA Records, 2014
[4] Ramin Djawadi, "Game of Throne, Main Title", Game of Throne: Season 3, WaterTower Music, 2013
[5] David Arkenstone, "Trail of Tears", Return of the Guardians, Narada, 1996
[6] Ethan Winer, "Men at Work", ethanwiner.com/e-tunes.html
[7] John Lennon, "Imagine", Imagine (2010 - Remaster), EMI Records LTD, 2010