Audio Time Stretching with an Adaptive Multiresolution Phase Vocoder

Companion page: audio samples and implementation

 

This page contains audio examples and a command-line tool to illustrate the time stretching technique presented in the following paper:

Nicolas Juillerat and Béat Hirsbrunner, "Audio Time Stretching with an Adaptive Multiresolution Phase Vocoder", IEEE International Conference on Acoustics, Speech and Signal Processing, March 2017, New Orlean

Summary of the idea and goal

Audio time stretching (phase vocoder) suffers from the time-frequency trade-off.

Illustration (slowing down by 1.5):

Original High frequency resolution Low frequency resolution Best trade-off
  Good on steady-state sound, but transient smearing Good on transients (percussive), poor on steady-state Perfectible

The idea: combine the two!

Transients Steady-state
1. Split transients / steady state
2. Process with appropriate time-frequency resolution
Low frequency resolution

High frequency resolution
These steps are far from being
straightforward to implement.
See paper for details
and proposed solution.
3. Mix! Proposed technique, final result:

Steps of the algorithm

Key ideas:

The following table contains audio examples (taken from [1]) corresponding to the different steps of the algorithm. A time-stretching factor of 1.5 has been used.

Input signal x[t]
After transience splitting (TS) x0[t]
x1[t]
x2[t]
After the master-slave phase vocoder (MS-PV) v0[t]
v1[t]
v2[t]
After the magnitude correction (MC) steps y0[t]
y1[t]
y2[t]
Result (proposed technique) y[t]
Unmodified phase vocoder (for comparison)

One of the most relevant step is the magnitude correction of the most transient components, that transforms v2[t] into y2[t] (rightmost column) and "fixes" the smearing.

Comparisons with other approaches

The following table shows various music excerpts, time stretched using different techniques for comparison. A time stretching factor of 1.5 is used. The techniques are:

Original excerpt Phase Vocoder Proposed technique Rubberband Radius
Instrumental [1]
 
 
 
 
Electronic [2]
 
 
 
 
Pop [3]
 
 
  Note 1
  Note 1
Classical [4]
 
 
  Note 2
  Note 2
New age [5]
 
 
 
 

Notes

  1.  The first voice note (A long "I" from the sentence "I'm gonna swing from the chandelier") sounds "hashed" by transients occurring at the same time, because they are detected and processed in time only. The problem is also present at other places but less audible. The proposed technique mitigates this problem by detecting and processing transients in both time and frequency
  2.  This excerpt mostly has "hidden" transients (mixed with steady components). Long string notes sound slightly "hashed" when hidden transients occurring during them are treated like transients. The proposed technique mitigates this problem using several degrees of transience

Different time stretching factors

This tables show the same musical excerpt ("New Age" from the previous table), time stretched using different factors.

Factor Phase vocoder Proposed technique Radius
0.7
1.1
2.1

Implementation

Download the command-line tools (as a zip archive) written in Java that can time stretch audio files using the proposed technique.

The zip archive contains:

Usage (proposed technique):

java -jar Adaptive-ts.jar <input file> <time-stretch ratio> <output-file>

Usage (phase vocoder):

java -jar Vocoder-ts.jar <input file> <time-stretch ratio> <output-file>

Examples:

java -jar Adaptive-ts.jar excerpts/NewAge.mp3 1.5 result-adaptive.wav
java -jar Vocoder-ts.jar excerpts/NewAge.mp3 1.5 result-vocoder.wav

Notes:


Further improvements (not in the paper)

Pyramid Resolutions

Original [6] Non Adaptive (Phase vocoder) Adaptive (Proposed) Adaptive + Pyramid Resolutions

Phasiness Reduction

Based on the following paper:

Nicolas Juillerat, "Audio Time Stretching with Controllable Phase Coherence", 142nd Audio Engineering Society Convention, May 2017, Berlin (to appear)

Original [7]
Non adaptive (Phase vocoder) Adaptive (Proposed)
No Phasiness Reduction
With Phasiness Reduction

Music References

The musical excerpts are used under fair use from the following sources:

[1] Philip Nixon, "Horror Level Soundtrack", Oscar, Flair Software, 1993
[2] Jean-Michel Jarre, "Oxygène, part. 4", Oxygène, Disques Dreyfus, 1977
[3] Sia Furler, "Chandelier", 1000 Forms of Fear, RCA Records, 2014
[4] Ramin Djawadi, "Game of Throne, Main Title", Game of Throne: Season 3, WaterTower Music, 2013
[5] David Arkenstone, "Trail of Tears", Return of the Guardians, Narada, 1996
[6] Ethan Winer, "Men at Work", ethanwiner.com/e-tunes.html
[7] John Lennon, "Imagine", Imagine (2010 - Remaster), EMI Records LTD, 2010


Copyright (C) 2016 - 2017, Nicolas Juillerat, Béat Hirsbrunner