Audio Time Stretching with Controllable Phase Coherence

Companion page

 

This page contains audio examples to illustrate the time stretching technique presented in the following paper:

Nicolas Juillerat, "Audio Time Stretching with Controllable Phase Coherence" , 142nd Audio Engineering Society Convention, May 2017, Berlin (to appear)


Overview

Introduction

Audio Time Stretching is a digital audio effet that changes the speed (slows down or speeds up) of an audio signal without affecting its pitch. This is a difficult problem and all existing techniques produce undesired artifacts to some degree.

The following table shows a musical excerpt, and a version that has been slowed down by 1.5 (150% longer) using a state of the art (see below) audio time stretching algorithm:

Original Signal [3] Time stretched by 1.5

Hint: throughout this page, playing any audio excerpt will automatically stop the previous one if not finished (unless JavaScript is disabled)

Most audio time stretching techniques fall into one of the following two categories:

These two approaches results in very different properties, both in terms of artifacts and qualities. The following table illustrates the point, notably with transformed versions in which two typical artifacts of each approach have been algorithmically exagerated:

Original Time Domain Phase Vocoder
Best for: Voice, solo, small changes Polyphonic music, large changes
Time Domain (exagerated artifacts) Phase Vocoder (exagerated artifacts)


Typical Artifacts: Stuttering (doubling of drums)
Low frequencies warbling (out of tune / sliding frequencies)
Loss of "presence", smearing of drums, reverberation
Modulations on low frequencies, thin sounding

Note: The pairs of exagerated artifacts have been simply (!) produced by processing the audio with a too high (184 ms) and a too low (46 ms) frequency resolution instead of the optimal one (92 ms).

Goal

The goal of the proposed algorithm is to be anywhere between time domain techniques and the phase vocoder, by the mean of a single control parameter M.

As such, it is possible to adjust the processing to the characteristics of the audio signal being transformed.

Final Algorithm, Compared

The following tables shows the presented technique with various values of M, on various audio signals, compared to the phase-locked vocoder and time-domain WSOLA with comparable time-frequency resolutions.

The proposed algorithm with M=683 is generally very close to the phase-locked vocoder. The only difference (less noise attenuation) is barely audible.

  Electronic [1] Pop [3] Classical [4] Latino [5]
Original Signal
Time-domain WSOLA
Proposed algorithm, M=2
Proposed algorithm, M=40
Proposed algorithm, M=100
Proposed algorithm, M=250
Proposed algorithm, M=683
Phase-Locked Vocoder

Different time stretching ratio

Obviously, the time stretching ratio is not limited to 1.5:

Original signal 0.7 (70% of length) 1.1 (110% of length) 1.5 (150% of length)

Selected Illustrations

This section illustrates some of the techniques that are described in the paper.

Steps of the algorithm

The following table contains audio examples corresponding to the successive modifications of the phase-locked vocoder. A time-stretching ratio of 1.5 has been used, and M=100 bands.

Original signal [1] (1) Phase-Locked Vocoder (N=4096, Overlap 8)
(Section 3 of paper)
(2) With inter-peak locking (N=4096, Overlap 8)
(Section 4 of paper)
(3) ... and multiple overlaps (2 and 8)
(Section 5 of paper)
(4) ... and noise handling (Final Result)
(Section 6 of paper)

Noise Detection And Handling (Section 6 and 7 of paper)

The following table illustrates:

A time stretching ratio of 1.2 was used, and M=40.

Original signal
Pink Noise + Sine

[2]
Extracted tonal part
Extracted noise part
Time stretched without noise handling
Time stretched with noise handling

For some reason these files do not play on Google Chrome. Use another browser, or right click on the player and choose "Save Audio As..."

Vertical Phase Coherence Measure (Section 8.2 of paper)

The following table illustrates the zero-phasing transformation used to measure vertical phase coherence, followed by the test signal time stretched with M=10 (rather good vertical phase coherence) and M=250 (rather poor vertical phase coherence).

Original signal [1]
(See Figure 2 (a) of paper)
Zero Phased
(See Figure 2 (b) of paper)
Time stretched, M=10
(See Figure 2 (c) of paper)
Time stretched, M=250
(See Figure 2 (d) of paper)

Further Improvements (preliminary results)

This table shows preliminary results in incorporating a transient processing scheme (reference [15] of the paper or here) in the proposed algorithm.

Original Signal [6] Proposed technique, M=40 With transient processing

Related Work

Music References

The musical excerpts are used under fair use from the following sources:

[1] Vangelis, "Main Theme from Chariots of Fire", Odyssey - The Definitive Collection, Hip-O Records, 2003
[2] Melanie G 'MySweetDarkness', "The Morn", Through the Shadow and the Light, Everlasting Dream, 2007
[3] John Lennon, "Imagine", Imagine (2010 - Remaster), EMI Records LTD, 2010
[4] Ramin Djawadi, "Game of Throne, Main Title", Game of Throne: Season 3, WaterTower Music, 2013
[5] Inti-Aymará, "Sinfonia N. 40 Em Sol Menor", Corazón Andino, MoviePlay Brasil, 1993
[6] Philip Nixon, "Horror Level Soundtrack", Oscar, Flair Software, 1993


Copyright (C) 2016 - 2017, Nicolas Juillerat