This page contains audio examples to illustrate the time stretching technique presented in the following paper:
Nicolas Juillerat, "Audio Time Stretching with Controllable Phase Coherence" , 142nd Audio Engineering Society Convention, May 2017, Berlin
Audio Time Stretching is a digital audio effet that changes the speed (slows down or speeds up) of an audio signal without affecting its pitch. This is a difficult problem and all existing techniques produce undesired artifacts to some degree.
The following table shows a musical excerpt, and a version that has been slowed down by 1.5 (150% longer) using a state of the art (see below) audio time stretching algorithm:
Original Signal [3] | Time stretched by 1.5 |
Hint: throughout this page, playing any audio excerpt will automatically stop the previous one if not finished (unless JavaScript is disabled)
Most audio time stretching techniques fall into one of the following two categories:
These two approaches results in very different properties, both in terms of artifacts and qualities. The following table illustrates the point, notably with transformed versions in which two typical artifacts of each approach have been algorithmically exagerated:
Original | Time Domain | Phase Vocoder |
Best for: | Voice, solo, small changes | Polyphonic music, large changes |
Time Domain (exagerated artifacts) | Phase Vocoder (exagerated artifacts) | |
|
|
|
Typical Artifacts: |
Stuttering (doubling of drums) Low frequencies warbling (out of tune / sliding frequencies) |
Loss of "presence", smearing of drums, reverberation Modulations on low frequencies, thin sounding |
Note: The pairs of exagerated artifacts have been simply (!) produced by processing the audio with a too high (184 ms) and a too low (46 ms) frequency resolution instead of the optimal one (92 ms).
The goal of the proposed algorithm is to be anywhere between time domain techniques and the phase vocoder, by the mean of a single control parameter M.
As such, it is possible to adjust the processing to the characteristics of the audio signal being transformed.
The following tables shows the presented technique with various values of M, on various audio signals, compared to the phase-locked vocoder and time-domain WSOLA with comparable time-frequency resolutions.
The proposed algorithm with M=683 is generally very close to the phase-locked vocoder. The only difference (less noise attenuation) is barely audible.
Electronic [1] | Pop [3] | Classical [4] | Latino [5] | |
Original Signal | ||||
Time-domain WSOLA | ||||
Proposed algorithm, M=2 | ||||
Proposed algorithm, M=40 | ||||
Proposed algorithm, M=100 | ||||
Proposed algorithm, M=250 | ||||
Proposed algorithm, M=683 | ||||
Phase-Locked Vocoder |
Obviously, the time stretching ratio is not limited to 1.5:
Original signal | 0.7 (70% of length) | 1.1 (110% of length) | 1.5 (150% of length) |
This section illustrates some of the techniques that are described in the paper.
The following table contains audio examples corresponding to the successive modifications of the phase-locked vocoder. A time-stretching ratio of 1.5 has been used, and M=100 bands.
Original signal [1] | (1) Phase-Locked Vocoder (N=4096, Overlap 8) (Section 3 of paper) |
(2) With inter-peak locking (N=4096, Overlap 8) (Section 4 of paper) |
(3) ... and multiple overlaps (2 and 8) (Section 5 of paper) |
(4) ... and noise handling (Final Result) (Section 6 of paper) |
|
The following table illustrates:
A time stretching ratio of 1.2 was used, and M=40.
Original signal |
Pink Noise + Sine |
[2] |
Extracted tonal part | ||
Extracted noise part | ||
Time stretched without noise handling | ||
Time stretched with noise handling |
For some reason these files do not play on Google Chrome. Use another browser, or right click on the player and choose "Save Audio As..."
The following table illustrates the zero-phasing transformation used to measure vertical phase coherence, followed by the test signal time stretched with M=10 (rather good vertical phase coherence) and M=250 (rather poor vertical phase coherence).
Original signal [1]
(See Figure 2 (a) of paper)
|
Zero Phased
(See Figure 2 (b) of paper)
|
Time stretched, M=10
(See Figure 2 (c) of paper)
|
Time stretched, M=250
(See Figure 2 (d) of paper)
|
This table shows preliminary results in incorporating a transient processing scheme (reference [15] of the paper or here) in the proposed algorithm.
Original Signal [6] | Proposed technique, M=40 | With transient processing |
The musical excerpts are used under fair use from the following sources:
[1] Vangelis, "Main Theme from Chariots of Fire", Odyssey - The Definitive Collection, Hip-O Records, 2003
[2] Melanie G 'MySweetDarkness', "The Morn", Through the Shadow and the Light, Everlasting Dream, 2007
[3] John Lennon, "Imagine", Imagine (2010 - Remaster), EMI Records LTD, 2010
[4] Ramin Djawadi, "Game of Throne, Main Title", Game of Throne: Season 3, WaterTower Music, 2013
[5] Inti-Aymará, "Sinfonia N. 40 Em Sol Menor", Corazón Andino, MoviePlay Brasil, 1993
[6] Philip Nixon, "Horror Level Soundtrack", Oscar, Flair Software, 1993