Audio Samples from "Controllable and Interpretable Singing Voice Decomposition via Assem-VC"
Paper: arXiv:2110.12676
Repository: mindslab-ai/assem-vc @ GitHub
Authors: Kang-wook Kim, Junhyeok Lee @MINDsLab Inc. , SNU
Abstract: We propose a singing decomposition system that encodes time-aligned linguistic content, pitch, and source speaker identity via Assem-VC. With decomposed speaker-independent information and the target speaker's embedding, we could synthesize the singing voice of the target speaker. In conclusion, we made a perfectly synced duet with the user's singing voice and the target singer's converted singing voice.
Contents
0. Reconstruction
Reference 1
Reference 2
Reference 3
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Reconstruction 1
Reconstruction 2
Reconstruction 3
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
1. Controllable Attributes
1.1. Control Lyrics
Text Content: {OW} {D IH R} {W AH T} {K AE N} {DH AH} {M AE T ER} {B IY}.
Reference
Your browser does not support the audio element.
1.1.1 Text Edition
{UH} {DH EH R} {M AW S} {K AE N} {N AA} {B AA DH ER} {M IY} .
{AH} {AH AH AH} {AH AH AH} {AH AH AH} {AH AH} {AH AH AH AH} {AH AH} .
{ER} {ER ER ER} {ER ER ER} {ER ER ER} {ER ER} {ER ER ER ER} {ER ER} .
{IY} {IY IY IY} {IY IY IY} {IY IY IY} {IY IY} {IY IY IY IY} {IY IY} .
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
1.1.2 Text Deletion
It is able to delete desired phonemes by replacing them with blank tokens and the corresponding pitches with 0.
{OW} {D IH R} {W AH T} {K AE N} {DH AH} {M AE T ER} {B IY}.
{OW} {D IH R} {W AH T} {K AE N} {DH AH} {M AE T ER} {B IY} .
Your browser does not support the audio element.
Your browser does not support the audio element.
Replacing phonemes with blank tokens without changing the pitches generates voice as follows.
{ } { } { } { } { } { } { } .
Your browser does not support the audio element.
1.2. Control Rhythm
Text Content: {IH T S} {F L IY S} {W AA Z} {W AY T} {AE Z} {S N OW}.
Reference
Your browser does not support the audio element.
{IH T S} {F L IY S} {W AA Z}
{W AY T} {AE Z} {S N OW }. × 2.5
{IH T S} {F L IY S} {W AA Z}
{W AY T} {AE Z} {S N OW }. × 0.5
{IH T S} {F L IY S} {W AA Z}
{W AY T} {AE Z} {S N OW}. × 5.
{IH T S} {F L IY S} {W AA Z}
{W AY T} {AE Z} {S N OW}. × 5.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Text Content: {HH IY} {P R AA M IY S T} {T UW} {B R IH NG} {M IY} {AH} {B AH N CH} {AH V} {R EH D} {R OW Z IH Z}.
Reference
Your browser does not support the audio element.
ALL × 0.6
ALL × 1.7
{HH IY} {P R AA M IY S T}
{T UW}BLANK {B R IH NG}... × 10.
{HH IY} {P R AA M IY S T} {T UW}
{B R IH NG } {M IY} {AH} ... × 10.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
1.3. Control Pitch
1.3.1 Pitch Shift
Reference
Your browser does not support the audio element.
+1
+2
+3
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
+4
+5
+6
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
-1
-2
-3
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
-4
-5
-6
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
1.3.2 Pitch Deletion
Replacing pitches with 0 without changing the phonemes generates voice as follows.
Text Content: {OW} {D IH R} {W AH T} {K AE N} {DH AH} {M AE T ER} {B IY}.
Reference
Your browser does not support the audio element.
{OW} {D IH R} {W AH T} {K AE N} {DH AH} {M AE T ER} {B IY} .
Your browser does not support the audio element.
1.4. Control Speaker Identity
1.4.1 CSD to Female
Text Content: {IH T S} {F L IY S} {W AA Z} {W AY T} {AE Z} {S N OW}.
Reference: CSD
Your browser does not support the audio element.
Target: NJAT
Target: PMAR
Your browser does not support the audio element.
Your browser does not support the audio element.
Converted Result: NJAT
Converted Result: PMAR
Your browser does not support the audio element.
Your browser does not support the audio element.
1.4.2 CSD to Male
The pitches are multiplied by 1/2 to match the pitch range of the male target speakers.
Text Content: {HH IY} {P R AA M IY S T} {T UW} {B R IH NG} {M IY} {AH} {B AH N CH} {AH V} {R EH D} {R OW Z IH Z}.
Reference: CSD
Your browser does not support the audio element.
Target: JTAN
Target: KENN
Your browser does not support the audio element.
Your browser does not support the audio element.
Converted Result: JTAN
Converted Result: KENN
Your browser does not support the audio element.
Your browser does not support the audio element.
2. Demo with the User's Singing Voice
2.4.1 Demo with the User's Singing Voice
Text Content: Till the end of the time
Reference: ADIZ
Target 1: JTAN
Target 2: KENN
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Result: JTAN-7 keys
Result: KENN-7 keys
Your browser does not support the audio element.
Your browser does not support the audio element.
Combined Result: ADIZ & JTAN-7
Combined Result: ADIZ & KENN-7
Your browser does not support the audio element.
Your browser does not support the audio element.
Text Content: Just a be with you
Reference: JLEE
Target: MCUR
Your browser does not support the audio element.
Your browser does not support the audio element.
Result: MCUR+5 keys
Result: MCUR+7 keys
Your browser does not support the audio element.
Your browser does not support the audio element.
Combined Result: Author & MPOL+5
Combined Result: Author & MPOL+7
Your browser does not support the audio element.
Your browser does not support the audio element.
Text Content: Do you hear the people sing, singing the song of angry man
Reference: Author
Target: CSD
Your browser does not support the audio element.
Your browser does not support the audio element.
Result: CSD:+5 keys
Result: CSD:+7 keys
Your browser does not support the audio element.
Your browser does not support the audio element.
Combined Result: Author & CSD+5
Combined Result: Author & CSD+7
Your browser does not support the audio element.
Your browser does not support the audio element.
Text Content: City of stars, are you shining just for me
Reference: Author
Target: MPOL
Your browser does not support the audio element.
Your browser does not support the audio element.
Result: MPOL+5 keys
Result: MPOL+7 keys
Your browser does not support the audio element.
Your browser does not support the audio element.
Combined Result: Author & MPOL+5
Combined Result: Author & MPOL+7
Your browser does not support the audio element.
Your browser does not support the audio element.
2.4.2 Further Examples of Duet
Text Content: Lucky I'm with love with my best friend, lucky to have been where I have been.
We mixed +3 and +4 keys to match the chord of the original song.
Reference: Author
Result: CSD+3,4
Combined: Author & CSD+3,4
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Text Content: There's a calm surrender
Reference: JLEE
Result: CSD+7
Combined: JLEE & CSD+7
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Text Content: Lift me up to touch the sky
Reference: ADIZ
Result: CSD-7
Combined: ADIZ & CSD-7
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Text Content: Show you the best of mine
Reference: ADIZ
Result: KENN-7
Combined: ADIZ & KENN-7
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
3. Artifacts of HiFi-GAN
We observe that there are audible artifacts in our model’s synthesized result, and it is also visible
in spectrogram. These noisy artifacts degrade the quality of the synthesized result of the model. We
also found that the similar audible artifacts is generated when the singing voice was reconstructed
by HiFi-GAN. We will resolve this issue in future works.
Reference
Reconstruction of HiFi-GAN
Reconstruction of our model
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.