Using Phonetic Posteriorgram Based Frame Pairing for Segmental Accent Conversion
Guanlong Zhao and Ricardo Gutierrez-Osuna
Department of Computer Science and Engineering, Texas A&M University, USA
Accent Conversion Audio Samples
Systems:
Baseline 1 (AC-SIM)
: using acoustic similarity to pair speech frames. This is the AC baseline system
Baseline 2 (AC-DTW)
: using Dynamic Time Warping to align speech frames. This is the VC baseline system
Posteriorgram (AC-PPG)
: using phonetic similarity to pair speech frames. This is the proposed AC system
Notes:
The L1 speakers speak the General American Accent
ABA's native language is Arabic
HKK's native language is Korean
TNI's native language is Hindi
The L1 reference audios were resynthesized from their MCEPs to match the acoustic quality of the other audio clips
Dataset (L2-ARCTIC corpus):
https://psi.engr.tamu.edu/l2-arctic-corpus/
L2 speaker
L1 reference speech
L2 speech
Baseline 1
Baseline 2
Posteriorgram
ABA
Your browser does not support the audio tag.
Your browser does not support the audio tag.
Your browser does not support the audio tag.
Your browser does not support the audio tag.
Your browser does not support the audio tag.
HKK
Your browser does not support the audio tag.
Your browser does not support the audio tag.
Your browser does not support the audio tag.
Your browser does not support the audio tag.
Your browser does not support the audio tag.
TNI
Your browser does not support the audio tag.
Your browser does not support the audio tag.
Your browser does not support the audio tag.
Your browser does not support the audio tag.
Your browser does not support the audio tag.