Notes
- The L1 speaker speaks the General American Accent
- YKWK's native language is Korean
- TXHC's native language is Chinese
- Dataset (L2-ARCTIC corpus [1]): https://psi.engr.tamu.edu/l2-arctic-corpus/
Experiment 1: Evaluating the reference-based golden speaker (L1-GS)
- L1/L2 reference speech: original unmodified L1/L2 speech recordings
- Senone-PPG: use the senone-PPG as the input
- Mono-PPG: use the monophone PPG as the input
- BNF: use the bottleneck feature vector as the input
L2 speaker | Text | L1 reference speech | L2 reference speech | Senone-PPG | Mono-PPG | BNF |
---|---|---|---|---|---|---|
YKWK | What an excited whispering and conferring took place. | |||||
I have seen myself that one man contemplated by Pascal's philosophic eye. | ||||||
And wherever I ranged, the way lay along alcohol drenched roads. | ||||||
TXHC | And as we hurried up town, Joe Goose explained. | |||||
The scents of strange vegetation blew off the tropic land. | ||||||
The life there was healthful and athletic, but too juvenile. |
Experiment 2: Evaluating the reference-free golden speaker (L2-GS)
- Input speech: original unmodified L2 speech recordings
- Baseline 1: the system of Zhang et al. [2], serving as a baseline model in this work
- Baseline 2: the reference-free FAC system of Liu et al. [3]; samples provided through the courtesy of S. Liu (CUHK)
- Proposed: the proposed reference-free accent conversion model
L2 speaker | Text | Input speech | Reference-Free Accent conversion (Baseline 1) | Reference-Free Accent conversion (Baseline 2) | Reference-Free Accent conversion (Proposed) |
---|---|---|---|---|---|
YKWK | He had fulfilled his duty and paid properly. | ||||
But already he had composed himself. | |||||
The Russian music player, the Count, was her obedient slave. | |||||
TXHC | What an excited whispering and conferring took place. | ||||
Thus he turned the tenets and jargon of psychology back on me. | |||||
But already he had composed himself. |
Citation
If you would like to cite this work, please refer to the manuscript below,
@article{zhao2021converting,
title={Converting Foreign Accent Speech Without a Reference},
author={Zhao, Guanlong and Ding, Shaojin and Gutierrez-Osuna, Ricardo},
journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
volume={29},
pages={2367--2381},
year={2021},
publisher={IEEE}
}
References
[1] G. Zhao et al., "L2-ARCTIC: A non-native English speech corpus," in Proc. Interspeech, 2018, pp. 2783-2787.
[2] J.-X. Zhang, Z.-H. Ling, Y. Jiang, L.-J. Liu, C. Liang, and L.-R. Dai, "Improving sequence-to-sequence acoustic modeling by adding text-supervision," in Proc. ICASSP, 2019, pp. 6785-6789.
[3] S. Liu et al., "End-to-end accent conversion without using native utterances," in Proc. ICASSP, 2020, pp. 6289-6293.