Notes
- The L1 speaker speaks the General American Accent
- YKWK's native language is Korean
- TXHC's native language is Chinese
- Dataset (L2-ARCTIC corpus [1]): https://psi.engr.tamu.edu/l2-arctic-corpus/
Experiment 1: Evaluating the reference-based golden speaker (L1-GS)
- L1/L2 reference speech: original unmodified L1/L2 speech recordings
- Senone-PPG: use the senone-PPG as the input
- Mono-PPG: use the monophone PPG as the input
- BNF: use the bottleneck feature vector as the input
| L2 speaker | Text | L1 reference speech | L2 reference speech | Senone-PPG | Mono-PPG | BNF | 
|---|---|---|---|---|---|---|
| YKWK | What an excited whispering and conferring took place. | |||||
| I have seen myself that one man contemplated by Pascal's philosophic eye. | ||||||
| And wherever I ranged, the way lay along alcohol drenched roads. | ||||||
| TXHC | And as we hurried up town, Joe Goose explained. | |||||
| The scents of strange vegetation blew off the tropic land. | ||||||
| The life there was healthful and athletic, but too juvenile. | 
Experiment 2: Evaluating the reference-free golden speaker (L2-GS)
- Input speech: original unmodified L2 speech recordings
- Baseline 1: the system of Zhang et al. [2], serving as a baseline model in this work
- Baseline 2: the reference-free FAC system of Liu et al. [3]; samples provided through the courtesy of S. Liu (CUHK)
- Proposed: the proposed reference-free accent conversion model
| L2 speaker | Text | Input speech | Reference-Free Accent conversion (Baseline 1) | Reference-Free Accent conversion (Baseline 2) | Reference-Free Accent conversion (Proposed) | 
|---|---|---|---|---|---|
| YKWK | He had fulfilled his duty and paid properly. | ||||
| But already he had composed himself. | |||||
| The Russian music player, the Count, was her obedient slave. | |||||
| TXHC | What an excited whispering and conferring took place. | ||||
| Thus he turned the tenets and jargon of psychology back on me. | |||||
| But already he had composed himself. | 
Citation
If you would like to cite this work, please refer to the manuscript below,
@article{zhao2021converting,
  title={Converting Foreign Accent Speech Without a Reference},
  author={Zhao, Guanlong and Ding, Shaojin and Gutierrez-Osuna, Ricardo},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  volume={29},
  pages={2367--2381},
  year={2021},
  publisher={IEEE}
}
References
[1] G. Zhao et al., "L2-ARCTIC: A non-native English speech corpus," in Proc. Interspeech, 2018, pp. 2783-2787.
[2] J.-X. Zhang, Z.-H. Ling, Y. Jiang, L.-J. Liu, C. Liang, and L.-R. Dai, "Improving sequence-to-sequence acoustic modeling by adding text-supervision," in Proc. ICASSP, 2019, pp. 6785-6789.
[3] S. Liu et al., "End-to-end accent conversion without using native utterances," in Proc. ICASSP, 2020, pp. 6289-6293.