Converting Foreign Accent Speech Without a Reference (submitted to TASLP)

Guanlong Zhao, Shaojin Ding, and Ricardo Gutierrez-Osuna

Department of Computer Science and Engineering, Texas A&M University, USA

Notes

Experiment 1: Evaluating the reference-based golden speaker (L1-GS)

L2 speaker Text L1 reference speech L2 reference speech Senone-PPG Mono-PPG BNF
YKWK What an excited whispering and conferring took place.
I have seen myself that one man contemplated by Pascal's philosophic eye.
And wherever I ranged, the way lay along alcohol drenched roads.
TXHC And as we hurried up town, Joe Goose explained.
The scents of strange vegetation blew off the tropic land.
The life there was healthful and athletic, but too juvenile.

Experiment 2: Evaluating the reference-free golden speaker (L2-GS)

L2 speaker Text Input speech Reference-Free Accent conversion (Baseline 1) Reference-Free Accent conversion (Baseline 2) Reference-Free Accent conversion (Proposed)
YKWK He had fulfilled his duty and paid properly.
But already he had composed himself.
The Russian music player, the Count, was her obedient slave.
TXHC What an excited whispering and conferring took place.
Thus he turned the tenets and jargon of psychology back on me.
But already he had composed himself.

References

[1] G. Zhao et al., "L2-ARCTIC: A non-native English speech corpus," in Proc. Interspeech, 2018, pp. 2783-2787.

[2] J.-X. Zhang, Z.-H. Ling, Y. Jiang, L.-J. Liu, C. Liang, and L.-R. Dai, "Improving sequence-to-sequence acoustic modeling by adding text-supervision," in Proc. ICASSP, 2019, pp. 6785-6789.

[3] S. Liu et al., "End-to-end accent conversion without using native utterances," in Proc. ICASSP, 2020, pp. 6289-6293.