Reference-Free AC Audio Samples

Notes

The L1 speaker speaks the General American Accent
YKWK's native language is Korean
TXHC's native language is Chinese
Dataset (L2-ARCTIC corpus [1]): https://psi.engr.tamu.edu/l2-arctic-corpus/

Experiment 1: Evaluating the reference-based golden speaker (L1-GS)

L1/L2 reference speech: original unmodified L1/L2 speech recordings
Senone-PPG: use the senone-PPG as the input
Mono-PPG: use the monophone PPG as the input
BNF: use the bottleneck feature vector as the input

L2 speaker	Text	L1 reference speech	L2 reference speech	Senone-PPG	Mono-PPG	BNF
YKWK	What an excited whispering and conferring took place.
	I have seen myself that one man contemplated by Pascal's philosophic eye.
	And wherever I ranged, the way lay along alcohol drenched roads.
TXHC	And as we hurried up town, Joe Goose explained.
	The scents of strange vegetation blew off the tropic land.
	The life there was healthful and athletic, but too juvenile.

Experiment 2: Evaluating the reference-free golden speaker (L2-GS)

Input speech: original unmodified L2 speech recordings
Baseline 1: the system of Zhang et al. [2], serving as a baseline model in this work
Baseline 2: the reference-free FAC system of Liu et al. [3]; samples provided through the courtesy of S. Liu (CUHK)
Proposed: the proposed reference-free accent conversion model

L2 speaker	Text	Input speech	Reference-Free Accent conversion (Baseline 1)	Reference-Free Accent conversion (Baseline 2)	Reference-Free Accent conversion (Proposed)
YKWK	He had fulfilled his duty and paid properly.
	But already he had composed himself.
	The Russian music player, the Count, was her obedient slave.
TXHC	What an excited whispering and conferring took place.
	Thus he turned the tenets and jargon of psychology back on me.
	But already he had composed himself.

Citation

If you would like to cite this work, please refer to the manuscript below,


@article{zhao2021converting,
  title={Converting Foreign Accent Speech Without a Reference},
  author={Zhao, Guanlong and Ding, Shaojin and Gutierrez-Osuna, Ricardo},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  volume={29},
  pages={2367--2381},
  year={2021},
  publisher={IEEE}
}

References

[1] G. Zhao et al., "L2-ARCTIC: A non-native English speech corpus," in Proc. Interspeech, 2018, pp. 2783-2787.

[2] J.-X. Zhang, Z.-H. Ling, Y. Jiang, L.-J. Liu, C. Liang, and L.-R. Dai, "Improving sequence-to-sequence acoustic modeling by adding text-supervision," in Proc. ICASSP, 2019, pp. 6785-6789.

[3] S. Liu et al., "End-to-end accent conversion without using native utterances," in Proc. ICASSP, 2020, pp. 6289-6293.

Converting Foreign Accent Speech Without a Reference

Guanlong Zhao, Shaojin Ding, and Ricardo Gutierrez-Osuna

Department of Computer Science and Engineering, Texas A&M University, USA

Notes

Experiment 1: Evaluating the reference-based golden speaker (L1-GS)

Experiment 2: Evaluating the reference-free golden speaker (L2-GS)

Citation

References