As a speech researcher and engineer, my work encompasses a diverse portfolio of research areas, including voice conversion, accent conversion, speaker change detection, speaker diarization, automatic speech recognition (ASR), keyword spotting, and multimodal large language models (LLMs).
For professional correspondence, you may reach me via email at FirstNameLastName at PersonalEmailServiceByGoogle dot com. How do I pronounce my name? In PinYin, it is written as Guàn-Lóng Zhào; the tones are fourth, second, and fourth. Mapping to American English phonemes, it roughly sounds like Guan-Loan Chao. 🌈 Cheers!
Ph.D. in Computer Science, Texas A&M University
B.S. in Applied Physics (minor in Computer Science), University of Science and Technology of China
Journal Articles
Conference Proceedings
Book Chapter
Preprints
Abstracts
Reviewer for:
Students mentored:
The L2-ARCTIC corpus is a multi-purpose non-native English speech dataset. I took a leading role in this project, where I designed the data collection schemes and the annotation standards. I also spent a lot of time manually cleaning the raw speech recordings and performing quality control to ensure that the speech data and annotations were consistent and high-quality. The recordings were collected at Iowa State University (ISU), led by Dr. John Levis and his students in the Department of English. The annotations were mostly done by Dr. Alif Silpachai and Dr. Ivana Lučić Rehman. The project spanned around two years. We released the first version at Interspeech 2018, and we continued to add more data to the corpus. Its most recent version is almost 2.4x the size of the initial release.
We initially designed the corpus for the accent conversion task, and that was why we chose to use the CMU-ARCTIC prompts in the first place. Along the way, we were also working on some projects related to mispronunciation detection (MPD) and realized that there were limited open-source resources for MPD. We noticed that many of the CMU-ARCTIC sentences were hard for the participants to speak, which, on the one hand, made the recording sessions difficult, on the other hand, elicited rich pronunciation errors in non-native speech productions. As a result, we decided to annotate part of the corpus for phonetic errors. All the sentences we annotated were carefully selected by Dr. Levis to reflect the pronunciation issues that might happen given the speakers' native languages.
I use this corpus in all my publications on accent conversion. I found it is well-suited for the task because it allows me to test the algorithms on speakers with different accents, fluency level, age, and gender. I also use this corpus for MPD research. To the best of my knowledge, this was probably the largest open-source annotated MPD corpus at the time (around 2018). If you are interested in using the corpus for your projects, you can find access guidelines on its official project site. I would be happy to see it being used in more projects. If you have any questions regarding downloading/using the corpus, please feel free to drop me an email.