About
As a speech researcher and engineer, my work encompasses a diverse portfolio of research areas, including voice conversion, accent conversion, speaker change detection, speaker diarization, automatic speech recognition (ASR), keyword spotting, and multimodal large language models (LLMs).
For professional correspondence, you may reach me via email at FirstNameLastName at PersonalEmailServiceByGoogle dot com. How do I pronounce my name? In PinYin, it is written as Guàn-Lóng Zhào; the tones
are fourth, second, and fourth. Mapping to American English phonemes, it roughly sounds like Guan-Loan Chao. 🌈 Cheers!
Publications
Journal Articles
- S. Ding, G. Zhao, and R. Gutierrez-Osuna, "Accentron: Foreign accent conversion to arbitrary non-native speakers using zero-shot learning,"
Computer Speech & Language, vol. 72, 2022.
pdf
demo
- G. Zhao, S. Ding, and R. Gutierrez-Osuna, "Converting foreign accent speech without a reference," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 2367–2381, 2021.
pdf
demo
- I. Lučić Rehman, A. Silpachai, J. Levis, G. Zhao, and R. Gutierrez-Osuna, "The English pronunciation of Arabic speakers: A data-driven approach to segmental error identification," Language Teaching Research,
2020. pdf summary
- G. Zhao and R. Gutierrez-Osuna, "Using phonetic posteriorgram based frame pairing for segmental accent conversion," IEEE/ACM Transactions on Audio,
Speech, and Language Processing, vol. 27, no. 10, pp. 1649–1660, 2019. pdf code demo
- S. Ding, G. Zhao, C. Liberatore, and R. Gutierrez-Osuna, "Learning structured sparse representations for voice conversion," IEEE/ACM Transactions
on Audio, Speech, and Language Processing, vol. 28, pp. 343–354, 2019. pdf demo
- S. Ding, C. Liberatore, S. Sonsaat, I. Lučić Rehman, A. Silpachai, G. Zhao, E. Chukharev-Hudilainen, J. Levis, and R. Gutierrez-Osuna, "Golden speaker
builder–An interactive tool for pronunciation training," Speech Communication, vol. 115, pp. 51–66, 2019. pdf
code
demo
Conference Proceedings
- B. Labrador, P. Zhu, G. Zhao, A. S. Scarpati, Q. Wang, A. Lozano-Diez, and I. L. Moreno, "Personalizing keyword spotting with speaker information,"
in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025. pdf
- Q. Wang*, Y. Huang*, G. Zhao*, E. Clark, W. Xia, and H. Liao, "DiarizationLM: Speaker diarization post-processing with large language models,"
in Interspeech, 2024. pp. 3754–3758. *Equal contribution. pdf
code
- Y. Huang, W. Wang, G. Zhao, H. Liao, W. Xia, and Q. Wang, "On the success and limitations of auxiliary network based word-level end-to-end neural speaker diarization,"
in Interspeech, 2024. pp. 32–36. pdf
- G. Zhao, Y. Wang, J. Pelecanos, Y. Zhang, H. Liao, Y. Huang, H. Lu, and Q. Wang, "USM-SCD: Multilingual speaker change detection based on large pretrained foundation models,"
in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024. pp. 11801–11805. pdf
poster
- G. Zhao, Q. Wang, H. Lu, Y. Huang, and I. L. Moreno, "Augmenting transformer-transducer based speaker change detection with token-level training loss,"
in IEEE International Conference on Acoustics, Speech, and
Signal Processing (ICASSP), 2023. pdf
poster
resources
- B. Labrador*, G. Zhao*, I. L. Moreno*, A. S. Scarpati, L. Fowl, and Quan Wang, "Exploring sequence-to-sequence transformer-transducer models for keyword spotting,"
in IEEE International Conference on Acoustics, Speech, and
Signal Processing (ICASSP), 2023. *Equal contribution. pdf
- A. Hair, G. Zhao, B. Ahmed, K. Ballard, and R. Gutierrez-Osuna, "Assessing posterior-based mispronunciation detection on field-collected recordings from child speech therapy sessions,"
in Interspeech, 2021. pp. 2936–2940. pdf
- A. Silpachai, I. Lučić Rehman, T. A. Barriuso, J. Levis, E. Chukharev-Khudilaynen, G. Zhao, and R. Gutierrez-Osuna, "Effects of voice type and task on L2 learners' awareness of pronunciation errors,"
in Interspeech, 2021. pp. 1952–1956. pdf
- S. Ding, G. Zhao, and R. Gutierrez-Osuna, "Improving the speaker identity of non-parallel many-to-many voice conversion with adversarial speaker recognition," in Interspeech, 2020. pp. 776–780.
pdf code
demo
video
- A. Das, G. Zhao, J. Levis, E. Chukharev-Hudilainen, and R. Gutierrez-Osuna, "Understanding the effect of voice quality and accent on talker similarity," in Interspeech, 2020. pp. 1763–1767.
pdf
video
- G. Zhao, S. Ding, and R. Gutierrez-Osuna, "Foreign accent conversion by synthesizing speech from phonetic posteriorgrams,"
in Interspeech, 2019, pp. 2843–2847. pdf
code
demo slides
- G. Zhao, S. Sonsaat, A. Silpachai, I. Lučić Rehman, E. Chukharev-Hudilainen, J. Levis, and R. Gutierrez-Osuna, "L2-ARCTIC: A non-native English
speech corpus," in Interspeech, 2018, pp. 2783–2787. pdf
data code
slides
- S. Ding, G. Zhao, C. Liberatore, and R. Gutierrez-Osuna, "Improving sparse representations in exemplar-based voice conversion with a
phoneme-selective objective function," in Interspeech, 2018, pp. 476–480. pdf
- C. Liberatore, G. Zhao, and R. Gutierrez-Osuna, "Voice conversion through residual warping in a sparse, anchor-based representation of speech,"
in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 5284–5288. pdf
poster
- G. Zhao, S. Sonsaat, J. Levis, E. Chukharev-Hudilainen, and R. Gutierrez-Osuna, "Accent conversion using phonetic posteriorgrams,"
in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018, pp. 5314–5318. pdf
code
demo poster
- G. Angello, A. B. Manam, G. Zhao, and R. Gutierrez-Osuna, "Training behavior of successful tacton-phoneme learners,"
in IEEE Haptics Symposium (WIP), 2018. pdf
- G. Zhao and R. Gutierrez-Osuna, "Exemplar selection methods in voice conversion," in IEEE International Conference on Acoustics, Speech, and
Signal Processing (ICASSP), 2017, pp. 5525–5529. pdf
demo
poster
Book Chapter
- Y. Liu, G. Zhao, B. Gong, Y. Li, R. Raj, N. Goel, S. Kesav, S. Gottimukkala, Z. Wang, W. Ren, and D. Tao, "Chapter 10 — Image dehazing: Improved techniques,"
in Deep Learning through Sparse and Low-Rank Modeling: Elsevier, 2019, pp. 251–262.
link
code
Preprints
- Y. Huang, W. Wang, G. Zhao, H. Liao, W. Xia, and Q. Wang, "Towards word-level end-to-end neural speaker diarization with auxiliary network,"
arXiv preprint arXiv:2309.08489, 2023. pdf
- Q. Wang, Y. Huang, H. Lu, G. Zhao, and I. L. Moreno, "Highly efficient real-time streaming and fully on-device speaker diarization with multi-stage clustering,"
arXiv preprint arXiv:2210.13690, 2022. pdf
code
- Y. Liu, G. Zhao, B. Gong, Y. Li, R. Raj, N. Goel, S. Kesav, S. Gottimukkala, Z. Wang, W. Ren, and D. Tao, "Improved techniques for learning to dehaze and
beyond: A collective study," arXiv preprint arXiv:1807.00202, 2018. pdf
code
- Y. Liu and G. Zhao, "PAD-Net: A perception-aided single image dehazing network," arXiv preprint arXiv:1805.03146, 2018.
pdf
code
- A. Datta, G. Zhao, B. Ramabhadran, E. Weinstein, "LSTM acoustic models learn to align and pronounce with graphemes," arXiv preprint arXiv:2008.06121, 2020.
(Work done as an intern at Google NYC during summer 2018.)
pdf
Abstracts
- I. Lučić Rehman, A. Silpachai, J. Levis, G. Zhao, and R. Gutierrez-Osuna, "Pronunciation errors — A systematic approach to diagnosis,"
in L2 Pronunciation Research Workshop: Bridging the Gap between Research and Practice, 2019, pp. 23–24. pdf
- S. Sonsaat, E. Chukharev-Hudilainen, I. Lučić Rehman, A. Silpachai, J. Levis, G. Zhao, S. Ding, C. Liberatore, and R. Gutierrez-Osuna, "Golden Speaker Builder, an interactive tool for pronunciation training: User
studies," in 6th International Conference on English Pronunciation: Issues & Practices (EPIP6), 2019, p. 72. pdf
- S. Ding, C. Liberatore, G. Zhao, S. Sonsaat, E. Chukharev-Hudilainen, J. Levis, and R. Gutierrez-Osuna, "Golden Speaker Builder: an interactive online tool for L2 learners to build pronunciation models,"
in Pronunciation in Second Language Learning and Teaching (PSLLT), 2017, pp. 25–26. pdf
Professional Service
Reviewer for:
Students mentored:
- Yu-Neng Chuang, Research Intern @ Google DeepMind, 2025
- Anastasia Kuznetsova, Research Intern @ Google, 2023. First Employment: Applied Research Scientist, Rev
- Beltrán Labrador, Research Intern @ Google, 2022 & 2023. First Employment: Machine Learning Engineer, Google DeepMind
L2-ARCTIC Corpus
The L2-ARCTIC corpus is a comprehensive, multi-purpose dataset of non-native English speech. As a lead researcher on this two-year project, I designed the data collection protocols and annotation standards. I also oversaw extensive manual processing and rigorous quality control to ensure high-fidelity recordings and consistent annotations. Data collection was conducted at Iowa State University (ISU) under the direction of Dr. John Levis and his team in the Department of English, with primary annotations completed by Dr. Alif Silpachai and Dr. Ivana Lučić Rehman. Following our initial release at Interspeech 2018, we continuously expanded the dataset, growing the current version to nearly 2.4 times its original size.
Initially developed for accent conversion tasks—which motivated our use of the CMU-ARCTIC prompts—the project quickly evolved to address the scarcity of open-source resources for mispronunciation detection (MPD). We observed that the phonetic complexity of the CMU-ARCTIC sentences naturally elicited a rich variety of pronunciation errors from non-native speakers. Consequently, we expanded our scope to include phonetic error annotations. All annotated subsets were carefully curated by Dr. Levis to target anticipated pronunciation challenges based on the speakers' native languages.
This corpus serves as the foundational dataset for my publications on accent conversion, proving highly effective for evaluating algorithms across diverse accents, fluency levels, ages, and genders. It has also been instrumental in my MPD research; at the time of its 2018 release, it was among the largest open-source annotated MPD corpora available. Access guidelines for integrating the dataset into your own research can be found on the official project site. I strongly encourage its broader application within the research community and welcome any inquiries regarding dataset access or utilization via email.