ECE 590SIC: Audio, Speech and Language Processing, Spring 2022

Syllabus

ECE 590SIC is a research seminar for people interested in speech and audio processing. Students may take it on a credit/no credit basis: credit is offered if you either (1) give a research presentation at least once during the semester, or (2) participate in a group project and submit some kind of report, co-authored with your group, describing your experimental results, or (3) submit a written report describing the presentations given by visiting speakers. Students may also attend without registering, and may give a presentation without registering. To schedule a talk, contact the instructor.

Course Schedule, Spring 2022

Wednesday, 1/19, 4pm: Small-group project planning: how should we go about designing small-group projects?

Wednesday, 1/26, 4pm: Small-group project planning: discussion of project proposals that have been submitted.

Wednesday, 2/2, 4pm: Datasets and first steps for three projects: (1) prediction and typology of second-language pronunciation errors, (2) unsupervised TTS, (3) monolingual fine-tuning in non-European languages.

Wednesday, 2/9, 9:00am: Mengzhe Geng, Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition; Zengrui Jin, Adversarial Data Augmentation for Disordered Speech Recognition

Wednesday, 2/16, 4pm: Shane Settle, Acoustic Word Embeddings (Whole-Word Segmental Speech Recognition using Acoustic Word Embeddings, multilingual AWEs, multilingual AWEs for query-by-example search)

Wednesday, 2/23, 4pm

Discussion: current progress of small-group projects.

Monolingual wav2vec2 tuning in six languages
Estimating pronunciations of second-language learners in L1 monophone space, L2 triphone space, and L1+L2 clustered acoustic centroid space
Unsupervised TTS

Thursday, 2/24, 2-5pm: CSL Student Conference

Wednesday, 3/2, 4pm: Probably: no meeting (TBD)

Wednesday, 3/9, 4pm: Discussion: current progress of small-group projects

Wednesday, 3/16, 4pm: Spring break

Wednesday, 3/23, 4pm: No meeting

Wednesday, 3/30, 4pm: Discussion: current progress of small-group projects

Wednesday, 4/6, 4pm: No meeting

Wednesday, 4/13, 4pm: Berk Iskender, StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

Wednesday, 4/20, 4pm

Presenters : Rimita Lahiri, Victor Ardulov

Title: Analysing Child-Adult Interactions with Machine Learning and Dynamical Systems

Papers:

Ardulov, V., Martinez, V. R., Somandepalli, K., Zheng, S., Salzman, E., Lord, C., ... & Narayanan, S. (2021). Robust diagnostic classification via Q-learning. Scientific reports, 11(1), 1-9.

Zane Durante, Victor Ardulov, Manoj Kumar, Jennifer Gongola, Thomas Lyon, Shrikanth Narayanan, Causal indicators for assessing the truthfulness of child speech in forensic interviews, Computer Speech & Language, Volume 71, 2022, 101263, ISSN 0885-2308, https://doi.org/10.1016/j.csl.2021.101263.

Lahiri, R., Kumar, M., Bishop, S., & Narayanan, S. (2020, May). Learning domain invariant representations for child-adult classification from speech. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6749-6753). IEEE.

Wednesday, 4/27, 4pm: Presentation #1: Peter Du, An LSTM-Based Autonomous Driving Model using Waymo Open Dataset; Presentation #2: Bobi Shi; Presentation #3: Final report, Unsupervised TTS team (Liming Wang, Junrui Ni, Heting Gao)

Wednesday, 5/4, 4pm: Final report: Non-Western Wav2vec (Heting Gao, Mahir Morshed, Junkai Wu; ppt)

Wednesday, 5/11, 4pm: Final report: Second-language pronunciation scoring (Shuju Shi, Jialu Li, John Harvill, Charlotte Yoder)