ECE 590SIC, Spring 2022

Syllabus

ECE 590SIC is a research seminar for people interested in speech and audio processing. Students may take it on a credit/no credit basis: credit is offered if you either (1) give a research presentation at least once during the semester, or (2) participate in a group project and submit some kind of report, co-authored with your group, describing your experimental results, or (3) submit a written report describing the presentations given by visiting speakers. Students may also attend without registering, and may give a presentation without registering. To schedule a talk, contact the instructor.

Course Schedule, Spring 2022

Wednesday, 1/19, 4pm
Small-group project planning: how should we go about designing small-group projects?
Wednesday, 1/26, 4pm
Small-group project planning: discussion of project proposals that have been submitted.
Wednesday, 2/2, 4pm
Datasets and first steps for three projects: (1) prediction and typology of second-language pronunciation errors, (2) unsupervised TTS, (3) monolingual fine-tuning in non-European languages.
Wednesday, 2/9, 9:00am
Mengzhe Geng, Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition
Zengrui Jin, Adversarial Data Augmentation for Disordered Speech Recognition
Wednesday, 2/16, 4pm
Shane Settle, Acoustic Word Embeddings (Whole-Word Segmental Speech Recognition using Acoustic Word Embeddings, multilingual AWEs, multilingual AWEs for query-by-example search)
Wednesday, 2/23, 4pm
Discussion: current progress of small-group projects.
  1. Monolingual wav2vec2 tuning in six languages
  2. Estimating pronunciations of second-language learners in L1 monophone space, L2 triphone space, and L1+L2 clustered acoustic centroid space
  3. Unsupervised TTS
Thursday, 2/24, 2-5pm
CSL Student Conference
Wednesday, 3/2, 4pm
Probably: no meeting (TBD)
Wednesday, 3/9, 4pm
Discussion: current progress of small-group projects
Wednesday, 3/16, 4pm
Spring break
Wednesday, 3/23, 4pm
No meeting
Wednesday, 3/30, 4pm
Discussion: current progress of small-group projects
Wednesday, 4/6, 4pm
No meeting
Wednesday, 4/13, 4pm
Berk Iskender, StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
Wednesday, 4/20, 4pm
Presenters : Rimita Lahiri, Victor Ardulov
Title: Analysing Child-Adult Interactions with Machine Learning and Dynamical Systems
Papers:
Ardulov, V., Martinez, V. R., Somandepalli, K., Zheng, S., Salzman, E., Lord, C., ... & Narayanan, S. (2021). Robust diagnostic classification via Q-learning. Scientific reports, 11(1), 1-9.
Zane Durante, Victor Ardulov, Manoj Kumar, Jennifer Gongola, Thomas Lyon, Shrikanth Narayanan, Causal indicators for assessing the truthfulness of child speech in forensic interviews, Computer Speech & Language, Volume 71, 2022, 101263, ISSN 0885-2308, https://doi.org/10.1016/j.csl.2021.101263.
Lahiri, R., Kumar, M., Bishop, S., & Narayanan, S. (2020, May). Learning domain invariant representations for child-adult classification from speech. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6749-6753). IEEE.
Wednesday, 4/27, 4pm
Presentation #1: Peter Du, An LSTM-Based Autonomous Driving Model using Waymo Open Dataset
Presentation #2: Bobi Shi
Presentation #3: Final report, Unsupervised TTS team (Liming Wang, Junrui Ni, Heting Gao)
Wednesday, 5/4, 4pm
Final report: Non-Western Wav2vec (Heting Gao, Mahir Morshed, Junkai Wu; ppt)
Wednesday, 5/11, 4pm
Final report: Second-language pronunciation scoring (Shuju Shi, Jialu Li, John Harvill, Charlotte Yoder)