Syllabus
ECE 590SIC is a research seminar for people interested in speech and audio processing. Students may take it on a credit/no credit basis: credit is offered if you either (1) give a research presentation at least once during the semester, or (2) participate in a group project and submit some kind of report, co-authored with your group, describing your experimental results, or (3) submit a written report describing the presentations given by visiting speakers. Students may also attend without registering, and may give a presentation without registering. To schedule a talk, contact the instructor.
Course Schedule, Spring 2022
- Wednesday, 1/19, 4pm
- Small-group project planning: how should we go about designing small-group projects?
- Wednesday, 1/26, 4pm
- Small-group project planning: discussion of project proposals that have been submitted.
- Wednesday, 2/2, 4pm
- Datasets and first steps for three projects: (1) prediction and typology of second-language pronunciation errors, (2) unsupervised TTS, (3) monolingual fine-tuning in non-European languages.
- Wednesday, 2/9, 9:00am
- Mengzhe Geng, Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition
- Zengrui Jin, Adversarial Data Augmentation for Disordered Speech Recognition
- Wednesday, 2/16, 4pm
- Shane Settle, Acoustic Word Embeddings (Whole-Word Segmental Speech Recognition using Acoustic Word Embeddings, multilingual AWEs, multilingual AWEs for query-by-example search)
- Wednesday, 2/23, 4pm
-
Discussion: current progress of small-group projects.
- Monolingual wav2vec2 tuning in six languages
- Estimating pronunciations of second-language learners in L1 monophone space, L2 triphone space, and L1+L2 clustered acoustic centroid space
- Unsupervised TTS
- Thursday, 2/24, 2-5pm
- CSL Student Conference
- Wednesday, 3/2, 4pm
- Probably: no meeting (TBD)
- Wednesday, 3/9, 4pm
- Discussion: current progress of small-group projects
- Wednesday, 3/16, 4pm
- Spring break
- Wednesday, 3/23, 4pm
- No meeting
- Wednesday, 3/30, 4pm
- Discussion: current progress of small-group projects
- Wednesday, 4/6, 4pm
- No meeting
- Wednesday, 4/13, 4pm
- Berk Iskender, StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
- Wednesday, 4/20, 4pm
- Presenters : Rimita Lahiri, Victor Ardulov
- Title: Analysing Child-Adult Interactions with Machine Learning and Dynamical Systems
-
Papers:
- Ardulov, V., Martinez, V. R., Somandepalli, K., Zheng, S., Salzman, E., Lord, C., ... & Narayanan, S. (2021). Robust diagnostic classification via Q-learning. Scientific reports, 11(1), 1-9.
- Zane Durante, Victor Ardulov, Manoj Kumar, Jennifer Gongola, Thomas Lyon, Shrikanth Narayanan, Causal indicators for assessing the truthfulness of child speech in forensic interviews, Computer Speech & Language, Volume 71, 2022, 101263, ISSN 0885-2308, https://doi.org/10.1016/j.csl.2021.101263.
- Lahiri, R., Kumar, M., Bishop, S., & Narayanan, S. (2020, May). Learning domain invariant representations for child-adult classification from speech. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6749-6753). IEEE.
- Wednesday, 4/27, 4pm
- Presentation #1: Peter Du, An LSTM-Based Autonomous Driving Model using Waymo Open Dataset
- Presentation #2: Bobi Shi
- Presentation #3: Final report, Unsupervised TTS team (Liming Wang, Junrui Ni, Heting Gao)
- Wednesday, 5/4, 4pm
- Final report: Non-Western Wav2vec (Heting Gao, Mahir Morshed, Junkai Wu; ppt)
- Wednesday, 5/11, 4pm
- Final report: Second-language pronunciation scoring (Shuju Shi, Jialu Li, John Harvill, Charlotte Yoder)