Project¶
ECE 537 will include a group final project.
Final project groups can contain one to three people.
Final projects will be presented during the regularly scheduled final exam period for this course, instead of a final exam.
Proposal (20%)¶
The final project proposal should be submitted on Wednesday, October 16. It should be about three pages, including:
Proposal title
Names of the people in your group
The main article or articles on which you plan to base your project
Key technological innovation that you want to focus on. This may be a new training criterion, or a new signal processing algorithm, or some other cool innovation. It need not be your own innovation; it can be something recently proposed by other authors that you want to test in ECE 537.
Datasets you plan to use
Include a formula defining the evaluation criterion(a). Define all terms.
Baselines against which your work will be compared. Are any of them code that you can download and test yourself, or will you need to compare against the numbers in somebody else’s paper?
Proposed ablation study(ies)
What open-source code do you plan to use?
Proposed division of labor: what code will be written by each person in your group?
Presentation (30%)¶
All final project presentations will take place on December 19, at the times and in the rooms specified below. The room online is the zoom URL specified for office hours in CampusWire. 3015 is our regular lecture room.
9:00-9:15, online, Mingyue Huo: “Content-Aware Speech Separation and Target Speech Extraction”
10:45-11:00, online, Ian Song: “Speech Diarization”
12:00-12:15, 3015, Yiting He: “Fast emotion detection based on formant frequency modulation”
12:15-13:00, online, Alara Tin, Katya Yegorova, and Priyam Mazumdar: “UnLID: Towards Universal Language Identification in an Unsupervised Setting”
13:00-13:30, 3015, Kevin Zhao and Mojtaba Khaliji: “Detection of Alzheimer’s Disease Through Acoustic Speech Analysis”
13:30-13:45, 3015, Yike Wang: “Extracting the speech from the blood: An improved clutter filter for functional ultrasound imaging”
13:45-14:00, 3015, Steven Guo: “Pitch Analysis by Filtered Autocorrelation”
14:00-14:45, 3015, Haolong Zheng, Jonah Cadena-Perera, and Kevin Hu: “Assessment of ASR Architectures for Children’s Speech: Impact of Pre-training on Adult vs. Children Data”
14:45-15:15, 3015, Chenhao Li and Linjie Tong: “Exploration of Utilizing Deep learning in Automatic Speech Recognition”
15:15-15:45, 3015, Jiajun Ruan and Tianxingjian Ding: “Personalized Speech Enhancement”
15:45-16:00, 3015, James Menezes: “Multi Speaker Speech Separation”
16:00-16:30, 3015, Aahan Thapliyal and Sharvill Garg: “Music Source Separation and Lyric Transcription”
16:30-16:45, online, Zutai Chen: “Enhancing Speech Separation in Noisy Environments Using Dual Input Neural Networks and Spatial Coherence Matrices”
16:45-17:30, 3015, Christopher Kim, Yurii Halychanskyi, and Yutong Wen: “Zero-Shot Accent and Timbre Transfer in Audio Diffusion”
Presentation should include:
Task description: what problem are you trying to solve? Why is it interesting? This part should be general enough to be understood by other students in ECE 537, but need not be more general than that.
Key technological innovation that you want listeners to focus on. This may be a new training criterion, or a new signal processing algorithm, or some other cool innovation. It need not be your own innovation; it can be something recently proposed by other authors that you want to test in ECE 537.
Evaluation condition: dataset(s), evaluation criterion(a), and baseline(s). If your project includes multiple evaluation conditions, your presentation may include all of them, but doesn’t need to; given the time available, include whichever conditions will be most interesting to other students in ECE 537.
Discussion. This part of the presentation may be an ablation study, or a visualization, or some other item of discussion that increases the level of understanding of the audience.
Conclusion. This should be one slide (or at most two slides) summarizing the most important point of your presentation.
Report (50%)¶
Final project reports are due at the time of your presentation, on Thursday, December 19.
The final project report should be six to twelve pages, including the following:
Proposal title
Names of the people in your group
Motivation. Describe the problem you’re trying to solve. Describe the current state of the art for this problem, including (if possible) at least three recent papers that have attempted to do something similar. Discuss what worked, and what didn’t work.
Background. What knowledge is necessary to understand your key technological innovation that is not universally known within the speech technology community? Give citations for at least two papers that provide the necessary background knowledge, distinct from the three papers that you listed in the motivation section. Provide full listings for all five (or more) of those papers in a bibliography at the end of your report. For each of the two papers cited in this section, provide a brief summary of its content in the text of this section, sufficient for the reader to understand the background knowledge encapsulated in that paper, and its contribution to support your key technological innovation.
Key technological innovation that you want the reader to focus on. Specify whether this is a new training criterion, a new signal processing algorithm, etc. Describe how it differs from the state of the art before it. Specify whether you are proposing a new innovation that has not been previously proposed (and if so, describe its relationship to the state of the art) or an innovation that you find particularly compelling from some published paper (and if so, cite that paper, and describe the relationship of this innovation to the ideas in papers published before it in the same field).
Datasets. Describe the content of each dataset in terms of the quantity of data, and the types of labels available for each datum, and the types of overall metadata available. Give the source of each dataset, and specify the license agreement of each dataset (open-source or not; if so, which one; if not, key terms).
Include a formula defining the evaluation criterion(a). Define all terms.
Baselines against which your work is compared. Specify which baselines you ran yourself, and which ones are numbers published in a paper. Describe (possibly in words, possibly in formulas or both) key differences in the algorithms between your main algorithm and the baselines.
Numerical results: one or more tables showing evaluation criterion(a) comparing your primary approach to the baselines.
Ablation study(ies): describe what configuration variables or hyperparameters were modified or ablated in each ablation study, and exactly how this was done. Show the results. Specify, in words, exactly what was learned from each ablation study.
Discussion. Describe, in words and/or visualizations, a scientific generalization that is implied by your results and/or your ablation study.
Conclusion. State the most important numerical difference in the evaluation criterion between your primary approach and the baseline(s) and/or ablation condition(s). State the main reason why you believe that difference occurred.
Statement of contributions. Describe, in words, which portion of the code was downloaded (and from what site), and which portion was written by each of your team members. Specify which portion of the final report had its initial draft written by each of your team members. It is assumed that initial drafts are rewritten, revised, and/or checked by all three team members.