Direct Exposure of Speech Articulations in Model Architectures for Downstream Processing

Towards Automatic Understanding of Parent-Child Interaction Patterns from Family Audio to Monitor Child Mental Health

Jialu Li

April 5, 2023, 4:00-5:00pm, 2017 ECEB or online


In the U.S., about 15-17% of children aged 2-8 have at least one diagnosed mental, behavioral, or developmental disorder, though many go undiagnosed until adulthood. Thus, more fundamental understanding of how those mental health issues reach a clinically significant level from daily emotional and behavioral disturbance is required. Daily interactions with family members that are repeated and reinforced over time often contribute to children’s emotional well-being. Previous studies have identified parent-child interaction attributes correlated with children's later mental health problems. Our research aims to develop advanced machine learning models to automatically predict and analyze crucial home-life vocalization interactions to enhance child mental health outcomes. The challenge lies in limited relevant examples in typical home recordings for training robust ML models. This talk will present how wav2vec 2.0, using unsupervised pretraining on thousands of hours of unlabeled family audio, improves performance on a small amount of labeled home recordings for parent/infant speaker diarization and vocalization classification tasks. This approach effectively addresses data sparsity issues and thus advances family audio analysis to next level.