Direct Exposure of Speech Articulations in Model Architectures for Downstream Processing

Direct Exposure of Speech Articulations in Model Architectures for Downstream Processing

Mahir Morshed

March 22, 2023, 4:00-5:00pm or online

Abstract:

Extensive prior work on speech recognizers for lower-resourced languages has used varied transfer learning approaches involving similar systems for higher-resourced languages. Examining those tools' architectures, however, has often revealed little use of specific phonetics principles in a directly explainable fashion, either in model fitting or in evaluation. The purpose of this proposed work is to more thoroughly explore the use and transfer of articulatory and suprasegmental features in multilingual speech processing systems, inspired by such features frequently appearing across vastly different languages in similar ways. It first considers reconciling differences between various conceptions of the articulatory space, whether in dimension (binary versus multi-class, for example) or in partition (among consonants, vowels, and sounds in-between). It then considers specialized extractors and processors of features that are able to reflect those differences in dimension and partition, as well as assemblies of such components for downstream speech recognition tasks. Both considerations proceed with an eye toward making the presence and transfer of phonetic information across systems and across languages more clearly distinguishable with linguistic principles.