Nasals are produced with the soft palate open. Therefore, its place of articulation must be downstream from the soft palate. The farthest-back place of articulation is uvular (/ɴ/), but it looks like the most common nasals are the four that are used in English and Spanish: labial, coronal, palatal and velar (/m/,/n/,/ɲ/,/ŋ/).
Nasals can be voiced or unvoiced, but in both cases, they are called [+sonorant]. That's because air pressure doesn't rise inside the vocal tract; there is no obstruction.
Languages like Hmong and Yupik have unvoiced nasals, such as /m̥/ (in the word "Hmong"), /n̥/ (in the example below), and /ŋ̊/. YouTube will give us an example of the word "Hmong," and Lane Schwartz will give us an example of the unvoiced /n/ from the Yupik language.
First, download http://courses.engr.illinois.edu/ece590sip/sp2018/spectrogram.py again, because I've added some functions that will save us a little bit of typing.
import spectrogram as sg
import soundfile as sf
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# Download three Yupik sample utterances
data1, fs1 = sg.download_audio('http://courses.engr.illinois.edu/ece590sip/sp2018/clip_2_3.26_5.09.wav')
data2, fs2 = sg.download_audio('http://courses.engr.illinois.edu/ece590sip/sp2018/clip_26_93.81_94.68.wav')
data3, fs3 = sg.download_audio('http://courses.engr.illinois.edu/ece590sip/sp2018/clip_36_152.88_155.08.wav')
# Save them locally
sf.write('yupik1.wav',data1,fs1)
sf.write('yupik2.wav',data2,fs2)
sf.write('yupik3.wav',data3,fs3)
(S1,Ext1)=sg.readable_spectrogram(data1,fs1)
plt.figure(figsize=(15,10))
plt.subplot(211)
plt.plot(data1,'k')
plt.title('Waveform')
plt.subplot(212)
im1=plt.imshow(S1,origin='lower',extent=Ext1,aspect='auto')
im1.set_cmap('Greys')
plt.title('Spectrogram')
This sentence is something like /li n̥ɑq su ʍɑ ɬɪk/.
Now let's get an example of an unvoiced /m/, from the name of the Hmong language. There are two videos on YouTube called "How to pronounce the word Hmong." One of them pronounces the name with an /h/ followed by a voiced /m/, the other uses a real unvoiced /m/. I'm not sure if the first one is just an Americanized version of the word, or if this is a dialect difference among dialects of Hmong. In any case, let's download the second one.
I haven't found any purely python way to do this (I know it's possible, but it looks really complicated). Instead, make sure you have youtube_dl and ffmpeg installed as command line tools, then try the following.
Actually, the following code fails for me; apparently I have ffmpeg available in bash, but not available in Windows. So in that case, just go to the bash shell and enter these same commands, so that you get the file "hmong.wav".
from subprocess import call
command1 = "youtube-dl https://www.youtube.com/watch?v=_G6AkZn_yJU --id -x"
command2 = "ffmpeg -i _G6AkZn_yJU.m4a hmong.wav"
call(command1.split(),shell=False)
call(command2.split(),shell=False)
(hmong,fs)=sf.read('hmong.wav')
plt.figure(figsize=(15,2))
plt.plot(np.linspace(0,len(hmong)/fs,len(hmong)),hmong)
As you can see, the audio is mostly silent, except that the word "Hmong" is uttered twice: once from 6.0 to 7.0 seconds, once from 11.0 to 12.0 seconds. Let's cut out one of those examples and look at it.
x = hmong[int(fs*5.75):int(fs*7.25)]
S,E=sg.readable_spectrogram(x,fs)
plt.figure(figsize=(15,10))
plt.subplot(211)
plt.plot(x)
plt.title('Waveform of the word Hmong')
plt.subplot(212)
im1=plt.imshow(S,extent=E,origin='lower',aspect='auto')
im1.set_cmap('Greys')
The nasal closure is made by pinching off the vocal tract at some position. Immediately before the nasal closure, the formant frequencies approach the separate resonant frequencies of the front and back cavity.
The resonant frequencies of the front cavity are Ffn=c4Lf+(n−1)c2Lf where Lf is the length of the front cavity.
If the tongue constriction has a length of Lc, its resonant frequencies are Fcn=(n−1)c2Lc
The resonant frequencies of the back cavity are Fbn=(n−1)c2Lb where Lb is the length of the back cavity. Yes, the lowest resonance really does go to F1=0.
The total length of the vocal tract varies from Ltot=Lf+Lc+Lb≈15cm to Ltot=Lf+Lc+Lb≈18cm, depending on the person. The value Ltot=17.7cm is particularly convenient, because it means that c/2Ltot=1000Hz. That would be a pretty large man, though; most people have c/2Ltot a little higher than that.
The formants are created by sorting the resonances of front and back cavity, in order of increasing frequency.
For an /m/ or /ɱ/, there is no front cavity. The back cavity has a length equal to the total length of the vocal tract. Let's say that's Lb=17.7cm, and c=35400cm/s. Then the formant frequencies immediately before closure of the /m/, and immediately after release, are Fn=(n−1)c2L=(n−1)1000 So the formants are F1=0, F2=1000, F3=2000, etc. In other words, all of the vowel formants drop down to their "closed-lip" values: 0, 1000, 2000, and so on.
For an /n/, the front cavity length is about Lf=2cm, the constriction is a long narrow tongue constriction of about Lc=4.5cm, and the back cavity length is around Lb=11cm. So there's only one front-cavity resonance that matters, and it's the same as for an /s/: it is Fb=4500Hz. The resonances of the constriction and of the back cavity are Fcn=(n−1)c2Lc=(n−1)3900 Fbn=(n−1)c2Lb=(n−1)1600 So the back cavity resonances are at 0, 1600, 3200, 4800Hz and so on. The front cavity resonance is 4500Hz, the constriction resonance is at 3900Hz. Putting them together, we get F1=0 F2=1600 F3=3200 F4=3900 F5=4500 F6=4800
The F2 at 1600Hz is the most reliable sign that the vocal tract is moving toward an alveolar constriction, for /n/, /s/, /t/, or /d/. It's remarkably consistent, across people and across contexts.
The very high value of F3 (around 3200Hz!!) is the second most reliable sign that the vocal tract is moving toward an alveolar constriction. It's the only place of articulation in English that causes F3 to go UP, instead of going down.
Exactly the same arguments hold for the palatal nasal /ɲ/, except that the back cavity is even shorter, so the value of F2 is much higher -- always above 2000Hz.
For an /ŋ/, the front cavity length is about Lf=5cm, the back cavity is Lb=10cm, and the constriction takes up what's left. The unique feature of velar articulations (/ŋ/,/k/, and /g/) is that the back cavity is twice the length of the front cavity, which means they have the same first resonance. Ffn=c4Lf+(n−1)c2Lf=1770+(n−1)3540 Fbn=(n−1)c2Lb=(n−1)1770 Putting it together, we get F1=0 F2=1770 F3=1770 F4=3540 F5=5310
The "velar pinch", the convergence of F2 and F3 to the same frequency, is the most reliable sign that the vocal tract is moving toward a velar constriction.
The uvular nasal is kind of rare in world languages, because it's hard to make. But it's cool from a speech production point of view. Basically, there is no back cavity. The entire pharynx is constricted, so we have Lc≈8.5cm, and Lf≈8.5cm.
When there is no back cavity, that means that the back of the constriction is no longer open, so it no longer acts as a half-wave resonator. Instead, we have a constriction that's closed in back, and open in front, so it acts as a quarter-wave resonator, just like the front cavity. So we have Lcn=c4Lc+(n−1)c2Lc=1000+(n−1)2000
From the point of view of the front cavity, the constriction looks closed. So the front cavity is ALSO closed in back, open in front. So it has Lfn=c4Lf+(n−1)c4Lf=1000+(n−1)2000
Putting it all together, we have F1=1000 F2=1000 F3=3000 F4=3000 F5=5000 F6=5000
So there's a "uvular pinch", just like the velar pinch, except that this time, it's F1 and F2 that pinch together (at about 1000Hz).
F3, instead of pinching together with F2 as in a velar /g/, goes up to 3000Hz as in an alveolar /d/. So if you weren't noticing F1 and F2, you might think this was an alveolar nasal, an /n/. But F1 and F2 give it away: F1 stays ridiculously high (at around 1000Hz), and F2 stays ridiculously low (at around 1000Hz).
OK, let's look at some examples from wikipedia.
import soundfile as sf
import io
import urllib.request as request
consonant_pathnames = {
'm' : 'a/a9/Bilabial_nasal',
'ɱ' : '1/18/Labiodental_nasal',
'n' : '2/29/Alveolar_nasal',
'ɲ' : '4/46/Palatal_nasal',
'ŋ' : '3/39/Velar_nasal',
'ɴ' : '3/3e/Uvular_nasal'
}
consonant_waves = {}
for c_ipa,c_pathname in consonant_pathnames.items():
c_url = 'https://upload.wikimedia.org/wikipedia/commons/{}.ogg'.format(c_pathname)
try:
req = request.urlopen(c_url)
except request.HTTPError:
print('Unable to download {}'.format(c_url))
else:
c_wav,c_fs = sf.read(io.BytesIO(req.read()))
c_filename = c_ipa + '.wav'
sf.write(c_filename,c_wav,c_fs)
consonant_waves[c_ipa] = c_wav
print('Donwnloaded these phones: {}'.format(consonant_waves.keys()))
use_this_consonant = 'ɴ'
(x_sgram,x_extent)=sg.readable_spectrogram(consonant_waves[use_this_consonant], c_fs)
plt.figure(figsize=(14,4))
plt.imshow(x_sgram,origin='lower',extent=x_extent,aspect='auto')