MP6: IIR Notch Filtering¶

Audio recordings sometimes have 60Hz hum noise, caused by leaving the microphone cable to close to a power line. This MP will explore methods for getting rid of the hum noise, especially the low-frequency harmonics. This MP was originally designed by Prof. Doug Jones at the University of Illinois.

In [55]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.figure
import wave
%matplotlib inline
In [56]:
import importlib
import submitted
importlib.reload(submitted)
Out[56]:
<module 'submitted' from '/Users/jhasegaw/Dropbox/mark/teaching/ece401/ece401labs/22fall/mp6/src/submitted.py'>

Part 0. Exploring the Data¶

This MP will explore one audio recording in detail, so that you can see where the low-frequency peaks are. The main peak, at 120Hz, could easily be filtered out from a large number of audio files, all at once.

In [57]:
with wave.open('input.wav','rb') as w:
    fs = w.getframerate()
    nsamps = w.getnframes()
    x = np.frombuffer(w.readframes(nsamps),dtype=np.int16).astype('float32')
In [58]:
t = np.linspace(0,nsamps/fs,nsamps)
fig=plt.figure(figsize=(14,4))
ax=fig.subplots()
ax.plot(t[(0.5<t)&(t<1.1)],x[(0.5<t)&(t<1.1)])
Out[58]:
[<matplotlib.lines.Line2D at 0x7fd164d57e50>]

Notice, in the waveform above, there is a periodic component in the "silence" before and after the speech begins. There seem to be five or six periods in 0.05sec, so maybe the period is around 0.01sec, or around 100Hz. We can listen to it using the following code:

In [59]:
import IPython.display as ipd
ipd.Audio(x, rate=fs)
Out[59]:
Your browser does not support the audio element.

How to debug¶

Rather than printing every plot using separate blocks for both the distributed solution and your solution, instead, each of the following blocks will include lines to plot either one or the other. In order to switch between your solution and the distributed solution, you'll need to comment out the other one.

In [60]:
import h5py
solutions = h5py.File('solutions.hdf5','r')

Part 1. Searching for the noise peaks¶

First, let's plot a spectrum, and try to find the frequencies of the noise components.

In [61]:
importlib.reload(submitted)
help(submitted.todo_spectrum)
Help on function todo_spectrum in module submitted:

todo_spectrum(x)
    Input:
    x (N) - a waveform
    Output:
    spec (N) - magnitude spectrum at N frequencies between 0 and fs, not including fs.

In [62]:
importlib.reload(submitted)
spec=submitted.todo_spectrum(x)
#spec = solutions['spec'][:]
f = np.linspace(0,fs,nsamps,endpoint=False)
fig=plt.figure(figsize=(14,4))
ax=fig.subplots()
ax.plot(f[f<500],spec[f<500])
Out[62]:
[<matplotlib.lines.Line2D at 0x7fd163f64580>]
  • It looks like there's a huge power-line noise component at about 120Hz.
  • There doesn't seem to be any 60Hz component; the recording device seems to have filtered out all energy below about 80Hz.
  • There is also energy at about 180Hz, which might be the third harmonic of the power line noise, but it might also be the pitch of a female talker. Let's leave that energy in the signal.
  • Instead, let's eliminate three of the other small peaks that occur between 90Hz and 120Hz.
  • We'll use the "todo_findpeak" function to find the frequencies of those peaks (if you've written it).
In [63]:
importlib.reload(submitted)
help(submitted.todo_findpeak)
Help on function todo_findpeak in module submitted:

todo_findpeak(spec, fs, flo, fhi)
    Input:
    spec (N) - magnitude spectrum at N frequencies between 0 and fs, not including fs.
    fs (scalar)  - sampling frequency
    flo (scalar) -  low end of the frequency range to search
    fhi (scalar) - high end of the frequency range to search
    Output:
    fpeak (scalar) - frequency of the highest-magnitude spectral sample, in Hertz

In [64]:
importlib.reload(submitted)
noise1 = submitted.todo_findpeak(spec, fs, 0, 100)
noise2 = submitted.todo_findpeak(spec, fs, noise1+1, 100)
noise3 = submitted.todo_findpeak(spec, fs, 100, 110)
noise4 = submitted.todo_findpeak(spec, fs, 100, 150)
freqs = np.array([noise1,noise2,noise3,noise4])
#freqs = solutions['freqs'][:]
print(freqs)
[ 90.28444206  96.42624084 103.18221949 119.7650762 ]

Part 2. Creating the notch filters¶

We'll create four notch filters: one for each of the frequencies we want to eliminate. For each notch filter, we need to create zeros and poles for the Z transform, then convert them to filter coefficients.

First, we'll find the zeros. Remember that the zeros of the filter are at:

$$r_{0,k} = e^{j2\pi f_k/F_s},~~~r_{1,k}=r_{0,k}^*$$
In [65]:
importlib.reload(submitted)
help(submitted.todo_zeros)
Help on function todo_zeros in module submitted:

todo_zeros(freqs, fs)
    Input:
    freqs (nfreqs) - an array of frequencies you want zeroed out, in Hertz
    fs (scalar) - sampling frequency, in Hertz
    Output:
    r (2,nfreqs) - an array of complex zeros on the unit circle, in complex conjugate pairs

In [66]:
importlib.reload(submitted)
r = submitted.todo_zeros(freqs,fs)
#z = solutions['z'][:]
print(r)
[[0.99937155+0.03544719j 0.99928315+0.03785745j 0.99917919+0.04050848j
  0.99889422+0.0470143j ]
 [0.99937155-0.03544719j 0.99928315-0.03785745j 0.99917919-0.04050848j
  0.99889422-0.0470143j ]]

Now let's find the poles. Remember that, for a notch filter, the poles should be at the same frequencies as the zeros, but shifted to the inside of the unit circle a little:

$$p_{0,k} = ar_{0,k},~~~p_{1,k}=p_{0,k}^*,$$

where $a$ is some number such that the resulting frequency response has a bandwidth of $BW$ Hertz.

In [67]:
importlib.reload(submitted)
help(submitted.todo_poles)
Help on function todo_poles in module submitted:

todo_poles(r, BW, fs)
    Input: 
    r (2,nfreqs) - an array of complex zeros on the unit circle, in complex conjugate pairs
    BW (scalar) - desired bandwidth, in Hertz
    fs (scalar) - sampling frequency, in Hertz
    Output:
    p (2,nfreqs) - an array of complex poles with bandwidth BW, in complex conjugate pairs

In [68]:
importlib.reload(submitted)
p = submitted.todo_poles(r,20,fs)
#p = solutions['p'][:]
np.log(np.abs(p))
Out[68]:
array([[-0.00392699, -0.00392699, -0.00392699, -0.00392699],
       [-0.00392699, -0.00392699, -0.00392699, -0.00392699]])
In [69]:
def pole_zero_plot(r,p):
    fig=plt.figure(figsize=(10,8))
    ax=fig.subplots()
    ax.plot([0,1e-6],[-2,2],'k-',[-2,2],[0,0],'k-')  # plot the axes
    omega = np.linspace(0,2*np.pi,200)
    ax.plot(np.cos(omega),np.sin(omega),'k-') # plot the unit circle
    ax.set_aspect('equal') # make it look like a circle
    zeromarker = matplotlib.markers.MarkerStyle(marker='o',fillstyle='none')
    polemarker = matplotlib.markers.MarkerStyle(marker='x',fillstyle='none')
    ax.scatter(x=np.real(r),y=np.imag(r),s=40,marker=zeromarker)
    ax.scatter(x=np.real(p),y=np.imag(p),s=40,marker=polemarker)
    return(ax)
ax=pole_zero_plot(r,p)

At that scale, the poles and zeros are all right on top of each other! Let's try to zoom in.

In [70]:
ax=pole_zero_plot(r,p)
ax.set_xlim([0.95,1.05])
ax.set_ylim([-0.05,0.05])
Out[70]:
(-0.05, 0.05)

Finally, let's create the filter coefficients. There are four filters, each of which has feedforward coefficients (b[:,k]) and feedback coefficients (a[:,k]):

$$H(z) = \prod_{k=1}^4 H_k(z)$$$$H_k(z) = \frac{B_k(z)}{A_k(z)}$$

WARNING: The definition of $A(z)$ in the MP are a little different from what we did in lecture. In lecture, and in this MP, the numerator polynomial is defined as:

$$B_k(z) = (1-r_{0,k}z^{-1})(1-r_{1,k}z^{-1}) = b_{0,k} + b_{1,k}z^{-1} + b_{2,k} z^{-2}$$

For the MP, we will define $A(z)$ with the same coefficient signs as $B(z)$, i.e., $A(z)$ is defined as:

$$A_k(z) = (1-p_{0,k}z^{-1})(1-p_{1,k}z^{-1}) = a_{0,k} + a_{1,k}z^{-1} + a_{2,k} z^{-2}$$

(In lecture we defined $A(z)$ with coefficients $a_1$ and $a_2$ that had the opposite sign from the signs shown above. Can you see why?)

Because the MP is designing $A(z)$ and $B(z)$ symmetrically, as shown above, it is possible to use the same function to calculate either set of coefficients. Let's call that function todo_coefficients:

In [71]:
importlib.reload(submitted)
help(submitted.todo_coefficients)
Help on function todo_coefficients in module submitted:

todo_coefficients(r)
    Input: 
    r (2,nfreqs) - an array of complex roots, in complex conjugate pairs
    Output:
    b (3,nfreqs) - an array of second-order polynomial coefficients, one per complex root pair

In [72]:
importlib.reload(submitted)
a = submitted.todo_coefficients(p)
#a = solutions['a'][:]
b = submitted.todo_coefficients(z)
#b = solutions['b'][:]
for whichfreq in range(4):
    print('For frequency',whichfreq,'numerator coefficients are',b[:,whichfreq])
    print('   and denominator coefficients are',a[:,whichfreq])
For frequency 0 numerator coefficients are [ 1.        -1.9987431  1.       ]
   and denominator coefficients are [ 1.         -1.99090945  0.99217678]
For frequency 1 numerator coefficients are [ 1.        -1.9985663  1.       ]
   and denominator coefficients are [ 1.         -1.99073334  0.99217678]
For frequency 2 numerator coefficients are [ 1.         -1.99835839  1.        ]
   and denominator coefficients are [ 1.         -1.99052624  0.99217678]
For frequency 3 numerator coefficients are [ 1.         -1.99778843  1.        ]
   and denominator coefficients are [ 1.         -1.98995852  0.99217678]
In [74]:
fig = plt.figure(figsize=(10,8))
axs=fig.subplots(len(freqs),2)
for row in range(len(freqs)):
    axs[row,0].stem(b[:,row],use_line_collection=True)
    axs[row,0].set_title('Feedforward coefficients, filter %d'%(row))
    axs[row,1].stem(a[:,row],use_line_collection=True)
    axs[row,1].set_title('Feedback coefficients, filter %d'%(row))
fig.tight_layout()

Part 3. Filtering out the noise¶

Let's plot the frequency response that we would get by running all four filters, one after the other.

First, let's plot the frequency responses that we get from each of the second-order notch filters separately. Remember that these are:

$$H_k(\omega)=\left. H_k(z)\right|_{z=e^{j\omega}}=\frac{B(e^{j\omega})}{A(e^{j\omega})}$$

Let's plot this for the following frequencies: $\omega\in\left\{0,\frac{2\pi}{N},\frac{4\pi}{N},\ldots,\frac{2\pi(N-1)}{N}\right\}$.

In [75]:
importlib.reload(submitted)
help(submitted.todo_freqresponse)
Help on function todo_freqresponse in module submitted:

todo_freqresponse(a, b, N, version='right')
    Input: 
    a (3) - feedback coefficients.  You may assume a[0]=1.
    b (3) - feedforward coefficients.  You may assume b[0]=1.
    N (scalar) - number of samples of the frequency response to compute
    Output: 
    omega (N) - frequencies linearly spaced between 0 and 2pi, not including 2pi.
    H (N) - B(e^{jw})/A(e^{jw}) evaluated at the frequencies in the vector omega.

We can calculate the total frequency response by multiplying all four component frequency responses together: $$H(\omega) = \prod_{k=1}^4 H_k(\omega)$$

In [77]:
importlib.reload(submitted)
H = np.ones(nsamps,dtype='complex')
for k in range(len(freqs)):
    omega, Hnew = submitted.todo_freqresponse(a[:,k],b[:,k],nsamps)
    H *= Hnew
#H = solutions['H'][:]
fig=plt.figure(figsize=(14,4))
ax=fig.subplots()
plt.plot(f[f<400],np.abs(H[f<400]))
Out[77]:
[<matplotlib.lines.Line2D at 0x7fd1635d0d60>]

OK, now let's actually run all four filters, one after the other, and take a look at the resulting spectrum and signal, to see if the noise has been removed.

To do this, we'll do a series connection of all four filters. Since we're allowed to assume that $a_{0}=1$ and $b_0=1$, we can implement each filter like this:

$$y[n] = x[n]+b_{1}x[n-1]+b_{2}x[n-2]-a_{1}y[n-1]-a_{2}y[n-2]$$
In [79]:
importlib.reload(submitted)
help(submitted.todo_filter)
Help on function todo_filter in module submitted:

todo_filter(x, a, b)
    Input: 
    a (3) - feedback coefficients.  You may assume a[0]=1.
    b (3) - feedforward coefficients.  You may assume b[0]=1.
    x (N) - input waveform
    Output: 
    y (N) - output after being filtered using B(z)/A(z)
      Assume that x[n]==0 for n<0.
      Do not generate samples of y[n] for n >= N.

Then, in order to implement the series connection, we just feed the output of each filter back into the input of the next filter, like this:

$$x[n]\rightarrow H_1(z)\rightarrow H_2(z)\rightarrow H_3(z)\rightarrow H_4(z)\rightarrow y[n]$$
In [81]:
importlib.reload(submitted)
x_tmp = x
for k in range(len(freqs)):
    y_tmp = submitted.todo_filter(x_tmp,a[:,k],b[:,k])
    x_tmp = y_tmp
    
y = y_tmp
    
#y = solutions['y']
print('Last five samples of y  are',y[-6:-1])
Last five samples of y  are [-373.09300398 -357.83001544 -342.58330979 -323.41297782 -304.5079235 ]

Let's plot the spectrum of $y[n]$, to see how much of the noise we were able to remove.

In [83]:
importlib.reload(submitted)
yspec=submitted.todo_spectrum(y)
fig=plt.figure(figsize=(14,4))
ax=fig.subplots()
ax.plot(f[f<500],yspec[f<500])
Out[83]:
[<matplotlib.lines.Line2D at 0x7fd16373e6a0>]
In [84]:
t = np.linspace(0,nsamps/fs,nsamps)
fig=plt.figure(figsize=(14,4))
ax=fig.subplots()
ax.plot(t[(0.5<t)&(t<1.1)],y[(0.5<t)&(t<1.1)])
Out[84]:
[<matplotlib.lines.Line2D at 0x7fd163776820>]

Part 4: Resonance¶

Bells, chimes, and similar musical instruments are created from bits of metal or wood that are each carved so that when a musician hits the bell with a stick, it rings at all of the resonant frequencies, all at once. Usually, all of the resonant frequencies have different decay times, so that the spectral composition of the bell changes, subtly, as it decays.

The sound of a stick hitting something a hard object is pretty well described by $x[n]=\delta[n]$. The sound of a bell being rung is therefore pretty well described as $h[n]$, its impulse response.

The sum of the different resonances can be described as a parallel system combination, thus

$$H(z) = \sum_k H_k(z)$$

where

$$H_k(z) = \frac{1}{1-a_{1,k}z^{-1}-a_{2,k}z^{-2}}$$

The last part of this MP is to create the sound of a bell:

In [86]:
importlib.reload(submitted)
help(submitted.todo_bell)
Help on function todo_bell in module submitted:

todo_bell(sample_rate, frequencies, decay_times, duration)
    x = todo_bell(sample_rate, frequencies, decay_times, duration)
    Generate the sound of a bell ringing at a particular frequency.
    
    Input:
    sample_rate (integer) - sampling rate, in samples/second
    frequencies (nfreqs) - frequencies of the bell, in Hertz
    decay_times (nfreqs) - times after onset (in seconds) at which each tone decays to exp(-1)
    duration (float) - duration of x[n], in seconds
    
    Output:
    x (int(sample_rate*duration)) - the sound of the bell, given by
      x[n] = sum over k of (1/sin(omega[k])) exp(-sigma[k]*n) sin(omega[k]*(n+1))
    for 0 <= n <= int(sample_rate*duration), 
    and for omega[k] and sigma[k] determined from sample_rate, frequencies[k], and decay_times[k].

Note that you can implement todo_bell almost entirely by calling functions that you have already written, or alternatively, you can cut and paste the appropriate bits of code.

The only important new concept, here, is the idea of decay time. Remember that the impulse response of a resonator is: $$h_k[n] = \frac{1}{\sin(\omega_k)} e^{-\sigma_k n}\sin\left(\omega_k (n+1)\right) u[n],$$ where $$\omega_k = \frac{2\pi f_k}{F_s},~~~\sigma_k=\frac{\pi B_k}{F_s}$$

The decay_time is defined as the time, in seconds, such that $e^{-\sigma_k n}=e^{-1}$. So, depending on whether you are implementing todo_bell by calling todo_zeros, or by directly calculating p[0,k]=exp(-sigma[k]+1j*omega[k]), you will either need to convert the decay_times into BW or directly into sigma.

In [89]:
importlib.reload(submitted)
bell = submitted.todo_bell(fs, [220,660], [0.5,1.25], 5.0)
In [90]:
import IPython.display as ipd
ipd.Audio(bell, rate=fs)
Out[90]:
Your browser does not support the audio element.

Extra Credit: Formant Synthesis of Speech¶

The extra credit assignment, for this MP, is the formant synthesis of a vowel.

A vowel is created by passing a periodic excitation signal (modeling the opening and closing of the vocal folds) through a series of formant resonators (modeling the resonant frequencies of the pharynx and mouth). The simplest effective excitation signal is an impulse train with a period of N0=int(sample_rate/pitch), thus

$$e[n] = \sum_m \delta[n-m N_0]$$

The vowel is synthesized by passing $e[n]$ through a series of formant resonators,

$$e[n]\rightarrow\frac{1}{G(z)}\rightarrow\frac{1}{A_1(z)}\rightarrow \frac{1}{A_2(z)}\rightarrow \frac{1}{A_3(z)}\rightarrow s[n]$$

Glottal Smoother¶

The vocal folds don't open and close instantaneously; they have some mass. The glottal smoother represents this delay with the filter $$G(z) = 1-gz^{-1}$$ where the glottal pole is $$g=e^{-\frac{\pi B_g}{F_s}}$$ The glottal bandwidth, $B_g$, is specified by the parameter glottalbw.

Formant Resonators¶

Most of the information in the vowel is provided by formant resonators. The $k^{\textrm{th}}$ formant resonator has the following denominator: $$A_k(z) = (1-p_kz^{-1})(1-p_k^*z^{-1}) = 1+a_{1,k}z^{-1}+a_{2,k}z^{-2}$$

The poles are given in terms of the formant frequencies and bandwidths as

$$p_k = e^{-\sigma_k+j\omega_k}$$

where

$$\sigma_k = \frac{\pi B_k}{F_s},~~~\omega_k=\frac{2\pi F_k}{F_s}$$
In [151]:
import extra
importlib.reload(extra)
help(extra.todo_speech)
Help on function todo_speech in module extra:

todo_speech(sample_rate, pitch, glottalbw, formants, bandwidths, duration)
    speech = todo_speech(sample_rate, pitch, formants, bandwidths, duration)
    Synthesize a vowel with specified pitch frequency, formants, and formant bandwidths.
    
    Input:
    sample_rate (integer) - sampling rate, in samples/second
    pitch (integer) - pitch frequency, in Hertz
    glottalbw (integer) - bandwidth of the glottal smoother, in Hertz
    glottal (integer) - frequency of the glottal pole, in Hertz
    formants (nfreqs) - formant frequencies, in Hertz
    bandwidhts (nfreqs) - formant bandwidths, in Hertz
    duration (float) - duration of x[n], in seconds
    
    Output:
    speech (int(sample_rate*duration)) - the synthetic vowel, created by generating an impulse
      train with a period of int(sample_rate/pitch), then filtering it through 
      the series connection of the formant resonators.

In [153]:
importlib.reload(extra)
sample_rate = 16000
duration = 0.5
glottal = 100
formants = [900, 1100, 2700, 3500]
bandwidths = [100, 200, 300, 700]
pitch = 125
speech = extra.todo_speech(sample_rate,pitch, glottal, formants, bandwidths,duration)
print('Speech signal has length',len(speech),'and its first samples are',speech[:10])
Speech signal has length 8000 and its first samples are [  1.           5.82821455  17.63004223  36.73932313  59.39404009
  79.97981055  94.37249811 100.98142356  99.24151263  88.20880221]
In [154]:
t = np.linspace(0,duration,int(duration*sample_rate))
print(len(t))
fig=plt.figure(figsize=(14,4))
ax=fig.subplots()
ax.plot(t, speech)
8000
Out[154]:
[<matplotlib.lines.Line2D at 0x7fd16338e4c0>]
In [159]:
import IPython.display as ipd
ipd.Audio(speech, rate=sample_rate)
Out[159]:
Your browser does not support the audio element.

If you have a little extra time, consider testing the different formant frequencies listed here: https://en.wikipedia.org/wiki/Formant

In [ ]: