In this file, we're going to use the inverse DFT to create images from MRI spectral measurements. Then we'll use the DFT to filter out noise in an image.
import os, h5py
import numpy as np
import matplotlib.figure
import matplotlib.pyplot as plt
%matplotlib inline
import importlib
import submitted
The data are provided for you in the file data.hdf5. Let's load it, and see what it contains.
with h5py.File('data.hdf5','r') as f:
print('data.hdf5 has the keys',f.keys())
mri_dft = f['mri_dft'][:]
image = f['image'][:]
noisy_image = f['noisy_image'][:]
print('mri_dft is an array of shape',mri_dft.shape,'and dtype',mri_dft.dtype)
print('image is an array of shape',image.shape,'and dtype',image.dtype)
print('noisy_image is an array of shape',noisy_image.shape,'and dtype',noisy_image.dtype)
data.hdf5 has the keys <KeysViewHDF5 ['image', 'mri_dft', 'noisy_image']> mri_dft is an array of shape (1114, 962) and dtype complex128 image is an array of shape (213, 320, 3) and dtype float64 noisy_image is an array of shape (213, 320, 3) and dtype float64
MRI (magnetic resonance images) are acquired in the frequency domain. It works roughly like this:
The image you are provided, mri_dft, is a set of complex-valued frequency samples from a real MRI image (a public domain image from Wikipedia, donated by Johns Hopkins University). Let's try taking the inverse DFT, to see if we can see the image.
fig,ax = plt.subplots(figsize=(10,8))
ax.imshow(np.maximum(0,np.real(np.fft.ifft2(mri_dft))), cmap='gray')
<matplotlib.image.AxesImage at 0x7fa75d9b7400>
The aliasing we see, here, is caused by capturing only $N_1\times N_2$ frequency samples, but computing the DFT using a spatial range $M_1\times M_2$ such that $M_1>N_1$ and/or $M_2>N_2$. By doing that, we accidentally discover that the DFT is periodic in space, not just in frequency:
$$x[N+n] = \frac{1}{N}\sum_k X[k]e^{j\frac{2\pi k(n+N)}{N}} = \frac{1}{N}\sum_k X[k]e^{j\frac{2\pi kn}{N}} =x[n]$$In order to solve this problem, we need to figure out how many of the frequency samples in mri_dft are actually nonzero-valued. To get an idea, let's try plotting the first row:
fig, ax = plt.subplots(figsize=(14,4))
ax.stem(np.abs(mri_dft[4,0:200]))
<ipython-input-11-14dc0ff74da3>:2: UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a LineCollection instead of individual lines. This significantly improves the performance of a stem plot. To remove this warning and switch to the new behaviour, set the "use_line_collection" keyword argument to True. ax.stem(np.abs(mri_dft[4,0:200]))
<StemContainer object of 3 artists>
It looks like maybe the odd-numbered spectral samples are zero, and only the even-numbered spectral samples are valid. Let's try taking the inverse fft of only the even-numbered samples.
fig,ax = plt.subplots(figsize=(10,8))
downsampling_factor = 2
ax.imshow(np.real(np.fft.ifft2(mri_dft[::downsampling_factor,::downsampling_factor])), cmap='gray')
<matplotlib.image.AxesImage at 0x7fa7437752e0>
That's the right size, but now the whole image has been shifted by about $N_1/2$ rows, and by about $N_2/2$ columns! One way to solve this is by multiplying the DFT by a shift operator, before we inverse DFT:
$$X_{\mbox{corrected}}[k_1,k_2] = \begin{cases} X[k_1,k_2] e^{-j\frac{2\pi k_1d_1}{N_1}}e^{-j\frac{2\pi k_2d_2}{N_2}} & 0\le k_1<\frac{N_1}{2},~0\le k_2<\frac{N_2}{2}\\ X[k_1,k_2] e^{-j\frac{2\pi (k_1-N_1)d_1}{N_1}}e^{-j\frac{2\pi (k_2-N_2)d_2}{N_2}} &\frac{N_1}{2}\le k_1<N_1,~\frac{N_2}{2}\le k_2<N_2 \end{cases} $$where $d_1$ is the number of rows that we want to shift, and $d_2$ is the number of columns that we want to shift. Notice that the second line is actually equal to the first line, in theory (because $e^{-j\frac{2\pi N_1d_1}{N_1}}=1$), but implementing it in the way shown above will help to avoid artifacts caused by floating-point roundoff error. Notice that the formula above only lists two of the four cases; I think you can figure out what the other two cases should be.
importlib.reload(submitted)
help(submitted.downsample_and_shift_dft2)
Help on function downsample_and_shift_dft2 in module submitted:
downsample_and_shift_dft2(oversampled_dft, downsampling_factor, row_shift, col_shift)
Input:
oversampled_dft [M1,M2] - a 2d array containing the oversampled DFT of a grayscale image
downsampling_factor (scalar) - the factor by which the DFT image is oversampled
row_shift (scalar) - the number of rows that the image should be shifted
col_shift (scalar) - the number of columns that the image should be shifted
Output:
image [M1/downsampling_factor, M2/downsampling_factor] - the real part of the inverse DFT
of the valid frequency samples, shifted by the specified numbers of rows and columns.
importlib.reload(submitted)
N1, N2 = mri_dft.shape
print('The original MRI_DFT shape is',N1,N2)
image = submitted.downsample_and_shift_dft2(mri_dft,2,N1/4,N2/4)
print('The downsampled image has shape', image.shape)
fig, ax = plt.subplots(figsize=(10,8))
ax.imshow(np.maximum(0,np.minimum(255,image)),cmap='gray')
The original MRI_DFT shape is 1114 962 The downsampled image has shape (557, 481)
<matplotlib.image.AxesImage at 0x7fa743735f10>
Filtering out noise is easier in the time domain, if the noise is wideband noise (meaning that there is noise in a wide range of frequencies). Sometimes, though, the noise is in a narrow range of frequencies. For example, consider this image (this is the image "Clifton Beach 5, South Arm, Tasmania, Australia" by JJ Harrison, distributed on Wikimedia under Gnu Free Documentation License):
with h5py.File('data.hdf5','r') as f:
image = f['image'][:]
noisy_image = f['noisy_image'][:]
fig, ax = plt.subplots(1,2,figsize=(14,8))
ax[0].imshow(image)
ax[1].imshow(noisy_image)
<matplotlib.image.AxesImage at 0x7fa7437bfac0>
You can see that the noisy image has been corrupted with some type of random stripes. Let's see what the DFT looks like.
Normally, it might be hard to figure out where the noise is. Since we have the original image, though, we can calculate the signal-to-noise ratio in each FFT bin.
fig, axs = plt.subplots(4,1,figsize=(14,8))
image_dft = np.fft.fft2(image,axes=(0,1))
noisy_dft = np.fft.fft2(noisy_image, axes=(0,1))
noise = noisy_dft - image_dft
M,N,K = noisy_dft.shape
#axs[0].plot(np.abs(image_dft[:,0,1]/noise[:,0,1]))
axs[0].plot(20*np.log10(np.abs(image_dft[:,0,1]/noise[:,0,1])))
axs[0].plot(np.arange(M),np.zeros(M),'k--')
axs[0].set_title('SNR as a function of bin, Green Channel, across rows')
#axs[1].plot(np.abs(image_dft[0,:,1]/noise[0,:,1]))
axs[1].plot(20*np.log10(np.abs(image_dft[0,:,1]/noise[0,:,1])))
axs[1].plot(np.arange(N),np.zeros(N),'k--')
axs[1].set_title('SNR as a function of bin, Green Channel, across columns')
#axs[2].stem(np.abs(image_dft[:20,0,1]/noise[:20,0,1]))
axs[2].stem(20*np.log10(np.abs(image_dft[:20,0,1]/noise[:20,0,1])))
axs[2].set_title('SNR as a function of bin, Green Channel, across rows, zoomed in')
#axs[3].stem(np.abs(image_dft[0,:40,1]/noise[0,:40,1]))
axs[3].stem(20*np.log10(np.abs(image_dft[0,:40,1]/noise[0,:40,1])))
axs[3].set_title('SNR as a function of bin, Green Channel, across columns, zoomed in')
fig.tight_layout()
<ipython-input-65-5f36d00a241a>:15: UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a LineCollection instead of individual lines. This significantly improves the performance of a stem plot. To remove this warning and switch to the new behaviour, set the "use_line_collection" keyword argument to True. axs[2].stem(20*np.log10(np.abs(image_dft[:20,0,1]/noise[:20,0,1]))) <ipython-input-65-5f36d00a241a>:18: UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a LineCollection instead of individual lines. This significantly improves the performance of a stem plot. To remove this warning and switch to the new behaviour, set the "use_line_collection" keyword argument to True. axs[3].stem(20*np.log10(np.abs(image_dft[0,:40,1]/noise[0,:40,1])))
fig, axs = plt.subplots(figsize=(8,6))
print(noise.shape)
axs.imshow(np.abs(noise).astype('int'))
axs.set_xlabel('$k_2$')
axs.set_ylabel('$k_1$')
axs.set_title('Spatial Magnitude DFT of the Noise (noisy_image - image)')
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
(213, 320, 3)
Text(0.5, 1.0, 'Spatial Magnitude DFT of the Noise (noisy_image - image)')
It looks like the signal-to-noise ratio is lowest in the bands $16\le k_1< 19$, $N_1-19<k_1\le N_1-16$, $29\le k_2<32$, and $N_2-32<k_2\le N_2-29$.
We could get rid of this noise by using an ideal bandstop filter, as in MP4. Instead, this time, let's try to get rid of it using the DFT.
Let's try to get rid of the noise by just zeroing out those channels. Implement the dft_filter function, with the following description:
where $\mbox{min}_1$, $\mbox{max}_1$, $\mbox{min}_2$, and $\mbox{max}_2$ are band edges.
This is the same as finding $Y[k]=H[k]X[k]$ where
$$H[k_1,k_2]=\begin{cases} 0 & \mbox{min}_1\le (k_1~\mbox{or}~N_1-k_1)<\mbox{max}_1\\ 0 & \mbox{min}_2\le (k_2~\mbox{or}~N_2-k_2)<\mbox{max}_2\\ 1 & \mbox{otherwise} \end{cases} $$importlib.reload(submitted)
help(submitted.dft_filter)
Help on function dft_filter in module submitted:
dft_filter(noisy_image, min1, max1, min2, max2)
Input:
noisy_image [N1,N2] - an image with narrowband noises
min1, max1 (scalars) - zero out all rows of the DFT min1 <= k1 < max1, likewise for N1-k1
min2, max2 (scalars) - zero out all cols of the DFT min2 <= k2 < max2, likewise for N2-k2
Outut:
cleaned_image [N1,N2] - image with the corrupted bands removed.
Be sure to take the real part of the inverse DFT, and then truncate
so that 0 <= cleaned_image[n1,n2,color] <= 1 for all n1,n2,color.
importlib.reload(submitted)
cleaned_image = submitted.dft_filter(noisy_image,16,19,29,32)
fig, ax = plt.subplots(1,2,figsize=(14,8))
ax[0].imshow(image)
ax[1].imshow(cleaned_image)
<matplotlib.image.AxesImage at 0x7fa7482fe850>
As shown above, mirroring the image reduces the strong stripe effects on the right-hand side, but it seems to strengthen the ripple artifacts throughout the rest of the image.
Remember that, when we were using ideal filters, similar ripple artifacts were caused by truncating the impulse response using a rectangular window. The way we avoided ripple artifacts then was by using a Hamming window instead.
One important difference between a Hamming window and a rectangular window is that the Hamming window has a wider transition band. So let's introduce a transition band into our filter:
$$X_{\mbox{corrected}}[k_1,k_2]=\begin{cases} 0 & \mbox{min}_1\le (k_1~\mbox{or}~N_1-k_1)<\mbox{max}_1\\ 0 & \mbox{min}_2\le (k_2~\mbox{or}~N_2-k_2)<\mbox{max}_2\\ \frac{1}{2}X[k_1,k_2] & k_1\in\left\{\mbox{min}_1-1,\mbox{max}_1\right\}\\ \frac{1}{2}X[k_1,k_2] & k_2\in\left\{\mbox{min}_2-1,\mbox{max}_2\right\}\\ \frac{1}{2}X[k_1,k_2] & N_1-k_1\in\left\{\mbox{min}_1-1,\mbox{max}_1\right\}\\ \frac{1}{2}X[k_1,k_2] & N_2-k_2\in\left\{\mbox{min}_2-1,\mbox{max}_2\right\}\\ X[k_1,k_2] & \mbox{otherwise} \end{cases} $$This is the same as finding $Y[k]=H[k]X[k]$ where
$$H[k_1,k_2]=\begin{cases} 0 & \mbox{min}_1\le (k_1~\mbox{or}~N_1-k_1)<\mbox{max}_1\\ 0 & \mbox{min}_2\le (k_2~\mbox{or}~N_2-k_2)<\mbox{max}_2\\ \frac{1}{2} & k_1\in\left\{\mbox{min}_1-1,\mbox{max}_1\right\}\\ \frac{1}{2} & k_2\in\left\{\mbox{min}_2-1,\mbox{max}_2\right\}\\ \frac{1}{2} & N_1-k_1\in\left\{\mbox{min}_1-1,\mbox{max}_1\right\}\\ \frac{1}{2} & N_2-k_2\in\left\{\mbox{min}_2-1,\mbox{max}_2\right\}\\ 1 & \mbox{otherwise} \end{cases} $$importlib.reload(submitted)
help(submitted.transitioned_filter)
Help on function transitioned_filter in module submitted:
transitioned_filter(noisy_image, min1, max1, min2, max2)
Input:
noisy_image [N1,N2] - an image with narrowband noises
min1, max1 (scalars) - zero out all rows of the DFT min1 <= k1 < max1, likewise for N1-k1
min2, max2 (scalars) - zero out all cols of the DFT min2 <= k2 < max2, likewise for N2-k2
Outut:
cleaned_image [N1,N2] - image with the corrupted bands removed.
Be sure to take the real part of the inverse DFT, and then truncate
so that 0 <= cleaned_image[n1,n2,color] <= 1 for all n1,n2,color.
Transition band:
the bands k1=min1-1, k1=max1, k2=min2-1, and k2=max2 should be set to half of their
original values, 0.5*X[k1,k2].
importlib.reload(submitted)
transitioned_image = submitted.transitioned_filter(noisy_image,16,19,29,32)
fig, ax = plt.subplots(1,2,figsize=(14,10))
ax[0].imshow(cleaned_image)
ax[1].imshow(transitioned_image)
<matplotlib.image.AxesImage at 0x7fa7482f5760>
TBH, it's a little hard to tell which cleaned-up image is best. Let's calculate their signal-to-noise ratios, in decibels.
error1 = cleaned_image - image
SNR1 = 20*np.log10(np.sum(np.square(image))/np.sum(np.square(error1)))
error2 = transitioned_image - image
SNR2 = 20*np.log10(np.sum(np.square(image))/np.sum(np.square(error2)))
print('w/o Transition band we get SNR=%.1f dB, w/Transition band, %.1f dB'%(SNR1,SNR2))
w/o Transition band we get SNR=42.5 dB, w/Transition band, 43.1 dB
It looks like the filter with the transition band has very slightly higher SNR.
Now let's try using overlap-add to filter a very long audio.
First, we need to zero-pad the input and the filter. Suppose you have:
Then the function zero-pad should return $h[n]$ and $x[n]$, both zero-padded to $N=L+M-1$ samples.
importlib.reload(submitted)
help(submitted.zero_pad)
Help on function zero_pad in module submitted:
zero_pad(h, x)
(hp,xp) = zero_pad(h,x)
Input:
h [L] - a length-L impulse response array
x [M] - a length-M signal array
Return:
hp [N] - the same h, but zero-padded to a length of N=L+M-1
xp [N] - the same x, but zero-padded to a length of N=L+M-1
Let's create a lowpass filter for $h[n]$, with a cutoff of $0.075\pi$ radians/sample, and let's say a length of 256 samples. For $x[n]$, let's just add two sine waves, one with a frequency of $0.05\pi$, one with a frequency of $0.1\pi$.
h = 0.075*np.sinc(0.075*(np.arange(256)-127.5)) * np.hamming(256)
x = np.sin(0.05*np.pi*np.arange(500))+np.sin(0.1*np.pi*np.arange(500))
hp, xp = submitted.zero_pad(h,x)
fig, ax = plt.subplots(2,2,figsize=(14,10))
ax[0,0].plot(h)
ax[0,1].plot(hp)
ax[1,0].plot(x)
ax[1,1].plot(xp)
[<matplotlib.lines.Line2D at 0x7fa74be33580>]
Now let's try multiplying their FFTs. The resulting $y[n]$ should still contain the sine wave at $0.05\pi$, but the sine wave at $0.1\pi$ should be mostly filtered out.
X = np.fft.fft(xp)
H = np.fft.fft(hp)
Y = X*H
y = np.real(np.fft.ifft(Y))
fig, ax = plt.subplots(1,2,figsize=(14,5))
ax[0].plot(y)
ax[0].set_title('Result of ifft(X[k]H[k])')
ax[1].plot(np.convolve(x,h))
ax[1].set_title('Result of x[n]*h[n]')
Text(0.5, 1.0, 'Result of x[n]*h[n]')
Let's download an audio file, add some noise to it, and then try to filter the worst of the noise out.
import urllib.request
example_url = "https://catalog.ldc.upenn.edu/desc/addenda/LDC93S1.wav"
webdata = urllib.request.urlopen(example_url).read()
f1 = open("webdata.wav", "wb")
f1.write(webdata)
f1.close()
Go ahead and listen to the file webdata.wav if you want to. Now you should go to a terminal and type
pip install soundfile
to install soundfile, then use that to read the audio data back in as .wav:
import soundfile as sf
speech_wave, speech_rate = sf.read("webdata.wav")
import IPython
IPython.display.Audio(data=speech_wave, rate=speech_rate)
Let's add noise, at an SNR of 0dB (noise is the same power as the speech):
import numpy as np
speech_power = np.average(np.square(speech_wave))
noisy_speech = speech_wave + np.sqrt(speech_power)*np.random.randn(len(speech_wave))
IPython.display.Audio(data=noisy_speech,rate=speech_rate)
Let's plot the SNR as a function of frequency, to see which frequencies we should keep.
import matplotlib.pyplot as plt
noise = noisy_speech - speech_wave
noise_dft = np.fft.fft(noise)
speech_dft = np.fft.fft(speech_wave)
SNR = 20*np.log10(np.abs(speech_dft)/np.abs(noise_dft))
frequency_axis = np.linspace(0,2*np.pi,len(speech_dft))
fig, ax = plt.subplots(1,2,figsize=(14,5))
ax[0].plot(frequency_axis, SNR)
ax[0].set_title('SNR vs. omega')
N = len(speech_dft)
ax[1].plot(frequency_axis[0:int(N/2)]/np.pi, SNR[0:int(N/2)])
ax[1].set_title('SNR vs. omega/pi')
Text(0.5, 1.0, 'SNR vs. omega/pi')
It looks like the SNR drops to about 0dB somewhere near $0.6\pi$ radians/sample, so let's create a LPF at that frequency, and try filtering the speech using np.convolve. Since the speech waveform is so long, that might take a long time, so we'll use time.time() to time the operations.
h =0.6*np.sinc(0.6*(np.arange(256)-127.5)) * np.hamming(256)
import time
start_time = time.time()
y_using_convolution = np.convolve(noisy_speech,h)
end_time = time.time()
print('That took', end_time-start_time, 'seconds')
IPython.display.Audio(data=y_using_convolution,rate=speech_rate)
That took 0.0051119327545166016 seconds
Now let's try using overlap-add:
import importlib
importlib.reload(submitted)
help(submitted.overlap_add)
Help on function overlap_add in module submitted:
overlap_add(h, x, M)
y = overlap_add(h, x, M)
Input:
h [L] - a length-L impulse response array
x - a very long array containing the signal x
M (scalar) - the length of frames into which x should be chopped
Output:
y - the result of filtering x by h using overlap-add
First, zero-pad h to a length of N samples, where N=L+M-1.
Second, compute its DFT, H.
Third, chop x into frames of length M.
There should be NF of these; the last one might have fewer than M samples.
Fourth, prepare the output array, y, of length equal to NF*M + L - 1.
Fifth, for each frame of M samples in x:
- zero-pad to a length of N
- compute its DFT
- multiply its DFT by H
- inverse DFT
- add the inverse DFT to the correct place in the y array
Return y
In order to make the FFT as efficient as possible, let's set M so that L+M-1 is a power of 2.
importlib.reload(submitted)
M = 1024 + 1 - len(h)
start_time = time.time()
y_overlap_add = submitted.overlap_add(h, noisy_speech, M)
end_time = time.time()
print('That took',end_time-start_time,'seconds')
IPython.display.Audio(data=y_overlap_add,rate=speech_rate)
That took 0.005012989044189453 seconds
If you have done this correctly, the two versions of y should be identical, except that y_overlap_add may be longer:
print('The average error is',np.average(np.abs(y_overlap_add[:len(y_using_convolution)]-y_using_convolution)))
The average error is 2.1766554318501152e-18