Prelab 4 - Autocorrelation
In this prelab, you get familiarized with two common tasks in speech signal analysis: voicing determination and autocorrelation.
Refer to the Submission Instruction page for more information.
Part 1 - Voiced/Unvoiced Detector
Voiced/unvoiced signal classification is an incredibly well-studied field with a number of vetted solutions such as Rabiner's pattern recognition approach or Bachu's zero-crossing rate approach. Pitch shifting (next lab) does not require highly-accurate voiced/unvoiced detection however, so we will use a much simpler technique.
The energy of a signal can be a useful surrogate for voiced/unvoiced classification. Put simply, if a signal has enough energy, we assume it is voiced and continue our pitch analysis. The energy of a discrete-time signal is given as follows:
Using the given test speech signal and the test code given below, determine a useful threshold for and classify frames as voiced (return 1) or unvoiced (return 0). The test code will plot the results for you.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
import numpy as np import matplotlib.pyplot as plt from scipy.io.wavfile import read, write FRAME_SIZE = 2048 def ece420ProcessFrame(frame): isVoiced = 0 #### YOUR CODE HERE #### return isVoiced ################# GIVEN CODE BELOW ##################### Fs, data = read('test_vector.wav') numFrames = int(len(data) / FRAME_SIZE) framesVoiced = np.zeros(numFrames) for i in range(numFrames): frame = data[i * FRAME_SIZE : (i + 1) * FRAME_SIZE] framesVoiced[i] = ece420ProcessFrame(frame.astype(float)) plt.figure() plt.stem(framesVoiced) plt.show()
Part 2 - Autocorrelation
Autocorrelation is the process of circularly convolving a signal with itself. That is, for a real signal, the discrete autocorrelation is given as:
where is the complex conjugate of the time reversal of . The output measures how self-similar a signal is if shifted by some lag . If normalized to 1 at zero lag, this can be written equivalently as:
For a periodic signal, the lag that maximizes indicates the frequency of the signal. In other words, the signal takes samples before repeating itself. This algorithm, combined with some additional modifications to prevent harmonics from being detected, comprises the most well-known frequency estimator for speech and music.
Calculate and plot the autocorrelation of the test signal
tune using the test code below. You may not use
np.correlate() or other such functions.
Indicate the value of lag that maximizes . What is the signal frequency that corresponds to this lag?
Python test code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
import numpy as np import matplotlib.pyplot as plt fs = 8000 # Sampling Rate is 8000 duration = 1 # 1 sec t = np.linspace(0,duration,duration*fs) freq = 10 # Tune Frequency is 10 Hz tune = np.sin(2*np.pi*freq*t) # Add some Gaussian noise tune += np.random.normal(0, 0.5, duration * fs) plt.figure() plt.plot() # Start a new figure for your autocorrelation plot plt.figure() # Your code here # Only call plt.show() at the very end of the script plt.show()
Prelab 4 will be graded as follows:
Assignment 1 [1 point]
A plot of the voiced/unvoiced detector [1 point]
Assignment 2 [1 point]
A plot of the autocorrelation result [0.5 point]
Short answer question [0.5 point]