Project

# Title Team Members TA Documents Sponsor
80 Edge-AI based audio classifier
Ahaan Joishy
Kavin Manivasagam
Om Dhingra
Gayatri Chandran design_document1.pdf
final_paper1.pdf
proposal1.pdf
proposal2.pdf
video
Problem Overview
Most audio-based embedded systems nowadays collect large amounts of raw sensor data but, they only use simple threshold-based logic for classification. And these methods are very sensitive to noise and fail to perform accurately across various conditions. Thus, they fail without the usage of external computation/ cloud services. Thus, there’s a need of a method that can covert raw captured signals to meaningful classifications locally under tight power and memory constraints.

Solution Overview
The proposed project is an Edge-AI embedded system that can classify audio signals (e.g. – a clap, laugh, snap, stomp, speech, etc.) in real time using a simple neural net. The system will use a single sensor (an MEMS I2S digital microphone) to collect audio data. The classification will result in an LED-based output, telling the user the result. The system thus eliminates the need for cloud usage/computation and demonstrates the true strength of machine learning, even under tight constraints.

Solution Components
Sensor Subsystem:
A MEMS microphone with an I2S interface will be used to collect raw audio signals (e.g. – clap, speech, snap, etc.).
Audio will be sampled at a target rate of 16kHz, which is sufficient and the industry standard (used in voice bots and voice recognition) for speech and common environmental sounds.
We use a digital microphone because it removes the need for an analog amplifier.
Processing Subsystem:
We will use an STM32F411 microcontroller for this project. We choose this microcontroller for our project because it features 512 kB Flash and 128 kB RAM which is crucial for running the math of a neural net. Furthermore, it has built-in DSP instructions that are crucial to convert raw audio signals into a spectrogram (MFCC) in real time. Since we’re using a neural net, we also need a chip with a floating point unit (FPU) which this microcontroller has.
The signal chain (to capture signals) would be as follows:
The microphone captures audio and sends it digitally over I2S to the microcontroller, which uses DMA to quickly store the data in memory.
The audio frames are converted into MFCC features
These features are then fed into our neural-net model.
The ML pipeline would be as follows:
The obtained MFCC features are fed into our small, dense neural net for classification into predefined types.
TensorFlow Lite Micro will be used to facilitate deployment on the microcontroller without an OS/ internet connection. (We may also try to use ExecuTorch if time permits).
Model size will be kept under 20 kB to ensure real-time performance.
Power Subsystem:
We will use a 5V USB input to power the board. This will be stepped down to 3.3 V using an on-board voltage regulator.
Decoupling capacitors and filtering components will be used to reduce electrical noise that could interfere with stable operation.
Criterion for Success
Our device can classify at least 3 different sound types correctly with more than 85% accuracy on the recorded test set.
The target end-to-end latency (from sound to LED output) is less than 100 ms.
Current drawn should be under 60mA.
Test Protocol description: Our test set will consist of around 50 samples per class and shall be gathered from a variety of noisy and quiet environments. (We shall aim for our model to correctly classify 3 different sound types but, this will be extended to 5 types if time permits).
Alternatives
Many existing sound classification systems use cloud-based processing or rely on high-power computing platforms such as smartphones and computers. These methods require a continuous internet connection. Many other methods also use a threshold-based audio detection but, these can’t work accurately for different types of sounds in varying environments. Our solution differs by performing audio classification on a low-power embedded device, using a simple neural, without the usage of external computing/ complex hardware.

VoxBox Robo-Drummer

Craig Bost, Nicholas Dulin, Drake Proffitt

VoxBox Robo-Drummer

Featured Project

Our group proposes to create robot drummer which would respond to human voice "beatboxing" input, via conventional dynamic microphone, and translate the input into the corresponding drum hit performance. For example, if the human user issues a bass-kick voice sound, the robot will recognize it and strike the bass drum; and likewise for the hi-hat/snare and clap. Our design will minimally cover 3 different drum hit types (bass hit, snare hit, clap hit), and respond with minimal latency.

This would involve amplifying the analog signal (as dynamic mics drive fairly low gain signals), which would be sampled by a dsPIC33F DSP/MCU (or comparable chipset), and processed for trigger event recognition. This entails applying Short-Time Fourier Transform analysis to provide spectral content data to our event detection algorithm (i.e. recognizing the "control" signal from the human user). The MCU functionality of the dsPIC33F would be used for relaying the trigger commands to the actuator circuits controlling the robot.

The robot in question would be small; about the size of ventriloquist dummy. The "drum set" would be scaled accordingly (think pots and pans, like a child would play with). Actuators would likely be based on solenoids, as opposed to motors.

Beyond these minimal capabilities, we would add analog prefiltering of the input audio signal, and amplification of the drum hits, as bonus features if the development and implementation process goes better than expected.

Project Videos