Project

# Title Team Members TA Documents Sponsor
33 HelpMeRecall
Michael Jiang
Sravya Davuluri
William Li
# HelpMeRecall

Team Members:
- Sravya Davuluri (sravyad2)
- William Li (wli202)
- Michael Jiang (mbjiang2)

# Problem

Many individuals have difficulty remembering recent activities and completing routine tasks like eating or taking medication.

# Solution

A standalone assistive device that supports activity recall using sensor-gated voice interaction. It allows users to verbally log activities they have completed, and later query if a specific activity has been performed. It uses an onboard microphone and on-device audio processing on a microcontroller to perform keyword detection.

This device is always on and will be verifiable with an LED, but the voice input is only accepted if the device is worn (capacitive touch sensor) and specific words from a limited vocabulary is said to avoid accidental logging. To address the possibility of reduced correct detection of supported keywords, we will have various keywords targeted for an activity. So in the case of taking medicine, it might be medicine, medication, pill, drug, and prescription. This also simplifies the problem and prevents confidence rate issues. To validate a completed action, the action is logged only if an accelerometer detects physical movement around the time in order to reduce false logging. If a voice log is accepted, haptic feedback is provided by the device. Activities are also timestamped and stored in local memory. If the device notes that a specific activity has been completed, it affirms it including the timestamp using an integrated speaker.

The logs reset at midnight automatically since the activities repeat on the daily. There is also an option of a hard reset button to clear logs. There will also be a button to delete the latest log in case of a logging mistake by the user.

# Solution Components

## Subsystem 1: Microcontroller Unit and Controls

Acts as the central unit for logic. Manages the sensor inputs, and executes a finite state machine. The FSM states are start, idle, listening, logging, and replying.

Components: ESP32-S3-WROOM-1

## Subsystem 2: Audio input processing unit

Captures the voice input from the user and performs keyword detection on a limited vocabulary, where each action can be mapped to multiple set keywords to improve detection.

Components: Digital MEMS microphone (INMP441), ESP32-S3-WROOM-1

## Subsystem 3: Sensor gating and activity validation

Uses a capacitive touch sensor and an accelerometer to detect motion, which ensures that voice input is only received and accepted if the device is worn and recent movement is detected by the accelerometer instead of continuous voice recognition. A "cooldown" period is enforced where the microphone will be disabled for 10 seconds if there's motion but no logging during the listening period multiple times in a row to help conserve some battery.

Components: Capacitive touch sensor (AT42QT1010), Accelerometer (MPU-6050)

## Subsystem 4: Feedback and Output

Uses a speaker for audio feedback as a response to the user’s query. This subsystem also provides haptic feedback as an indication of an accepted user voice log. To indicate if the device is on, the LED is green. If the device is listening, the LED is yellow. If the device is low on power, the LED will be red.

Components: Speaker (8 ohm speaker), amplifier (MAX98357A), coin vibration motor, transistor (2N3904), RGB LED

## Subsystem 5: Time logging and local storage

Stores the activity voice logs along with timestamps. Allows automatic reset at midnight to support daily repetitive tasks. Timekeeping is done using ESP32’s internal RTC.

Components: ESP32-S3-WROOM-1

## Subsystem 6: Power

Supplies power to the device.

Components: Battery (Li-Po battery)

# Criterion For Success
- Correctly detects supported keywords with an accuracy of at least 80% in a quiet environment
- Device will only log upon verifying physical activity and hearing a keyword from the user within a 5 second window
- Upon successful logging, the speaker will output audibly and haptic feedback can be felt by the user with a 2 second vibration
- While querying logs, speaker will output and LED will be solid
- Logs will be automatically cleared at midnight and can be manually reset with the reset button
- Latest log will be deleted upon pushing a separate button
- LED stays solid while device is powered
- False log rate < 1 per hour in normal conversation when worn.

Smart Glasses for the Blind

Siraj Khogeer, Abdul Maaieh, Ahmed Nahas

Smart Glasses for the Blind

Featured Project

# Team Members

- Ahmed Nahas (anahas2)

- Siraj Khogeer (khogeer2)

- Abdulrahman Maaieh (amaaieh2)

# Problem:

The underlying motive behind this project is the heart-wrenching fact that, with all the developments in science and technology, the visually impaired have been left with nothing but a simple white cane; a stick among today’s scientific novelties. Our overarching goal is to create a wearable assistive device for the visually impaired by giving them an alternative way of “seeing” through sound. The idea revolves around glasses/headset that allow the user to walk independently by detecting obstacles and notifying the user, creating a sense of vision through spatial awareness.

# Solution:

Our objective is to create smart glasses/headset that allow the visually impaired to ‘see’ through sound. The general idea is to map the user’s surroundings through depth maps and a normal camera, then map both to audio that allows the user to perceive their surroundings.

We’ll use two low-power I2C ToF imagers to build a depth map of the user’s surroundings, as well as an SPI camera for ML features such as object recognition. These cameras/imagers will be connected to our ESP32-S3 WROOM, which downsamples some of the input and offloads them to our phone app/webpage for heavier processing (for object recognition, as well as for the depth-map to sound algorithm, which will be quite complex and builds on research papers we’ve found).

---

# Subsystems:

## Subsystem 1: Microcontroller Unit

We will use an ESP as an MCU, mainly for its WIFI capabilities as well as its sufficient processing power, suitable for us to connect

- ESP32-S3 WROOM : https://www.digikey.com/en/products/detail/espressif-systems/ESP32-S3-WROOM-1-N8/15200089

## Subsystem 2: Tof Depth Imagers/Cameras Subsystem

This subsystem is the main sensor subsystem for getting the depth map data. This data will be transformed into audio signals to allow a visually impaired person to perceive obstacles around them.

There will be two Tof sensors to provide a wide FOV which will be connected to the ESP-32 MCU through two I2C connections. Each sensor provides a 8x8 pixel array at a 63 degree FOV.

- x2 SparkFun Qwiic Mini ToF Imager - VL53L5CX: https://www.sparkfun.com/products/19013

## Subsystem 3: SPI Camera Subsystem

This subsystem will allow us to capture a colored image of the user’s surroundings. A captured image will allow us to implement egocentric computer vision, processed on the app. We will implement one ML feature as a baseline for this project (one of: scene description, object recognition, etc). This will only be given as feedback to the user once prompted by a button on the PCB: when the user clicks the button on the glasses/headset, they will hear a description of their surroundings (hence, we don’t need real time object recognition, as opposed to a higher frame rate for the depth maps which do need lower latency. So as low as 1fps is what we need). This is exciting as having such an input will allow for other ML features/integrations that can be scaled drastically beyond this course.

- x1 Mega 3MP SPI Camera Module: https://www.arducam.com/product/presale-mega-3mp-color-rolling-shutter-camera-module-with-solid-camera-case-for-any-microcontroller/

## Subsystem 4: Stereo Audio Circuit

This subsystem is in charge of converting the digital audio from the ESP-32 and APP into stereo output to be used with earphones or speakers. This included digital to audio conversion and voltage clamping/regulation. Potentially add an adjustable audio option through a potentiometer.

- DAC Circuit

- 2*Op-Amp for Stereo Output, TLC27L1ACP:https://www.ti.com/product/TLC27L1A/part-details/TLC27L1ACP

- SJ1-3554NG (AUX)

- Connection to speakers/earphones https://www.digikey.com/en/products/detail/cui-devices/SJ1-3554NG/738709

- Bone conduction Transducer (optional, to be tested)

- Will allow for a bone conduction audio output, easily integrated around the ear in place of earphones, to be tested for effectiveness. Replaced with earphones otherwise. https://www.adafruit.com/product/1674

## Subsystem 5: App Subsystem

- React Native App/webpage, connects directly to ESP

- Does the heavy processing for the spatial awareness algorithm as well as object recognition or scene description algorithms (using libraries such as yolo, opencv, tflite)

- Sends audio output back to ESP to be outputted to stereo audio circuit

## Subsystem 6: Battery and Power Management

This subsystem is in charge of Power delivery, voltage regulation, and battery management to the rest of the circuit and devices. Takes in the unregulated battery voltage and steps up or down according to each components needs

- Main Power Supply

- Lithium Ion Battery Pack

- Voltage Regulators

- Linear, Buck, Boost regulators for the MCU, Sensors, and DAC

- Enclosure and Routing

- Plastic enclosure for the battery pack

---

# Criterion for Success

**Obstacle Detection:**

- Be able to identify the difference between an obstacle that is 1 meter away vs an obstacle that is 3 meters away.

- Be able to differentiate between obstacles on the right vs the left side of the user

- Be able to perceive an object moving from left to right or right to left in front of the user

**MCU:**

- Offload data from sensor subsystems onto application through a wifi connection.

- Control and receive data from sensors (ToF imagers and SPI camera) using SPI and I2C

- Receive audio from application and pass onto DAC for stereo out.

**App/Webpage:**

- Successfully connects to ESP through WIFI or BLE

- Processes data (ML and depth map algorithms)

- Process image using ML for object recognition

- Transforms depth map into spatial audio

- Sends audio back to ESP for audio output

**Audio:**

- Have working stereo output on the PCB for use in wired earphones or built in speakers

- Have bluetooth working on the app if a user wants to use wireless audio

- Potentially add hardware volume control

**Power:**

- Be able to operate the device using battery power. Safe voltage levels and regulation are needed.

- 5.5V Max

Project Videos