Project

# Title Team Members TA Documents Sponsor
46 Voice-Controlled Robotic Study Assistant
Jiaxuan He
Qi Jin
Shuohan Fang
Yicheng Chen
proposal1.pdf
Hua Chen
# Problem

According to the newest data of the World Health Organization (WHO), there are about 295 million people who suffer from vision impairment. And 43 million people among them are blind. Moreover, the number of people with hand disabilities is from tens of millions to hundreds of millions. With the development of assistive technologies, the quality of life of these groups of people is improving. However, there’s still a lack of an all-in-one solution. Existing page-turning devices often provide not enough interactive ways, such as keyword search and turning pages by voice command. Based on this observation, we aim to develop an auto page-turning machine with voice control and text recognition, which truly provides aid to millions of people with disabilities who are currently marginalized by traditional print media.

# Solution Overview

Our solution is a voice-controlled, autonomous reading machine that combines mechanical page manipulation with computer vision. A central microcontroller manages the physical navigation of the book. To turn a page, the system coordinates three steps: two actuated paperweights secure the book, a robotic arm with a vacuum suction cup vertically lifts the top sheet, and a motorized swing arm sweeps the page across the binding. This design allows for hands-free, bidirectional page turning.

To ensure mechanical reliability, we paired this physical setup with a fixed overhead camera and a computer vision pipeline. Because thick textbooks naturally curve near the binding, our software first applies a geometric dewarping algorithm to digitally flatten the captured images. This improves the accuracy of the Optical Character Recognition (OCR) engine. The camera also forms a closed-loop control system by reading page numbers after every turn. If the system detects that multiple pages were accidentally grabbed, the microcontroller automatically triggers a reverse-sweep recovery sequence to fix the error.

Users control the entire system using natural language voice commands via a microphone. They can request absolute or relative page navigation, such as "turn to page 45" or "go back one page." Once the system verifies it has reached the correct page, it waits for further instructions. If the user commands it to read, the system uses the extracted OCR text to provide text-to-speech (TTS) playback through an integrated speaker. Additionally, the software features a voice-activated keyword search. Users can ask the machine to locate specific terms on the open spread, and the system will verbally identify the exact paragraph or read the relevant context, providing a fully interactive study experience.

# Solution Components

The proposed system consists of several integrated components that enable voice-controlled interaction with physical reading materials. These components work together to assist users, particularly individuals with limited upper-limb mobility, in accessing and navigating printed documents independently.

**1. Voice Command Interface**
This module allows users to control the system using predefined voice commands. The system recognizes commands such as “next page,” “previous page,” “read page,” “book one,” and “book two.” These commands enable the user to navigate through physical reading materials without using their hands. The recognized speech input is processed and translated into control signals that trigger the corresponding system actions.

**2. Robotic Page-Turning Mechanism**
This component performs the physical manipulation of paper pages. A robotic mechanism, consisting of actuators and a page-lifting structure, is designed to lift and flip individual pages of a book or document. The mechanism must operate carefully to avoid tearing or damaging the paper while ensuring that only a single page is turned at a time. The system is designed to handle common book and document sizes within a specified range.

**3. Dual Document Workstations**
The system includes two predefined workstations where different physical reading materials can be placed before operation begins. These workstations allow the user to switch between two separate documents using voice commands. For example, one workstation may contain a bound textbook while the other contains stapled lecture notes. This feature allows users to interact with multiple learning materials without manual assistance.

**4. Vision-Based Page Monitoring**
A camera system continuously monitors the document during operation. This module captures images of the current page and uses computer vision techniques to detect whether a page-turning action has been successfully completed. It can also identify possible errors such as incomplete page flips, page misalignment, or paper sticking. The visual feedback helps improve system reliability and provides useful information for system control and debugging.

**5. Text Recognition and Audio Reading Module**
Using Optical Character Recognition (OCR), this module extracts textual content from the captured page images. The recognized text is then processed by a text-to-speech (TTS) system that reads the content aloud to the user. This function allows visually impaired users or users who prefer auditory feedback to access the information on the current page without needing to read the physical text directly.

**6. System Control and Integration**
This module serves as the central controller of the system. It coordinates all components, including voice input processing, robotic page-turning actions, vision feedback, and audio output. The control module ensures that commands are executed in the correct sequence and that feedback from sensors and the vision system is used to verify successful operations. This integration allows the system to function reliably as a unified assistive reading platform.

# Criterion for Success

1. The system shall correctly recognize and execute predefined voice commands such as “next page,” “previous page,” “turn to page X,” “read page,” “book one,” and “book two” in at least 8 out of 10 trials, with each action beginning within 3 seconds in a quiet indoor environment.
2. The system shall autonomously switch between two predefined document workstations, one containing a bound textbook and one containing stapled lecture notes, without requiring manual document replacement during operation, in at least 8 out of 10 trials.
3. For each of the two document types, the system shall successfully turn a single page forward and backward in at least 8 out of 10 trials, without damaging the paper, unintentionally turning multiple pages, or causing visible permanent damage.
4. The vision system shall correctly detect whether a page-turning action has succeeded or failed and identify common errors such as incomplete flips, page misalignment, or multiple-page pickup in at least 8 out of 10 trials; when an error is detected, the control system shall initiate a corrective action.
5. For a predefined set of printed textbook and lecture-note pages, the OCR and audio module shall extract the main textual content and read it aloud with at least 90% text accuracy, and the keyword search function shall correctly identify whether a requested keyword appears on the current page in at least 8 out of 10 trials.

Prosthetic Control Board

Featured Project

Psyonic is a local start-up that has been working on a prosthetic arm with an impressive set of features as well as being affordable. The current iteration of the main hand board is functional, but has limitations in computational power as well as scalability. In lieu of this, Psyonic wishes to switch to a production-ready chip that is an improvement on the current micro controller by utilizing a more modern architecture. During this change a few new features would be added that would improve safety, allow for easier debugging, and fix some issues present in the current implementation. The board is also slated to communicate with several other boards found in the hand. Additionally we are looking at the possibility of improving the longevity of the product with methods such as conformal coating and potting.

Core Functionality:

Replace microcontroller, change connectors, and code software to send control signals to the motor drivers

Tier 1 functions:

Add additional communication interfaces (I2C), and add temperature sensor.

Tier 2 functions:

Setup framework for communication between other boards, and improve board longevity.

Overview of proposed changes by affected area:

Microcontroller/Architecture Change:

Teensy -> Production-ready chip (most likely ARM based, i.e. STM32 family of processors)

Board:

support new microcontroller, adding additional communication interfaces (I2C), change to more robust connector. (will need to design pcb for both main control as well as finger sensors)

Sensor:

Addition of a temperature sensor to provide temperature feedback to the microcontroller.

Software:

change from Arduino IDE to new toolchain. (ARM has various base libraries such as mbed and can be configured for use with eclipse to act as IDE) Lay out framework to allow communication from other boards found in other parts of the arm.