Project

# Title Team Members TA Documents Sponsor
27 An Intelligent Assistant Using Sign Language
Best Integrated
Haina Lou
Howie Liu
Qianzhong Chen
Yike Zhou
Xiaoyue Li design_document1.pdf
final_paper1.pdf
proposal2.pdf
Liangjing Yang
# TEAM MEMBERS
Qianzhong Chen (qc19)
Hanwen Liu (hanwenl4)
Haina Lou (hainal2)
Yike Zhou (yikez3)

# TITLE OF THE PROJECT
An Intelligent Assistant Using Sign Language

# PROBLEM & SOLUTION OVERVIEW
Recently, smart home accessories are more and more common in people's home. A center, which is usually a speaker with voice user interface, is needed to control private smart home accessories. But a interactive speaker may not be the most ideal for people who are hard to speak or hear. Therefore, we aim to develop a intelligent assistant using sign language, which can understand sign languages, interact with people, and act as a real assistant.

# SOLUTION COMPONENTS
## Subsystem1: 12-Degree-of-Freedom Bionic Hand System
- Two moveable joints every finger driven by 5-V servo motors
- The main parts of the hand manufactured with 3D printing
- The bionic hand is fixed on a 2-DOF electrical platform
- All of the servo motors controlled by PWM signals transmitted by STM32 micro controller


## Subsystem2: The Control System
- The controlling system consists of embedded system modules including the microcontroller, high performance edge computing platform which will be used to run dynamic gesture recognition model and more than 20 motors which can control the delicate movement of our bionic hand. It also requires a high-precision camera to capture the hand gesture of users.


## Subsystem3: Dynamic Gesture Recognition System
- A external camera capturing the shape, appearance, and motion of objective hands
- A pre-trained model to help other subsystems to figure out the meaning behind the sign language. To be more specific, at the step of objects detection, we intended to adopt YOLO algorithm as well as Mediapipe, a machine learning framework developed by Google to recognize different sign language efficiently. Considering the characteristic of dynamic gesture, we also hope to adopt 3D-CNN and RNN to build our models to better fit in the spatio-temporal features.

# CRITERION OF SUCCESS

- The bionic hand can move free and fluently as designed, all of the 12 DOFs fulfilled. The movement of single joint of the finger does not interrupt or be interrupted by other movements. The durability and reliability of the bionic hand is achieved.
- The controlling system needs to be reliable and outputs stable PWM signals to motors. The edge computing platform we choose should have high performance when running the dynamic gesture recognition model.
- Our machine could recognize different sign language immediately and react with corresponding gestures without obvious delay.


# DISTRIBUTION OF WORK
- Qianzhong Chen(ME): Mechanical design and manufacture the bionic hand; tune the linking between motors and mechanical parts; work with Haina to program on STM32 to generate PWM signals and drive motors.
- Hanwen Liu(CompE): Record gesture clips to collect enough data; test camera modules; draft reports; make schedules.
- Haina Lou(EE): Implement the embedded controlling System; program the microcontroller, AI embedded edge computing module and implement serial communication.
- Yike Zhou(EE): Accomplish object detection subsystem; Build and train the machine learning models.

A Wearable Device Outputting Scene Text For Blind People

Hangtao Jin, Youchuan Liu, Xiaomeng Yang, Changyu Zhu

A Wearable Device Outputting Scene Text For Blind People

Featured Project

# Revised

We discussed it with our mentor Prof. Gaoang Wang, and got a solution to solve the problem

## TEAM MEMBERS (NETID)

Xiaomeng Yang (xy20), Youchuan Liu (yl38), Changyu Zhu (changyu4), Hangtao Jin (hangtao2)

## INSTRUCTOR

Prof. Gaoang Wang

## LINK

This idea was pitched on Web Board by Xiaomeng Yang.

https://courses.grainger.illinois.edu/ece445zjui/pace/view-topic.asp?id=64684

## PROBLEM DESCRIPTION

Nowadays, there are about 12 million visually disabled people in China. However, it is hard for us to see blind people in the street. One reason is that when the blind people are going to the location they are not familiar with, it is difficult for blind people to figure out where they are. When blind people travel, they are usually equipped with navigation equipment, but the accuracy of navigation equipment is not enough, and it is difficult for blind people to find the accurate position of the destination when they arrive near the destination. Therefore, we'd like to make a device that can figure out the scene text information around the destination for blind people to reach the direct place.

## SOLUTION OVERVIEW

We'd like to make a device with a micro camera and an earphone. By clicking a button, the camera will take a picture and send it to a remote server to process through a communication subsystem. After that, text messages will be extracted and recognized from the pictures using neural network, and be transferred to voice messages by Google text-to-speech API. The speech messages will then be sent back through the earphones to the users. The device can be attached to glasses that blind people wear.

The blind use the navigation equipment, which can tell them the location and direction of their destination, but the blind still need the detail direction of the destination. And our wearable device can help solve this problem. The camera is fixed to the head, just like our eyes. So when the blind person turns his head, the camera can capture the text of the scene in different directions. Our scenario is to identify the name of the store on the side of the street. These store signs are generally not tall, about two stories high. Blind people can look up and down to let the camera capture the whole store. Therefore, no matter where the store name is, it can be recognized.

For example, if a blind person aims to go to a book store, the navigation app will tell him that he arrives the store and it is on his right when he are near the destination. However, there are several stores on his right. Then the blind person can face to the right and take a photo of that direction, and figure out whether the store is there. If not, he can turn his head a little bit and take another photo of the new direction.

![figure1](https://courses.grainger.illinois.edu/ece445zjui/pace/getfile/18612)

![figure2](https://courses.grainger.illinois.edu/ece445zjui/pace/getfile/18614)

## SOLUTION COMPONENTS

### Interactive Subsystem

The interactive subsystem interacts with the blind and the environment.

- 3-D printed frame that can be attached to the glasses through a snap-fit structure, which could holds all the accessories in place

- Micro camera that can take pictures

- Earphone that can output the speech

### Communication Subsystem

The communication subsystem is used to connect the interactive subsystem with the software processing subsystem.

- Raspberry Pi(RPI) can get the images taken by the camera and send them to the remote server through WiFi module. After processing in the remote server, RPI can receive the speech information(.mp3 file).

### Software Processing Subsystem

The software processing subsystem processes the images and output speech, which including two subparts, text recognition part and text-to-speech part.

- A OCR recognition neural network which is able to extract and recognize the Chinese text from the environmental images transported by the communication system.

- Google text-to-speech API is used to transfer the text we get to speech.

## CRITERION FOR SUCCESS

- Use neural network to recognize the Chinese scene text successfully.

- Use Google text-to-speech API to transfer the recognized text to speech.

- The device can transport the environment pictures or video to server and receive the speech information correctly.

- Blind people could use the speech information locate their position.