Project

# Title Team Members TA Documents Sponsor
29 Interactive Projection System on Arbitrary Surfaces
Jie Xu
Jing Weng
Yuqi Tang
Zibo Dai
design_document1.pdf
design_document2.pdf
final_paper1.pdf
final_paper2.pdf
other1.pdf
Liangjing Yang
# Problem

Most current smart devices rely on fixed-size screens for human-computer interaction, which limits display area, temporary collaboration, and natural input. Projection technology can extend interfaces into the physical environment, but conventional projectors usually provide visual output only and cannot support stable direct touch interaction across surfaces with different shapes, sizes, and materials.

Our project aims to develop a system that projects an interactive user interface onto arbitrary physical surfaces and supports direct touch input on the projected area. This is a meaningful and technically challenging problem because the system must address not only projection, but also surface detection, projector-sensor calibration, touch localization, and real-time interaction feedback. We will begin by validating the first prototype on a normal wall, and then extend the design toward more general surfaces such as desks, paper, and other physical objects.


# Solution Overview

We propose to build an interactive projection system that integrates projection hardware, vision-based sensing, and embedded control. A projection module will display a graphical user interface on the target surface, while a camera or depth-based sensing module will monitor the surface and detect the position of a user’s finger during interaction.

The sensed position will then be mapped into the projected interface coordinate system so that the system can recognize basic actions such as clicking and dragging, forming a complete display-sensing-recognition-feedback loop. The first implementation will be validated on a flat and stable wall surface; however, the overall architecture will be designed for extension to arbitrary surfaces, with attention to surface size variation, pose variation, and adaptive interface placement.

Prior research shows that the key technical problems of arbitrary-surface interactive projection include surface segmentation and tracking, projector-camera calibration, and interaction area definition, which directly motivates our design.


# Solution Components

The proposed system consists of the following major components:

## Projection Display Module
Projects a graphical user interface onto the target surface and adjusts the displayed area according to surface size, position, and orientation.

## Surface Sensing Module
Uses a camera or depth/vision sensor to capture image or depth information from the target surface, detect surface geometry, and identify the available interactive area.

## Touch Detection and Interaction Recognition Module
Detects whether the user’s finger is touching the projected surface and recognizes basic interaction events such as tapping and dragging.

## Coordinate Calibration and Mapping Module
Establishes the spatial relationship between the sensing system and the projector so that detected touch points can be accurately mapped to interface locations.

## Embedded Control and System Integration Module
Executes control logic, coordinates sensing and projection data flow, and manages communication and power across the system.

## Mechanical Support Structure
Provides stable mounting for the projector, sensors, and control hardware so that the relative geometry remains fixed and repeatable during calibration and testing.


# Criteria of Success

The project will be considered successful based on the following criteria.

1. The system must project a stable and visible interactive interface onto at least one physical surface and maintain usable operation during demonstration.
2. It must detect direct touch input within the projected area and correctly trigger at least one basic interaction event, such as a click.
3. The touch localization accuracy must be sufficient for users to complete simple interface tasks such as button selection or menu navigation.
4. The system must demonstrate extensibility toward arbitrary surfaces by supporting interaction on at least one additional surface beyond a wall.
5. The complete prototype must support a demonstrable application scenario, such as a numeric keypad, simple control panel, or menu-based interface, showing that the full interaction loop has been implemented.

These success criteria match the course expectation that requirements should be clear and verifiable, and they are also consistent with prior evaluation methods for click detection and drag interaction in projected interactive systems.

A Wearable Device Outputting Scene Text For Blind People

Hangtao Jin, Youchuan Liu, Xiaomeng Yang, Changyu Zhu

A Wearable Device Outputting Scene Text For Blind People

Featured Project

# Revised

We discussed it with our mentor Prof. Gaoang Wang, and got a solution to solve the problem

## TEAM MEMBERS (NETID)

Xiaomeng Yang (xy20), Youchuan Liu (yl38), Changyu Zhu (changyu4), Hangtao Jin (hangtao2)

## INSTRUCTOR

Prof. Gaoang Wang

## LINK

This idea was pitched on Web Board by Xiaomeng Yang.

https://courses.grainger.illinois.edu/ece445zjui/pace/view-topic.asp?id=64684

## PROBLEM DESCRIPTION

Nowadays, there are about 12 million visually disabled people in China. However, it is hard for us to see blind people in the street. One reason is that when the blind people are going to the location they are not familiar with, it is difficult for blind people to figure out where they are. When blind people travel, they are usually equipped with navigation equipment, but the accuracy of navigation equipment is not enough, and it is difficult for blind people to find the accurate position of the destination when they arrive near the destination. Therefore, we'd like to make a device that can figure out the scene text information around the destination for blind people to reach the direct place.

## SOLUTION OVERVIEW

We'd like to make a device with a micro camera and an earphone. By clicking a button, the camera will take a picture and send it to a remote server to process through a communication subsystem. After that, text messages will be extracted and recognized from the pictures using neural network, and be transferred to voice messages by Google text-to-speech API. The speech messages will then be sent back through the earphones to the users. The device can be attached to glasses that blind people wear.

The blind use the navigation equipment, which can tell them the location and direction of their destination, but the blind still need the detail direction of the destination. And our wearable device can help solve this problem. The camera is fixed to the head, just like our eyes. So when the blind person turns his head, the camera can capture the text of the scene in different directions. Our scenario is to identify the name of the store on the side of the street. These store signs are generally not tall, about two stories high. Blind people can look up and down to let the camera capture the whole store. Therefore, no matter where the store name is, it can be recognized.

For example, if a blind person aims to go to a book store, the navigation app will tell him that he arrives the store and it is on his right when he are near the destination. However, there are several stores on his right. Then the blind person can face to the right and take a photo of that direction, and figure out whether the store is there. If not, he can turn his head a little bit and take another photo of the new direction.

![figure1](https://courses.grainger.illinois.edu/ece445zjui/pace/getfile/18612)

![figure2](https://courses.grainger.illinois.edu/ece445zjui/pace/getfile/18614)

## SOLUTION COMPONENTS

### Interactive Subsystem

The interactive subsystem interacts with the blind and the environment.

- 3-D printed frame that can be attached to the glasses through a snap-fit structure, which could holds all the accessories in place

- Micro camera that can take pictures

- Earphone that can output the speech

### Communication Subsystem

The communication subsystem is used to connect the interactive subsystem with the software processing subsystem.

- Raspberry Pi(RPI) can get the images taken by the camera and send them to the remote server through WiFi module. After processing in the remote server, RPI can receive the speech information(.mp3 file).

### Software Processing Subsystem

The software processing subsystem processes the images and output speech, which including two subparts, text recognition part and text-to-speech part.

- A OCR recognition neural network which is able to extract and recognize the Chinese text from the environmental images transported by the communication system.

- Google text-to-speech API is used to transfer the text we get to speech.

## CRITERION FOR SUCCESS

- Use neural network to recognize the Chinese scene text successfully.

- Use Google text-to-speech API to transfer the recognized text to speech.

- The device can transport the environment pictures or video to server and receive the speech information correctly.

- Blind people could use the speech information locate their position.