Project

# Title Team Members TA Documents Sponsor
28 Extended Reality Based Robotic Desktop Assistant
Cheng Zheng
Yuxuan Wu
Zhewei Zhang
Ziyang Jin
proposal1.pdf
Liangjing Yang
#People
Cheng Zheng: cz77
Yuxuan Wu: yuxuan59
Ziyang Jin: ziyang3
Zhewei Zhang: zheweiz3

#Problem:
Portable robotic assistants have strong potential for everyday use, yet their compact form factor severely limits on-board user interfaces. As a result, users often cannot quickly understand what the robot can do, what it is currently doing, and how to interact with it efficiently.

Most palm-size robots rely on a mobile app or a few physical buttons, which leads to a narrow and less intuitive interaction style. Users frequently need to switch between checking the phone, issuing commands, and observing the robot’s response, increasing both the learning curve and operational friction. Some existing solutions enhance interaction by adding external displays or extra devices, but this increases system bulk and setup complexity, undermining portability and “grab-and-go” usability.

By turning any flat surface (e.g., a desk or a wall) into an interactive projection-based interface and integrating gesture recognition with dynamic visual feedback, the robot can provide a more natural and direct human–robot interaction experience without requiring an additional screen. This approach improves usability by making robot status and functions easier to perceive and operate, while also enabling an “interface anywhere” form factor that better fits real-world daily-assistance scenarios and enhances user engagement.

#Solution Overview:
Core function:
##Dynamic Projection Interface:
The robot projects an interactive user interface onto any flat surface (e.g., a desk or a wall), converting the surrounding physical space into an operable interaction area without adding an external display.

##Gesture-Based Interaction Control:
Users interact with the projected interface using hand gestures. The system performs real-time gesture detection and recognition, maps gestures to commands, and triggers corresponding robot responses, enabling a natural and intuitive interaction flow.

##Interface Navigation:
The main projected interface provides basic feature entries (e.g., Weather, Clock, Exit) and supports page switching and function invocation through a “point-and-click” interaction style.

##Information Query Functions:
Selecting the Weather icon switches the interface to display current weather information. Selecting the Clock icon switches the interface to display the current time. Each sub-page includes an Exit icon that returns the user to the main interface, ensuring a consistent and easy-to-learn navigation logic.

##Affective (Emotional) Interaction:
Simple gesture-triggered feedback is included to improve engagement and user friendliness.
A thumbs-up gesture triggers a 👍 animation with a cheerful sound;
a thumbs-down gesture triggers a 👎 animation with a sad sound;
and a heart gesture triggers a ❤️ animation with a warm, gentle sound.

#Components:
1. Mechanical Module
The mechanical module is designed to meet the overall goal of a palm-size, mobile robot with a projection-based interactive interface. A compact and lightweight structure is adopted to ensure stable desktop mobility, provide attitude adjustment, and support proper mounting and viewing angles for the projector–camera system. The module consists of three main parts:
(1) Omni-wheel Mobile Base: a three-omni-wheel chassis is used, with each wheel diameter no larger than 5 cm, enabling agile planar motion and maneuverability on desktop surfaces;
(2) Mini Gimbal: a 2-DoF gimbal provides orientation adjustment with a pitch range of approximately ±30°, allowing the system to align the projection and vision direction under different usage conditions and improving projection/recognition robustness;
(3) Lightweight Enclosure: the enclosure will be fabricated via 3D printing to support rapid iteration and assembly optimization. The overall robot size is constrained within 15 cm × 15 cm × 15 cm to maintain portability and desktop friendliness.

2. Electronic Module
The electronic module is selected and integrated to support a low-power, portable, and fully self-contained system. It provides computation and control, vision sensing, projection display, audio feedback, wireless connectivity, and power management to ensure reliable standalone operation in desktop scenarios. The main components include:
(1) Main Controller: Raspberry Pi Zero 2 W (512MB RAM) serves as the core computing and control unit, capable of running basic vision and interaction logic (with OpenCV support);
(2) Micro Projector: a DLP2000-based module with 854×480 resolution and short-throw projection, used to project the interactive UI onto flat surfaces (desk/wall) to form an interaction area;
(3) Camera: an OV5640 camera (5 MP, autofocus supported) captures gestures, objects, and environmental cues, enabling gesture recognition, interface registration, and task execution;
(4) Speaker: a compact speaker module with PWM audio output provides sound cues and affective feedback;
(5) Power System: two 18650 Li-ion cells (capacity ≥2000 mAh) target a battery life of at least 1 hour for demos and mobile usage;
(6) Communication: Wi-Fi 2.4 GHz and Bluetooth 4.2 enable connection with a phone or external devices for control, debugging, and data transfer.

3. Software Modules
The software modules integrate “projection display—visual perception—interaction comprehension—motion execution” into a closed-loop operational system. Employing a modular design for parallel development and future expansion, it comprises the following submodules:
(1) Gesture Recognition Module: Implements gesture detection and recognition using MediaPipe/OpenCV, supporting inputs such as tap, thumbs-up, thumbs-down, and heart gestures for interaction commands.
(2) Interface Rendering Module: Dynamically generates and renders main and sub-interface content (e.g., weather, clock), outputting corresponding graphical interfaces to the projection display.
(3) Interaction Logic Engine: Maps gestures to commands and triggers events, manages interface state machines and interaction flows (main/sub-interface switching, exit/return), ensuring consistent and maintainable interaction logic.
(4) Image Correction Module: Performs geometric correction and alignment on projected images to enhance stability at varying angles and distances. Integrates with cameras to implement auto-focus/alignment strategies, ensuring clearer and more reliable interface display.
(5) Sound Effect Generation Module: Plays corresponding audio cues (e.g., thumbs-up, thumbs-down, heart feedback sounds) based on interaction events, providing clearer feedback.
(6) Data Acquisition Module: Retrieves real-time weather, time, and other information via network APIs, updates projected interfaces, and enables information lookup functionality.
(7) Motion Control Module: Manages chassis movement control and task execution, including fundamental speed/attitude control interfaces and higher-level behaviors like line-following navigation and moving to designated zones. This module integrates with the interaction logic engine, allowing users to trigger motion-related tasks via projected interfaces or gestures.

#Criterion for success:
## F1: Main UI Projection Clarity & Icon Size
• Success Criteria: The main interface projects two clearly visible icons (Weather and Clock). Each icon has a visible size of at least 3 cm × 3 cm.
• Verification Method: Visually check projection clarity and measure the icon size using a ruler.
## F2: Weather Page Switching Latency
• Success Criteria: After the user completes a click on the Weather icon, the UI switches to the weather page within 2 s.
• Verification Method: Time the interval from click completion to page switch completion.
## F3: Weather Information Field Completeness
• Success Criteria: The weather page displays, at minimum, the following fields: city, temperature, and weather condition.
• Verification Method: Visually verify the presence of these fields on the projected page.
## F4: Clock Page Switching Latency
• Success Criteria: After the user completes a click on the Clock icon, the UI switches to the time page within 2 s.
• Verification Method: Time the interval from click completion to page switch completion.
## F5: Time Display Format & Refresh
• Success Criteria: The time page displays the current time in “HH:MM:SS” format and updates continuously.
• Verification Method: Visually check the format and observe continuous time updates.
## F6: Return/Exit Entry Consistency on Sub-Pages
• Success Criteria: Sub-pages (Weather/Time) always show an Exit/Back icon (or an equivalent return entry) with a consistent, recognizable placement.
• Verification Method: Visually check that the return entry remains present and consistent across pages.
## F7: Return-to-Main Page Latency
• Success Criteria: After clicking the Exit/Back icon, the UI returns to the main page within 1.5 s.
• Verification Method: Time the interval from click completion to main page display completion.
## F8: Thumbs-Up Gesture Response & Feedback
• Success Criteria: Upon a thumbs-up gesture, the system displays a 👍 animation and plays a cheerful sound cue, with a total response time under 2 s.
• Verification Method: Record a video and measure the latency frame-by-frame from gesture completion to animation/audio onset.
## F9: Thumbs-Down Gesture Response & Feedback
• Success Criteria: Upon a thumbs-down gesture, the system displays a 👎 animation and plays a sad sound cue, with a total response time under 2 s.
• Verification Method: Record a video and measure the latency frame-by-frame from gesture completion to animation/audio onset.
## F10: Heart Gesture Response & Feedback
• Success Criteria: Upon a heart gesture, the system displays a ❤️ animation and plays a warm sound cue, with a total response time under 2 s.
• Verification Method: Record a video and measure the latency frame-by-frame from gesture completion to animation/audio onset.
## F11: Gesture Recognition Accuracy
• Success Criteria: Gesture recognition accuracy is at least 85% (at least 20 trials per gesture type).
• Verification Method: Log the number of correct recognitions and total trials per gesture, then compute accuracy.
## F12: False-Trigger Rate (Robustness)
• Success Criteria: False-trigger rate is no more than 10% (no response should be triggered by non-target gestures or no-interaction motions).
• Verification Method: During a fixed-duration or fixed-count negative test (non-gesture/disturbance motions), record false triggers and compute the rate.

LED Cube

Featured Project

LED technology is more advanced and much more efficient than traditional incandescent light bulbs and as such our team decided we wanted to build a device related to LEDs. An LED cube is inherently aesthetically pleasing and ours will be capable of displaying 3D animations and lighting patterns with much increased complexity compared to any 2D display of comparable resolution. Environmental interaction will also be able to control the various lighting effects on the cube. Although our plan is for a visually pleasing cube, our implementation can easily be adapted for more practical applications such as displaying 3D models.