Project

# Title Team Members TA Documents Sponsor
37 Visual chatting and Real-time acting Robot
Haozhe Chi
Jiatong Li
Minghua Yang
Zonghai Jing
design_document1.pdf
design_document2.pdf
proposal1.pdf
proposal2.pdf
proposal3.pdf
Gaoang Wang
Group member:
Haozhe Chi, haozhe4
Minghua Yang, minghua3
Zonghai Jing, zonghai2
Jiatong Li, jl180
Problem:
With the rise of large language models (LLMs), Large visual language models (LVLMs) have achieved great success in recent AI development. However, it's still a big challenge to configure an LVLM system for a robot and make all hardware work well around this system. We aim to design an LVLM-based robot that can react to multimodal inputs.
Solution overview:
We aim to deliver an LVLM system (software part), a robot arm for robot actions like grabbing objects (hardware part), a robot movement equipment for moving according to the environment (hardware part), a camera for real-time visual inputs (hardware part), a laser tracker for implicating the object (hardware part), and an audio equipment for audio inputs and outputs (hardware part).
Solution components:
LVLM system:
We will deploy a BLIP-2 based AI model for visual language processing. We will incorporate the strengths of several recent visual-language models, including LlaVA, Videochat, and VideoLlaMA, and design a better real-time visual language processing system. This system should be able to realize real-time visual chatting with less object hallucination.
Robot arm and wheels:
We will use ROS environment to control robot movements. We will apply to use robot arms in ZJUI ECE470 labs and buy certain wheels for moving. We may use four-wheel design or track design.
Camera:
We will configure cameras for real-time image inputs. 3D reconstruction may be needed, depending on our LVLM system design.
If multi-viewed inputs are needed, we will design a better camera configuration.
Audio processing:
We will use two audio processing systems: voice recognition and text-to-audio generation. They are responsible for audio inputs and outputs respectively. We will use certain audio broadcast components to make the robot talk.
Criterion for success:
The robot consists of functions including voice recognition, laser tracking, real-time visual chatting, a multimodal processing system, identifying a certain object, moving and grabbing it, and multi-view camera configuration. All the hardware parts should cooperate well in the final demo. This means that not only every single hardware should function well, but also perform more advanced functions. For instance, the robot should be able to move towards certain objects while chatting with humans.

BusPlan

Featured Project

# People

Scott Liu - sliu125

Connor Lake - crlake2

Aashish Kapur - askapur2

# Problem

Buses are scheduled inefficiently. Traditionally buses are scheduled in 10-30 minute intervals with no regard the the actual load of people at any given stop at a given time. This results in some buses being packed, and others empty.

# Solution Overview

Introducing the _BusPlan_: A network of smart detectors that actively survey the amount of people waiting at a bus stop to determine the ideal amount of buses at any given time and location.

To technically achieve this, the device will use a wifi chip to listen for probe requests from nearby wifi-devices (we assume to be closely correlated with the number of people). It will use a radio chip to mesh network with other nearby devices at other bus stops. For power the device will use a solar cell and Li-Ion battery.

With the existing mesh network, we also are considering hosting wifi at each deployed location. This might include media, advertisements, localized wifi (restricted to bus stops), weather forecasts, and much more.

# Solution Components

## Wifi Chip

- esp8266 to wake periodically and listen for wifi probe requests.

## Radio chip

- NRF24L01 chip to connect to nearby devices and send/receive data.

## Microcontroller

- Microcontroller (Atmel atmega328) to control the RF chip and the wifi chip. It also manages the caching and sending of data. After further research we may not need this microcontroller. We will attempt to use just the ens86606 chip and if we cannot successfully use the SPI interface, we will use the atmega as a middleman.

## Power Subsystem

- Solar panel that will convert solar power to electrical power

- Power regulator chip in charge of taking the power from the solar panel and charging a small battery with it

- Small Li-Ion battery to act as a buffer for shady moments and rainy days

## Software and Server

- Backend api to receive and store data in mongodb or mysql database

- Data visualization frontend

- Machine learning predictions (using LSTM model)

# Criteria for Success

- Successfully collect an accurate measurement of number of people at bus stops

- Use data to determine optimized bus deployment schedules.

- Use data to provide useful visualizations.

# Ethics and Safety

It is important to take into consideration the privacy aspect of users when collecting unique device tokens. We will make sure to follow the existing ethics guidelines established by IEEE and ACM.

There are several potential issues that might arise under very specific conditions: High temperature and harsh environment factors may make the Li-Ion batteries explode. Rainy or moist environments may lead to short-circuiting of the device.

We plan to address all these issues upon our project proposal.

# Competitors

https://www.accuware.com/products/locate-wifi-devices/

Accuware currently has a device that helps locate wifi devices. However our devices will be tailored for bus stops and the data will be formatted in a the most productive ways from the perspective of bus companies.