Project
# | Title | Team Members | TA | Documents | Sponsor |
---|---|---|---|---|---|
25 | A.I.dan: ChatGPT Integrated Virtual Assistant |
Andrew Scott Brahmteg Minhas Leonardo Garcia |
Hanyin Shao | design_document1.pdf final_paper1.pdf photo1.jpg photo2.png presentation1.pptx proposal1.pdf |
|
Team Members: - Andrew Scott (ajscott5) - Leonardo Garcia (lgarci91) - Brahmteg Minhas (bminhas2) # Problem Current virtual assistants (Amazon’s Alexa, Apple’s Siri, etc) all use google as their primary mechanism for answering questions posed to them. While they may have other functionality, like integration with Amazon.com or spotify, their primary function is as assistants who answer questions based on audio I/O. With the advent of Chat GPT-3, Google is now an outdated information gathering mechanism, and needs to be replaced within the virtual assistant space. # Solution Our solution combines the convenience of a virtual assistant with the power of chatGPT to create a more powerful and useful home-assistant for answering questions. We will use a Speech-to-Text module to convert user voice input to text. This interaction, taking in user sound and responding shall be facilitated by a “cue word”, like “Hey A.I.dan”, or similar. To ask a question, a user will say the cue word and then ask their question. Once they have stopped speaking, A.I.dan will send the message through to ChatGPT, and once it gets back ChatGPT’s response, use Text To Speech (TTS) to relay it to the user as well as display it on the screen. ## Control Unit Utilizes an ESP32 microcontroller with a Raspberry Pi RP2040. Software on the microcontroller interfaces with the audio I/O, the screen, and the through Wi-Fi to a PC which handles the chatGPT API, as well as the Speech-to-Text and Text-to-Speech modules. The microcontroller will also receive the information to be output to the screen and microphone from the PC. ## Audio I/O The mechanism through which a user will interact with our device is with their voice. To facilitate this, both a speaker and microphone will be added to our PCB. Any post processing we want to do in order to clean up the audio to increase accuracy will also be done onboard. Any audio input to the microphone will go to the RP2040 for the detection of a wake word. Once a wake word is detected, the microcontroller will stream audio to a PC through Wi-Fi. Once the PC returns the chatGPT output after it has been passed through the text-to-speech module, it is played through the microphone. ## Screen Many outputs that ChatGPT has are not easily understood through an audio description. The best example of this is code segments, which are always formatted as a markdown. In order to provide this particular functionality, a screen shall be added externally to our assistant, connected by SPI to the PCB. # Criterion For Success To consider this fully successful, at least 75% of attempted basic interactions should be successful. Basic interactions are questions that are based entirely on words included in our pre-trained speech to text model. Code (Markdown) as well as traditional text answers must display/speak properly given a successful question. This can be tested by asking the same question to chatGPT on a separate device. # Resources: [Example of ESP32 to PC Audio Streaming]( https://github.com/MinePro120/ESP32-Audio-Streamer) [Example of PC to ESP32 Audio Streaming](https://www.hackster.io/julianfschroeter/stream-your-audio-on-the-esp32-2e4661) |