David Schneck, Chis Shannon, and Ruthvik Reddy Kadiri
Voice to Text Keyboard
Chris Shannon, David Schneck, and Ruthvik Reddy Kadiri
Introduction
Our project, aptly named “Voice to Text Keyboard,” was originally meant to act as a normal keyboard, simply with the built-in functionality of having voice to text for the user, so at the simple push of a button one could start speaking and the keyboard would type whatever it is the user said. The goal was to provide the user with a device that has easy and accessible voice to text option, so removes the need for an external device or software. Our original design was to build this speech to text functionality into a keyboard, but due to a few issues, our final product worked fully as a voice to text bluetooth device able to translate what is said and type it out on a computer, without any attached keyboard.
Currently, a few applications of voice recognition software and hardware are used and produced, and technology in that area is expanding rapidly. Most notably, there exist devices such as Amazon’s Alexa/Echo and Google Home for standalone “smart” home devices, Cortana for Windows and on many smartphones the ability to use speech to text for texting. There are also some products like voice to text headsets for typing. Our idea was to take the idea of voice to text and use it in something that is already used for typing: a keyboard. Although it is not a “smart” keyboard (yet) it would have the ability to allow the user to speak and type whatever is spoken- without the need for a device like a headset, and would be integrated and easy to use. Our final product unfortunately is not integrated into a keyboard due to time constraints, but as a future product could be. Either way, the product is useful as a device to people who would rather speak than type, as a faster, more comfortable method of typing.
Design
Our project has four main components, centered around the Raspberry Pi Zero W, which transmits data through bluetooth to the computer. Directly attached to the Pi is a USB hub containing 4 standard USB ports since the Pi does not have any. It connects directly through some of the Pi’s onboard connections. In one of those usb ports is a USB microphone. The button is connected on a breadboard and through two wired that go to the raspberry Pi’s GPIO pins. Power comes to the Pi via a mini USB cable.
Our design mainly uses the Raspberry Pi, which has all sensors and power connected into it. The button is connected with two wires on a breadboard and into one GPIO pin for ground and one for a data pin (18). The raspberry pi can receive data for the button being pushed through python code, and once this is triggered, executes more python code to start recording from the USB microphone aided by a number of python libraries. Once the audio data is recorded, the Pi, which is connected to the internet through its onboard wifi, send the data to Google’s speech recognition API which calculates the string of text and sends that back to the Pi. Once the Pi receives the string of text, it sends that string to the target computer through onboard bluetooth- in this case, the Pi is emulating a bluetooth keyboard so must be previously connected to the computer. The button can be pushed as many times as needed; currently, it records for a duration of five seconds after a push.
Results
By the end, our project functioned almost completely from what was initially planned of it. The assembled project can successfully record speaking with the push of a button, and compite that audio data into written text. It also can successfully emulate a bluetooth keyboard and send the computed string of text into the target computer. With our final tests, it was able to very accurately record and compute the speaking as well, and no major problems occur with the accuracy of translation. However, the time between speaking and the text appearing on the computer is somewhat long, approximately five seconds, because of the relative slowness of internet connectivity and the Pi’s very limited computing power. Additionally, while it can connect to a computer and act successfully as a keyboard, it currently must be connected manually through the Pi’s operating system and GUI using command line to connect, which greatly reduces the ease of use. In the end, although we were unable to add everything we wanted to and make it a complete project, the project functions well and it would be relatively easy, given some more time, to fully complete our project.
Problems and Challenges
There were many problems in the development process of our project, the first of which was receiving our components. Because of some back order issues and some parts going out of stock after we ordered, we didn’t receive all necessary parts to start building and testing until very late, with only a couple sessions left. There were also many problems that led to the final design being different than the initial design- at first, we had planned to use the Pi as a wired keyboard, attached alongside a normal keyboard and connect through wired USB, as well as using offline speech to text software. In the end, it was not connected to a keyboard at all, and became bluetooth. For starters, through much trial and much error, we were unable to get the Pi to successfully connect through wired USB. For unknown reasons, the computer would continuously fail to recognize the device, so we eventually had to change the design and coding when it would only connect through bluetooth (this meant that there was no reason to attach it to a keyboard too). Additionally, although we had an open source speech to text recognition software, called Pocketsphinx, we were unable to implement the offline design due to time constraints, as it was easier to avoid trying to download that software as opposed to the easier version with Google’s API. Unfortunately, that means that the speech to text keyboard we developed needs to be connected to the internet, which is an unwanted aspect of the design. Furthermore, being in the 120 Honors lab, we were supposed to include logic elements using gates. Again, due to a number of factors, we did not have time to implement the gated logic component in our final product, although we had a plan to add different colored LEDs to display which state the Pi was in: recording, computing, waiting, etcetera. Ultimately we finished the design and it worked well, but was challenging due to the little time we had combined with the many software problems we had at every level of implementation.
Future Plans
As has been stated a few times, our final product was not the same as the initial plan we had. It is bluetooth rather than wired, is not integrated into a keyboard, does not have indication lights, and has to be connected to the internet to compute the text. Future plans for this project would fix all of these issues. One of the most important of which would be adding offline functionality, since computing through Google’s speech recognition API causes it to be quite slow. Switching to the offline Pocketsphinx would hopefully speed up the computation and create a better product. Additionally, a next step would be to fully integrate this into a physical keyboard, which would also work better implemented through wired connections rather than wireless because of connectivity issues. As a fully integrated keyboard, our project would become more desirable as a product and easier to use than a standalone device. We would also add indicitation lights to let the user know when it was recording, computing, and powered on. One last possibility is to actually create this into a sort of “smart” device- however, this would be a project much above our skill level.
References
“Connecting a Push Switch.” Razzpisampler, razzpisampler.oreilly.com/ch07.html.
Keef. “Yet Another Pointless Tech Blog.” Emulating a Bluetooth Keyboard with a Raspberry Pi and Python (Raspbian Jessie/Bluez 5 Version), 1 Jan. 1970, yetanotherpointlesstechblog.blogspot.com/2016/04/emulating-bluetooth-keyboard-with.html.
Shmyrev, Nickolay. “CMUSphinx Open Source Speech Recognition.” CMUSphinx Open Source Speech Recognition, cmusphinx.github.io/.
“Speech API - Speech Recognition | Google Cloud Platform.” Google, Google, cloud.google.com/speech/.