Alan Qu

VoicePilot

A project utilizing face and voice to interact with the computer in a way never seen before.

↓

Hey Jarvis, What's the Weather?

Okay, maybe it's not a fully intelligent and sentient assistant like Jarvis, but it can do some pretty cool things. VoicePilot would respond to this command by opening up the weather app automatically. Spawned from a 12 hour hackathon, with development still continuing, VoicePilot aims to be an assistive program that allows users to talk to use their computers with no manual mouse or keyboard input.

We take in user voice input and feed it through a speech recognition algorithm. With this raw speech text, we leverage a large language model, Gemini, to convert it into commands. These commands are then run through the Win32API to use the computer. With Gemini, speech doesn't have to be robotic, allowing for more robust use cases and interpretaions.

Timeline
April 2024 - Present

Skills
Python - Tkinter, PyAutoGUI
Large Language Models - Gemini

Utilizing Gemini and PyAutoGUI to interact with the computer smoothly.

For more, check out the Github Repository.

"What all of us have to do is to make sure we are using AI in a way that is for the benefit of humanity, not to the detriment of humanity." -Tim Cook