Do Not Sell My Information      Privacy

Sunday, October 29, 2023

Siri 2.0: Making a GPT voice assistant

Once, while dabbling with AI again, I thought to myself, "Why can't I talk to AI?".
Literally talk to it, not just type on a keyboard and see a response on the screen, I mean use my voice and hear the AI speak. Although it's a little weird, we already have AI's such as Siri or Alexa. And I found out that ChatGPT and other new AI models are a lot more advanced than voice assistants now. Siri's IQ is a measly 23.9 (src), about twice less than a 6 year old's intelligence. While ChatGPT is 155 (src) (nearly as smart as Einstein?).

Although these measurements may not be the most accurate (Intelligence is very hard to define and measure), it's definitely time for a new, updated virtual assistant. 

First, I needed to figure out how to do this and what tools to use. What I planned to do was merge voice to text, text to speech, and large language models. I decided to use Python for the job. Even though it's not the fastest and not the best for AI, it should be easy to implement as a prototype, and will be of adequate performance. Next, I chose HuggingChat as the LLM; it is built on LLaMA, and is free and very easy to use, see this post I made if you want to learn how to use it yourself. Next is the SpeechRecognition module, using Google Speech Recognition. It's a pretty simple module and the Google Speech Recognition works very well each time I've used it. Finally, is pyttsx3, a text to speech module, this will give HuggingChat its voice.
 
The first thing to do is to plug in the modules.

import pyttsx3 # Text to speech
from hugchat import hugchat
from hugchat.login import Login
import speech_recognition as sr

The only thing this will do is import the modules we will be using, so next we should implement some functionality.

Next is to set up the AI. We will need to sign in, save the cookies, and start a new chatbot conversation. If you don't already have a HuggingFace account, you can make one at https://huggingface.co/ (it is completely free)

sign = Login(email, password)
cookies = sign.login()
cookie_path_dir = "./cookies_snapshot"
sign.saveCookiesToDir(cookie_path_dir)
chatbot = hugchat.ChatBot(cookies=cookies.get_dict())
isd = chatbot.new_conversation()


Now let's get into the juicy part, detecting speech. We will set up the microphone and speech detection software, then listen for speech. The following code snippets should be in a while True loop, in order to continuously listen for speech and have the AI answer questions.

while True:
    r = sr.Recognizer()
    with sr.Microphone() as source:
        print("Say something!")
        # Listen for speech
        audio = r.listen(source) 

Next, we will process the speech into text using recognize_google. Then, the following lines will query the chatbot and use the pyttsx3 text to speech to translate the chatbot's words into sound.

    try:
        print("(Detected) You said: " + r.recognize_google(audio))
        pyttsx3.speak("Got it. Let me think for a second.")
        # Query chatbot
        i = chatbot.query(r.recognize_google(audio))
        print(i)
        pyttsx3.speak(i)


Finally, we must decide what will happen in the event of an error. Usually this will happen when the Speech Recognition can't process the speech or if no words are spoken. A simple message via text to speech and the console will do.

    except sr.UnknownValueError:
        # If there are errors with Speech Recog.
        pyttsx3.speak("Sorry, I couldn't understand you. Do you mind saying that again?")
        print("Google Speech Recognition could not understand audio / No words were spoken")
    except sr.RequestError as e:
        pyttsx3.speak("Sorry, I couldn't understand you. Do you mind saying that again?")
        print("Could not request results from Google Speech Recognition service; {0}".format(e))


We are now done! Surprisingly, this program only takes 34 lines of code. It's truly incredible how easy it is to make complex programs like this.

Now, run the program, turn up your volume, and talk into your microphone. You are now talking to AI!

Full code:

import pyttsx3 # Text to speech
from hugchat import hugchat
from hugchat.login import Login
# Set up the ChatBot
sign = Login(email, password)
cookies = sign.login()
cookie_path_dir = "./cookies_snapshot"
sign.saveCookiesToDir(cookie_path_dir)
chatbot = hugchat.ChatBot(cookies=cookies.get_dict())
isd = chatbot.new_conversation()
#
import speech_recognition as sr
pyttsx3.speak("Alright, I'm ready for your questions.")
while True:
    r = sr.Recognizer()
    with sr.Microphone() as source:
        print("Say something!")
        # Listen for speech
        audio = r.listen(source)

    try:
        print("(Detected) You said: " + r.recognize_google(audio))
        pyttsx3.speak("Got it. Let me think for a second.")
        # Query chatbot
        i = chatbot.query(r.recognize_google(audio))
        print(i)
        pyttsx3.speak(i)
    except sr.UnknownValueError:
        # If there are errors with Speech Recog.
        pyttsx3.speak("Sorry, I couldn't understand you. Do you mind saying that again?")
        print("Google Speech Recognition could not understand audio / No words were spoken")
    except sr.RequestError as e:
        pyttsx3.speak("Sorry, I couldn't understand you. Do you mind saying that again?")
        print("Could not request results from Google Speech Recognition service; {0}".format(e))


No comments:

Post a Comment

Siri 2.0: Making a GPT voice assistant

Once, while dabbling with AI again, I thought to myself, "Why can't I talk to AI?". Literally talk to it, not just type on a k...