ChatGPT Learns to Listen, Speak, Recognize Images, and Search Online
The Chatbot ChatGPT has received a significant update. It now incorporates a neural network that understands voice commands, responds using synthesized speech, and can recognize the content of images. According to an OpenAI press release, these new features offer a more intuitive form of interaction, allowing users to engage in voice conversations or visually convey information to ChatGPT.
"For instance, while traveling, you can take a photo of a landmark and discuss its interesting aspects. When you're at home, you can photograph the contents of your refrigerator to plan dinner (and ask additional questions for recipes). After dinner, you can help your child solve a math problem: take a photo, annotate it, and ask for hints," describes OpenAI regarding the new interaction possibilities with ChatGPT.
Additionally, ChatGPT developers have enabled it to access the internet and provide links to the sources from which it derives information, allowing users to fact-check the chatbot's responses. This feature is currently available only through a paid subscription.
Back in March, the company announced that GPT-4 would operate based on multimodal models. This means the algorithm possesses a multimodal dictionary where some tokens correspond to text processing, while others handle images, audio, and more.
The voice capabilities were also made possible by using a new model. It requires only a short audio sample to generate a voice similar to a human's. Furthermore, OpenAI utilizes its Whisper algorithm to transcribe spoken words into text.
OpenAI acknowledges that these new capabilities come with potential risks. For example, a system that can create voices could be exploited by malicious actors for impersonation and fraud. Therefore, the company currently restricts the use of this technology to voice chats only.
Additionally, testing teams have scrutinized how the new algorithm interacts with images, paying special attention to photos that may contain misinformation or extremist messages.
You can see the functionality and capabilities of the updated ChatGPT in a video by Business Today.