GPT-4o Processes Text, Audio, or Images for Instant Chat Responses

GPT-4o Processes Text, Audio, or Images for Instant Chat Responses

OpenAI's ChatGPT platform has taken a significant leap forward with the introduction of GPT-4o. This premier model can analyze audio, visual, and text inputs, delivering responses through a real-time conversation with an AI agent that sounds remarkably human.
GPT-4o helps solve a handwritten algebra equation as part of today’s demo
OpenAI

OpenAI’s ChatGPT platform has taken a significant leap forward with the introduction of GPT-4o. This premier model can analyze audio, visual, and text inputs, delivering responses through a real-time conversation with an AI agent that sounds remarkably human.

Unveiled during an online launch event on Monday, May 13th, by OpenAI CTO Mira Murati, GPT-4o is heralded as a move towards significantly more seamless human-computer interaction. The ‘o’ in its name signifies “omni.”

GPT-4o’s Performance and Cost-Effectiveness

Geared towards enhancing the user experience for free service users, it claims to match the performance of the paid GPT-4 Turbo model in processing text and code, while also being faster and 50% more cost-effective in terms of API usage. This enables seamless integration into third-party applications at a reduced cost.

To initiate interaction, users simply utter “Hey, ChatGPT,” eliciting a lively spoken response from the agent.

They can then articulate their query using natural language, supplemented with text, audio, and/or visual inputs as needed – the latter encompassing images, live camera feeds from their device, or virtually any other visual data the agent can interpret.

Comparable Response Times and Multilingual Capabilities

In terms of audio inputs, the AI exhibits an average response time of 320 milliseconds, a figure comparable to human conversational response times, according to the company. Moreover, the system is currently proficient in more than 50 languages.

During today’s announcement and demonstration, there were no noticeable delays in the agent’s responses, which were notably infused with human-like emotion – far from resembling HAL 9000. Furthermore, users could interrupt the agent’s responses without disrupting the flow of conversation.

GPT-4o’s Multifaceted Capabilities

In the demonstration, GPT-4o served various roles, such as interpreting an Italian-English conversation between two individuals, assisting in solving a handwritten algebra equation, analyzing specific sections of programming code, and even improvising a bedtime story featuring a robot.

To conclude, GPT-4o is now accessible for general use, with additional features slated to be unveiled in the coming weeks. You can witness its capabilities firsthand in the video provided below.

Rock, Paper, Scissors with GPT-4o

Read the original article on: New Atlas

Read more: ChatGPT and The Dark Web, Yet, A Hushed Talk in The Tech World

Share this post