American artificial intelligence (AI) company OpenAI unveiled Monday its new model, named GPT-4o, which is much faster compared to previous models.
GPT-4o, in which "o" stands for "omni," is a step towards more natural human-computer interaction, as it accepts input of any combination of text, audio and image, while it generates any combination of text, audio, and image outputs, the company said in a statement.
"It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in a conversation," it added.
In addition, GPT-4o is better at vision and audio understanding compared to existing models, while it can reason across audio, vision, and text in real time, according to the company.
While GPT-4 loses a lot of information since it cannot directly observe tone, multiple speakers, or background noises, and it cannot output laughter, singing, or express emotion; GPT-4o provides all inputs and outputs being processed by the same neural network.
Microsoft-backed OpenAI said GPT-4o also has undergone extensive teaming with more than 70 experts in domains such as social psychology, bias and fairness, and misinformation to identify risks that are introduced by the newly added modalities.