SeamlessM4T is a neural network capable of processing both text and audio, enabling it to perform tasks such as text-to-speech, speech-to-text, speech-to-speech, and text-to-text translations for "up to 100 languages."
Meta, the creator of this artificial intelligence, says that its goal in developing SeamlessM4T is to assist people speaking different languages in communicating more effectively with each other.
In its announcement of SeamlessM4T, Meta referenced the fictional Babel Fish from Douglas Adams' classic science fiction series "The Hitchhiker's Guide to the Galaxy." The Babel Fish is a fictional fish that, when placed in one's ear, can instantly translate any spoken language.
Similar to the Babel Fish, SeamlessM4T aims to eliminate the language barrier between people.
Creating a universal language translator like the Babel Fish is challenging because existing speech-to-speech and speech-to-text systems cover only a small portion of the world's languages.
Meta is not the first artificial intelligence company to offer machine learning translation tools.
Google Translate has been using machine learning techniques since 2006, and large language models like GPT-4 are known for their cross-lingual translation capabilities.
However, recent technological advancements have heated up in the field of speech processing.
In September, OpenAI released its own open-source speech-to-text translation model called Whisper, which can recognize speech in audio and accurately translate it into text.
SeamlessM4T aims to leverage this trend by expanding multimodal translation capabilities to a larger number of languages.