Meta AI: New ‘Voicebox’ Speech-to-Text Translation Tool

Meta showcases a new ‘Voicebox’ Speech-to-Text translation tool. The system enables improved text-to-audio translation. Meta has printed a summary of its new ‘Voicebox’ AI system, allowing customers to translate textual content to audio, in a spread of types and voices.

Introducing Voicebox, a new breakthrough generative speech system based on Flow Matching, a new method proposed by Meta AI. It can synthesize speech across six languages, perform noise removal, edit content, transfer audio style & more.

More details on this work & examples ⬇️
— Meta AI (@MetaAI) June 16, 2023

Meta claims Voicebox is the first AI that can generalize text-to-speech tasks that it wasn’t trained to accomplish and describes it as a “breakthrough.” Further, it claims to produce results up to 20 times faster than state-of-the-art artificial intelligence models with comparable performance. The new system, dubbed Voicebox, eschews traditional TTS architecture in favor of a model more akin to OpenAI’s ChatGPT or Google’s Bard.

The main difference between Voicebox and similar TTS models, such as ElevenLabs Prime Voice AI, is that Meta Voicebox can generalize through in-context learning. For example, is identical to ChatGPT and other transformer models. Meta’s Voicebox uses large-scale training data sets. Previous efforts to use massive troves of audio data have resulted in severely degraded audio outputs. For this reason, most TTS systems use small, highly curated, labeled data sets. However, Meta overcomes this limitation through a novel training scheme that ditches labels and curation for an architecture capable of “in-filling” audio information.

According to a post made by Meta AI on June 16, the new Voicebox is the “first model that can generalize to speech-generation tasks it was not specifically trained to accomplish with state-of-the-art performance.” This makes it possible for Voicebox to translate text to speech, remove unwanted noise by synthesizing replacement speech and even apply a speaker’s voice to different language outputs. According to an accompanying research paper published by Meta, its pre-trained Voicebox system can accomplish all of this using only the desired output text and a three-second audio clip.

As defined by Meta:

“Voicebox can produce prime quality audio clips and edit pre-recorded audio – like eradicating automotive horns or a canine barking – all whereas preserving the content material and elegance of the audio. The mannequin can also be multilingual and may produce speech in six languages. Sooner or later, multipurpose generative AI fashions like Voicebox might give natural-sounding voices to digital assistants and non-player characters within the metaverse. They may enable visually impaired folks to listen to written messages from associates learn by AI of their voices, give creators new instruments to simply create and edit audio tracks for movies, and far more.”

As Meta notes, Voicebox additionally allows you to use fashions of voice for translation, so you should utilize an audio clip of one other individual as a way to make your text-to-speech translation sound like that individual is talking, by way of simply seconds of audio enter.

You can learn more about Meta’s Voicebox challenge here: https://about.meta.com/

Source: Social Media Today

Find more information here: http://bit.ly/2BPQn38

For more information contact us at: [email protected]

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to deliver advertisement when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
fr	3 months	The cookie is set by Facebook to show relevant advertisments to the users and measure and improve the advertisements. The cookie also tracks the behavior of the user across the web on sites that have Facebook pixel or Facebook social plugin.

5 Tips to Write Conversational Content

Pinterest: New Program That Aims to Improve Safety Elements for Young Users

5 Tips to Write Conversational Content

Pinterest: New Program That Aims to Improve Safety Elements for Young Users

Related posts

Reddit introduces Dynamic Product Ads to assist brands in reaching consumers during the discovery phase

Google Integrates Generative AI Image Creation into Demand Gen Campaigns

Meta Rolls Out Advanced AI Chatbot Across All Its Apps

Cookies Policy