5 Tips to Write Conversational Content
14/06/2023Pinterest: New Program That Aims to Improve Safety Elements for Young Users
21/06/2023Meta showcases a new ‘Voicebox’ Speech-to-Text translation tool. The system enables improved text-to-audio translation. Meta has printed a summary of its new ‘Voicebox’ AI system, allowing customers to translate textual content to audio, in a spread of types and voices.
Introducing Voicebox, a new breakthrough generative speech system based on Flow Matching, a new method proposed by Meta AI. It can synthesize speech across six languages, perform noise removal, edit content, transfer audio style & more.
— Meta AI (@MetaAI) June 16, 2023
More details on this work & examples ⬇️
Meta claims Voicebox is the first AI that can generalize text-to-speech tasks that it wasn’t trained to accomplish and describes it as a “breakthrough.” Further, it claims to produce results up to 20 times faster than state-of-the-art artificial intelligence models with comparable performance. The new system, dubbed Voicebox, eschews traditional TTS architecture in favor of a model more akin to OpenAI’s ChatGPT or Google’s Bard.
The main difference between Voicebox and similar TTS models, such as ElevenLabs Prime Voice AI, is that Meta Voicebox can generalize through in-context learning. For example, is identical to ChatGPT and other transformer models. Meta’s Voicebox uses large-scale training data sets. Previous efforts to use massive troves of audio data have resulted in severely degraded audio outputs. For this reason, most TTS systems use small, highly curated, labeled data sets. However, Meta overcomes this limitation through a novel training scheme that ditches labels and curation for an architecture capable of “in-filling” audio information.
According to a post made by Meta AI on June 16, the new Voicebox is the “first model that can generalize to speech-generation tasks it was not specifically trained to accomplish with state-of-the-art performance.” This makes it possible for Voicebox to translate text to speech, remove unwanted noise by synthesizing replacement speech and even apply a speaker’s voice to different language outputs. According to an accompanying research paper published by Meta, its pre-trained Voicebox system can accomplish all of this using only the desired output text and a three-second audio clip.
As defined by Meta:
“Voicebox can produce prime quality audio clips and edit pre-recorded audio – like eradicating automotive horns or a canine barking – all whereas preserving the content material and elegance of the audio. The mannequin can also be multilingual and may produce speech in six languages. Sooner or later, multipurpose generative AI fashions like Voicebox might give natural-sounding voices to digital assistants and non-player characters within the metaverse. They may enable visually impaired folks to listen to written messages from associates learn by AI of their voices, give creators new instruments to simply create and edit audio tracks for movies, and far more.”
As Meta notes, Voicebox additionally allows you to use fashions of voice for translation, so you should utilize an audio clip of one other individual as a way to make your text-to-speech translation sound like that individual is talking, by way of simply seconds of audio enter.
You can learn more about Meta’s Voicebox challenge here: https://about.meta.com/
Source: Social Media Today
Find more information here: http://bit.ly/2BPQn38
For more information contact us at: [email protected]