A new ‘voice engine’ from OpenAI only needs 15 seconds to jump speech.

A New 'Voice Engine' From Openai Only Needs 15 Seconds To Jump Speech.



OpenAI, the company behind the leading generative AI tool ChatGPT, has unveiled a new voice cloning technology it calls the “Voice Engine.” This audio model can reproduce a person's voice, intonation, and other unique human speech patterns based on a relatively small sample of original audio.

“It's remarkable that a small model with a 15-second sample can create emotive and realistic sounds,” the company said in a blog post on Friday.

For comparison, AI audio platform ElevenLabs features a fast audio cloning tool that requires at least one minute of samples. A professional service level of approximately 10 minutes of continuous speaking is required for best results.

The company has shown various examples of what this technology can do. In one example, the voice of a young patient who had lost the ability to speak due to an arterial brain tumor was recorded using an old recording she made for a school project. That's what you'll hear today, according to OpenAI.

okex

OpenAI has worked with Lifespan, a non-profit from Buran University School of Medicine, and the creators of a device called Livox, an “alternative communication app” for people with disabilities. The team was able to work on a recording of the woman's presentation to the school:

The open AI voice engine is able to provide fast text-to-speech capabilities that allow the patient to speak effectively in her own voice.

OpenAI also demonstrated how Heijen can use the technology to generate natural-sounding translations uploaded in a specific language in another language.

The company's sound engine was first developed in 2011. It says it's due by the end of 2022, and is already being used to power the default voices available through OpenAI's text-to-speech API, as well as ChatGPT's voice and read-aloud feature. With recent developments, the company says it is taking precautions before a wider release.

Acknowledging the “deeply fake” practice, OPNI wrote, “we hope to start a conversation about the responsible deployment of synthetic voices and how society can adapt to these new capabilities.” The voices of celebrities, government officials, and increasingly private citizens are being imitated for nefarious purposes ranging from political campaigns, false advertising, and outright criminal activity.

In fact, Meta announced last summer that its AI voice tool was specifically held back due to “potential abuse risks.”

“Due to our approach and voluntary commitment to AI safety, we are choosing to preview this technology but not release it widely at this time,” OpenAI explained.

Even before its official release, OpenAI is putting restrictions on the Voice Engine—including a list of celebrities it doesn't look like.

We believe that any widespread deployment of artificial voice technology should include voice authentication experiences that ensure that the original speaker is knowingly adding their voice to the service, and a voice inventory that prevents voices from being created. Same with celebrities,” OpenAI wrote.

Partners testing Voice Engine today agree to OpenAI's usage policies, which prohibit impersonating another individual or organization without permission. In addition, the company requires clear and informed consent from the original speaker, and does not allow developers to build a way for individual users to create their own voice.

“Based on these discussions and the results of these micro-tests, we will make more informed decisions about how much and how to deploy this technology,” the blog post reads.

In addition to the sound engine, Open AI is working on several projects in parallel. CEO Sam Altman revealed that the company is working on releasing GPT-5 this year. The company Sora showed its generative video tool. The company says Sora will be the most advanced video generator on the market, surpassing models like Pica, Smooth Video Broadcast and Runway ML.

Sora is currently only available to “red teams” registered with Open AI to ensure that it cannot be misused.

Voice Engine can certainly outperform other voice cloning tools, including Meta, ElevenLabs, WellSaid Labs, and open source models like RVC.

Open AI is also working on a secret project called Q*, the name of which has only been released. Sam Altman declined to provide any specifics, but said the research team is focused on finding techniques and approaches that can better support AI reasoning.

Edited by Ryan Ozawa.

Stay on top of crypto news, get daily updates in your inbox.

Leave a Reply

Pin It on Pinterest