OpenAI’s Voice Engine on backburner as elections grip half of the world’s population

Published 18:42 IST, April 2nd 2024

Due to the political sensitivity of the current times, OpenAI has outlined several safety measures that they believe should be in place before widely deploying

Tech
5 min read

Follow:

Representative | Image: Unsplash

OpenAI’s Voice Engine: OpenAI, the leading Artificial Intelligence research lab that developed ChatGPT4, DALL.E, and recently teased Sora, has also announced another AI model named Voice Engine. The AI voice model uses the power of text input and a 15-second audio sample to produce close-to-natural-sounding speech.

The preliminary insights and results from a small-scale preview of Voice Engine shared by OpenAI’s announcement blog, are nothing short of impressive in terms of replicating the original voice. Despite the modest sample input requirement, the model showcases the ability to generate emotive and realistic voices.

No public release amid deepfake concerns

OpenAI is putting a hold on the release of its Voice Engine, due to concerns about misuse, especially during elections. The AI research lab is concerned that this technology could be used to create deepfakes or other misleading content. However, no indications about the tentative release of the Voice Engine have been provided in the official introductory announcement of the model.

This year marks a historic event with over 64 countries, including the European Union, heading to national elections. These elections will impact nearly half the world's population, which will shape the future for the coming years. Amid the elections, governments all around the world are already concerned about the deepfakes and finding a way to develop regulatory and identification mechanisms to tackle the challenge.

Due to the political sensitivity of the current times, OpenAI has outlined several safety measures that they believe should be in place before widely deploying synthetic voice technology. These include voice authentication to verify the identity of the speaker and a "no-go" list to prevent the creation of voices that are too similar to prominent figures.

The AI leader is calling out for including phasing out voice-based authentication for security purposes, developing policies to protect the use of individuals' voices, and educating the public about the capabilities and limitations of AI.

OpenAI believes that it is important for people to be aware of the potential risks of synthetic voice technology, even if they do not decide to widely release Voice Engine themselves. They hope to continue discussions about the challenges and opportunities of this technology with policymakers, researchers, and others.

On-going testing and varied use cases

Developed initially in late 2022, OpenAI is already using Voice Engine in its text-to-speech API, as well as ChatGPT Voice and Read Aloud functionalities. However, according to the AI research lab, the broader release of the Voice Engine is likely to be stalled due to caution and prudence as the new voice generator can make it difficult to identify the difference between real and AI-generated audio clips.

Even though OpeAI is going to use an inherent watermark to distinguish the voice samples from original recordings, the company plans to initiate a dialogue on the responsible deployment of synthetic voices and explore how “society can adapt to these transformative capabilities”.

Currently, OpenAI’s Voice Engine is being tested and used by a small group of “trusted partners” that are working on developing the applications of the technology and helping OpenAI with their approach, safeguards, and how the Voice Engine can be used for the better good across various industries. Here are some of the use cases of Voice Engine that OpenAI shared:

Assisting reading and education

Voice Engine can provide reading assistance to non-readers and children by generating natural-sounding, emotive voices. According to the OpenAI’s announcement, Age of Learning, an education technology company, has used Voice Engine to create pre-scripted voice-over content and personalised responses for student interaction.

The emotive voices can foster engagement and comprehension while also democratising the knowledge by making it accessible across multiple languages.

Content translation

Voice Engine facilitates the translation of content, such as videos and podcasts, enabling creators and businesses to reach a global audience fluently and in their voices. HeyGen, an AI visual storytelling platform, used Voice Engine for video translation, preserving the native accent of the original speaker.

In the samples shared by HeyGen, the 15-second sample shared with the Voice Engine was enough for it to generate voice across 5 different languages. The samples did not sound robotic and retained the actual voice of the speaker, making it sound hyper-realistic.

Health Services and support for non-verbal individuals

Dimagi a for-profit social enterprise that delivers open-source software technology suitable for underserved communities, used Voice Engine to improve essential service delivery in remote settings, particularly for community health workers providing counselling and support in various languages, including Swahili and Sheng.

Livox, an AI alternative communication app, empowers non-verbal individuals with unique and non-robotic voices across multiple languages, improving communication for those with speech-related disabilities.

Speech impairment recovery

The Norman Prince Neurosciences Institute at Lifespan explores the use of Voice Engine in clinical contexts, offering hope to individuals with speech impairments, such as those caused by oncologic or neurologic conditions.

According to OpenAI, the potential applications of Voice Engine extend far beyond these initial examples, promising transformative advancements in communication, education, healthcare, and beyond.

16:13 IST, April 2nd 2024

Due to the political sensitivity of the current times, OpenAI has outlined several safety measures that they believe should be in place before widely deploying

Follow:

Advertisement

Advertisement

No public release amid deepfake concerns

Advertisement

Advertisement

On-going testing and varied use cases

Advertisement

Advertisement

Assisting reading and education

Content translation

Health Services and support for non-verbal individuals

Speech impairment recovery