OpenAI continues to push the boundaries of AI tech. First, it released a tool that can conjure digital images with just a description. Then, it revealed Sora, a technology that generates Hollywood-quality motion videos. And now, it is stepping into the realm of voice recreation.
The latest from OpenAI is a feature that reads text aloud in a remarkably human-like voice. This breakthrough in artificial intelligence marks a significant leap forward, but it also raises concerns about the potential for deepfake manipulation (via Bloomberg).
The company has unveiled early results from testing this feature, offering demos, which you can listen to here. Dubbed Voice Engine, this text-to-speech model is currently in a limited trial phase with about 10 developers. OpenAI has opted for a cautious approach rather than a widespread release.
Following feedback from stakeholders like policymakers and educators, OpenAI has decided to scale back its initial rollout. The company acknowledges the serious risks of generating human-like speech, especially during sensitive times like an election year.
The company wrote in a blog post:
We recognize that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year. We are engaging with US and international partners from across government, media, entertainment, education, civil society, and beyond to ensure we are incorporating their feedback as we build.
Unlike previous audio projects, Voice Engine stands out for its ability to mimic individual voices with remarkable accuracy, capturing nuances in cadence and intonation. And all it needs is just 15 seconds to replicate a person’s voice.
Among OpenAI’s partners is the Norman Prince Neurosciences Institute at Lifespan, where the technology is used to help patients in voice rehabilitation. For instance, it was used to restore the speech of a young patient who had difficulty speaking clearly due to a brain tumor. The AI learned from earlier recordings for a school project.
In addition to its applications in healthcare, the custom speech model has caught the attention of companies like Spotify, which sees potential in translating audio content, such as podcasts, into multiple languages. However, OpenAI emphasizes ethical guidelines for using the technology, including obtaining consent from original speakers and disclosing AI-generated content to listeners.Also, before considering a wider release, OpenAI is soliciting feedback and urging public awareness of the challenges posed by advanced AI tech. This includes advocating for the phasing out of voice authentication in sensitive areas like banking.
OpenAI added in its blog post:
It’s important that people around the world understand where this technology is headed, whether we ultimately deploy it widely ourselves or not.
Additionally, the company adds that it hopes this preview sparks a conversation about addressing the risks associated with AI advancements and promoting societal resilience.