On Monday, Spotify rolled out a restricted pilot program that makes use of AI to automatically translate podcasts into varied languages, using voice synthesis know-how from OpenAI to protect the unique speaker’s voice. The characteristic goals to supply a extra genuine listening expertise in contrast to conventional dubbing. It might additionally introduce language errors which might be tough for non-native audio system to detect, since machine translation is way from an ideal know-how.
In its press launch saying this system, Spotify says it’s a platform that enables creators to share their work around the globe. Then it asks a query: “With current developments, we’ve been questioning: Are there extra methods we will bridge the language hole in order that these voices may be heard worldwide?”
Spotify’s reply is Voice Translation, which may reportedly translate English voices into Spanish, French, and German whereas retaining the distinctive vocal traits of the speaker. The characteristic is at the moment getting used with solely choose podcasters, corresponding to Dax Shepard, Monica Padman, Lex Fridman, Bill Simmons, and Steven Bartlett.
“We imagine {that a} considerate method to AI may also help construct deeper connections between listeners and creators, a key part of Spotify’s mission to unlock the potential of human creativity,” stated Ziad Sultan, Spotify’s VP of personalization, within the announcement.
Spotify says the translated episodes might be out there worldwide to each Premium and Free customers. Users can entry the translations through Spotify’s “Now Playing View” for supported episodes or a devoted Voice Translations Hub that may proceed to add extra translated content material.
On X, Lex Friedman posted a pattern of his voice cloned and translated into Spanish, writing, “This is me talking Spanish, thanks to wonderful work by Spotify AI engineers. The translation & voice-cloning are totally achieved by AI. Language can create boundaries of understanding & thus gas division. I am unable to look forward to AI to break down this barrier & reveal our frequent humanity.”
Lost in translation
But not all podcasters are excited concerning the potential for automated AI translations. Reacting to the information on BlueSky, Retronauts co-creator and co-host Jeremy Parish posted, “Another motive to roll my eyes when individuals ask why we don’t make the podcast out there on Spotify.”
In the previous, we have seen voice-cloning know-how from each Microsoft and Meta analyze samples of supply audio, then increase that audio with a big coaching information set of voices to synthesize a brand new, related voice. That know-how can probably fail when an individual’s vocal type is not represented properly within the information set of coaching samples, particularly with sure accents.
Here, Spotify is including a further layer of complexity, hoping to seamlessly translate which means between languages with out making errors, one thing Meta has additionally tried with SeamlessM4T. Over the previous decade, AI-driven translation has made massive strides, however it hasn’t knocked human translators fully out of the sport. Industry consultants level out that these programs nonetheless get tripped up on nuance and do not perceive cultural context, affecting the standard of the translated materials.
Tech-savvy customers seemingly anticipate translation errors when the supply is correctly framed as a machine translation, however when the errors come within the podcaster’s personal voice, it could add a brand new dimension of bother, particularly if the translated audio is taken out of context and later presumed to be unique. Additionally, if the unique speaker would not know the translated language, they cannot examine to see if the interpretation precisely displays their unique intentions. That’s placing numerous belief—and private popularity—within the palms of unproven automation know-how.
For now, it seems that Spotify’s program is engaged on a restricted, opt-in foundation amongst choose podcasters solely, so problems with consent over cloning podcast visitor voices do not seem to be at play except this rolls out extra extensively. Going ahead, Spotify says it hopes to collect suggestions from creators and listeners to refine the voice translation characteristic. However, with over 100 million common podcast listeners on the platform, that is 100 million methods this experiment might go poorly if the interpretation know-how makes embarrassing errors.