Speaking to the friends and relatives from beyond the grave is an impossible task. But Amazon’s virtual assistant, Alexa might make that happen soon. Whether you find that creepy or comforting is entirely up to you. Also Read - Amazon to launch its inaugural drone delivery service this year: All you need to know
At Amazon’s Re: MARS (Machine Learning, Automation, Robots and Space) conference, Alexa’s Senior Vice President Rohit Prasad demonstrated a rather starting feature that Alexa might have one day: the ability to mimic voices. What’s equally, or perhaps more startling is the fact that this feature (or skill as they call it) would also enable Alexa to mimic voices of those we have lost or are no longer with us. Also Read - Amazon introduces new way to try on shoes virtually before purchasing them
Amazon also demoed this functionality at the event in the form of a recorded video. In the demonstration video, a child says, “Alexa, can Grandma finish reading me The Wizard of Oz?”. On hearing the request, Alexa acknowledges the child’s request in her usual voice, after which she begins reading the story in a voice that is very similar to that of the child’s dead grandmother, CNBC reported. Also Read - How to change your phone number on Amazon
While this might seem borderline creepy, but at the company’s annual event, Prasad pitched this functionality as a way to preserve memories. Amazon says that while this functionality of Alexa to mimic people’s voices ‘cannot eliminate the pain of loss’, it can definitely ‘make memories last’.
How does this feature work?
In case you are curious as to how this feature functions, Amazon told Engadget that Alexa’s new skill can create a synthetic voiceprint of an individual’s voice after being trained on as little as a minute of audio of the individual voice. Powering it are the advancements that the company has made in text-to-speech technology. Amazon also shared a whitepaper detailing these developments recently wherein it said that a ‘Voice Filter’ can use speech as little as one minute for Alexa to replicate the voice.
“State-of-the-art text-to-speech (TTS) systems require several hours of recorded speech data to generate high-quality synthetic speech…In this paper, we propose a novel extremely low-resource TTS method called Voice Filter that uses as little as one minute of speech from a target speaker. It uses voice conversion (VC) as a post-processing module appended to a pre-existing high-quality TTS system and marks a conceptual shift in the existing TTS paradigm, framing the few-shot TTS problem as a VC task,” the company wrote in the white paper.
But there are concerns
While it all seems quite noble, things aren’t as simple as they seem. Experts have for long been concerned about the tools that are used for replicating voices in deep fake videos. Although this skill is still in development and it remains unclear if Amazon will release to its users globally, it does raise concerns regarding this technology being misused by scammers and cyber criminals.