Alexa. Cortana. Google Assistant. Bixby. Siri. Hundreds of millions of people use voice assistants developed by Amazon, Microsoft, Google, Samsung, and Apple every day, and that number is growing all the time. According to a recent survey conducted by tech publication Voicebot, 90.1 million U.S. adults use voice assistants on their smartphones at least monthly, while 77 million use them in their cars, and 45.7 million use them on smart speakers. Juniper Research predicts that voice assistant use will triple, from 2.5 billion assistants in 2018 to 8 billion by 2023.
What most users don’t realize is that recordings of their voice requests aren’t deleted right away. Instead, they may be stored for years, and in some cases they’re analyzed by human reviewers for quality assurance and feature development. We asked the major players in the voice assistant space how they handle data collection and review, and we parsed their privacy policies for more clues.
Amazon says that it annotates an “extremely small sample” of Alexa voice recordings in order to improve the customer experience — for example, to train speech recognition and natural language understanding systems “so [that] Alexa can better understand … requests.” It employs third-party contractors to review those recordings, but says it has “strict technical and operational safeguards” in place to prevent abuse and that these employees don’t have direct access to identifying information — only account numbers, first names, and device serial numbers.
“All information is treated with high confidentiality and we use multi-factor authentication to restrict access, service encryption and audits of our control environment to protect it,” an Amazon spokesperson said in a statement.
In web and app settings pages, Amazon gives users the option of disabling voice recordings for features development. Users who opt out, it says, might still have their recordings analyzed manually over the regular course of the review process, however.
Apple discusses its review process for audio recorded by Siri in a white paper on its privacy page. There, it explains that human “graders” review and label a small subset of Siri data for development and quality assurance purposes, and that each reviewer classifies the quality of responses and indicates the correct actions. These labels feed recognition systems that “continually” enhance Siri’s quality, it says.
Apple adds that utterances reserved for review are encrypted and anonymized and aren’t associated with users’ names or identities. And it says that additionally, human reviewers don’t receive users’ random identifiers (which refresh every 15 minutes). Apple stores these voice recordings for a six-month period, during which they’re analyzed by Siri’s recognition systems to “better understand” users’ voices. And after six months, copies are saved (without identifiers) for use in improving and developing Siri for up to two years.
Apple allows users to opt out of Siri altogether or use the “Type to Siri” tool solely for local on-device typed or verbalized searches. But it says a “small subset” of identifier-free recordings, transcripts, and associated data may continue to be used for ongoing improvement and quality assurance of Siri beyond two years.
A Google spokesperson told VentureBeat that it conducts “a very limited fraction of audio transcription to improve speech recognition systems,” but that it applies “a wide range of techniques to protect user privacy.” Specifically, she says that the audio snippets it reviews aren’t associated with any personally identifiable information, and that transcription is largely automated and isn’t handled by Google employees. Furthermore, in cases where it does use a third-party service to review data, she says it “generally” provides the text, but not the audio.
Google also says that it’s moving toward techniques that don’t require human labeling, and it’s published research toward that end. In the text to speech (TTS) realm, for instance, its Tacotron 2 system can build voice synthesis models based on spectrograms alone, while its WaveNet system generates models from waveforms.
When we reached out for comment, a Microsoft representative pointed us to a support page outlining its privacy practices regarding Cortana. The page says that it collects voice data to “[enhance] Cortana’s understanding” of individual users’ speech patterns and to “keep improving” Cortana’s recognition and responses, as well as to “improve” other products and services that employ speech recognition and intent understanding.
It’s unclear from the page if Microsoft employees or third-party contractors conduct manual reviews of that data and how the data is anonymized, but the company says that when the always-listening “Hey Cortana” feature is enabled on compatible laptops and PCs, Cortana collects voice input only after it hears its prompt.
Microsoft allows users to opt out of voice data collection, personalization, and speech recognition by visiting an online dashboard or a search page in Windows 10. Predictably, disabling voice recognition prevents Cortana from responding to utterances. But like Google Assistant, Cortana recognizes typed commands.
Samsung didn’t immediately respond to a request for comment, but the FAQ page on its Bixby support website outlines the ways it collects and uses voice data. Samsung says it taps voice commands and conversations (along with information about OS versions, device configurations and settings, IP addresses, device identifiers, and other unique identifiers) to “improve” and customize various product experiences, and that it taps past conversation histories to help Bixby better understand distinct pronunciations and speech patterns.
You can delete Bixby conversations and recordings through the Bixby Home app on Samsung Galaxy devices.