Bloomberg recently reported that Apple is working to drop the “Hey” requirement in front of Siri in a well-written and very interesting article by Mark Gurman:
Mark got a lot of things right in discussing the complexity and difficulty of doing this, but a few things could be explained in more detail. By the way, Sensory has already created “Siri” only wake-words so we fully understand this complexity, especially across accents and noise environments.
Mark says: If successful, the shift from “Hey Siri” to “Siri” would match Amazon.com Inc.’s Alexa, which requires users to simply say “Alexa” rather than “Hey Alexa.” This is kind of true but it’s more complex than that…Alexa is 3 syllables and has a wider range of sounds. Siri is only 2 syllables with fewer sounds. To simplify wake word decisions Sensory often recommends more syllables as being better.
Privacy isn’t really discussed but is a key issue here. Especially since Apple has marketed privacy as a key differentiator. More false accepts will occur with “Siri” than with “Hey Siri” (a false accept is when the device starts listening when it shouldn’t be listening). Even if Apple can make a perfect algorithm, they will need to deal with words like “serious” or people discussing “Siri” in a sentence. With more false accepts occurring, the only way to deal with it is to analyze the speech around the perceived wake word, and this means they will be listening in when they shouldn’t be.
By the way, this “listening in” CAN be done on-device to keep it private, but I suspect that Google and Amazon send the data to the cloud to analyze and use for building better models, and I have no idea what Apple’s plans are.
Final thought…Have you noticed that Amazon, Apple, Microsoft and others use a made-up brand name (e.g. Cortana, Siri, Alexa, etc) for their assistant while Google uses “Google”? Do you think Google considered that they would get MORE false fires and collect MORE private and unintended conversations about Google?