One of the great things about Sensory is the traction we have had over the years. Not just traction that produces revenues and profits, but traction that gives us insights into what hundreds of multibillion-dollar companies want in their speech solutions. Since Sensory introduced the first commercially successful voice triggers aka wake word that called up a voice assistant (e.g. Samsung Galaxy S2 and MotoX), we have been getting requests for the same thing:
“How do I get rid of the wake word?”
And we have been saying the same answer: it’s not easy but the way to do it is by combining other sensors (e.g. cameras, touch, IR, etc.).”
Of course, technically it’s easy! Sensory has great command-based engines like TrulyHandsfree that can listen for a large set of “disguised” wake-up commands like “turn the lights on” and “what time is it”. We have even larger engines like TrulyNatural that could be always on, private by design and listening for hundreds of thousands of commands. But the larger that listening set gets, the more difficult it is to eliminate the false accepts and false rejects…and that’s the issue.
Here’s what made me think of writing this blog…for three days over the past week, my kitchen conversations with my wife have been interrupted by Alexa when we weren’t talking to her. Now for clarity, I love Alexa, and I think Amazon has GREAT technology for both wake words and cloud-based revalidation (I believe Sensory is better overall, but that doesn’t change my point here), but if it’s not good enough with wake words it’s only going to get worse if we remove the wake words.
Now in some situations, it makes sense to have LIMITED open listening windows. We used to recommend this scheme to wearable companies that didn’t want to wear down the battery by being “always on”. The idea is to be “sometimes on” for example a wearable (or for that matter a smart speaker) could have a listening window that lasts for 5 seconds after any activity or voice communication. Some smart speaker makers are playing with this kind of approach in dialog-based situations.
I saw this article a few years ago saying Google might drop the wake word.. Yeah right. I had to turn off the wake word from my Android phone because half the times I mentioned Google it thought I said “Hey Google” or “OK Google” and started listening/spying on me. Even if they don’t interrupt me and start talking, I think it is totally unacceptable to listen to me when I talk about their brand.
So I get it…people want wake words to go away…but people want it to work after they go away. Not to interrupt accidentally, not to spy quietly analyzing everything spoken, and not to completely ignore. I think without the use of more sophisticated sensors and user-trained behaviors we are still a long way from getting rid of wake words for mainstream voice assistant users.
The examples I always here are things like “If I’m in the room with someone, I don’t need to start every conversation with “Hey XXX”, they just know I’m talking to them.” Well, they have all these great sensors that let them know you are talking to them, and until we get vision and more into our assistants, we are going to have to use wake words!