Revisiting Wake Word Accuracy and Privacy
June 11, 2019
I used to blog a lot about wake words and voice triggers. Sensory pioneered this technology for voice assistants, and we evangelized the importance of not hitting buttons to speak to a voice recognizer. Then everybody caught on and the technology went into main stream use (think Alexa, OK Google, Hey Siri, etc.), and I stopped blogging about it. But I want to reopen the conversation…partly to talk about how important a GREAT wake word is to the consumer experience, and partly to congratulate my team on a recent comparison test that shows how Sensory continues to have the most accurate embedded wake word solutions.
Competitive Test Results. The comparison test was done by Vocalize.ai. Vocalize is an independent test house for voice enabled products. For a while, Sensory would contract out to them for independent testing of our latest technology updates. We have always tested in-house but found that our in-house simulations didn’t always sync up with our customers’ experience. Working with Vocalize allowed us to move from our in-house simulations to more real-world product testing. We liked Vocalize so much that we acquired them. So, now we “contract in” to them but keep their data and testing methodology and reporting uninfluenced by Sensory.
Vocalize compared two Sensory TrulyHandsfree wake word models (1MB size and 250KB size) with two external wake words (Amazon and Kitt.ai’s Snowboy), all using “Alexa” as the trigger. The results are replicable and show that Sensory’s TrulyHandsfree remains the superior solution on the market. TrulyHandsfree was better/lower on BOTH false accepting AND false rejecting. And in many cases our technology was better by a longshot! If you would like see the full report and more details on the evaluation methods, please send an email request to either Vocalize (email@example.com) or Sensory (firstname.lastname@example.org).
It’s Not Easy. There are over 20 companies today that offer on-device wake words. Probably half of these have no experience in a commercially shipping product and they never will; there are a lot of companies that just won’t be taken seriously. The other half can talk a good talk, and in the right environment they can even give a working demo. But this technology is complex, and really easy to do badly and really hard to do great. Some demos are carefully planned with the right noise in the right environment with the right person talking. Sensory has been focused on low power embedded speech for 25 years, we have 65 of the brightest minds working on the toughest challenges in embedded AI. There’s a reason that companies like Amazon, Google, Microsoft and Samsung have turned to Sensory for our TrulyHandsfree technology. Our stuff works, and they understand how difficult it is to make this kind of technology work on-device! We are happy to provide APK’s so you can do you’re your own testing and judge for yourself! OK, enough of the sales pitch…some interesting stuff lays ahead…
It’s Really Important. Getting a wake word to work well is more important than most people realize. It’s like the front door to your house. It might be a small part of your house, but if you aren’t letting the homeowners in then that’s horrible, and if you are letting strangers in by accident that’s even worse. The name a company gives their wake word is usually the company brand name, imagine the sentiment that comes off when I say a brand name and it doesn’t work. Recently I was at a tradeshow that had a Mercedes booth. There were big signs that said “Hey Mercedes”…I walked up to the demo area and I said “Hey Mercedes” but nothing happened…the woman working there informed me that they couldn’t demo it on the show floor because it was really too noisy. I quickly pulled out my mobile phone and showed her that I could use dozens of wake words and command sets without an error in that same environment. Mercedes has spent over 100 years building up one of the best quality brand reputations in the car industry. I wonder what will happen to that reputation if their wake word doesn’t respond in noise? Even worse is when devices accidentally go off. If you have family members that listen to music above volume 7 then you already know the shock that a false alarm causes!
It’s about Privacy. Amazon, like Google and a few others seem to have pretty good wake words, but if you go into your Alexa settings you can see all of the voice data that’s been collected, and a lot of it is being collected when you weren’t intentionally talking to Alexa! You can see this performance issue in the Vocalize test report. Sensory substantially outperformed Amazon in the false reject area. This is when a person tries to speak to Alexa and she doesn’t respond. The difference is most apparent in babble noise where Sensory falsely rejected 3% and Amazon falsely rejected 10% in comparable sized models (250KB). However the False Accept difference is nothing short of AMAZING. Amazon false accepted 13 times in 24 hours of random noise. In this same time period Sensory false accepted ZERO times (on comparably sized 250KB models). How is this possible you may be wondering? Amazon “fixes” its mistakes in the cloud. Even though the device falsely accepts quite frequently, their (larger and more sophisticated) models in the cloud collect the error. Was that a Freudian slip? They correct the error…AND they COLLECT the error. In effect, they are disregarding privacy to save device cost and collect more data.
As the voice revolution continues to grow, you can bet that privacy will continue to be a hot topic. What you now understand is that wake word quality has a direct impact on both the user experience and PRIVACY! While most developers and product engineers in the CE industry are aware of wake words and the difficulty in making them work well on-device, they don’t often consider that competing wake words technologies aren’t created equally – the test results from Vocalize prove it! Sensory is more accurate AND allows more privacy!