Speech Blog
HEAR ME - Speech Blog  |  Read more January 13, 2020 - Voice Assistants Going Embedded
HEAR ME - Speech Blog



Posts Tagged ‘low power audio DSP’s’

KitKat’s Listening!

November 15, 2013

Android introduced the new KitKat OS for the Nexus 5, and Sensory has gotten lots of questions about the new “always listening” feature that allows a user to say “OK Google” followed by a Google Now search. Here’s some of the common questions:

  1. Is it Sensory’s? Did it come from LG (like the hardware)? Is it Google’s in-house technology? I believe it was developed within the speech team at Android. LG does use Sensory’s technology in the G2, but this does not appear to be an implementation of Sensory. Google has one of the smartest, most capable, and one of the larger speech recognition groups in the industry, and they certainly have the chops to build a key word spotting technology. Actually, developing a voice activated trigger is not very hard. There are several dozens of companies that can do this today (including Qualcomm!). However, making it useable in an “always on” mode is very difficult where accuracy is really important.
  2. The KitKat trigger is just like the one on MotoX, right? Ugh, definitely not. Moto X really has “always on” capabilities. This requires low power operation. The Android approach consumes too much power to be left “always on”. Also, the Moto X approach combines speaker verification so the “wrong” users can’t just take over the phone with their voice. Motorola is a Sensory licensee, Android isn’t.
  3. How is Sensory’s trigger word technology different than others?
    • First of all, Sensory’s approach is ultra low power. We have IC partners like Cirrus Logic, DSPG, Realtek, and Wolfson that are measuring current consumption in the 1.5-2mA range. My guess is that the KitKat implementation consumes 10-100 times more power than this. This is for 2 reasons, 1) We have implemented a “deeply embedded” approach on these tiny DSPs and 2) Sensory’s approach requires as little as 5 MIPS, whereas most other recognizers need 10 to 100 times more processing power and must run on the power hungry Android processor!
    • Second…Sensory’s approach requires minimal memory. These small DSP’s that run at ultra low power allow less RAM and more limited memory access. The traditional approach to speech recognition is to collect tons of data and build huge models that take a lot of memory…very difficult to move this approach onto low power silicon.
    • Thirdly, to be left always on really pushes accuracy, and Sensory is VERY unique in the accuracy of its triggers. Accuracy is usually measured in looking at the two types of errors – “false accepts” when it fires unintentionally, and “false rejects” when it doesn’t let a person in when they say the right phrase. When there’s a short listening window, then “false accepts” aren’t too much of an issue, and the KitKat implementation has very intentionally allowed a “loose” setting which I suspect would produce too many false accepts if it was left “always on”. For example, I found this YouTube video that shows “OK Google” works great, but so does “OK Barry” and “OK Jarvis”
    • Finally, Sensory has layered other technologies on top of the trigger, like speaker verification, and speaker identification. Also Sensory has implemented a “user defined trigger” capability that allows the end customer to define their own trigger, so the phone can accurately and at ultra low power respond to the users personalized commands!

Follow the Leader in Mobile

October 2, 2012

I really enjoyed reading this article interviewing Vlad Sejnoha, Nuance’s CTO. Most people would consider Nuance the leader in speech recognition today, and Vlad is certainly a very smart, thoughtful, and articulate man.

I enjoyed it for a few different reasons. The first and main reason I liked the article is it helps to push the idea Sensory has been championing for the past several years that devices don’t have to be touched to enable voice commands, and that you should be able to just start talking to things like we talk to each other. That’s what Sensory calls TrulyHandsfree, and it’s the technology that showed up in the first Bluetooth carkit that requires no touching (by BlueAnt) AND the first mobile phones that responded to voice without touch (Samsungs Galaxy SII and SIII and Note). Even hit toys like Mattel’s award winning Fijit Friends and Hallmarks Interactive Books use this unique technology that just works when you talk to it. In fact, it really was the TrulyHandsfree feature that made Vlingo so popular, as this Vlingo video nicely states in its comparison between Vlingo and Siri. (Nuance bought Vlingo earlier this year, but the Sensory TrulyHandsfree didn’t come with it!).

The article says “Sejnoha believes that within a year or two you’ll be able to talk to your smartphone even as it lies idle on a desk, asking it questions such as, “When’s my next appointment?” The phone will be able to detect that you are speaking, wake itself up, and accomplish the task at hand.” Check out this Sensory video…this is definitely what Vlad is talking about! Yeah, we can do it today, and it’s REALLY FAST and really accurate.

But is it low power? Well that’s ABSOLUTELY KEY. That’s why Sensory partnered with Tensilica. Tensilica is a leader in low power audio DSP’s for Mobile Phones. Sensory already has its TrulyHandsfree running on chips that run under 5 mW for a COMPLETE audio system. And that’s without having to wake up to understand the task at hand. We can drop by another 1-2mW by not being always on, but turning the recognizer off doesn’t do much. That’s because even if the full recognizer is shut down, you still need to run a mic and preamp, which drives a lot of the current consumption when you have a low power recognizer like TrulyHandsfree (it can run on as little as 7 MIPS!). This means it’s REALLY critical to have a low power recognizer as well, and that’s Sensory’s forte. We are expecting that by next year we will have systems running at 1-3mW!

The article mentions “persistent” listening, but even though I’ve always preached this “always on” concept, I think what will really explode is “intelligent automatic listening”. That is, the device figures out when it needs to listen for what and turns on to listen for it. So it doesn’t always have to be on…it will just seem that way because the devices are so intelligent. For example a certain traveling speed could make a phone listen for car commands or car wake up words. An incoming call could cause the recognizer to wake up and listen for Answer/Ignore. For these to work, the device needs to run not only at very low power but also with VERY high accuracy. You don’t want to have a background conversation triggering the phone call to hang up! Accuracy is another Sensory forte! The combination of accuracy with low power consumption is a difficult mix to conquer! Sensory’s accuracy is not only in noise but also from a distance…that is when a recognizer works well with a poor S/N ratio, that means the signal can be lower (like from distance) and/or the noise can be higher.

So it’s really cool that Nuance is getting on the bandwagon behind Sensory’s innovations like TrulyHandsfree at low power. In fact after Samsungs release on the Galaxy SII with Sensory, Nuance did come out with an always “on and listening mobile device”; for fun we quickly ported our technology onto the same phone to compare…check out this video.

Something interesting we noticed was that after Sensory announced its speaker verification and speaker ID for mobile devices at CTIA this year, Nuance shortly thereafter came out with their own announcement, but there were no demo’s available so we couldn’t do a comparison video.