Speech Blog
HEAR ME - Speech Blog  |  Read more June 11, 2019 - Revisiting Wake Word Accuracy and Privacy
HEAR ME - Speech Blog



Posts Tagged ‘speaker identification’

Sensory’s CEO, Todd Mozer, interviewed on FutureTalk

October 1, 2015

Todd Mozer’s interview with Martin Wasserman on FutureTalk

KitKat’s Listening!

November 15, 2013

Android introduced the new KitKat OS for the Nexus 5, and Sensory has gotten lots of questions about the new “always listening” feature that allows a user to say “OK Google” followed by a Google Now search. Here’s some of the common questions:

  1. Is it Sensory’s? Did it come from LG (like the hardware)? Is it Google’s in-house technology? I believe it was developed within the speech team at Android. LG does use Sensory’s technology in the G2, but this does not appear to be an implementation of Sensory. Google has one of the smartest, most capable, and one of the larger speech recognition groups in the industry, and they certainly have the chops to build a key word spotting technology. Actually, developing a voice activated trigger is not very hard. There are several dozens of companies that can do this today (including Qualcomm!). However, making it useable in an “always on” mode is very difficult where accuracy is really important.
  2. The KitKat trigger is just like the one on MotoX, right? Ugh, definitely not. Moto X really has “always on” capabilities. This requires low power operation. The Android approach consumes too much power to be left “always on”. Also, the Moto X approach combines speaker verification so the “wrong” users can’t just take over the phone with their voice. Motorola is a Sensory licensee, Android isn’t.
  3. How is Sensory’s trigger word technology different than others?
    • First of all, Sensory’s approach is ultra low power. We have IC partners like Cirrus Logic, DSPG, Realtek, and Wolfson that are measuring current consumption in the 1.5-2mA range. My guess is that the KitKat implementation consumes 10-100 times more power than this. This is for 2 reasons, 1) We have implemented a “deeply embedded” approach on these tiny DSPs and 2) Sensory’s approach requires as little as 5 MIPS, whereas most other recognizers need 10 to 100 times more processing power and must run on the power hungry Android processor!
    • Second…Sensory’s approach requires minimal memory. These small DSP’s that run at ultra low power allow less RAM and more limited memory access. The traditional approach to speech recognition is to collect tons of data and build huge models that take a lot of memory…very difficult to move this approach onto low power silicon.
    • Thirdly, to be left always on really pushes accuracy, and Sensory is VERY unique in the accuracy of its triggers. Accuracy is usually measured in looking at the two types of errors – “false accepts” when it fires unintentionally, and “false rejects” when it doesn’t let a person in when they say the right phrase. When there’s a short listening window, then “false accepts” aren’t too much of an issue, and the KitKat implementation has very intentionally allowed a “loose” setting which I suspect would produce too many false accepts if it was left “always on”. For example, I found this YouTube video that shows “OK Google” works great, but so does “OK Barry” and “OK Jarvis”
    • Finally, Sensory has layered other technologies on top of the trigger, like speaker verification, and speaker identification. Also Sensory has implemented a “user defined trigger” capability that allows the end customer to define their own trigger, so the phone can accurately and at ultra low power respond to the users personalized commands!

Mobile Users Get it!

May 30, 2012

Sensory’s had a lot of press lately. We made 3 big announcements all pretty much together:

1) Announcing speaker verification

2) Announcing speaker identification

3) Saying Sensory is in the Samsung Galaxy S3

Sensory announced these just before CTIA in New Orleans. We had a small booth at the show, and gave demos at several events (on the CTIA stage and floor, at the Mobility Awards dinner, and at the excellent Pepcom Mobile Focus event).

We got a lot of nice press from this. I was thrilled that the Speech Technology email newsletter put our verification release as the featured and lead story. One of the articles I like best, though, just came out last week by Pete Pachal at Mashable http://mashable.com/2012/05/29/sensory-galaxy-s-iii/

This article is great for several key reasons. One is that Pete gets it. He didn’t just reprint our press release, but he added his commentary and wrapped it up in a nice story that hits some of the key issues.

However, what’s best is what the readers wrote in. I LOVE their insights and comments. Here’s a few of the dialogs with my commentary attached:

JB: Seriously??? You still need to push a button to use Siri? I’ve had the “wake with voice” option on my crusty old HTC Incredible, via VLingo inCar, for about 2 years now. Hard to believe Apple is that far behind.

My response: EXACTLY JB! In fact that crusty old HTC using Vlingo, also uses Sensory’s TrulyHandsfree approach! Vlingo was our first licensee in the mobile space.

Scott: But this is talking about OS integration instead of app integration. And as I’m sure you’ve seen on your phone, and as the article noted, wake with voice options currently use a lot of power, which means I can’t see a lot of people willing to use it.

My response: Precisely, Scott! This is why we are implementing the “deeply embedded” approach that will take power consumption down by a factor of 10! Nevertheless, users LOVE it even if it consumes power:

JB – I use it all the time and since my phone plugs into the car’s adapter, I don’t really worry at all about power usage. It’s never been a problem.

My response – Yes, Vlingo and Samsung did a very nice implementation by having an “always listening” mode, particularly useful while driving. Other approaches we expect to see in the future are intelligent sensor based approaches so the phone knows when to listen and when not to (e.g. why not have it turn on and listen whenever you start traveling past 20 MPH, etc.)

Is there anything to prevent me from messing with another person’s phone?

Fillfill Ha ha, imagine being in an auditorium and yelling “Hi Galaxy! … Erase Address Book! … Confirm!”

My comment – Funny! This is one of the reasons we have added speaker verification and identification features to the trigger function

DhanB – Siri doesn’t require a button. It can be activated by lifting the phone up to your face.

Great reader responses:

Darkreaper – …..while driving? (Right! That’s illegal in California and other states!)

Tone – Yes, but with the Samsung Galaxy II, I don’t have to touch it at all. As the article states, this is crucial when you’re in a situation, such as driving. I’ve dropped the phone on the floor while driving and I was still able to send a text message, an email and place a call with it sliding around the back seat. (Bluetooth) iPhone can’t compete, sorry. :-/

…and of course the old “butt dialing” problem:

Jason – This makes me think of the old “butt dialing” problem when you sat down on your phone cause I’d much prefer a manual trigger to prevent accidental usage.

My comment: Once again, I agree with the readers. Sensory isn’t pushing to force “always listening” modes on users, we just want to allow them the choice. We strongly recommend that products have multiple options for anything that can be done by voice or touch. We believe the users should have the right and the ability to access the power of mobile devices without being forced to touch them. And if they want to turn off this ability, that is certainly their choice! We turn off our ringers (at least we should) when we enter a meeting or go to the movies. Likewise, we can turn off hands free voice control when it’s not appropriate…and with the growing presence and power of intelligent sensors, it will get easier and easier (albeit with some mishaps along the way!) for the phones to know when they should listen!

A lot of people commented about Siri. Apple isn’t stupid. They get it that hitting buttons isn’t the most convenient way to always access voice control. That’s why there’s a sensor in place when you lift the phone to your face (of course still requiring touch), it’s also why Siri can speak back. Apple pushed the Voice User Interface forward with Siri…Samsung pushed it further with TrulyHandsfree wake up. There will be a lot of back and forth over the coming years and voice features will continue as a major battleground.

As devices get increasing utility WITHOUT touching the phones (e.g. remote control functions, accessing and receiving data by voice, etc.), the need for a TrulyHandsfree approach will grow stronger and stronger, and Sensory will continue to have the BEST solution – More Accurate, Lower Power, Faster Response Times, and NOW with built in speaker verification or speaker ID!