HEAR ME -
Speech Blog
HEAR ME - Speech Blog  |  Read more October 25, 2018 - Biometrics’ Place in the Embedded Industry
HEAR ME - Speech Blog

Archives

Categories

Archive for the ‘speaker verification’ Category

Sensory Demos Awesome AI Mashup at Finovate!

September 28, 2017

Finovate is one of those shows where you get up on stage and give a short intro and live demo. They are selective in who they allow to present and many applicants are rejected. Sensory demonstrated some really cutting-, perhaps bleeding-, edge stuff by combining animated talking avatars, with text-to-speech, lip movement synchronization, natural language speech recognition and face and voice biometrics. I don’t know of any company ever combining so many AI technologies into a single product or demo!

Speech recognition has a long history of failing on stage, and one of the ways Sensory has always differentiated itself, is that our demos always work! And all our AI technologies worked here too! Even with bright backlighting, our TrulySecure face recognition was so fast and accurate some missed it. With the microphones and echo’s in the large room, our TrulyNatural speech recognition was perfect! That said, we did have a user-error… before Jeff and I got on stage he put his demo phone in DND mode, which cut our audio output – but quickly recovered from that mishap.


Banks Looking to Biometrics for Improved Customer Security

October 16, 2015

I saw a LinkedIn message to one of the biometrics groups in which I’m a member linking to a new video on biometrics:

I was quite surprised to see that I am actually in it!

It’s a great topic…Banks turning to biometrics. The video doesn’t talk too much about what’s really happening and why, so I’ll blog about a few salient points, worthy of understanding:

1)    Passwords are on their deathbed. This is old news and everyone gets it, but worthy of repeating. Too easy to crack and/or too hard to remember

2)    Mobile is everything, and mobile biometrics will be the entry point. Our mobile phones will be the tools to control and open a variety of things. Our phones will know who we are and keep track of the probability of that changing as we use them. Mobile banking apps will be accessed through biometrics and that will allow us to not only check balances, but pay or send money or speed ATM transactions.

3)    EMV credit cards are here…Biometric credit confirmation is next! Did you get a smart card from your bank? Europay, Visa, and MasterCard decided to improve fraud by shifting fraud risk based on security implemented. Smart cards are now, biometrics will be added to aid fraud prevention.

4)    It’s all about convenience & security. So much focus has been on security that convenience was often overlooked. There was a perception that you can’t have both! With Biometrics you actually can have an extremely fast and convenient solution that is highly accurate.

5)    Layered biometrics will rule. Any one biometric or authentication approach in isolation will fail. The key is to layer a variety of authentication techniques that enhance the systems security but don’t hurt convenience. Voice and face authentication can be used together, passwords can be thrown on top if the biometric confirmation is unsure, tokens or fingerprint or iris scans can also be deployed if the security isn’t high enough. The key is knowing the accuracy of match and increasing the security to the desired security level in a stepped function so as to maximize user convenience.

Sensory’s CEO, Todd Mozer, interviewed on FutureTalk

October 1, 2015

Todd Mozer’s interview with Martin Wasserman on FutureTalk

TrulyHandsfree 4.0… Maintaining the big lead!

August 6, 2015

We first came out with TrulyHandsfree about five years ago. I remember talking to speech tech executives at MobileVoice as well as other industry tradeshows, and when talking about always-on hands-free voice control, everybody said it couldn’t be done. Many had attempted it, but their offerings suffered from too many false fires, or not working in noise, or consuming too much power to be always listening. Seems that everyone thought a button was necessary to be usable!

In fact, I remember the irony of being on an automotive panel, and giving a presentation about how we’ve eliminated the need for a trigger button, while the guy from Microsoft presented on the same panel the importance of where to put the trigger button in the car.

Now, five years later, voice activation is the norm… we see it all over the place with OK Google, Hey Siri, Hey Cortana, Alexa, Hey Jibo, and of course if you’ve been watching Sensory’s demos over the years, Hello BlueGenie!

Sensory pioneered the button free, touch free, always-on voice trigger approach with TrulyHandsfree 1.0 using a unique, patented keyword spotting technology we developed in-house– and from its inception, it was highly robust to noise and it was ultra-low power. Over the years we have ported it to dozens of platforms, Including DSP/MCU IP cores from ARM, Cadence, CEVA, NXP CoolFlux, Synopsys and Verisilicon, as well as for integrated circuits from Audience, Avnera, Cirrus Logic, Conexant, DSPG, Fortemedia, Intel, Invensense, NXP, Qualcomm, QuickLogic, Realtek, STMicroelectronics, TI and Yamaha.

This vast platform compatibility has allowed us to work with numerous OEMs to ship TrulyHandsfree in over a billion products!

Sensory didn’t just innovate a novel keyword spotting approach, we’ve continually improved it by adding features like speaker verification and user defined triggers. Working with partners, we lowered the draw on the battery to less than 1mA, and Sensory introduced hardware and software IP to enable ultra-low-power voice wakeup of TrulyHandsfree. All the while, our accuracy has remained the best in the industry for voice wakeup.

We believe the bigger, more capable companies trying to make voice triggers have been forced to use deep learning speech techniques to try and catch up with Sensory in the accuracy department. They have yet to catch up, but they have grown their products to a very usable accuracy level, through deep learning, but lost much of the advantages of small footprint and low power in the process.

Sensory has been architecting solutions for neural nets in consumer electronics since we opened the doors more than 20 years ago. With TrulyHandsfree 4.0 we are applying deep learning to improve accuracy even further, pushing the technology even more ahead of all other approaches, yet enabling an architecture that has the ability to remain small and ultra-low power. We are enabling new feature extraction approaches, as well as improved training in reverb and echo. The end result is a 60-80% boost in what was already considered industry-leading accuracy.

I can’t wait for TrulyHandsfree 5.0…we have been working on it in parallel with 4.0, and although it’s still a long ways off, I am confident we will make the same massive improvements in speaker verification with 5.0 that we are doing for speech recognition in 4.0! Once again further advancing the state of the art in embedded speech technologies!

Random Blogger Thoughts

June 30, 2014

  • TrulySecure™ is now announced!!!! This is the first on device fusion of voice and vision for authentication, and it really works AMAZINGLY well. I’m so proud of our new computer vision team and in Sensory’s expansion from speech recognition to speech and vision technologies. Now we are much more than “The Leader in Speech Technologies for Consumer Electronics”- we are “The Leader in Speech and Vision Technology for Consumer Products!” Hey check out the new TrulySecure video on our home page, and our new TrulySecure Product Brief. We hope and expect that TrulySecure will have the same HUGE impact on the market as Sensory had with TrulyHandsfree, the technology that pioneered always on touch less control!
  • Google I/O. Android wants to be everywhere: in our cars, in our homes, and in our phones. They are willing to spend billions of dollars to do it. Why? To observe our behaviors, which in turn will help provide us more of what we want…and they will also assist in those purchases. Of course this is what Microsoft and Apple and others want as well, but right now Google has the best cloud based voice experience, and if you ask me it’s the best user experience that will win the game. Seems like they should try and move ahead on the client, but lucky for Sensory we are staying ahead!
  • Rumors about Samsung acquiring Nuance…Why would they spend $7B for Nuance when they can pick up a more unique solution from Sensory for only $1B? Yeah, that’s a joke, and is definitely not intended as an offer or solicitation to sell Sensory!
  • OH! Sensory has a new logo! We made it to celebrate our 20 year anniversary!

Biometrics – The Studies Don’t Reveal the Truth

May 7, 2014

If you read through the biometrics literature you will see a general security based ranking of biometric techniques starting with retinal scans as the most secure, followed by iris, hand geometry and fingerprint, voice, face recognition, and then a variety of behavioral characteristics.

The problem is that these studies have more to do with “in theory” than “in practice” on a mobile phone, but they never-the-less mislead many companies into thinking that a single biometric can provide the results required. This is really not the case in practice. Most companies will require that False Accepts (error caused by wrong person or thing getting in) and False Rejects (error caused by the right person not getting in) be so low that the rate where these two are equal (equal error rate or EER) would be well under 1% across all conditions. Here’s why the studies don’t reflect the real world of a mobile phone user:

  1. Cost is key. Mobile phone manufacturers will not be willing to invest in the highest end approaches for capturing and measuring biometrics that are used by academic studies. This means less MIPS less memory, and poorer quality readers.
  2. Size matters. Mobile phone manufacturers have extremely limited real estate, so larger systems cannot be properly deployed, and further complicating things is that an extremely fast enrollment and usage is required without a form factor change.
  3. Conditions are uncontrollable. Noisy environments, lighting, dirty hands, oily screens/cameras/readers are all uncontrollable and will affect performance
  4. User compliance cannot be assumed. The careful placement of an eye, finger or face does not always happen.

A great case in point is the fingerprint readers now deployed by Apple and Samsung. These are extremely expensive devices, and the literature would make one think that they are highly accurate, but Apple doesn’t have the confidence to allow them to be used in the iTunes store for ID, and San Jose Mercury News columnist Troy Wolverton says:

“I’ve not been terribly happy with the fingerprint reader on my iPhone, but it puts the one on the S5 to shame. Samsung’s fingerprint sensor failed repeatedly. At best, I would get it to recognize my print on the second try. But quite often, it would fail so many times in a row that I’d be prompted to enter my password instead. I ended up turning it off because it was so unreliable (full article).”

There is a solution to this problem…It’s to utilize sensors already on the phone to minimize cost, and deploy a biometric chain combining face verification, voice verification, or other techniques that can be easily implemented in a user friendly manner that allows the combined usage to create a very low equal error rate, that become “immune” to conditions and compliance issues by having a series of biometric and other secure backup systems.

Sensory has an approach we call SMART, Sensory Methodology for Adaptive Recognition Thresholding that takes a look at environmental and usage conditions and intelligently deploys thresholds across a multitude of biometric technologies to yield a highly accurate solution that is easy to use and fast in responding yet robust to environmental and usage models AND uses existing hardware to keep costs low.

KitKat’s Listening!

November 15, 2013

Android introduced the new KitKat OS for the Nexus 5, and Sensory has gotten lots of questions about the new “always listening” feature that allows a user to say “OK Google” followed by a Google Now search. Here’s some of the common questions:

  1. Is it Sensory’s? Did it come from LG (like the hardware)? Is it Google’s in-house technology? I believe it was developed within the speech team at Android. LG does use Sensory’s technology in the G2, but this does not appear to be an implementation of Sensory. Google has one of the smartest, most capable, and one of the larger speech recognition groups in the industry, and they certainly have the chops to build a key word spotting technology. Actually, developing a voice activated trigger is not very hard. There are several dozens of companies that can do this today (including Qualcomm!). However, making it useable in an “always on” mode is very difficult where accuracy is really important.
  2. The KitKat trigger is just like the one on MotoX, right? Ugh, definitely not. Moto X really has “always on” capabilities. This requires low power operation. The Android approach consumes too much power to be left “always on”. Also, the Moto X approach combines speaker verification so the “wrong” users can’t just take over the phone with their voice. Motorola is a Sensory licensee, Android isn’t.
  3. How is Sensory’s trigger word technology different than others?
    • First of all, Sensory’s approach is ultra low power. We have IC partners like Cirrus Logic, DSPG, Realtek, and Wolfson that are measuring current consumption in the 1.5-2mA range. My guess is that the KitKat implementation consumes 10-100 times more power than this. This is for 2 reasons, 1) We have implemented a “deeply embedded” approach on these tiny DSPs and 2) Sensory’s approach requires as little as 5 MIPS, whereas most other recognizers need 10 to 100 times more processing power and must run on the power hungry Android processor!
    • Second…Sensory’s approach requires minimal memory. These small DSP’s that run at ultra low power allow less RAM and more limited memory access. The traditional approach to speech recognition is to collect tons of data and build huge models that take a lot of memory…very difficult to move this approach onto low power silicon.
    • Thirdly, to be left always on really pushes accuracy, and Sensory is VERY unique in the accuracy of its triggers. Accuracy is usually measured in looking at the two types of errors – “false accepts” when it fires unintentionally, and “false rejects” when it doesn’t let a person in when they say the right phrase. When there’s a short listening window, then “false accepts” aren’t too much of an issue, and the KitKat implementation has very intentionally allowed a “loose” setting which I suspect would produce too many false accepts if it was left “always on”. For example, I found this YouTube video that shows “OK Google” works great, but so does “OK Barry” and “OK Jarvis”
    • Finally, Sensory has layered other technologies on top of the trigger, like speaker verification, and speaker identification. Also Sensory has implemented a “user defined trigger” capability that allows the end customer to define their own trigger, so the phone can accurately and at ultra low power respond to the users personalized commands!