This month had three very different announcements about face recognition from Alibaba, Google, and Microsoft. Nice to see that Sensory is in good company!!!
Alibaba’s CEO Jack Ma discussed and demoed the possibility of using face verification for the very popular Alipay.
A couple interesting things about this announcement…First, I have to say, with a name like Alibaba, I am a little let down that they’re not using “Open Sesame” as a voice password to go with or instead of the face authentication… All joking aside, I do think relying on facial recognition as the sole means of user authentication is risky, and think they would be better served using a solution that integrates both face and voice recognition (something like our own TrulySecure), to ensure the utmost security of their customers’ linked bank accounts.
Face is considered one of the more “convenient” methods of biometrics because you just hold your phone out and it works! Well, at least it should… A couple of things I noticed in the Alibaba announcement: Look at the picture…Jack Ma is using both hands to carefully center his photo, and looking at the image of the phone screen tells us why. He needs to get his face very carefully centered on this outline to make it work. Why? Well, it’s a technique used to improve accuracy, but this improved accuracy, trades off the key advantage of face recognition, convenience, to make the solution more robust. Also the article notes that it’s a cloud based solution. To me cloud based means slower, dependent on a connection, and putting personal privacy more at risk. At Sensory, we believe in keeping data secure, especially when it comes to something like mobile payments, which is why we design our technologies to be “embedded” on the device – meaning no biometric data has to be sent to the cloud, and our solutions don’t require an internet connection to function. Additionally, with TrulySecure, we combine face and voice recognition, making authentication quick and simple, not to mention more secure, and less spoofable than face-only solutions. By utilizing a multi-biometric authentication solution like TrulySecure, the biometric is far less environmentally sensitive and even more convenient!
Mobile pay solutions are on the rise and as more hit the market differentiators like authentication approach, solution accuracy, convenience and most of all data security will continue to be looked at more closely. We believe that the embedded multi-biometric approach to user authentication is best for mobile pay solutions.
Also, Google announced that its deep learning FaceNet is nearly 100% accurate.
Everybody (even Sensory) is using deep learning neural net techniques for things like face and speech recognition. Google’s announcement seems to have almost no bearing on their Android based face authentication, which came in the middle of the pack of the five different face authentication systems we recently tested. So, why does Google announce this? Two reasons: – 1) Reaction to Baidu’s recent announcement that their deep learning speech recognition is the best in the world: 2) To counter Facebook’s announcement last year that their DeepFace is the best face recognition in world. My take – it’s really hard to tell whose solution is best on these kind of things, and the numbers and percentages can be deceiving. However, Google is clearly doing research experiments on high-accuracy face matching and NOT real world implementation, and Facebook is using face recognition in a real world setting to tag photos of you. Real-world facial recognition is WAY harder to perfect, so my praise goes out to Facebook for their skill in tagging everyone’s picture to reveal to our friends and family things might not have otherwise seen us doing!
Lastly, Microsoft’s announced Windows Hello.
This is an approach to getting into your Windows device with a biometric (face, iris, or fingerprint). Microsoft has done a very nice job with this. They joined the FIDO alliance and are using an on-device biometric. This approach is what made sense to us at Sensory, because you can’t just hack into it remotely, you must have the device AND the biometric! They also addressed privacy by storing a representation of the biometric. I think their approach of using a 3D IR camera for Face ID is a good approach for the future. This extra definition and data should yield much better accuracy than what is possible with today’s standard 2D cameras and should HELP with convenience because it could be better at angles can work in the dark. Microsoft claims 1 in 100,000 false accepts (letting the wrong person in). I always think it’s silly when companies make false accept claims without stating the false reject numbers (when the right person doesn’t get in). There’s always a tradeoff. For example I could say my coffee mug uses a biometric authenticator to let the right user telepathically levitate it and it has less than a 1 in a billion false accepts (it happens to also have a 100% false reject since even the right biometric can’t telepathically levitate it!). Nevertheless, with a 3D camera I think Microsoft’s face authentication can be more accurate than Sensory’s 2D face authentication. BUT, its unlikely that the face recognition on its own will ever be more accurate than our TrulySecure, which still offers a lower False Accept rate than Microsoft – and less than 10% False Reject rate to boot!
Nevertheless, I like the announcement of 3D cameras for face recognition and am excited to see how their system performs.
It feels like I had a whole week’s worth of the trade show wrapped into one day! By the time mid week hits, I’ll surely be ready to head home! Here are some of the highlights from the first day of Mobile World Congress 2015:
First a word about Catalonia. That’s where Barcelona is…in the heart of Catalonia, a province of Spain. Don’t expect delayed meetings, inefficiencies, relaxed long lunches or anything like that. The Catalonians have the precision of Germans (to continue my gross stereotyping!), and my experience with one of the largest trade shows on the planet is that it’s going off without a hitch! I picked up my badge at the airport in a five-minute line that was well staffed and moved rapidly. I could just about walk into the show yesterday morning. The subways and trains though crowded and overheated ran extremely smoothly. Kudos to the show management for pulling off such a difficult feat!
I’d be remiss without mentioning the Galaxy S6. Samsung invited us to the launch and of course they continue to use Sensory in a relationship that has grown quite strong over the years. Samsung continues to innovate with the Edge, and other products that everyone is talking about. It’s amazing how far Apple took the mantle in the first iPhone and how companies like Samsung and the Android system seem to now be leading the charge on innovation!
My favorite product that doesn’t feature Sensory technology that I bumped into was an electronic jump rope. They put sensors in the handles and a visual display shows across the field of the rope, kind of like those clocks that rapidly flash LED’s as the pendulum quickly moves back and forth in order to display the time. I talked with Alex Woo from Tangram and he said they were going to launch a crowdfunding campaign. I gave Alex a demo of our TrulyHandsfree with jump ropers jumping and all the show noise and of course it worked flawlessly. It would be really cool to be able to ask things like “How much time,” “How many jumps,” “What’s my heart rate,” or “How many calories burned” and so on, and the display would make voice control so much more functional!
We had a couple of partnership announcements here at the show, supporting both Qualcomm and Synopsys – both great partners to add to our support mix, and always nice when its customers driving our platform directions. The Qualcomm platform is interesting because it’s not their standard platform for 3rd parties to support. As far as I know they opened it up to Sensory and ONLY Sensory, and already we are seeing much interest!
Last night ZTE had a press party to indoctrinate Sensory and NXP into its Smart Voice Alliance. ZTE is really putting some forward thinking into the user experience and their research shows how much people want a voice interface but how dissatisfying the current state of the art actually is. Sensory’s hoping to change that! We’ll make one of our biggest announcements in history over the next month… and I’ll let you in on the secret (it’s on our website already!) We call it TrulyNatural, and it will be the highest accuracy large vocabulary embedded speech engine that the world has ever seen!
I see a bit of irony that a great Saturday Night Live alumnus is launching a campaign to decrease spoofing. I’m talking about Senator Al Franken, who has been looking into the problem of stolen fingerprints, see article.
Senator Franken challenges Samsung and Apple with some fair concerns about the problem of stolen or spoofed biometrics. The issue is that most biometrics that could be stolen can’t be easily replaced. We only have one face, two eyes, and 10 fingers, so not a lot of chances to replace or change them if they are stolen.
The mobile phone companies, challenged on the fingerprint issue, had two responses:
The biometric data is ON DEVICE. This is very important because when it’s stored in the clouds it becomes much more accessible to a hacker AND much more desirable because the payoff is a whole lot of user information. Cloud security is often hacked into, such as the recent break-in of the European Central Bank. In fact many banks I have spoken to insist that passwords can’t be stored in the clouds because they are just too easy to hack that way.
The fingerprint biometric is not stored as a fingerprint image, but as some sort of mathematical representation. I’m not sure I understand this argument because if the digital representation can be copied and replicated, then the system is cracked whether or not it looks like a fingerprint.
I think Franken is right to question the utility of biometric fingerprints, because a product like Sensory’s TrulySecure (combining voice and vision authentication) offers a large number of advantages:
The TrulySecure biometric is not easy to copy or find. Unlike a fingerprint which gets left everywhere, a voice print with a video image of a person saying a particular phrase is NOT easy to find, and even if well recorded, would fall apart with Sensory’s anti-spoofing technology that requires a live image.
The TrulySecure biometric is readily changeable. Unlike the nine chances that a user has to replace a fingerprint, there are a virtually unlimited number of TrulySecure password phrases that can be used. If by some nearly impossible chance a TrulySecure biometric phrase is copied, it can be changed in a matter of seconds and a virtually unlimited number of times.
TrulySecure works across conditions. Every biometric seems to have a failure mode. Fingerprint sensors seem to require a highly directionalized swipe of a very clean finger. If I cut my finger or have a little peanut butter on it, it just doesn’t work. Likewise a voiceprint by itself might fail in high noise, and a faceprint might fail in low lighting, but that magical dual biometric fusion in TrulySecure seems immune to conditions.
Here’s a demo I gave to UberGizmo in a somewhat dark and very noisy hotel lobby. I like this demo because it shows a real world situation and how FAST TrulySecure works.
Here’s a more canned demo on Sensory’s home page that better showcases some of the anti-spoofing features.
TrulySecure™ is now announced!!!! This is the first on device fusion of voice and vision for authentication, and it really works AMAZINGLY well. I’m so proud of our new computer vision team and in Sensory’s expansion from speech recognition to speech and vision technologies. Now we are much more than “The Leader in Speech Technologies for Consumer Electronics”- we are “The Leader in Speech and Vision Technology for Consumer Products!” Hey check out the new TrulySecure video on our home page, and our new TrulySecure Product Brief. We hope and expect that TrulySecure will have the same HUGE impact on the market as Sensory had with TrulyHandsfree, the technology that pioneered always on touch less control!
Google I/O. Android wants to be everywhere: in our cars, in our homes, and in our phones. They are willing to spend billions of dollars to do it. Why? To observe our behaviors, which in turn will help provide us more of what we want…and they will also assist in those purchases. Of course this is what Microsoft and Apple and others want as well, but right now Google has the best cloud based voice experience, and if you ask me it’s the best user experience that will win the game. Seems like they should try and move ahead on the client, but lucky for Sensory we are staying ahead!
Rumors about Samsung acquiring Nuance…Why would they spend $7B for Nuance when they can pick up a more unique solution from Sensory for only $1B? Yeah, that’s a joke, and is definitely not intended as an offer or solicitation to sell Sensory!
OH! Sensory has a new logo! We made it to celebrate our 20 year anniversary!
Nick Bilton, in a New York Times article, cites Forrester Research analysts who point out the importance of software in differentiating and creating value in the wearables market while avoiding commoditization.
While the new hardware is fun and exciting for consumers, the ultimate value will come from creating a connection and engaging the consumers with effective and useful analysis of all the data collected. And in the small wearable form factor, the user interface is always going to be critical. With little or no room for buttons and displays, and not always having a smartphone handy to run an app, voice will increasingly become the user interface of choice for these devices.
Sensory is very well positioned to support voice user interfaces for wearables with ultra-low power implementations that can be woken by a gesture, and quickly respond to commands or shut down to minimize impact on battery life. Watch this space (pun intended) for product announcements of wearables with great voice user interfaces!
Android introduced the new KitKat OS for the Nexus 5, and Sensory has gotten lots of questions about the new “always listening” feature that allows a user to say “OK Google” followed by a Google Now search. Here’s some of the common questions:
Is it Sensory’s? Did it come from LG (like the hardware)? Is it Google’s in-house technology? I believe it was developed within the speech team at Android. LG does use Sensory’s technology in the G2, but this does not appear to be an implementation of Sensory. Google has one of the smartest, most capable, and one of the larger speech recognition groups in the industry, and they certainly have the chops to build a key word spotting technology. Actually, developing a voice activated trigger is not very hard. There are several dozens of companies that can do this today (including Qualcomm!). However, making it useable in an “always on” mode is very difficult where accuracy is really important.
The KitKat trigger is just like the one on MotoX, right? Ugh, definitely not. Moto X really has “always on” capabilities. This requires low power operation. The Android approach consumes too much power to be left “always on”. Also, the Moto X approach combines speaker verification so the “wrong” users can’t just take over the phone with their voice. Motorola is a Sensory licensee, Android isn’t.
How is Sensory’s trigger word technology different than others?
First of all, Sensory’s approach is ultra low power. We have IC partners like Cirrus Logic, DSPG, Realtek, and Wolfson that are measuring current consumption in the 1.5-2mA range. My guess is that the KitKat implementation consumes 10-100 times more power than this. This is for 2 reasons, 1) We have implemented a “deeply embedded” approach on these tiny DSPs and 2) Sensory’s approach requires as little as 5 MIPS, whereas most other recognizers need 10 to 100 times more processing power and must run on the power hungry Android processor!
Second…Sensory’s approach requires minimal memory. These small DSP’s that run at ultra low power allow less RAM and more limited memory access. The traditional approach to speech recognition is to collect tons of data and build huge models that take a lot of memory…very difficult to move this approach onto low power silicon.
Thirdly, to be left always on really pushes accuracy, and Sensory is VERY unique in the accuracy of its triggers. Accuracy is usually measured in looking at the two types of errors – “false accepts” when it fires unintentionally, and “false rejects” when it doesn’t let a person in when they say the right phrase. When there’s a short listening window, then “false accepts” aren’t too much of an issue, and the KitKat implementation has very intentionally allowed a “loose” setting which I suspect would produce too many false accepts if it was left “always on”. For example, I found this YouTube video that shows “OK Google” works great, but so does “OK Barry” and “OK Jarvis”
Finally, Sensory has layered other technologies on top of the trigger, like speaker verification, and speaker identification. Also Sensory has implemented a “user defined trigger” capability that allows the end customer to define their own trigger, so the phone can accurately and at ultra low power respond to the users personalized commands!
Samsung was kind enough to invite me to their roll-out of Galaxy Gear and Galaxy Note 3, but I had no plans to be at IFA Berlin, and I couldn’t justify the time to get out to New York. I did catch some of the roll-out live on my computer…a few misc. thoughts:
Who was that guy with the weird glasses? Was that a European thing, or jab at Google Glass?
I remembered a few years back when the first Note was introduced. Everyone thought it was crazy big. Samsung was right! Samsung won, and foresaw the direction of the mobile phone.
Does anybody think it’s a coincidence that Google’s acquisition of WIMM (smart Android watch) and Qualcomm’s move into the Smartwatch space with Toq all happen in the same week as Samsung intros its Galaxy Gear watch?
S -Voice is in Note 3 and Galaxy Gear! Great move for Samsung! Wearables, with their smaller displays and almost non-existent keyboards, definitely need speech recognition as part of a multi-modal interface.
Seems like Steve Jobs had it right about the close integration of consumer hardware and software. Everyone seems to be following in Apple’s footsteps. Google/Moto, Microsoft/Nokia, and now Qualcomm, with Toq, are getting into consumer hardware. Although maybe Toq is just an attempt to promote their display tech from Mirasol
Qualcomm is expanding its business models these days. Along with their move into smart watches, they also recently announced they are licensing chip IP. They even have their own in-house speech recognizer. I wonder what Samsung thinks of Qualcomm’s announcement of Toq?
One of the leakiest announcements in recent memory, Motorola’s new Moto X is expected to be officially announced today. Rather than trying to one up Apple and Samsung with the highest resolution screen and fastest processor, the Moto X competes on its ability to be customized and its intelligent use of low power sensors. With my background, it’s no surprise that I’m excited to see the “always listening” technology enabling the wake-up command “OK Google Now”. With this feature, speech recognition is enabled but in an ultra low power state, so it can be on and responsive without draining the battery. From other “press leaks”, I’m looking forward to a line of Droid phones with similar “always listening” functionality.
Motorola isn’t the only one rolling out interesting new “always listening” kinds of functions. Samsung did this first in the mobile phone, but implemented it in a “driving mode” so that it was sometimes always listening. The new Moto phones have been compared with Google’s Glass and the “OK Glass” function which some hackers have noted can be put in an “always listening” mode. Qualcomm has even implemented a speech technology on their chips and Android has released a function like this in their OS. Motorola’s use of the “always listening” trigger is especially cool because it calls up Google Now for a seamless flow from client to server speech recognition.
Here’s a demo of Sensory’s use of a very similar approach that we call “trigger to search” from a video we posted around a year ago:
So what’s Sensory’s involvement in these “always on” features from Android, Glass, Motorola, Nuance, Qualcomm, Samsung, etc.? I can’t say much except we have licensed our technology to Google/Motorola, Samsung and many others. We have not licensed Android or Qualcomm, but Qualcomm has commented on its interest in a partnership with Sensory for more involved applications.
With a mass market device like the Moto X, I’m excited to see more people experiencing the convenience of voice recognition that is always listening for your OK. Tomorrow I’m going to discuss leading voice recognition apps on the top mobile environments and then over the next few days and weeks, I’ll cover more topics around voice triggering technology such as pricing models (it’s free right?), power drain, privacy concerns with an “always listening” product, security and personalization. This is an exciting time for TrulyHandsfree™ voice control and I’d welcome your thoughts.
It was interesting to me because they put a motion/proximity sensor under the trunk so the user could open the trunk in a hands-free manner. The commercial highlights the benefit of hands-free access when a woman walks up with her hands full of luggage and she just wiggles her foot around and the trunk pops open! Cool…except the user has to do a little one legged dance with their hands full, and as the commercial highlights (which is another reason why I found it interesting), other things can accidentally open the trunk, like a dog wagging its tail. Wouldn’t a hands-free voice trigger do a much better job? Especially an ultra-low-power implementation on a standalone processor with built in speaker verification for security…sounds like a challenge for Sensory’s TrulyHandsfree approach.
Fast forward to this year’s Superbowl, and Kia comes out with the “space babies” ad for its Sorento, and the Uvo entertainment system. Kid asks dad “where do babies come from” and dad concocts an elaborate and humorous lie.
Then after dad’s tall tale the kid says “But Jake said that babies are made when mommies and daddies…” and dad quickly interrupts the kid by saying “Uvo, play Wheels on the Bus”. The Uvo system hears dad and immediately plays the music drowning out the kid’s question. Cool commercial and nice use of voice activation to control music while driving!
Many of Sensory’s customers have told us that they don’t want to have to say the brand name as a command word, and they would really like to name their products themselves, and even better, have the products know who they are when they talk so that settings and controls can be customized to their use…Another job for Sensory’s TrulyHandsfree!
On February 19th we will announce our TrulyHandsfree 3.0 which will enable all of the voice control scenarios I have described, enabling better user experiences that are more customized and more secure! Stay tuned for the details!
I’ve been going to CES for about 30 years now. More than half of that has been with Sensory selling speech recognition. This year I reminisced with Jeff Rogers (Sensory’s VP Sales who has been at Sensory almost as long as me) about Sensory’s first CES back in 1995 where we walked around with briefcases that said “Ask Me About Speech Recognition for Consumer Electronics”. A lot of people did ask! There’s always been a lot of interest in speech recognition for consumer electronics, but today it goes beyond interest…it’s in everything from the TV’s to the Cars to Bluetooth devices…and a lot of that is with Sensory technology. Often we are paired with Nuance, Google and increasingly ATT as the cloud speech solution, while Sensory is the client. In 2013, Sensory counted about 20 companies showing its technology on the floor or in private meeting rooms. An increasing percentage of our products are now connected to the cloud and using client/cloud speech schemes. Here’s just a short summary of some of the new things here at the show:
Bluetooth BlueAnt, Bluetrek, Drive and Talk, Monster Cable, Motorola, Plantronics, all showed products using Sensory’s BlueGenie speech technologies for Bluetooth devices. I noticed Plantronics won a show award for one of their new devices with Sensory technology. This market seems to have flattened and stopped growing, and Sensory is lucky to be working with the leaders who appear to be gaining in marketshare against their competition…correlation or causation?? ;-) Our customers in this segment introduced a dozen or more new products ranging from carkits to headsets to Bluetooth speaker systems.
Chip Companies Conexant announced their new DSP CX20865 running Sensory’s TrulyHandsfree and gave demo’s in their Suite at the LVH. Tensilica announced their new HIFI Mini and gave some of the best demo’s on the showroom floor of speech recognition (Sensory’s of course!) working in adverse noise conditions at ultra low power.
Automotive QNX showed off their beautiful Bentley concept car with built in graphics and speech recognition including Sensory’s TrulyHandsfree Voice Control paired with AT&T’s cloud based Watson ASR engine Visteon – Did some pretty neat demo’s that we can’t discuss other to say they featured Sensory’s TrulyHandsfree Voice Control! The car companies love us because WE WORK in noise!
Other Samsung had a huge booth showing Galaxy products (Note, S3, etc.) using Sensory’s TrulyHandsfree triggers as a part of the S-Voice system VTech showed a variety of phone products using Sensory technologies including our micro-TTS solutions for caller ID IVEE paired a Sensory IC for local command and operation with the ATT cloud recognizer to create a very impressive demo that got nice coverage on NPR! (scroll down to “heard on the air”) Behind closed doors – around half a dozen other companies showed cool new things in private suites. Unfortunately I can’t discuss these, but I will say that 2013 will see some major product releases with interesting user experiences and Sensory will be very proud to be a part of these! My favorite non Sensory things – Yeah the 4K/8K TV’s were pretty amazing. Crisper than real life, which doesn’t seem possible but it’s true. The new 3D printers and services to make hardware prototypes are amazing (why isn’t HP dominating this market???). But…my favorite stuff is robotics. There was a robot glass cleaner that climbs vertically around windows and cleans them off without falling. Kinda like a Roomba for windows. I met some hacker guys that as a hobby make giant servo/mechanical/electro robot snakes and creatures they can ride in. Think MadMax/Burning Man kinds of artistic technology. I have some neat video’s of this I’ll send anyone who wants them.