Posts Tagged ‘Samsung’
February 11, 2015
The advent of “always on” speech processing has raised concerns about organizations spying on us from the cloud.
In this Money/CNN article, Samsung is quoted as saying, “Samsung does not retain voice data or sell it to third parties.” But, does this also mean that your voice data isn’t being saved at all? Not necessarily. In a separate article, the speech recognition system in Samsung’s TVs is shown to be an always-learning cloud-based system solution from Nuance. I would guess that there is voice data being saved, and that Nuance is doing it.
This doesn’t mean Nuance is doing anything evil; this is just the way that machine learning works. There has been this big movement towards “deep” learning, and what “deep” really means is more sophisticated learning algorithms that require more data to work. In the case of speech recognition, the data needed is speech data, or speech features data that can be used to train and adapt the deep nets.
But just because there is a necessary use for capturing voice data and invading privacy, doesn’t mean that companies should do it. This isn’t just a cloud-based voice recognition software issue; it’s an issue with everyone doing cloud based deep learning. We all know that Google’s goal in life is to collect data on everything so Google can better assist you in spending money on the right things. We in fact sign away our privacy to get these free services!
I admit guilt too. When Sensory first achieved usable results for always-on voice triggers, the basis of our TrulyHandsfree technology, I applied for a patent on a “background recognition system” that listens to what you are talking about in private and puts together different things spoken at different times to figure out what you want…. without you directly asking for it.
Can speech recognition be done without having to send all this private data to the cloud? Sure it can! There’s two parts in today’s recognition systems: 1) The wake up phrase; 2) The cloud based deep net recognizer – AND NOW THEY CAN BOTH BE DONE ON DEVICE!
Sensory pioneered the low-power wake up phrase on device (item 1), now we have a big team working on making an EMBEDDED deep learning speech recognition system so that no personal data needs to be sent to the cloud. We call this approach TrulyNatural, and it’s going to hit the market very soon! We have benchmarked TrulyNatural against state-of-the-art cloud-based deep learning systems and have matched and in some cases bested the performance!
June 30, 2014
June 4, 2014
It was about 4 years ago that Sensory partnered with Vlingo to create a voice assistant with a special “in car” mode that would allow the user to just say “Hey Vlingo” then ask any question. This was one of the first “TrulyHandsfree” voice experiences on a mobile phone, and it was this feature that was often cited for giving Vlingo the lead in the mobile assistant wars (and helped lead to their acquisition by Nuance).
About 2 years ago Sensory introduced a few new concepts including “trigger to search” and our “deeply embedded” ultra-low power always listening (now down to under 2mW, including audio subsystem!). Motorola took advantage of these excellent approaches from Sensory and created what I most biasedly think is the best voice experience on a mobile phone. Samsung too has taken the Sensory technology and used in a number of very innovative ways going beyond mere triggers and using the same noise robust technology for what I call “sometimes always listening”. For example when the camera is open it is always listening for “shoot” “photo” “cheese” and a few other words.
So I’m curious about what Google, Microsoft, and Apple will do to push the boundaries of voice control further. Clearly all 3 like this “sometimes always on” approach, as they don’t appear to be offering the low power options that Motorola has enabled. At Apple’s WWDC there wasn’t much talk about Siri, but what they did say seemed quite similar to what Sensory and Vlingo did together 4 years ago…enable an in car mode that can be triggered by “Hey Siri” when the phone is plugged in and charging.
I don’t think that will be all…I’m looking forward to seeing what’s really in store for Siri. They have hired a lot of smart people, and I know something good is coming that will make me go back to the iPhone, but for now it’s Moto and Samsung for me!
May 7, 2014
If you read through the biometrics literature you will see a general security based ranking of biometric techniques starting with retinal scans as the most secure, followed by iris, hand geometry and fingerprint, voice, face recognition, and then a variety of behavioral characteristics.
The problem is that these studies have more to do with “in theory” than “in practice” on a mobile phone, but they never-the-less mislead many companies into thinking that a single biometric can provide the results required. This is really not the case in practice. Most companies will require that False Accepts (error caused by wrong person or thing getting in) and False Rejects (error caused by the right person not getting in) be so low that the rate where these two are equal (equal error rate or EER) would be well under 1% across all conditions. Here’s why the studies don’t reflect the real world of a mobile phone user:
A great case in point is the fingerprint readers now deployed by Apple and Samsung. These are extremely expensive devices, and the literature would make one think that they are highly accurate, but Apple doesn’t have the confidence to allow them to be used in the iTunes store for ID, and San Jose Mercury News columnist Troy Wolverton says:
“I’ve not been terribly happy with the fingerprint reader on my iPhone, but it puts the one on the S5 to shame. Samsung’s fingerprint sensor failed repeatedly. At best, I would get it to recognize my print on the second try. But quite often, it would fail so many times in a row that I’d be prompted to enter my password instead. I ended up turning it off because it was so unreliable (full article).”
There is a solution to this problem…It’s to utilize sensors already on the phone to minimize cost, and deploy a biometric chain combining face verification, voice verification, or other techniques that can be easily implemented in a user friendly manner that allows the combined usage to create a very low equal error rate, that become “immune” to conditions and compliance issues by having a series of biometric and other secure backup systems.
Sensory has an approach we call SMART, Sensory Methodology for Adaptive Recognition Thresholding that takes a look at environmental and usage conditions and intelligently deploys thresholds across a multitude of biometric technologies to yield a highly accurate solution that is easy to use and fast in responding yet robust to environmental and usage models AND uses existing hardware to keep costs low.
April 25, 2014
It’s not often that I rave about articles I read, but Ian Mansfield of Cellular News hit the nail on the head with this article.
Not only is it a well written and concise article but its chock full of recent data (primarily from JD Power research), and most importantly it’s data that tells a very interesting story that nicely aligns with Sensory’s strategy in mobile. So, thanks Ian, for getting me off my butt to start blogging again!
A few key points from the article:
Now, let me dive one step deeper into the problem, and explore whether customer satisfaction can be achieved with minimal impact on cost:
Seamless voice control is here and soon every phone will have it, and it doesn’t add any hardware cost. Sensory introduced the technology with our TrulyHandsfree technology that allows users to just start talking, and our “trigger to search” technology has been nicely deployed by companies like Motorola that pioneered this “seamless voice control” in many of their recent releases. The seamless voice control really doesn’t add much cost, and with excellent engines from Google and Apple and Microsoft sitting in the clouds, it can and will be nicely implemented without effecting handset pricing.
Sensors are a different story. By their nature they will be embedded into the phones and will increase cost. Some “sensors” in the broadest sense of the term are no brainers and necessities, for example microphones and cameras are a must have, and the six-axis sensors combining GPS and accelerometers are arguably must haves as well. Magnetometers, barometers are getting increasingly common, and to differentiate further leading manufacturers are embedding things like heartbeat monitors; stereo 3D cameras are just around the corner. To address the desire for biometric security Samsung and Apple have the 2 bestselling phones in the world embedded with fingerprint sensors!
The problem is that all these sensors add cost, and in particular those finger print sensors are the most expensive and can add $5-$15 to the cost of goods. It’s kind of ironic that after spending all that money on biometric security, Apple doesn’t even allow them as a security measure for purchasing iTunes. And both Samsung and Apple have been chastised for fingerprint sensors that can be cracked with gummy bears or glue!
A much more accurate and cost effective solution can be achieved for biometrics by using the EXISTING sensors on the phones and not adding special purpose biometric sensors. In particular, the “must have sensors” like microphones, cameras, and 6-axis sensors can create a more secure environment that is just as seamless but much less difficult to crack. I’ll talk more about that in my next blog.
August 16, 2013
I saw a post recently in the Android Central forum that talked about Sensory’s technology as used by Samsung:
What makes it different from any other voice app is its part of the OS. e.g. get a call, you can say ‘Answer’ or ‘Ignore’. alarm rings, just say ‘Snooze’. You don’t have to launch an app or press buttons to do this, the phone is always active and listening. No one else does this!
It’s an astute comment but not 100% accurate. When people talk about “always listening” what they really mean is that it appears to be “always listening”. At Sensory we call it TrulyHandsfree, and the idea is that there can be certain “modes” or “windows” where it listens for specific words. Like when the alarm goes off, it listens for “snooze” etc. If you say “snooze” when the alarm isn’t going off you find it’s not really “always listening”.
Glass has a similar usage model. It’s “always listening” but for different things at different times and only for short periods of time. I put my Glass on and timed it. The OK Glass trigger window seems to last 3-4 seconds, then the next set of commands (like Get Directions to) stays on 10-11 seconds.
What’s really cool about Glass is that during those listening windows you can say other things and it doesn’t “false fire” on them. I let my wife try out my Glass, and she said “You mean I just say OK Glass and then I can say any of these things like get directions to Chef Chu’s and…woah it works!” It ignores everything it’s not listening for and picks out the things it is listening for. The technology is known as “keyword spotting” for this reason.
To save power, Hallmark’s use of Sensory’s technology kicks into gear when the product is turned on. If it doesn’t hear one of the words it’s listening for spoken within a certain time frame, it will automatically power down, and stop “always listening” until its turned back on with a button press.
Sensory recently introduced a low power sound detection technology that further cuts power consumption by having the device “always listening” in a low power mode, where it doesn’t perform speech recognition. When it hears something it quickly powers up the recognizer for further analysis. This can cut the power consumption by “always listening” but not always recognizing, down to 1mA or so.
August 5, 2013
I often get the question, “If Android and Qualcomm offer voice activation for free, why would anyone license from Sensory?” While I’m not sure about Android and Qualcomm’s business models, I do know that decisions are based on accuracy, total added cost (royalties plus hardware requirements to run), power consumption, support, and other variables. Sensory seems to be consistently winning the shootouts it enters for embedded voice control. Some approaches that appear lower cost require a lot more memory or MIPS, driving up total cost and power consumption.
It’s interesting to note that companies like Nuance have a similar challenge on the server side where Google and Microsoft “give it away”. Because Google’s engine is so good it creates a high hurdle for Nuance. I’d guess Google’s rapid progress helps Nuance with their licensing of Apple, but may have made it more challenging to license Samsung. Samsung actually licensed Vlingo AND Nuance AND Sensory, then Nuance bought Vlingo.
Why doesn’t Samsung use Google recognition if it’s free? On the server it’s not power consumption effecting decisions, but cost, quality, and in this case CONTROL. On the cost side it could be that Samsung MAKES more money by using Nuance in some sort of ad revenue kickbacks, which I’d guess Google doesn’t allow. This is of course just hypothesizing. I don’t really know, and if I did know I couldn’t say. The control issue is big too as companies like Sensory and Nuance will sell to everyone and in that sense offer platform independence and more control. Working with a Microsoft or Google engine forces an investment in a specific platform implementation, and therefore less flexibility to have a uniform cross platform solution.
August 2, 2013
What about Microsoft and Amazon? Both have good cloud based recognition engines in house but neither seem particularly relevant in Mobile…YET!
Kudos to Microsoft for its always listening feature in XBox! It’s actually the best implementation I’ve seen that doesn’t use Sensory technology. I’ll blog more about how they do it and why they can’t do a low power implementation in the weeks ahead.
January 15, 2013
I’ve been going to CES for about 30 years now. More than half of that has been with Sensory selling speech recognition. This year I reminisced with Jeff Rogers (Sensory’s VP Sales who has been at Sensory almost as long as me) about Sensory’s first CES back in 1995 where we walked around with briefcases that said “Ask Me About Speech Recognition for Consumer Electronics”. A lot of people did ask! There’s always been a lot of interest in speech recognition for consumer electronics, but today it goes beyond interest…it’s in everything from the TV’s to the Cars to Bluetooth devices…and a lot of that is with Sensory technology. Often we are paired with Nuance, Google and increasingly ATT as the cloud speech solution, while Sensory is the client.
August 5, 2011
I recently learned about 2 awards that Sensory has won over the past year. The contrast is in how we learned about them, and the different nature of these awards. It’s really amusing, so I thought I’d share my take.
Both awards were for our TrulyHandsfree™ Voice Control. One was for the significance of Sensory’s truly hands-free trigger in implementing speech recognition without using buttons, and the other was for Sensory’s chip-based implementation of a truly hands-free interface.
The first award came from Speech Technology Magazine. Sensory won their Star Performer award for 2011, and I didn’t even know we had been nominated. In fact, nobody ever told me that we had won; I found out really by chance (thanks, Bernie!) They only gave out four of these awards this year, and I’m honored and thrilled that Sensory won one of them. It’s really a testament to our team behind TrulyHandsfree… IT’S THE MOST AMAZING TECHNOLOGY. I sent kudos to Speech Tech for having the insight to understand the significance of this technology! Speech Technology Magazine has gotten so independent and non-self-serving in their awards process, that they didn’t even take the opportunity to call us and let us know! Now we know, so thanks again, Speech Tech!
In contrast…The second award came from a market research firm I’ll call the Cold Irishman. Why don’t I use their real name? Well I can’t or they might sue me. I received a call from their “Manager of IP and Copyrights” to congratulate me, and to let me know about their thoroughly independent and fair process that looked at the entire speech market and decided that Sensory stood out… blah blah blah…
I knew there was something funny going on by the guy’s title. Yeah you guessed it. To be able to tell people we won their award costs a certain price; you pay more the more you want to use it, and you can even pay more to go to an awards banquet. He offered me programs for as little as $10K, which went up in price to WAY more than that. One of the more expensive programs was that they’d make a video for us receiving the award with lots of praise from their esteemed analysts. So, I decided to go onto YouTube and see for myself how many hits last year’s award winners were getting…my memory said low double digits, but that didn’t seem possible (Sensory’s little home-made video’s often get thousands of hits.) Just for fun I looked just now at this year’s award winners – one of them had only 10 (yes TEN) hits. Most of them must have been employees… Pretty hefty price to stroke your own own ego and get almost nothing in return! I’ve always wondered who pays to be in Whoever’s Whatever? It’s probably the same CEO’s that pay to go to award dinners!
So…Many Thanks to Leonard Klie and Speech Technology Magazine…and Cold Irishman…thanks, but no thanks! Sensory deserves recognition for innovation in speech technologies based on our hard work, not on how much we pay to market it.