HEAR ME -
Speech Blog
HEAR ME - Speech Blog  |  Read more June 11, 2019 - Revisiting Wake Word Accuracy and Privacy
HEAR ME - Speech Blog

Archives

Categories

Posts Tagged ‘Samsung’

Deep Listening in the Cloud

February 11, 2015

The advent of “always on” speech processing has raised concerns about organizations spying on us from the cloud.

4081596290_5ccb708d7d_mIn this Money/CNN article, Samsung is quoted as saying, “Samsung does not retain voice data or sell it to third parties.” But, does this also mean that your voice data isn’t being saved at all? Not necessarily. In a separate article, the speech recognition system in Samsung’s TVs is shown to be an always-learning cloud-based system solution from Nuance. I would guess that there is voice data being saved, and that Nuance is doing it.

This doesn’t mean Nuance is doing anything evil; this is just the way that machine learning works. There has been this big movement towards “deep” learning, and what “deep” really means is more sophisticated learning algorithms that require more data to work. In the case of speech recognition, the data needed is speech data, or speech features data that can be used to train and adapt the deep nets.

But just because there is a necessary use for capturing voice data and invading privacy, doesn’t mean that companies should do it. This isn’t just a cloud-based voice recognition software issue; it’s an issue with everyone doing cloud based deep learning. We all know that Google’s goal in life is to collect data on everything so Google can better assist you in spending money on the right things. We in fact sign away our privacy to get these free services!

I admit guilt too. When Sensory first achieved usable results for always-on voice triggers, the basis of our TrulyHandsfree technology, I applied for a patent on a “background recognition system” that listens to what you are talking about in private and puts together different things spoken at different times to figure out what you want…. without you directly asking for it.

Can speech recognition be done without having to send all this private data to the cloud? Sure it can! There’s two parts in today’s recognition systems: 1) The wake up phrase; 2) The cloud based deep net recognizer – AND NOW THEY CAN BOTH BE DONE ON DEVICE!

Sensory pioneered the low-power wake up phrase on device (item 1), now we have a big team working on making an EMBEDDED deep learning speech recognition system so that no personal data needs to be sent to the cloud. We call this approach TrulyNatural, and it’s going to hit the market very soon! We have benchmarked TrulyNatural against state-of-the-art cloud-based deep learning systems and have matched and in some cases bested the performance!

Random Blogger Thoughts

June 30, 2014

  • TrulySecure™ is now announced!!!! This is the first on device fusion of voice and vision for authentication, and it really works AMAZINGLY well. I’m so proud of our new computer vision team and in Sensory’s expansion from speech recognition to speech and vision technologies. Now we are much more than “The Leader in Speech Technologies for Consumer Electronics”- we are “The Leader in Speech and Vision Technology for Consumer Products!” Hey check out the new TrulySecure video on our home page, and our new TrulySecure Product Brief. We hope and expect that TrulySecure will have the same HUGE impact on the market as Sensory had with TrulyHandsfree, the technology that pioneered always on touch less control!
  • Google I/O. Android wants to be everywhere: in our cars, in our homes, and in our phones. They are willing to spend billions of dollars to do it. Why? To observe our behaviors, which in turn will help provide us more of what we want…and they will also assist in those purchases. Of course this is what Microsoft and Apple and others want as well, but right now Google has the best cloud based voice experience, and if you ask me it’s the best user experience that will win the game. Seems like they should try and move ahead on the client, but lucky for Sensory we are staying ahead!
  • Rumors about Samsung acquiring Nuance…Why would they spend $7B for Nuance when they can pick up a more unique solution from Sensory for only $1B? Yeah, that’s a joke, and is definitely not intended as an offer or solicitation to sell Sensory!
  • OH! Sensory has a new logo! We made it to celebrate our 20 year anniversary!

Hey Siri what’s really in iOS8?

June 4, 2014

It was about 4 years ago that Sensory partnered with Vlingo to create a voice assistant with a special “in car” mode that would allow the user to just say “Hey Vlingo” then ask any question. This was one of the first “TrulyHandsfree” voice experiences on a mobile phone, and it was this feature that was often cited for giving Vlingo the lead in the mobile assistant wars (and helped lead to their acquisition by Nuance).

About 2 years ago Sensory introduced a few new concepts including “trigger to search” and our “deeply embedded” ultra-low power always listening (now down to under 2mW, including audio subsystem!). Motorola took advantage of these excellent approaches from Sensory and created what I most biasedly think is the best voice experience on a mobile phone. Samsung too has taken the Sensory technology and used in a number of very innovative ways going beyond mere triggers and using the same noise robust technology for what I call “sometimes always listening”. For example when the camera is open it is always listening for “shoot” “photo” “cheese” and a few other words.

So I’m curious about what Google, Microsoft, and Apple will do to push the boundaries of voice control further. Clearly all 3 like this “sometimes always on” approach, as they don’t appear to be offering the low power options that Motorola has enabled. At Apple’s WWDC there wasn’t much talk about Siri, but what they did say seemed quite similar to what Sensory and Vlingo did together 4 years ago…enable an in car mode that can be triggered by “Hey Siri” when the phone is plugged in and charging.

I don’t think that will be all…I’m looking forward to seeing what’s really in store for Siri. They have hired a lot of smart people, and I know something good is coming that will make me go back to the iPhone, but for now it’s Moto and Samsung for me!

Biometrics – The Studies Don’t Reveal the Truth

May 7, 2014

If you read through the biometrics literature you will see a general security based ranking of biometric techniques starting with retinal scans as the most secure, followed by iris, hand geometry and fingerprint, voice, face recognition, and then a variety of behavioral characteristics.

The problem is that these studies have more to do with “in theory” than “in practice” on a mobile phone, but they never-the-less mislead many companies into thinking that a single biometric can provide the results required. This is really not the case in practice. Most companies will require that False Accepts (error caused by wrong person or thing getting in) and False Rejects (error caused by the right person not getting in) be so low that the rate where these two are equal (equal error rate or EER) would be well under 1% across all conditions. Here’s why the studies don’t reflect the real world of a mobile phone user:

  1. Cost is key. Mobile phone manufacturers will not be willing to invest in the highest end approaches for capturing and measuring biometrics that are used by academic studies. This means less MIPS less memory, and poorer quality readers.
  2. Size matters. Mobile phone manufacturers have extremely limited real estate, so larger systems cannot be properly deployed, and further complicating things is that an extremely fast enrollment and usage is required without a form factor change.
  3. Conditions are uncontrollable. Noisy environments, lighting, dirty hands, oily screens/cameras/readers are all uncontrollable and will affect performance
  4. User compliance cannot be assumed. The careful placement of an eye, finger or face does not always happen.

A great case in point is the fingerprint readers now deployed by Apple and Samsung. These are extremely expensive devices, and the literature would make one think that they are highly accurate, but Apple doesn’t have the confidence to allow them to be used in the iTunes store for ID, and San Jose Mercury News columnist Troy Wolverton says:

“I’ve not been terribly happy with the fingerprint reader on my iPhone, but it puts the one on the S5 to shame. Samsung’s fingerprint sensor failed repeatedly. At best, I would get it to recognize my print on the second try. But quite often, it would fail so many times in a row that I’d be prompted to enter my password instead. I ended up turning it off because it was so unreliable (full article).”

There is a solution to this problem…It’s to utilize sensors already on the phone to minimize cost, and deploy a biometric chain combining face verification, voice verification, or other techniques that can be easily implemented in a user friendly manner that allows the combined usage to create a very low equal error rate, that become “immune” to conditions and compliance issues by having a series of biometric and other secure backup systems.

Sensory has an approach we call SMART, Sensory Methodology for Adaptive Recognition Thresholding that takes a look at environmental and usage conditions and intelligently deploys thresholds across a multitude of biometric technologies to yield a highly accurate solution that is easy to use and fast in responding yet robust to environmental and usage models AND uses existing hardware to keep costs low.

Mobile phones – It doesn’t have to be Cost OR Quality!

April 25, 2014

It’s not often that I rave about articles I read, but Ian Mansfield of Cellular News hit the nail on the head with this article.

Not only is it a well written and concise article but its chock full of recent data (primarily from JD Power research), and most importantly it’s data that tells a very interesting story that nicely aligns with Sensory’s strategy in mobile. So, thanks Ian, for getting me off my butt to start blogging again!

A few key points from the article:

  1. Price is becoming increasingly important in the choice of mobile phones, and simultaneously the prices of mobile phones are increasing.
  2. Although price might be the most important factor in choice, the overall customer satisfaction is driven by features.
  3. The features customers want are seamless voice control (36%); built-in sensors that can gauge temperature, lighting, noise and moods to customize settings to the environment (35%); and facial recognition and biometric security (28%).
  4. As everyone knows, Samsung and Apple have the overwhelming market share in mobile phones, but interesting to me was that they also both lead in customer satisfaction.

Now, let me dive one step deeper into the problem, and explore whether customer satisfaction can be achieved with minimal impact on cost:

Seamless voice control is here and soon every phone will have it, and it doesn’t add any hardware cost. Sensory introduced the technology with our TrulyHandsfree technology that allows users to just start talking, and our “trigger to search” technology has been nicely deployed by companies like Motorola that pioneered this “seamless voice control” in many of their recent releases. The seamless voice control really doesn’t add much cost, and with excellent engines from Google and Apple and Microsoft sitting in the clouds, it can and will be nicely implemented without effecting handset pricing.

Sensors are a different story. By their nature they will be embedded into the phones and will increase cost. Some “sensors” in the broadest sense of the term are no brainers and necessities, for example microphones and cameras are a must have, and the six-axis sensors combining GPS and accelerometers are arguably must haves as well. Magnetometers, barometers are getting increasingly common, and to differentiate further leading manufacturers are embedding things like heartbeat monitors; stereo 3D cameras are just around the corner. To address the desire for biometric security Samsung and Apple have the 2 bestselling phones in the world embedded with fingerprint sensors!

The problem is that all these sensors add cost, and in particular those finger print sensors are the most expensive and can add $5-$15 to the cost of goods. It’s kind of ironic that after spending all that money on biometric security, Apple doesn’t even allow them as a security measure for purchasing iTunes. And both Samsung and Apple have been chastised for fingerprint sensors that can be cracked with gummy bears or glue!

A much more accurate and cost effective solution can be achieved for biometrics by using the EXISTING sensors on the phones and not adding special purpose biometric sensors. In particular, the “must have sensors” like microphones, cameras, and 6-axis sensors can create a more secure environment that is just as seamless but much less difficult to crack. I’ll talk more about that in my next blog.

“Always Listening” doesn’t have to be always listening

August 16, 2013

I saw a post recently in the Android Central forum that talked about Sensory’s technology as used by Samsung:

What makes it different from any other voice app is its part of the OS. e.g. get a call, you can say ‘Answer’ or ‘Ignore’. alarm rings, just say ‘Snooze’. You don’t have to launch an app or press buttons to do this, the phone is always active and listening. No one else does this!

It’s an astute comment but not 100% accurate. When people talk about “always listening” what they really mean is that it appears to be “always listening”. At Sensory we call it TrulyHandsfree, and the idea is that there can be certain “modes” or “windows” where it listens for specific words. Like when the alarm goes off, it listens for “snooze” etc. If you say “snooze” when the alarm isn’t going off you find it’s not really “always listening”.

Glass has a similar usage model. It’s “always listening” but for different things at different times and only for short periods of time. I put my Glass on and timed it. The OK Glass trigger window seems to last 3-4 seconds, then the next set of commands (like Get Directions to) stays on 10-11 seconds.

What’s really cool about Glass is that during those listening windows you can say other things and it doesn’t “false fire” on them. I let my wife try out my Glass, and she said “You mean I just say OK Glass and then I can say any of these things like get directions to Chef Chu’s and…woah it works!” It ignores everything it’s not listening for and picks out the things it is listening for. The technology is known as “keyword spotting” for this reason.

To save power, Hallmark’s use of Sensory’s technology kicks into gear when the product is turned on. If it doesn’t hear one of the words it’s listening for spoken within a certain time frame, it will automatically power down, and stop “always listening” until its turned back on with a button press.

Sensory recently introduced a low power sound detection technology that further cuts power consumption by having the device “always listening” in a low power mode, where it doesn’t perform speech recognition. When it hears something it quickly powers up the recognizer for further analysis. This can cut the power consumption by “always listening” but not always recognizing, down to 1mA or so.

The price of free phone features

August 5, 2013

I often get the question, “If Android and Qualcomm offer voice activation for free, why would anyone license from Sensory?” While I’m not sure about Android and Qualcomm’s business models, I do know that decisions are based on accuracy, total added cost (royalties plus hardware requirements to run), power consumption, support, and other variables. Sensory seems to be consistently winning the shootouts it enters for embedded voice control. Some approaches that appear lower cost require a lot more memory or MIPS, driving up total cost and power consumption.

It’s interesting to note that companies like Nuance have a similar challenge on the server side where Google and Microsoft “give it away”. Because Google’s engine is so good it creates a high hurdle for Nuance. I’d guess Google’s rapid progress helps Nuance with their licensing of Apple, but may have made it more challenging to license Samsung. Samsung actually licensed Vlingo AND Nuance AND Sensory, then Nuance bought Vlingo.

Why doesn’t Samsung use Google recognition if it’s free? On the server it’s not power consumption effecting decisions, but cost, quality, and in this case CONTROL. On the cost side it could be that Samsung MAKES more money by using Nuance in some sort of ad revenue kickbacks, which I’d guess Google doesn’t allow. This is of course just hypothesizing. I don’t really know, and if I did know I couldn’t say. The control issue is big too as companies like Sensory and Nuance will sell to everyone and in that sense offer platform independence and more control. Working with a Microsoft or Google engine forces an investment in a specific platform implementation, and therefore less flexibility to have a uniform cross platform solution.

Apple Siri vs Android Voice Actions/Google Now vs Samsung S Voice.

August 2, 2013

  • Kudos to Apple for kicking it off with Siri, but shame on Apple for so quickly allowing its competitors to get ahead.
  • Kudos to Android for the speech technology devt. It’s really great, but Android – get a clue and hire some marketing people: Apple and Samsung have kicked your butt in branding the speech technology.  I still don’t know what to call it!
  • Kudos to Samsung for being first to have an always listening feature that worked!! You can’t stop there… Moto is beating you to the punch in a low power version and in the simple “trigger to search” flow with voice.

What about Microsoft and Amazon? Both have good cloud based recognition engines in house but neither seem particularly relevant in Mobile…YET!

Kudos to Microsoft for its always listening feature in XBox! It’s actually the best implementation I’ve seen that doesn’t use Sensory technology. I’ll blog more about how they do it and why they can’t do a low power implementation in the weeks ahead.

CES 2013

January 15, 2013

I’ve been going to CES for about 30 years now. More than half of that has been with Sensory selling speech recognition. This year I reminisced with Jeff Rogers (Sensory’s VP Sales who has been at Sensory almost as long as me) about Sensory’s first CES back in 1995 where we walked around with briefcases that said “Ask Me About Speech Recognition for Consumer Electronics”.  A lot of people did ask! There’s always been a lot of interest in speech recognition for consumer electronics, but today it goes beyond interest…it’s in everything from the TV’s to the Cars to Bluetooth devices…and a lot of that is with Sensory technology. Often we are paired with Nuance, Google and increasingly ATT as the cloud speech solution, while Sensory is the client.
In 2013, Sensory counted about 20 companies showing its technology on the floor or in private meeting rooms. An increasing percentage of our products are now connected to the cloud and using client/cloud speech schemes. Here’s just a short summary of some of the new things here at the show:

Bluetooth
BlueAnt, Bluetrek, Drive and Talk, Monster Cable, Motorola, Plantronics, all showed products using Sensory’s BlueGenie speech technologies for Bluetooth devices. I noticed Plantronics won a show award for one of their new devices with Sensory technology. This market seems to have flattened and stopped growing, and Sensory is lucky to be working with the leaders who appear to be gaining in marketshare against their competition…correlation or causation??   ;-) Our customers in this segment introduced a dozen or more new products ranging from carkits to headsets to Bluetooth speaker systems.

Chip Companies
Conexant announced their new DSP CX20865 running Sensory’s TrulyHandsfree and gave demo’s in their Suite at the LVH.
Tensilica announced their new HIFI Mini and gave some of the best demo’s on the showroom floor of speech recognition (Sensory’s of course!) working in adverse noise conditions at ultra low power.

Automotive
QNX showed off their beautiful Bentley concept car with built in graphics and speech recognition including Sensory’s TrulyHandsfree Voice Control paired with AT&T’s cloud based Watson ASR engine
Visteon – Did some pretty neat demo’s that we can’t discuss other to say they featured Sensory’s TrulyHandsfree Voice Control! The car companies love us because WE WORK in noise!

Other
Samsung had a huge booth showing Galaxy products (Note, S3, etc.) using Sensory’s TrulyHandsfree triggers as a part of the S-Voice system
VTech showed a variety of phone products using Sensory technologies including our micro-TTS solutions for caller ID
IVEE paired a Sensory IC for local command and operation with the ATT cloud recognizer to create a very impressive demo that got nice coverage on NPR! (scroll down to “heard on the air”)
Behind closed doors – around half a dozen other companies showed cool new things in private suites. Unfortunately I can’t discuss these, but I will say that 2013 will see some major product releases with interesting user experiences and Sensory will be very proud to be a part of these!
My favorite non Sensory things – Yeah the 4K/8K TV’s were pretty amazing. Crisper than real life, which doesn’t seem possible but it’s true. The new 3D printers and services to make hardware prototypes are amazing (why isn’t HP dominating this market???). But…my favorite stuff is robotics. There was a robot glass cleaner that climbs vertically around windows and cleans them off without falling. Kinda like a Roomba for windows. I met some hacker guys that as a hobby make giant servo/mechanical/electro robot snakes and creatures they can ride in. Think MadMax/Burning Man kinds of artistic technology. I have some neat video’s of this I’ll send anyone who wants them.

A Tale of Two Awards

August 5, 2011

I recently learned about 2 awards that Sensory has won over the past year. The contrast is in how we learned about them, and the different nature of these awards. It’s really amusing, so I thought I’d share my take.

Both awards were for our TrulyHandsfree™ Voice Control. One was for the significance of Sensory’s truly hands-free trigger in implementing speech recognition without using buttons, and the other was for Sensory’s chip-based implementation of a truly hands-free interface.

The first award came from Speech Technology Magazine. Sensory won their Star Performer award for 2011, and I didn’t even know we had been nominated. In fact, nobody ever told me that we had won; I found out really by chance (thanks, Bernie!) They only gave out four of these awards this year, and I’m honored and thrilled that Sensory won one of them. It’s really a testament to our team behind TrulyHandsfree… IT’S THE MOST AMAZING TECHNOLOGY. I sent kudos to Speech Tech for having the insight to understand the significance of this technology! Speech Technology Magazine has gotten so independent and non-self-serving in their awards process, that they didn’t even take the opportunity to call us and let us know! Now we know, so thanks again, Speech Tech!

In contrast…The second award came from a market research firm I’ll call the Cold Irishman. Why don’t I use their real name? Well I can’t or they might sue me. I received a call from their “Manager of IP and Copyrights” to congratulate me, and to let me know about their thoroughly independent and fair process that looked at the entire speech market and decided that Sensory stood out… blah blah blah…

I knew there was something funny going on by the guy’s title. Yeah you guessed it. To be able to tell people we won their award costs a certain price; you pay more the more you want to use it, and you can even pay more to go to an awards banquet. He offered me programs for as little as $10K, which went up in price to WAY more than that. One of the more expensive programs was that they’d make a video for us receiving the award with lots of praise from their esteemed analysts. So, I decided to go onto YouTube and see for myself how many hits last year’s award winners were getting…my memory said low double digits, but that didn’t seem possible (Sensory’s little home-made video’s often get thousands of hits.) Just for fun I looked just now at this year’s award winners – one of them had only 10 (yes TEN) hits. Most of them must have been employees… Pretty hefty price to stroke your own own ego and get almost nothing in return! I’ve always wondered who pays to be in Whoever’s Whatever? It’s probably the same CEO’s that pay to go to award dinners!

So…Many Thanks to Leonard Klie and Speech Technology Magazine…and Cold Irishman…thanks, but no thanks! Sensory deserves recognition for innovation in speech technologies based on our hard work, not on how much we pay to market it.

Todd
sensoryblog@sensoryinc.com