Archive for the ‘ICs’ Category
August 6, 2015
We first came out with TrulyHandsfree about five years ago. I remember talking to speech tech executives at MobileVoice as well as other industry tradeshows, and when talking about always-on hands-free voice control, everybody said it couldn’t be done. Many had attempted it, but their offerings suffered from too many false fires, or not working in noise, or consuming too much power to be always listening. Seems that everyone thought a button was necessary to be usable!
In fact, I remember the irony of being on an automotive panel, and giving a presentation about how we’ve eliminated the need for a trigger button, while the guy from Microsoft presented on the same panel the importance of where to put the trigger button in the car.
Now, five years later, voice activation is the norm… we see it all over the place with OK Google, Hey Siri, Hey Cortana, Alexa, Hey Jibo, and of course if you’ve been watching Sensory’s demos over the years, Hello BlueGenie!
Sensory pioneered the button free, touch free, always-on voice trigger approach with TrulyHandsfree 1.0 using a unique, patented keyword spotting technology we developed in-house– and from its inception, it was highly robust to noise and it was ultra-low power. Over the years we have ported it to dozens of platforms, Including DSP/MCU IP cores from ARM, Cadence, CEVA, NXP CoolFlux, Synopsys and Verisilicon, as well as for integrated circuits from Audience, Avnera, Cirrus Logic, Conexant, DSPG, Fortemedia, Intel, Invensense, NXP, Qualcomm, QuickLogic, Realtek, STMicroelectronics, TI and Yamaha.
This vast platform compatibility has allowed us to work with numerous OEMs to ship TrulyHandsfree in over a billion products!
Sensory didn’t just innovate a novel keyword spotting approach, we’ve continually improved it by adding features like speaker verification and user defined triggers. Working with partners, we lowered the draw on the battery to less than 1mA, and Sensory introduced hardware and software IP to enable ultra-low-power voice wakeup of TrulyHandsfree. All the while, our accuracy has remained the best in the industry for voice wakeup.
We believe the bigger, more capable companies trying to make voice triggers have been forced to use deep learning speech techniques to try and catch up with Sensory in the accuracy department. They have yet to catch up, but they have grown their products to a very usable accuracy level, through deep learning, but lost much of the advantages of small footprint and low power in the process.
Sensory has been architecting solutions for neural nets in consumer electronics since we opened the doors more than 20 years ago. With TrulyHandsfree 4.0 we are applying deep learning to improve accuracy even further, pushing the technology even more ahead of all other approaches, yet enabling an architecture that has the ability to remain small and ultra-low power. We are enabling new feature extraction approaches, as well as improved training in reverb and echo. The end result is a 60-80% boost in what was already considered industry-leading accuracy.
I can’t wait for TrulyHandsfree 5.0…we have been working on it in parallel with 4.0, and although it’s still a long ways off, I am confident we will make the same massive improvements in speaker verification with 5.0 that we are doing for speech recognition in 4.0! Once again further advancing the state of the art in embedded speech technologies!
June 3, 2015
When I started Sensory over 20 years ago, I knew how difficult it would be to sell software to cost sensitive consumer electronic OEMs that would know my cost of goods. A chip based method of packaging up the technology made a lot of sense as a turnkey solution that could maintain a floor price by adding the features of a microcontroller or DSP with the added benefit of providing speech I/O. The idea was “buy Sensory’s micro or DSP and get speech I/O thrown in for free”.
After about 10 years it was becoming clear that Sensory’s value add in the market was really in technology development, and particularly in developing technologies that could run on low cost chips and with smaller footprints, less power, and superior accuracy than other solutions. Our strategy of using trailing IC technologies to get the best price point was becoming useless because we lacked the scale to negotiate the best pricing, and more cutting edge technologies were becoming further out of reach; even getting the supply commitments we needed was difficult in a world of continuing flux between over and under capacity.
So Sensory began porting our speech technologies onto other people’s chips. Last year about 10% of our sales came from our internal IC’s! Sensory’s DSP, IP, and platform partners have turned into the most strategic of our partnerships.
Today in the semiconductor industry there is a consolidation that is occurring that somewhat mirrors Sensory’s thinking over the past 10 years, albeit at a much larger scale. Avago pays $37 billion dollars for Broadcom, Intel pays $16.7B for Altera, and NXP pays $12B for Freescale, and the list goes on, dwarfing acquisitions of earlier time periods.
It used to be the multi-billion dollar chip companies gobbled up the smaller fabless companies, but now even the multibillion-dollar chip companies are being gobbled up. There’s a lot of reasons for this but economies of scale is probably #1. As chips get smaller and smaller, there are increasing costs for design tools, tape outs, prototyping, and although the actual variable per chip cost drops, the fixed costs are skyrocketing, making consolidation and scale more attractive.
That sort of consolidation strategy is very much a hardware centered philosophy. I think the real value will come to these chip giants through in house technology differentiation. It’s that differentiation that will add value to their chips, enabling better margins and/or more sales.
I expect that over time the chip giants will realize what Sensory concluded 10 years ago…that machine learning, algorithmic differentiation, and software skills, are where the majority of the value added equation on “smart” chips needs to come from, and that improving the user experience on devices can be a pot of gold! In fact, we have already seen Intel, Qualcomm and many other chip giants investing in speech recognition, biometrics, and other user experience technologies, so the change is underway!
November 15, 2013
Android introduced the new KitKat OS for the Nexus 5, and Sensory has gotten lots of questions about the new “always listening” feature that allows a user to say “OK Google” followed by a Google Now search. Here’s some of the common questions:
August 7, 2013
Running on the OS level does drain power. Even using the baseband or apps processors can be inefficient. This is why Sensory has ported to processors such as Cirrus Logic, Conexant, DSPG, Realtek, Texas Instruments, Wolfson, and all of the leading IP platforms (ARM, CEVA, Tensilica, Verisilicon, etc.), and many more. Our chip and IP partners are reporting power consumption as low as 1.7mA. That includes the microphone and preamp circuit, and it can go even lower by using a sound detection front end to turn off processing in quiet conditions. Tensilica has even introduced the HiFi Mini IP core targeting Sensory’s TrulyHandsfree approach for companies that want to use always on technology without extra bells and whistles.
May 1, 2013
January 15, 2013
I’ve been going to CES for about 30 years now. More than half of that has been with Sensory selling speech recognition. This year I reminisced with Jeff Rogers (Sensory’s VP Sales who has been at Sensory almost as long as me) about Sensory’s first CES back in 1995 where we walked around with briefcases that said “Ask Me About Speech Recognition for Consumer Electronics”. A lot of people did ask! There’s always been a lot of interest in speech recognition for consumer electronics, but today it goes beyond interest…it’s in everything from the TV’s to the Cars to Bluetooth devices…and a lot of that is with Sensory technology. Often we are paired with Nuance, Google and increasingly ATT as the cloud speech solution, while Sensory is the client.
May 30, 2012
Sensory’s had a lot of press lately. We made 3 big announcements all pretty much together:
1) Announcing speaker verification
2) Announcing speaker identification
3) Saying Sensory is in the Samsung Galaxy S3
Sensory announced these just before CTIA in New Orleans. We had a small booth at the show, and gave demos at several events (on the CTIA stage and floor, at the Mobility Awards dinner, and at the excellent Pepcom Mobile Focus event).
We got a lot of nice press from this. I was thrilled that the Speech Technology email newsletter put our verification release as the featured and lead story. One of the articles I like best, though, just came out last week by Pete Pachal at Mashable http://mashable.com/2012/05/29/sensory-galaxy-s-iii/
This article is great for several key reasons. One is that Pete gets it. He didn’t just reprint our press release, but he added his commentary and wrapped it up in a nice story that hits some of the key issues.
However, what’s best is what the readers wrote in. I LOVE their insights and comments. Here’s a few of the dialogs with my commentary attached:
Seriously??? You still need to push a button to use Siri? I’ve had the “wake with voice” option on my crusty old HTC Incredible, via VLingo inCar, for about 2 years now. Hard to believe Apple is that far behind.
My response: EXACTLY JB! In fact that crusty old HTC using Vlingo, also uses Sensory’s TrulyHandsfree approach! Vlingo was our first licensee in the mobile space.
Scott: But this is talking about OS integration instead of app integration. And as I’m sure you’ve seen on your phone, and as the article noted, wake with voice options currently use a lot of power, which means I can’t see a lot of people willing to use it.
My response: Precisely, Scott! This is why we are implementing the “deeply embedded” approach that will take power consumption down by a factor of 10! Nevertheless, users LOVE it even if it consumes power:
JB – I use it all the time and since my phone plugs into the car’s adapter, I don’t really worry at all about power usage. It’s never been a problem.
My response – Yes, Vlingo and Samsung did a very nice implementation by having an “always listening” mode, particularly useful while driving. Other approaches we expect to see in the future are intelligent sensor based approaches so the phone knows when to listen and when not to (e.g. why not have it turn on and listen whenever you start traveling past 20 MPH, etc.)
Is there anything to prevent me from messing with another person’s phone?
Fillfill Ha ha, imagine being in an auditorium and yelling “Hi Galaxy! … Erase Address Book! … Confirm!”
My comment – Funny! This is one of the reasons we have added speaker verification and identification features to the trigger function
DhanB – Siri doesn’t require a button. It can be activated by lifting the phone up to your face.
Great reader responses:
Darkreaper – …..while driving? (Right! That’s illegal in California and other states!)
Tone – Yes, but with the Samsung Galaxy II, I don’t have to touch it at all. As the article states, this is crucial when you’re in a situation, such as driving. I’ve dropped the phone on the floor while driving and I was still able to send a text message, an email and place a call with it sliding around the back seat. (Bluetooth) iPhone can’t compete, sorry. :-/
…and of course the old “butt dialing” problem:
Jason – This makes me think of the old “butt dialing” problem when you sat down on your phone cause I’d much prefer a manual trigger to prevent accidental usage.
My comment: Once again, I agree with the readers. Sensory isn’t pushing to force “always listening” modes on users, we just want to allow them the choice. We strongly recommend that products have multiple options for anything that can be done by voice or touch. We believe the users should have the right and the ability to access the power of mobile devices without being forced to touch them. And if they want to turn off this ability, that is certainly their choice! We turn off our ringers (at least we should) when we enter a meeting or go to the movies. Likewise, we can turn off hands free voice control when it’s not appropriate…and with the growing presence and power of intelligent sensors, it will get easier and easier (albeit with some mishaps along the way!) for the phones to know when they should listen!
A lot of people commented about Siri. Apple isn’t stupid. They get it that hitting buttons isn’t the most convenient way to always access voice control. That’s why there’s a sensor in place when you lift the phone to your face (of course still requiring touch), it’s also why Siri can speak back. Apple pushed the Voice User Interface forward with Siri…Samsung pushed it further with TrulyHandsfree wake up. There will be a lot of back and forth over the coming years and voice features will continue as a major battleground.
As devices get increasing utility WITHOUT touching the phones (e.g. remote control functions, accessing and receiving data by voice, etc.), the need for a TrulyHandsfree approach will grow stronger and stronger, and Sensory will continue to have the BEST solution – More Accurate, Lower Power, Faster Response Times, and NOW with built in speaker verification or speaker ID!
January 27, 2012
Lot’s of thoughts…no time to share them…So I’ll be brief in a few different areas:
September 17, 2011
I decided to pop up to San Francisco this week to hit the Intel Developer Forum. It’s open to the public, but it’s really more of a show and tell to Intel employees than from them.
One of the sessions was entitled “Enhanced Experiences with Low Power Speech Recognition,” and this was my main reason for being there. Intel’s Devon Worrell gave a very nice presentation, focusing on the importance of a closed computer being not just a brick, but still having functionality in a low power state. He put up a lot of compelling slides about using speech recognition in this mode, and emphasized the need for low-power command and control with an always-on always listening device that responds to commands…hmmmm…sounds like a page right out of the Sensory bible!
Realtek appears to have been selected by Intel as a chip provider for the low-power speech recognition, and they presented at the session and even gave a demo of their in-house speech recognition technology. I wasn’t very impressed; the idea was for it to work in music with the user not speaking directly into the microphone. For the demo, however, the music was so quiet the audience could barely tell it was on, and the speaker spoke only a few inches from the mic. I had a hard time understanding if it was working or not (well, that’s giving it the benefit of the doubt.)
Jean-Marc Jot from DTS also spoke and gave an impressive presentation and demo. Of course, I’m very biased….The DTS speech recognition demo used Sensory’s TrulyHandsfree™ Voice Control. I was a bit nervous because of Jean-Marc’s French accent and the fact that DTS had created their own TrulyHandsfree trigger phrase, “Hello Jennifer” without any assistance from Sensory. (As a side note, Sensory’s TrulyHandsfree 2.0 SUBSTANTIALLY improves performance, but there are a number of complex variables in our algorithm that are not accessible through our SDK’s, and therefore our customers can not yet use the latest technology to its fullest extent unless Sensory fine tunes the vocabularies in-house.) So…Jean-Marc was demoing our earliest incarnation of TrulyHandsfree Voice Control, with a French accent in a noisy room and with a command set that Sensory has never reviewed.
The demo was AWESOME. Jean-Marc spoke about 3 feet from the mic, and said commands like “Hey Jennifer…play Lady Gaga.” The music was cranked up really loud, and Jean-Marc spoke commands like “fast forward” and other music controls as well as calling up songs by name. I have a habit of counting speech recognition errors… On the trigger there were no false positives (accidental firing), and only 2 false negatives (where Jean-Marc needed to repeat the trigger phrase). That was 2 out of about 30 or 40 uses, indicating a 94% or 95% acceptance accuracy in high noise, and the phrases following the trigger had about the same high accuracy.
Sweet Demo of how speech recognition can work in a low-power mode and be always on and listening for commands even in high noise situations!
July 12, 2011
I have a new favorite toy. It’s Watson the Raccoon, one of Hallmark’s new Interactive Storybook and Story Buddy™ characters.
I used to have the time to buy every product that featured one of Sensory’s technologies. I have to admit I don’t do that much anymore, and as my kids have gotten older, I buy fewer and fewer toys in general, much less ones with speech recognition. Luckily I was visiting my dad, and he had purchased one out of curiosity.
For many years, my favorite Sensory-based toy was from the very early days of Sensory called Radar the Robot, from Fisher Price. I have fond memories of my kids imitating Radar, and also remember biking around Bali on my honeymoon, going for a night ride through the jungle by myself just to find a fax machine so I could send an agreement draft back to Fisher Price (Design Win for Radar!)
Sorry Radar…Watson has removed you from your pedestal. Watson now reigns supreme!
Hallmark’s Interactive Storybooks use Sensory’s NLP based processors with TrulyHandsfree™ Voice Control. As you read the book, the Story Buddy™ listens and interacts appropriate when it hears you say different phrases from the book.
I knew the concept was great when we first did Jingle the Husky Pup with Hallmark. I also knew that these products were selling really well, and I even knew that TrulyHandsfree™ Voice Control is the MOST AMAZING technology to ever come out of Sensory (Hey – I just got an email from the manager of one of the larger speech organizations in the world and he said “we keep trying to break your TrulyHandsfree™ 2.0 beta technology, but we just can’t seem to make it fail!”)
What I didn’t know is what an EXCELLENT job Hallmark does in story writing, character creation, and putting the whole thing together to make a really fun experience that really works! The book starts off with Watson wondering why grass grows up and rain falls down… I love that line!
Kudos to you Hallmark…now I gotta go buy a Watson for my lobby!