Posts Tagged ‘Intel’
June 22, 2016
I’ve written a series of blogs about consumer devices with speech recognition, like Amazon Echo. I mentioned that everyone is getting into the “always listening” game (Alexa, OK Google, Hey Siri, Hi Galaxy, Assistant, Hey Cortana, OK Hound, etc.), and I’ve explained that privacy concerns attempt to be addressed by putting the “always listening” mode on the device, rather than in the cloud.
Let’s now look deeper into the “always listening” approaches and compare some of the different methods and platforms available for embedded triggers.
There are a few basic approaches for running embedded voice wakeup triggers:
First, is running on an embedded DSP, microprocessor, and/or smart microphones. I like to think of this as a “deeply embedded: approach as opposed to running embedded on the operating system (OS). Knowles recently announced a design with a smart mike that provides low-power wake up assistance.
Many leading chip companies have small DSPs that are enabled for “wake up word” detection. These vendors include Audience, Avnera, Cirrus Logic, Conexant, DSPG, Fortemedia, Intel, InvenSense, NXP, Qualcomm, QuickLogic, Realtek, STMicroelectronics, TI, and Yamaha. Many of these companies combine noise suppression or acoustic echo cancellation to make these chips add value beyond speech recognition. Quicklogic recently announced availability of an “always listening” sensor fusion hub, the EOS S3, which lets the sensor listen while consuming very little power.
Next is DSP IP availability. The concept of low-power voice wakeup has gotten so popular amongst processor vendors that the leading DSP/MCU IP cores from ARM, Cadence, CEVA, NXP CoolFlux, Synopsys, and Verisilicon all offer this capability, and some even offer special versions targeting this function.
Running on an embedded OS is another option. Bigger systems like Android, Windows, or Linux can also run voice wake-up triggers. The bigger systems might not be so applicable for battery-operated devices, but they offer the advantage of being able to implement larger and more powerful voice models that can improve accuracy. The DSPs and MCUs might run a 50-kbyte trigger at 1 mA, while bigger systems can cut error rates in half by increasing models to hundreds of megabytes and power consumption to hundreds of milliamps. Apple used this approach in its initial implementation of Siri, thus explaining why the iPhone needed to be plugged in to be “always listening.”
Finally, one can try combinations and multi-level approaches. Some companies are implementing low-power wake-up engines that look to a more powerful system when woken up to confirm its accuracy. This can be done on the device itself or in the cloud. This approach works well for more complex uses of speech technology like speaker verification or identification, where the DSPs are often crippled in performance and a larger system can implement a more state of the art approach. It’s basically getting the accuracy of bigger models and systems, while lowering power consumption by running a less accurate and smaller wakeup system first.
A variant of this approach is accomplished with a low-power speech detection block acting as an always listening front-end, that then wakes up the deeply embedded recognition. Some companies have erred by using traditional speech-detection blocks that work fine for starting a recording of a sentence (like an answering machine), but fail when the job is to recognize a single word, where losing 100 ms can have a huge effect on accuracy. Sensory has developed a very low power hardware sound-detection block that runs on systems like the Knowles mike and Quicklogic sensor hub.
August 6, 2015
We first came out with TrulyHandsfree about five years ago. I remember talking to speech tech executives at MobileVoice as well as other industry tradeshows, and when talking about always-on hands-free voice control, everybody said it couldn’t be done. Many had attempted it, but their offerings suffered from too many false fires, or not working in noise, or consuming too much power to be always listening. Seems that everyone thought a button was necessary to be usable!
In fact, I remember the irony of being on an automotive panel, and giving a presentation about how we’ve eliminated the need for a trigger button, while the guy from Microsoft presented on the same panel the importance of where to put the trigger button in the car.
Now, five years later, voice activation is the norm… we see it all over the place with OK Google, Hey Siri, Hey Cortana, Alexa, Hey Jibo, and of course if you’ve been watching Sensory’s demos over the years, Hello BlueGenie!
Sensory pioneered the button free, touch free, always-on voice trigger approach with TrulyHandsfree 1.0 using a unique, patented keyword spotting technology we developed in-house– and from its inception, it was highly robust to noise and it was ultra-low power. Over the years we have ported it to dozens of platforms, Including DSP/MCU IP cores from ARM, Cadence, CEVA, NXP CoolFlux, Synopsys and Verisilicon, as well as for integrated circuits from Audience, Avnera, Cirrus Logic, Conexant, DSPG, Fortemedia, Intel, Invensense, NXP, Qualcomm, QuickLogic, Realtek, STMicroelectronics, TI and Yamaha.
This vast platform compatibility has allowed us to work with numerous OEMs to ship TrulyHandsfree in over a billion products!
Sensory didn’t just innovate a novel keyword spotting approach, we’ve continually improved it by adding features like speaker verification and user defined triggers. Working with partners, we lowered the draw on the battery to less than 1mA, and Sensory introduced hardware and software IP to enable ultra-low-power voice wakeup of TrulyHandsfree. All the while, our accuracy has remained the best in the industry for voice wakeup.
We believe the bigger, more capable companies trying to make voice triggers have been forced to use deep learning speech techniques to try and catch up with Sensory in the accuracy department. They have yet to catch up, but they have grown their products to a very usable accuracy level, through deep learning, but lost much of the advantages of small footprint and low power in the process.
Sensory has been architecting solutions for neural nets in consumer electronics since we opened the doors more than 20 years ago. With TrulyHandsfree 4.0 we are applying deep learning to improve accuracy even further, pushing the technology even more ahead of all other approaches, yet enabling an architecture that has the ability to remain small and ultra-low power. We are enabling new feature extraction approaches, as well as improved training in reverb and echo. The end result is a 60-80% boost in what was already considered industry-leading accuracy.
I can’t wait for TrulyHandsfree 5.0…we have been working on it in parallel with 4.0, and although it’s still a long ways off, I am confident we will make the same massive improvements in speaker verification with 5.0 that we are doing for speech recognition in 4.0! Once again further advancing the state of the art in embedded speech technologies!
June 3, 2015
When I started Sensory over 20 years ago, I knew how difficult it would be to sell software to cost sensitive consumer electronic OEMs that would know my cost of goods. A chip based method of packaging up the technology made a lot of sense as a turnkey solution that could maintain a floor price by adding the features of a microcontroller or DSP with the added benefit of providing speech I/O. The idea was “buy Sensory’s micro or DSP and get speech I/O thrown in for free”.
After about 10 years it was becoming clear that Sensory’s value add in the market was really in technology development, and particularly in developing technologies that could run on low cost chips and with smaller footprints, less power, and superior accuracy than other solutions. Our strategy of using trailing IC technologies to get the best price point was becoming useless because we lacked the scale to negotiate the best pricing, and more cutting edge technologies were becoming further out of reach; even getting the supply commitments we needed was difficult in a world of continuing flux between over and under capacity.
So Sensory began porting our speech technologies onto other people’s chips. Last year about 10% of our sales came from our internal IC’s! Sensory’s DSP, IP, and platform partners have turned into the most strategic of our partnerships.
Today in the semiconductor industry there is a consolidation that is occurring that somewhat mirrors Sensory’s thinking over the past 10 years, albeit at a much larger scale. Avago pays $37 billion dollars for Broadcom, Intel pays $16.7B for Altera, and NXP pays $12B for Freescale, and the list goes on, dwarfing acquisitions of earlier time periods.
It used to be the multi-billion dollar chip companies gobbled up the smaller fabless companies, but now even the multibillion-dollar chip companies are being gobbled up. There’s a lot of reasons for this but economies of scale is probably #1. As chips get smaller and smaller, there are increasing costs for design tools, tape outs, prototyping, and although the actual variable per chip cost drops, the fixed costs are skyrocketing, making consolidation and scale more attractive.
That sort of consolidation strategy is very much a hardware centered philosophy. I think the real value will come to these chip giants through in house technology differentiation. It’s that differentiation that will add value to their chips, enabling better margins and/or more sales.
I expect that over time the chip giants will realize what Sensory concluded 10 years ago…that machine learning, algorithmic differentiation, and software skills, are where the majority of the value added equation on “smart” chips needs to come from, and that improving the user experience on devices can be a pot of gold! In fact, we have already seen Intel, Qualcomm and many other chip giants investing in speech recognition, biometrics, and other user experience technologies, so the change is underway!
January 21, 2015
I know it’s been months since Sensory has blogged and I thank you for pinging me to ask what’s going on…Well, lot’s going on at Sensory. There are really 3 areas that we are putting a strategic focus on, and I’ll briefly mention each:
Of course, there’s a lot more going on than just this…we recently announced partnerships with Intel and Nok Nok Labs, and we have further lowered power consumption in touchless control and always-on voice systems with the addition of our hardware block for low power sound detection.
February 5, 2014
Everyone seems to be talking about this as the year of the wearable. I don’t think so. Even if Apple does introduce a watch, and Google widely releases Glass, will they really go mainstream and sell hundreds of millions of units? I don’t think so. At least not for a few years. IMHO there needs to be a few major breakthroughs:
I’ll be leading a Wearables panel at the Mobile Voice show with an AWESOME group of people representing thought leaders from Google, Pebble, Intel, Xowi, and reQall. Here’s the press release
January 15, 2014
I spent last week at CES in Las Vegas. What a show!
The big keynote speech was the night before the show started and was given by Brian Krzanich, Intel’s new CEO. His talk was focused on Wearables, and he demonstrated 3 wearable devices (charger, in-ear, and platform architecture). The platform demo included a live on stage use of speech recognition with the low power wake up provided by Sensory. The demo was a smashing success! Several bloggers called it a “canned” demo assuming it couldn’t be live speech recognition if it worked so flawlessly!
I had a chance to walk through the Wearables area. Holy smoke there must have been 20 or 30 smart watches, a similar number of health bands, and even a handful of glasses vendors. In fact, seeing attendees wearing Google’s Glass was quite common place. The smart watches mostly communicate with Bluetooth, and some of the smaller, lighter devices, use Zigbee, ultra-low power Bluetooth, or Ant+ for wireless communications.
Sensory was all over CES, here’s some of the things Sensory sales people were able to catch us in:
Overall a great show for Sensory. Jeff Rogers, Sensory’s VP Sales told me, “A few people said they had searched out speech recognition products on the show floor to find the various speech vendors, and found that they all were using Sensory.”
September 17, 2011
I decided to pop up to San Francisco this week to hit the Intel Developer Forum. It’s open to the public, but it’s really more of a show and tell to Intel employees than from them.
One of the sessions was entitled “Enhanced Experiences with Low Power Speech Recognition,” and this was my main reason for being there. Intel’s Devon Worrell gave a very nice presentation, focusing on the importance of a closed computer being not just a brick, but still having functionality in a low power state. He put up a lot of compelling slides about using speech recognition in this mode, and emphasized the need for low-power command and control with an always-on always listening device that responds to commands…hmmmm…sounds like a page right out of the Sensory bible!
Realtek appears to have been selected by Intel as a chip provider for the low-power speech recognition, and they presented at the session and even gave a demo of their in-house speech recognition technology. I wasn’t very impressed; the idea was for it to work in music with the user not speaking directly into the microphone. For the demo, however, the music was so quiet the audience could barely tell it was on, and the speaker spoke only a few inches from the mic. I had a hard time understanding if it was working or not (well, that’s giving it the benefit of the doubt.)
Jean-Marc Jot from DTS also spoke and gave an impressive presentation and demo. Of course, I’m very biased….The DTS speech recognition demo used Sensory’s TrulyHandsfree™ Voice Control. I was a bit nervous because of Jean-Marc’s French accent and the fact that DTS had created their own TrulyHandsfree trigger phrase, “Hello Jennifer” without any assistance from Sensory. (As a side note, Sensory’s TrulyHandsfree 2.0 SUBSTANTIALLY improves performance, but there are a number of complex variables in our algorithm that are not accessible through our SDK’s, and therefore our customers can not yet use the latest technology to its fullest extent unless Sensory fine tunes the vocabularies in-house.) So…Jean-Marc was demoing our earliest incarnation of TrulyHandsfree Voice Control, with a French accent in a noisy room and with a command set that Sensory has never reviewed.
The demo was AWESOME. Jean-Marc spoke about 3 feet from the mic, and said commands like “Hey Jennifer…play Lady Gaga.” The music was cranked up really loud, and Jean-Marc spoke commands like “fast forward” and other music controls as well as calling up songs by name. I have a habit of counting speech recognition errors… On the trigger there were no false positives (accidental firing), and only 2 false negatives (where Jean-Marc needed to repeat the trigger phrase). That was 2 out of about 30 or 40 uses, indicating a 94% or 95% acceptance accuracy in high noise, and the phrases following the trigger had about the same high accuracy.
Sweet Demo of how speech recognition can work in a low-power mode and be always on and listening for commands even in high noise situations!