Speech Blog
HEAR ME - Speech Blog  |  Read more March 11, 2019 - Taking Back Control of Our Personal Data
HEAR ME - Speech Blog



Posts Tagged ‘voice trigger’

Out Today: Moto X is “Always Listening”

August 1, 2013

One of the leakiest announcements in recent memory, Motorola’s new Moto X is expected to be officially announced today. Rather than trying to one up Apple and Samsung with the highest resolution screen and fastest processor, the Moto X competes on its ability to be customized and its intelligent use of low power sensors. With my background, it’s no surprise that I’m excited to see the “always listening” technology enabling the wake-up command “OK Google Now”. With this feature, speech recognition is enabled but in an ultra low power state, so it can be on and responsive without draining the battery. From other “press leaks”, I’m looking forward to a line of Droid phones with similar “always listening” functionality.

Motorola isn’t the only one rolling out interesting new “always listening” kinds of functions. Samsung did this first in the mobile phone, but implemented it in a “driving mode” so that it was sometimes always listening. The new Moto phones have been compared with Google’s Glass and the “OK Glass” function which some hackers have noted can be put in an “always listening” mode. Qualcomm has even implemented a speech technology on their chips and Android has released a function like this in their OS. Motorola’s use of the “always listening” trigger is especially cool because it calls up Google Now for a seamless flow from client to server speech recognition.

Here’s a demo of Sensory’s use of a very similar approach that we call “trigger to search” from a video we posted around a year ago:


So what’s Sensory’s involvement in these “always on” features from Android, Glass, Motorola, Nuance, Qualcomm, Samsung, etc.? I can’t say much except we have licensed our technology to Google/Motorola, Samsung and many others. We have not licensed Android or Qualcomm, but Qualcomm has commented on its interest in a partnership with Sensory for more involved applications.

With a mass market device like the Moto X, I’m excited to see more people experiencing the convenience of voice recognition that is always listening for your OK. Tomorrow I’m going to discuss leading voice recognition apps on the top mobile environments and then over the next few days and weeks, I’ll cover more topics around voice triggering technology such as pricing models (it’s free right?), power drain, privacy concerns with an “always listening” product, security and personalization. This is an exciting time for TrulyHandsfree™ voice control and I’d welcome your thoughts.

TrulyHandsfree™ – The Important First Step in a Voice User Interface

October 10, 2011

An interesting blog post (from PC World) came out following Apple’s iPhone 4s intro with Siri. I think everyone knows what Siri is…it’s the Apple acquisition that has turned into a big part of the Apple user experience. Siri technology allows a user to not only search but control various aspects of a smartphone by voice in a “natural language” manner.

The blog post depicts a looming showdown between Sensory and Apple’s Siri. It is quite kind to Sensory, pointing out our near-flawless performance in noise and how TrulyHandsfree™ does not require button presses. While those points are true, Sensory is certainly NOT a competitor to Siri. We do partner with companies like Vlingo that might be considered a Siri competitor, but Sensory’s TrulyHandsfree is just the first part of a multi-stage process for creating a true Voice User Interface.

Here is the basic process:



It’s just that first step that Sensory does better than anyone else. However, it’s an important step that requires a few critical characteristics:

  1. Extremely fast response time. Since it basically competes with a button press, it has to have a similar or faster response time. Because TrulyHandsfree uses a probabilistic approach, it can respond without having to wait for the recognizer to determine if the word is even finished! Slow response times lead users to speak before the Step 2 recognizer is ready to listen, which is a major cause of failure.
  2. Low power consumption. If it’s always on and always listening, it can’t be a power hog. Sensory can perform wake-up triggers with as little as 15 MIPS, and has the ability to operate in the 1-10mA range on today’s smartphones.
  3. Highly accurate with poor S/N ratios. This means several things:
    • Works in high noise. TrulyHandsfree Voice Control performs flawlessly in extremely loud environments, including music playing in the background or even outdoors in downtown Portland!
    • Works without a microphone in close proximity. TrulyHandsfree is responsive even at distances of 20 feet (in a relatively quiet environment) and at arms length in noise. This is critical because many VUI based applications of the future will become commonplace in a wide variety of consumer electronics devices, and users won’t want to get up and walk over to their devices to control them.

Companies like Nuance, Vlingo, Google and Microsoft are pretty good at the second step, which is a more powerful (often cloud-based) recognition system.

The third step “Understanding Meaning” is what the original Siri was all about. This was an AI component developed under DARPA funding at SRI and later spun off and acquired by Apple. Apple is rumored to be using Nuance as the “Step 2” in Siri.

Vlingo does a really nice job of implementing Steps 1-3 (using Sensory as its partner for Step 1.) I’m sure Google, Microsoft, Apple and Nuance all have efforts underway in the area of AI and natural language understanding. It’s really not that different than what they have needed for text-based “meaning” recognition during traditional searches.

The SEARCH in Step 4 is done via typical search engines (Google, Microsoft, Apple) and I’d guess Vlingo and other independent players (are there any still around???) have developed partnerships in these areas.

Step 5 is basically a good quality TTS engine. Providers like Nuance, Ivona, ATT, NeoSpeech, and Acapella all have nice TTS engines, and I believe Apple, Microsoft and Google all have in-house solutions as well!

The important point in comparing Sensory’s technology is that we provide the logical entryway to a successful Voice User Interface experience–with a lightning-fast voice trigger that replaces tactile button presses. It is a given that noise immunity and extremely high accuracy are also required, and Trulyhandsfree accomplishes this without requiring a prohibitive amount of power to function reliably and consistently.

AND…while we appreciate the comparison to the most profitable company on the planet, we’d like to focus on what we do better…making Truly Hands-Free really mean Trulyhandsfree™.


Truly Handsfree™ Trigger Technology Taking Over Sensory!

February 24, 2011

I haven’t had much time to blog lately, and you may have noticed that when I do, I often write about our revolutionary new Truly Handsfree™ Trigger speech technology. Technically it’s a phrase-spotting technology, but Sensory is using a revolutionary new multi-patent pending approach that’s changing the way we do speech recognition. The Truly Handsfree™ Trigger doesn’t use typical techniques like background noise modeling or speech detection (i.e. start and ending speech.) In operation, it ends up being MUCH more noise robust, yet still very efficient as it consumes less current than it would if we also included all the traditional approaches. The basic idea is that it’s on and listening all the time, and able to reject all of the wrong words and correctly identify the right words! This eliminates the need for activation via button pressing.

A lot of companies are using our technology now as a voice trigger for other speech recognition applications. At the recent Mobile World Congress, Samsung introduced the first Truly Handsfree Smartphone, the Galaxy sII, which uses a Truly Handsfree™ Trigger followed by the Vlingo experience. You say “Hey Galaxy” and it wakes up, no touching necessary! I tried this on the noisy showroom floor at Mobile World Congress, and it nailed my “Hey Galaxy” every time, even from a distance of 5 feet away!

Chris Schreiner over at Strategy Analytics recently tried out an early beta demo for Android, and in a blog late last year he said, “In a demo experience on my Android phone, the hands-free trigger worked remarkably well with varying types of background noise.”

With Truly Handsfree™ Trigger’s noise-robust nature and the ability to always be on listening, we are able to do more natural language-like schemes. A couple of great examples are in the toy space (and we do love toys at Sensory!)

  • I mentioned Hallmark in my last blog…now they are rolling out a whole new product line built with Sensory chips because of the huge success of Jingle, the Husky Pup.
  • Mattel has pushed us to deploy this phrase spotting technology even in our lowest cost, entry level processor. They have a new product line coming out this year that’s for sure to be a BIG HIT called Fijit. The Fijit’s are these cute wiggly characters with amazing skin, and they do the TOUGHEST speech recognition feats ever. They listen for a bunch (30??) of short key words like “hungry” so you can say a variety of things to it (Like…Hungry?…I’m Hungry…Are you Hungry?) and it can intelligently respond and interact. (Actually I don’t know if “Hungry” is a one of its actual words, that’s for example only.) SpeechTech just did a nice summary on Fiji Friends in their blog, and Mattel has some nice YouTube videos and websites where you can learn all about Fijits.

So what’s happening here at Sensory is that this technology initially invented as a trigger is migrating into being an amazingly noise-robust speech solution for any command and control application! It’s nominated for awards by MobileTrax in both the Speech Processing and Software Technology innovation categories!

Sensory has developed a whole product roadmap around our new approach, and this includes speaker adaptive recognition, larger vocabulary solutions, improvements in accuracy, and consumer created triggers. A funny thing about consumer created triggers…Our initial release was NOT INTENDED for this, but one of our customers, Adelavoice, did a few tricks and allowed end users to create their own triggers. Know what’s the most common trigger phrase?? “Yo Bitch”…I guess that says something about the demographic of the user base!

OK…I could go on and on about this new phrase spotting technology, but I gotta get some real work done!


Voice Search, M&A, and the Economy

May 3, 2010

Haven’t blogged in a long time…I have plenty to say but have just been too busy. That’s good news. Sensory is signing up new deals at a very rapid rate, so 2011 should be an excellent year for us. I declare the economic recovery in full swing (although I do have some trepidation it could be short lived). Right now my biggest issue is chip SUPPLY! We’ve actually had some trouble getting enough chips (this is endemic to the entire chip market right now!). Luckily, our software business is exploding and a growing percentage of overall revenues is not dependent on buying silicon!

The cool thing is for the first time in Sensory’s 15 year history we are putting text-to-speech into products. We’ve done a handful of deals in just the last couple of months, and I expect that within 2 years we’ll have over 10 million TTS devices that will have hit the market (we’re at around 60 million speech recognition products right now).

I went to Voice Search last week. This is the show that Bill Meisel and AVIOS co-host every year. It’s my favorite speech industry show and pretty much the only one I attend. At the show I spoke on a consumer speech panel and demonstrated Sensory’s Truly Hands-Free Voice Trigger. Nobody thinks that wordspotting can be always on and always listening without false firing – and still catch the trigger word when it’s spoken. Sensory’s spotting technology WORKS! It’s my pet technology right now and I think it will change the world, by making speech recognition TRULY HANDSFREE (that was the title of my presentation)…anyways…I demoed it live. Nobody is supposed to do live speech recognition demos because they always fail (Microsoft has had the misfortune of proving that more than once!), so most people at the conferences show video clips. I know Sensory’s stuff works well, but I got a little nervous when I started talking and I could hear the echo of the microphone, and as I spoke I was hoping it wouldn’t false trigger and totally embarrass me. It didn’t false trigger…then it just had to recognize my trigger words. It got the first and the second one right. Then on the 3rd time the small device started sliding down the podium and the mic got covered up and for a brief moment my heart froze and I thought I was going to need to repeat my trigger word…then all of a sudden I felt my heart exploding as I waited microseconds for the response…then it spoke and it got it! No false fires and 3/3 triggers accurately recognized. Oh the trials and tribulations of a speech industry veteran! The technology is great and in a car it’s nearly flawless; it was this new acoustical environment that made me nervous. It came through!!!

So…Apple acquired SIRI, Inc., an iPhone developer that supplies a personal assistant application featuring speech recognition. Cool. That means Apple is in the game – the speech game, with apparently a slightly different twist than Microsoft or Google. All 3 companies are investing in speech recognition. But Apple is doing very light investing while Google and Microsoft are HEAVILY invested. Apple apparently isn’t using any of its home grown technologies as they keep licensing Nuance…and SIRI uses a Nuance engine as well. SIRI is a voice concierge type service that uses the Nuance recognizer then throws a layer of “meaning” interpretation or “intelligence” into the process. Anyways, I’m glad Apple is taking voice control seriously…they’re gonna have a tough time catching up with Google. My take is the Google stuff works best right now. I was playing with a Nexus One phone and the recognition on it is really amazing. BING is pretty good too and has wrapped better apps around their technology in BING411.

I remember a Keynote talk 15 years ago at a speech conference titled something like “the Ever-Imminent-Never-Arriving Speech Bonanza”…well it’s finally here, and I have to thank Google and Microsoft (and Vlingo too!) for clearly taking us over the hurdle and making speech recognition accessible and usable by the masses. Now it’s time for Apple to kick in and do its part…and now that HP has acquired Palm, it will need to get in the game too. I don’t even know if HP has a speech recognition team, but if they don’t they will soon. So will Cisco. So will all the major consumer electronics and automotive companies! Our time has come!!! Speech Recognition has arrived and is working for the masses! It will just get better!