HEAR ME -
Speech Blog
HEAR ME - Speech Blog  |  Read more June 11, 2019 - Revisiting Wake Word Accuracy and Privacy
HEAR ME - Speech Blog

Archives

Categories

Posts Tagged ‘Truly Handsfree’

Sensory Earns Two Coveted 2016 Speech Tech Magazine Awards

August 22, 2016

Sensory is proud to announce that it has been awarded with two 2016 Speech Tech Magazine Awards. With some stiff competition in the speech industry, Sensory continues to excel in offering the industry’s most advanced embedded speech recognition and speech-based security solutions for today’s voice-enabled consumer electronics movement.

The 2016 Speech Technology Awards include:

sla2016

Speech Luminary Award – Awarded to Sensory’s CEO, Todd Mozer

“What really impresses me about Todd is his long commitment to speech technology, and specifically, his focus on embedded and small-footprint speech recognition,” says Deborah Dahl, principal at Conversational Technologies and chair of the World Wide Web Consortium’s Multimodal Interactions Working Group. “He focuses on what he does best and excels at that.”

spa2016

Star Performers Award – Awarded to Sensory for its contributions in enabling voice-enabled IoT products via embedded technologies

“Sensory has always been in the forefront of embedded speech recognition, with its TrulyHandsfree product, a fast, accurate, and small-footprint speech recognition system. Its newer product, TrulyNatural, is ground- breaking because it supports large vocabulary speech recognition and natural language understanding on embedded devices, removing the dependence on the cloud,” said Deborah Dahl, principal at Conversational Technologies and chair of the World Wide Web Consortium’s Multimodal Interactions Working Group. “While cloud-based recognition is the right solution for many applications, if the application must work regardless of connectivity, embedded technology is required. The availability of TrulyNatural embedded natural language understanding should make many new types of applications possible.”

– Guest Blog by Michael Farino

 

Sensory’s CEO, Todd Mozer, interviewed on FutureTalk

October 1, 2015

Todd Mozer’s interview with Martin Wasserman on FutureTalk

Is Voice Activation Unsafe?

October 15, 2014

A couple of news headlines have appeared recently asserting that voice activation is unsafe. I thought it was time for Sensory to weigh in on a few aspects of this since we are the pioneers in voice activation:

  1. In-Car Speech Recognition. There have been a few studies like AAA/U of Utah
    The headlines from these studies claim speech recognition creates distraction while driving. Other recent studies have shown that voice recognition in the car is one of the biggest complaints. But if you read into these studies carefully, what you really find are several important aspects:

    • What they call “hands free” is not 100% TrulyHandsfree. It requires touch to activate so right there I agree it can take your eyes of the road, and potentially your hands off the wheels.
    • It’s really bad UX design that is distracting and not the speech recognition per se.
    • It’s not that people don’t want speech recognition. It’s that they don’t want speech recognition that fails all the time.

    Here’s my conclusion on all this denigration of in-car speech recognition: there are huge problems with what the automotive companies have been deploying. The UX is bad and the speech recognition is bad. That doesn’t mean that speech recognition is not needed in the car…on the contrary, what’s needed is good speech recognition implemented in good design.

    From my own experience it isn’t just that the speech recognition is bad and the UX is bad. The flaky Bluetooth connections and the problems of changing phones adds to the perception of speech not working. When I’m driving, I use speech recognition all the time, and it’s GREAT, but I don’t use the recognizer in my Lexus…I use my MotoX with the always on trigger, and then with Google Now, I can make calls or listen to music, etc.

  2. Lack of Security. The CTO of AVG blasted speech recognition because it is unsafe.Now I previously resisted the temptation to comment on this, because the CTO’s boss (the CEO) is on my board of directors. I kind of agree and I kind of disagree with the CTO. I agree that speech recognition CAN BE unsafe…that’s EXACTLY why we add speaker verification into our wake up triggers…then ONLY the right person can get in. It’s really kind of surprising to me that Apple and Google haven’t done this yet! On the other hand, there are plenty of tasks that don’t really require security. The idea of a criminal lurking outside my home and controlling my television screen seemed more humorous than scary. In the case of TVs, I do think password protection is great but it’s really more for the purpose of identifying who is using the television and to call up their favorites, their voice adapted templates, and their restrictions (if any) on what they can watch AND how long they can watch…yeah I’m thinking about my kids and their need to get homework done. :-)

Power drain in an “always on” technology

August 7, 2013

Running on the OS level does drain power. Even using the baseband or apps processors can be inefficient. This is why Sensory has ported to processors such as Cirrus Logic, Conexant, DSPG, Realtek, Texas Instruments, Wolfson, and all of the leading IP platforms (ARM, CEVA, Tensilica, Verisilicon, etc.), and many more. Our chip and IP partners are reporting power consumption as low as 1.7mA. That includes the microphone and preamp circuit, and it can go even lower by using a sound detection front end to turn off processing in quiet conditions. Tensilica has even introduced the HiFi Mini IP core targeting Sensory’s TrulyHandsfree approach for companies that want to use always on technology without extra bells and whistles.

Quick Thoughts

May 1, 2013

  1. Texas A&M Transportation Institute Study. Yeah they found that using Siri or Vlingo was as dangerous as texting while driving. Adam Cheyer hit the nail on the head in his response But Adam focused on Siri…Turns out they didn’t use Vlingo in the In Car mode, which was of course designed for In Car. Duh! Vlingo’s (now Nuance) In Car uses Sensory’s Truly Handsfree which requires NO TOUCHING and no distracted eyes while driving. All these articles which said “Handsfree texting no safer than typing” really got it wrong. It’s not TRULYHANDSFREE!!! In the study they held phones in their hands and hit buttons. Sorry that’s not Handsfee!
  2. Google Now on iOS. Cool! Android speech recognition is very good and probably the best, but having it built into the home button is easy, and easy usually trumps good. But Apple can’t be complacent, it’s gotta make some big moves or it will be left behind in the category they popularized.
  3. Google Glass. Holy smoke what a lot of press it gets. Sensory has 2 in house and we love the user experience! We believe wearables will become huge and Google is certainly driving the forefront of this. Glass must use Google’s speech recognition in the clouds. Wonder what they use on the client? It works GREAT!
  4. Galaxy S4. Yep, Sensory made it in for the embedded recognition used in triggers (with SVoice) and voice command and control! We got invited to the launch party. It’s a GREAT product, with a GREAT embedded speech recognizer.
  5. Icahn buying into Nuance.  Interesting…Can’t be bad for Nuance investors, until he sells! It’s nice to see speech technology reach the forefront in not just consumer electronics and technology but in the finance world too!
  6. Qualcomm introduces voice triggers. Yeah everyone knows that’s the area where Sensory dominates. Better accuracy, faster response time, lower power consumption, works in noise and from a distance, etc. People ask if Qualcomm is using Sensory technology. I say try it, and if it works GREAT then it’s probably Sensory’s. Anyways, we welcome the Qualcomm solution as it totally validates what we’ve been saying and doing. I tried it at mobile world congress and it responded well in noise, but you had to hit a button to turn it on to make it listen, which kind of defeats the purpose.
  7. Amazon buying the pieces. Yeah they bought up some of the best components available – TTS from Ivona, cloud speech recognition from YAP, and now intelligence from Evi. Even adding it all up, they haven’t paid that much and if they put it all together well, they should be in a strong position relative to their competitors.
  8. Industry. The overall speech field is aligning as a battle of titans all with good patent positions, large teams, and good technologies. Amazon, Google/Android, Microsoft, and Nuance are all major speech players today. Apple probably is too, but it’s hard to know what’s in house at Apple vs. Nuance. Nuance is the only substantive player that’s a vendor, out selling speech technology.  This puts them in a nice position, but they have competitors giving it away on all major platforms, so nobody is without challenges. Sensory might be the second largest speech vendor after Nuance and our sales are less than 2% of Nuance…pretty amazing gap there! I want to fill that gap! ;-)

Superbowl Ads – Speech Activation Coming of Age

February 18, 2013

(…and something new from Sensory just around the corner!)

I remember watching the Superbowl last year and seeing a BMW Series 3 commercial that I thought was interesting.

It was interesting to me because they put a motion/proximity sensor under the trunk so the user could open the trunk in a hands-free manner. The commercial highlights the benefit of hands-free access when a woman walks up with her hands full of luggage and she just wiggles her foot around and the trunk pops open! Cool…except the user has to do a little one legged dance with their hands full, and as the commercial highlights (which is another reason why I found it interesting), other things can accidentally open the trunk, like a dog wagging its tail. Wouldn’t a hands-free voice trigger do a much better job? Especially an ultra-low-power implementation on a standalone processor with built in speaker verification for security…sounds like a challenge for Sensory’s TrulyHandsfree approach.

Fast forward to this year’s Superbowl, and Kia comes out with the “space babies” ad for its Sorento, and the Uvo entertainment system. Kid asks dad “where do babies come from” and dad concocts an elaborate and humorous lie.

Then after dad’s tall tale the kid says “But Jake said that babies are made when mommies and daddies…” and dad quickly interrupts the kid by saying “Uvo, play Wheels on the Bus”. The Uvo system hears dad and immediately plays the music drowning out the kid’s question. Cool commercial and nice use of voice activation to control music while driving!

Many of Sensory’s customers have told us that they don’t want to have to say the brand name as a command word, and they would really like to name their products themselves, and even better, have the products know who they are when they talk so that settings and controls can be customized to their use…Another job for Sensory’s TrulyHandsfree!

On February 19th we will announce our TrulyHandsfree 3.0 which will enable all of the voice control scenarios I have described, enabling better user experiences that are more customized and more secure!
Stay tuned for the details!

Mobile Users Get it!

May 30, 2012

Sensory’s had a lot of press lately. We made 3 big announcements all pretty much together:

1) Announcing speaker verification

2) Announcing speaker identification

3) Saying Sensory is in the Samsung Galaxy S3

Sensory announced these just before CTIA in New Orleans. We had a small booth at the show, and gave demos at several events (on the CTIA stage and floor, at the Mobility Awards dinner, and at the excellent Pepcom Mobile Focus event).

We got a lot of nice press from this. I was thrilled that the Speech Technology email newsletter put our verification release as the featured and lead story. One of the articles I like best, though, just came out last week by Pete Pachal at Mashable http://mashable.com/2012/05/29/sensory-galaxy-s-iii/

This article is great for several key reasons. One is that Pete gets it. He didn’t just reprint our press release, but he added his commentary and wrapped it up in a nice story that hits some of the key issues.

However, what’s best is what the readers wrote in. I LOVE their insights and comments. Here’s a few of the dialogs with my commentary attached:

JB: Seriously??? You still need to push a button to use Siri? I’ve had the “wake with voice” option on my crusty old HTC Incredible, via VLingo inCar, for about 2 years now. Hard to believe Apple is that far behind.

My response: EXACTLY JB! In fact that crusty old HTC using Vlingo, also uses Sensory’s TrulyHandsfree approach! Vlingo was our first licensee in the mobile space.

Scott: But this is talking about OS integration instead of app integration. And as I’m sure you’ve seen on your phone, and as the article noted, wake with voice options currently use a lot of power, which means I can’t see a lot of people willing to use it.

My response: Precisely, Scott! This is why we are implementing the “deeply embedded” approach that will take power consumption down by a factor of 10! Nevertheless, users LOVE it even if it consumes power:

JB – I use it all the time and since my phone plugs into the car’s adapter, I don’t really worry at all about power usage. It’s never been a problem.

My response – Yes, Vlingo and Samsung did a very nice implementation by having an “always listening” mode, particularly useful while driving. Other approaches we expect to see in the future are intelligent sensor based approaches so the phone knows when to listen and when not to (e.g. why not have it turn on and listen whenever you start traveling past 20 MPH, etc.)

Is there anything to prevent me from messing with another person’s phone?

Fillfill Ha ha, imagine being in an auditorium and yelling “Hi Galaxy! … Erase Address Book! … Confirm!”

My comment – Funny! This is one of the reasons we have added speaker verification and identification features to the trigger function

DhanB – Siri doesn’t require a button. It can be activated by lifting the phone up to your face.

Great reader responses:

Darkreaper – …..while driving? (Right! That’s illegal in California and other states!)

Tone – Yes, but with the Samsung Galaxy II, I don’t have to touch it at all. As the article states, this is crucial when you’re in a situation, such as driving. I’ve dropped the phone on the floor while driving and I was still able to send a text message, an email and place a call with it sliding around the back seat. (Bluetooth) iPhone can’t compete, sorry. :-/

…and of course the old “butt dialing” problem:

Jason – This makes me think of the old “butt dialing” problem when you sat down on your phone cause I’d much prefer a manual trigger to prevent accidental usage.

My comment: Once again, I agree with the readers. Sensory isn’t pushing to force “always listening” modes on users, we just want to allow them the choice. We strongly recommend that products have multiple options for anything that can be done by voice or touch. We believe the users should have the right and the ability to access the power of mobile devices without being forced to touch them. And if they want to turn off this ability, that is certainly their choice! We turn off our ringers (at least we should) when we enter a meeting or go to the movies. Likewise, we can turn off hands free voice control when it’s not appropriate…and with the growing presence and power of intelligent sensors, it will get easier and easier (albeit with some mishaps along the way!) for the phones to know when they should listen!

A lot of people commented about Siri. Apple isn’t stupid. They get it that hitting buttons isn’t the most convenient way to always access voice control. That’s why there’s a sensor in place when you lift the phone to your face (of course still requiring touch), it’s also why Siri can speak back. Apple pushed the Voice User Interface forward with Siri…Samsung pushed it further with TrulyHandsfree wake up. There will be a lot of back and forth over the coming years and voice features will continue as a major battleground.

As devices get increasing utility WITHOUT touching the phones (e.g. remote control functions, accessing and receiving data by voice, etc.), the need for a TrulyHandsfree approach will grow stronger and stronger, and Sensory will continue to have the BEST solution – More Accurate, Lower Power, Faster Response Times, and NOW with built in speaker verification or speaker ID!

Todd
sensoryblog@sensoryinc.com

I Love Watson!

July 12, 2011

watsonI have a new favorite toy. It’s Watson the Raccoon, one of Hallmark’s new Interactive Storybook and Story Buddy™ characters.

I used to have the time to buy every product that featured one of Sensory’s technologies. I have to admit I don’t do that much anymore, and as my kids have gotten older, I buy fewer and fewer toys in general, much less ones with speech recognition. Luckily I was visiting my dad, and he had purchased one out of curiosity.

For many years, my favorite Sensory-based toy was from the very early days of Sensory called Radar the Robot, from Fisher Price. I have fond memories of my kids imitating Radar, and also remember biking around Bali on my honeymoon, going for a night ride through the jungle by myself just to find a fax machine so I could send an agreement draft back to Fisher Price (Design Win for Radar!)

Sorry Radar…Watson has removed you from your pedestal. Watson now reigns supreme!

Hallmark’s Interactive Storybooks use Sensory’s NLP based processors with TrulyHandsfree™ Voice Control. As you read the book, the Story Buddy™ listens and interacts appropriate when it hears you say different phrases from the book.

I knew the concept was great when we first did Jingle the Husky Pup with Hallmark. I also knew that these products were selling really well, and I even knew that TrulyHandsfree™ Voice Control is the MOST AMAZING technology to ever come out of Sensory (Hey – I just got an email from the manager of one of the larger speech organizations in the world and he said “we keep trying to break your TrulyHandsfree™ 2.0 beta technology, but we just can’t seem to make it fail!”)

What I didn’t know is what an EXCELLENT job Hallmark does in story writing, character creation, and putting the whole thing together to make a really fun experience that really works! The book starts off with Watson wondering why grass grows up and rain falls down… I love that line!

Kudos to you Hallmark…now I gotta go buy a Watson for my lobby!

Todd
sensoryblog@sensoryinc.com

The Holy Grail in Speech is Almost Here!

May 6, 2011

For far too long, speech recognition just hasn’t worked well enough to be usable for everyday purposes. Even simple command and control by voice had been barely functional and unreliable…but times, they are a changing! Today speech recognition works quite well and is widely used in computer and smart phone applications…and I believe we are rapidly converging on the Holy Grail of Speech – making a recognition and response system that can be virtually indistinguishable from a human (a really smart human with immaculate spelling skills and fluency in many languages!)

I think there are 4 important components to what I’d call the Holy Grail in Speech:

  1. No Buttons Necessary. OK here I’m tooting my own whistle, but Sensory has really done something amazing in this area. For the first time in history there is a technology that can be always-on and always-listening, and it consistently works when you call out to it and VERY rarely false-fires in noise and conversation! This just didn’t exist before Sensory introduced the Truly Handsfree™ Voice Control, and it is a critical part of a human-like system. Users don’t want to have to learn how to use a device, Open Apps, and hold talk buttons to use! People just want to talk naturally, like we do to each other! This technology is HERE NOW and gaining traction VERY rapidly.
  2. Natural Language Interactions. This is a bit tricky, because it goes way beyond just speech recognition; there has to be “meaning recognition”. Today, many of the applications running on smart phones allow you to just say what you want. I use SIRI (Nuance), Google and Vlingo pretty regularly, and they are all very good. But what’s impressive to me isn’t just how good they are, it’s the rate at which they seem to be improving. Both the recognition accuracy and the understanding of intent seem to be gaining ground very rapidly.
    I just did a fun test…I asked each engine (in my nice quiet office) “How many legs does an insect have?”…and all three interpreted my request perfectly. Google and Vlingo called up the right website with the question and answer…and SIRI came back with the answer – six! Pretty nice! My guess is the speech recognition is still a bit ahead of the “meaning recognition”…
    Just tried another experiment. I asked “Where can I celebrate Cinco de Mayo?” SIRI was smart enough to know I wanted a location, but tried to send me off to Sacramento (sorry – too far away for a margarita!) Vlingo and Google both rely on Google search, and did a general search which didn’t seem to associate my location… (one of them mis-recognized, but not so badly that they didn’t spit out identical results!) Anyways, I’d say we are close in this category, but this is where the biggest challenge lies.
  3. Accurate Translation and Transcription. I suppose this is ultimately important in achieving the Holy Grail. I don’t do much of this myself, but it’s an important component to Item 2 above, and also necessary for dictating emails and text messages. When I last tested Nuance’s Dragon Dictate I was blown away by how well it performed. It’s probably the Nuance engine used in Apple’s Siri (you know, Nuance has a lot of engines to choose from!), and it’s really quite good. I think Nuance is a step ahead in this area.
  4. Human Sounding TTS. The TTS (text-to-speech) technology in use today is quite remarkable. There are really good sounding engines from ATT, Nuance, Acapela, Neospeech, SVOX, Ivona, Loquendo and probably others! They are not quite “human”, but come very close. As more data gets thrown at unit selection (yes, size will not matter in the future!), they will essentially become intelligently spliced-together recordings that are indistinguishable from live performance.

Anyways, reputable companies are starting to combine and market these kinds of functions today, and I’d guess it’s a just a matter of five to ten years until you can have a conversation with a computer or smartphone that’s so good, it is difficult to tell whether it’s a live person or not!

Todd
sensoryblog@sensoryinc.com