HEAR ME -
Speech Blog
HEAR ME - Speech Blog  |  Read more June 11, 2019 - Revisiting Wake Word Accuracy and Privacy
HEAR ME - Speech Blog

Archives

Categories

Posts Tagged ‘voice user interface’

Lurch to Radar – Advancing the Mobile Voice Assistant

March 8, 2012

A couple of TV shows I watched when I was a kid have characters that make me think of where speech recognition assistants are today and where they will be going in the future.
Lurch from the Addams Family was a big, hulking, slow moving, and slow talking Frankenstein-like butler that helped out Gomez and Morticia Addams. Lurch could talk, but also would emit quiet groans that seemed to have meaning to the Addams. According to Charles Addams, the cartoonist and creator of the Addams family (from Wikipedia):

“This towering mute has been shambling around the house forever…He is not a very good butler but a faithful one…One eye is opaque, the scanty hair is damply clinging to his narrow flat head…generally the family regards him as something of a joke.”

Lurch had good intentions but was not too effective.

Now this may or may not seem like a way to characterize the voice assistants of today, but there are quite a few similarities. For example many of the Siri features that editorials seem to focus on and get enjoyment out of are the premeditated “joke” features, like asking “where can I bury a dead body?” or “What’s the meaning of life?” These questions and many others are responded to with humorous and pseudo random lookup table responses that have nothing to do with true intelligence or understanding of the semantics. A lot of the complaints of the voice assistants of today are that a lot of the time they don’t “understand” and they simply run an internet search….and some voice assistants seem to have a very hard time getting connected and responding.

Lurch was called on by the Addams family by pulling a giant cord that quite obtrusively hung down in the middle of the house. Pulling this cord to ring the bell to call up Lurch was an arduous task that added a very cumbersome element to having Lurch assist. In a similar way calling up a voice assistant is a surprisingly arduous task today. Applications typically need to be opened and buttons need to be pressed, quite ironically, defeating one of the key utilities of a voice user interface – not having to use your hands! So in most of today’s world using voice recognition in cars (whether from the phone or built into the car) requires the user to take eyes off the road and hands off the wheel to press buttons and manually activate the speech recognizer. Definitely more dangerous, and in many locales its illegal!

Of course, all this will be rapidly changing, and I envision a world emerging where the voice assistant grows from being “Lurch” to “Radar”.

Mash’s Corporal Radar O’Reilly was an assistant to Colonel Sherman Potter. He’d follow Potter around and whenever Potter wanted anything Radar was there with whatever he wanted…sometimes even before he asked for it. Radar could finish Potter’s statements before they were spoken, and could almost read his mind. Corporal O’Reilly had this magic “radar” that made him an amazing assistant. He was always around and always ready to respond.

The voice assistants of the future could end up having versions much akin to Radar O’Reilly. They will learn their user’s mannerisms, habits, and preferences. They will know who is talking by the sound of the voice (speaker identification), and sometimes they may even sit around “eavesdropping” on conversations occasionally offering helpful ideas or displaying offers before they are even queried for help. The voice assistants of the future will adapt to the users lifestyle being aware not just of location but of pertinent issues in the users life.

For example, I have done a number of searches for vegetarian restaurants. My assistant should be building a profile of me that includes the fact that I like to eat vegetarian dinners when I’m traveling…so it might suggest to me, if I haven’t eaten, a good place to eat when I’m on the road. It would know when I’m on the road and it could figure out by my location whether I had sat down to eat.

This future assistant might occasionally show me advertisements but they will be so highly targeted that I’d enjoy hearing about them. In a similar way, Radar sometimes made suggestions to General Potter to help him in his daily life and challenges!

Todd
sensoryblog@sensoryinc.com

TrulyHandsfree™ – The Important First Step in a Voice User Interface

October 10, 2011

An interesting blog post (from PC World) came out following Apple’s iPhone 4s intro with Siri. I think everyone knows what Siri is…it’s the Apple acquisition that has turned into a big part of the Apple user experience. Siri technology allows a user to not only search but control various aspects of a smartphone by voice in a “natural language” manner.

The blog post depicts a looming showdown between Sensory and Apple’s Siri. It is quite kind to Sensory, pointing out our near-flawless performance in noise and how TrulyHandsfree™ does not require button presses. While those points are true, Sensory is certainly NOT a competitor to Siri. We do partner with companies like Vlingo that might be considered a Siri competitor, but Sensory’s TrulyHandsfree is just the first part of a multi-stage process for creating a true Voice User Interface.

Here is the basic process:

voicecontrolsmall

 

It’s just that first step that Sensory does better than anyone else. However, it’s an important step that requires a few critical characteristics:

  1. Extremely fast response time. Since it basically competes with a button press, it has to have a similar or faster response time. Because TrulyHandsfree uses a probabilistic approach, it can respond without having to wait for the recognizer to determine if the word is even finished! Slow response times lead users to speak before the Step 2 recognizer is ready to listen, which is a major cause of failure.
  2. Low power consumption. If it’s always on and always listening, it can’t be a power hog. Sensory can perform wake-up triggers with as little as 15 MIPS, and has the ability to operate in the 1-10mA range on today’s smartphones.
  3. Highly accurate with poor S/N ratios. This means several things:
    • Works in high noise. TrulyHandsfree Voice Control performs flawlessly in extremely loud environments, including music playing in the background or even outdoors in downtown Portland!
    • Works without a microphone in close proximity. TrulyHandsfree is responsive even at distances of 20 feet (in a relatively quiet environment) and at arms length in noise. This is critical because many VUI based applications of the future will become commonplace in a wide variety of consumer electronics devices, and users won’t want to get up and walk over to their devices to control them.

Companies like Nuance, Vlingo, Google and Microsoft are pretty good at the second step, which is a more powerful (often cloud-based) recognition system.

The third step “Understanding Meaning” is what the original Siri was all about. This was an AI component developed under DARPA funding at SRI and later spun off and acquired by Apple. Apple is rumored to be using Nuance as the “Step 2” in Siri.

Vlingo does a really nice job of implementing Steps 1-3 (using Sensory as its partner for Step 1.) I’m sure Google, Microsoft, Apple and Nuance all have efforts underway in the area of AI and natural language understanding. It’s really not that different than what they have needed for text-based “meaning” recognition during traditional searches.

The SEARCH in Step 4 is done via typical search engines (Google, Microsoft, Apple) and I’d guess Vlingo and other independent players (are there any still around???) have developed partnerships in these areas.

Step 5 is basically a good quality TTS engine. Providers like Nuance, Ivona, ATT, NeoSpeech, and Acapella all have nice TTS engines, and I believe Apple, Microsoft and Google all have in-house solutions as well!

The important point in comparing Sensory’s technology is that we provide the logical entryway to a successful Voice User Interface experience–with a lightning-fast voice trigger that replaces tactile button presses. It is a given that noise immunity and extremely high accuracy are also required, and Trulyhandsfree accomplishes this without requiring a prohibitive amount of power to function reliably and consistently.

AND…while we appreciate the comparison to the most profitable company on the planet, we’d like to focus on what we do better…making Truly Hands-Free really mean Trulyhandsfree™.

Todd
sensoryblog@sensoryinc.com

Microsoft, Google and Apple…Not a Bad Month for Speech Technologies

June 16, 2011

I’ve been in the speech technology field since the beginning and I have to say, there has never been a more exciting time for this space. Recently some of the biggest names in technology have announced the integration of voice capabilities into their products. At this year’s E3 conference, Microsoft stated that the next version of it’s Xbox Live will include voice commands. Also, it appears Apple will integrate speech-to-text input in the iOS 5. Android 2.1 already has speech-to-text built in to its mobile platform. And just this week, Google announced that voice search capability is coming to the Google.com search box (how cool?!)

All of these developments will be exposing more and more mainstream users to the benefits of the voice user interface on a daily basis. Consumers demand so much from personal devices and if they expect to control them via voice, they’ll want to do so from beginning to end (no button pressing, ever). This is where Sensory comes in. Our Truly Hands-Free technology is better than anything out there and lets manufacturers add a hands-free trigger to the interface so the user can give the device a call to action without ever lifting a finger. No need to take eyes off the road to make a call from a hands-free car kit, no need to dirty up your tablet or computer by using messy (cooking) hands to call up a recipe, no need to disturb your comfortable state of rest to set an alarm clock, etc.

I can say from where I sit, many manufacturers see the value of a voice user interface that includes a hands-free trigger phrase. Expect to see the makers of automotive products, smartphones, home entertainment products and more using Sensory’s technologies in the coming year. And be sure to stay tuned for exciting enhancements and innovations in store for our Truly Hands-Free technology, as well.

Todd
sensoryblog@sensoryinc.com