Posts Tagged ‘voice recognition’
June 17, 2016
Hey Siri, Cortana, Google, Assistant, Alexa, BlueGenie, Hound, Galaxy, Ivee, Samantha, Jarvis, or any other voice-recognition assistant out there.
Now that Google and Apple have announced that they’ll be following Amazon into the home far-field voice assistant business, I’m wondering how many things in my home will always be on, listening for voice wakeup phrases. In addition, how will they work together (if at all). Let’s look at some possible alternatives:
Co-existence. We’re heading down a path where we as consumers will have multiple devices on and listening in our homes and each device will respond to its name when spoken to. This works well with my family; we just talk to each other, and if we need to, we use each other’s names to differentiate. I can have friends and family over or even a big party, and it doesn’t become problematic calling different people by different names.
The issue for household computer assistants all being on simultaneously is that false fires will grow in direct proportion to the number of devices on and listening. With Amazon’s Echo, I get a false fire about every other day, and Alexa does a great job of listening to what I say after the false fire and ignoring if it doesn’t seem to be an intended command. It’s actually the best performing system I’ve used and the fact that its starts playing music or talking every other week is a testament to what a good job they have done. However, interrupting my family every other week is not good enough. And if I have five always-listening devices interrupting us 10 times a month, that becomes unacceptable. And if they don’t do as good a job as Alexa, and interrupt more frequently, it becomes quite problematic.
Functional winners. Maybe each device could own a functional category. For example, all my music systems could use Alexa, my TV’s use Hi Galaxy, and all appliances are Bosch. Then I’d have less “names” to call out to and there would be some big benefits: 1) The devices using the same trigger phrase could communicate and compare what they heard to improve performance; 2) More relevant data could be collected on the specific usage models, thus further improving performance; and 3) With less names to call out, I’d have fewer false fires. Of course, this would force me as a consumer to decide on certain brands to stick to in certain categories.
Winner take all. Amazon is adopting a multi-pronged strategy of developing its own products (Echo, Dot, Tap, etc.) and also letting its products control other products. In addition, Amazon is offering the backend Alexa voice service to independent product developers. It’s unclear whether competitors will follow suit, but one thing is clear—the big guys want to own the home, not share it.
Amazon has a nice lead as it gets other products to be controlled by Echo. The company even launched an investment fund to spur more startups writing to Alexa. Consumers might choose an assistant we like (and we think performs well) and just stick with that across the household. The more we share with that assistant, the better it knows us, and the better it serves us. This knowledge base could carry across products and make our lives easier.
Just Talk. In the “co-existence” case previously mentioned, there six people in my household, so it can be a busy place. But when I speak to someone, I don’t always start with their name. In fact, I usually don’t. If there’s just one other person in the room, it’s obvious who I’m speaking to. If there are multiple people in the room, I tend to look at or gesture toward the person I’m addressing. This is more natural than speaking their name.
An “always listening” device should have other sensors to know things like how many people are in the room, where they’re standing and looking at, how they’re gesturing, and so on. These are the subconscious cues humans use to know who is talking to us, and our devices would be smarter and more capable if they could do it.
May 4, 2015
I was at the Mobile Voice Conference last week and was on a keynote panel with Adam Cheyer (Siri, Viv, etc.) and Phil Gray (Interactions) with Bill Meisel moderating. One of Bills questions was about the best speech products, and of course there was a lot of banter about Siri, Cortana, and Voice Actions (or GoogleNow as it’s often referred to). When it was my turn to chime in I spoke about Amazon’s Echo, and heaped lots of praise on it. I had done a bit of testing on it before the conference but I didn’t own one. I decided to buy one from Ebay since Amazon didn’t seem to ever get around to selling me one. It arrived yesterday.
Here are some miscellaneous thoughts:
OK, Amazon… here’s my free advice (admittedly self-serving but nevertheless accurate):
August 1, 2013
One of the leakiest announcements in recent memory, Motorola’s new Moto X is expected to be officially announced today. Rather than trying to one up Apple and Samsung with the highest resolution screen and fastest processor, the Moto X competes on its ability to be customized and its intelligent use of low power sensors. With my background, it’s no surprise that I’m excited to see the “always listening” technology enabling the wake-up command “OK Google Now”. With this feature, speech recognition is enabled but in an ultra low power state, so it can be on and responsive without draining the battery. From other “press leaks”, I’m looking forward to a line of Droid phones with similar “always listening” functionality.
Motorola isn’t the only one rolling out interesting new “always listening” kinds of functions. Samsung did this first in the mobile phone, but implemented it in a “driving mode” so that it was sometimes always listening. The new Moto phones have been compared with Google’s Glass and the “OK Glass” function which some hackers have noted can be put in an “always listening” mode. Qualcomm has even implemented a speech technology on their chips and Android has released a function like this in their OS. Motorola’s use of the “always listening” trigger is especially cool because it calls up Google Now for a seamless flow from client to server speech recognition.
Here’s a demo of Sensory’s use of a very similar approach that we call “trigger to search” from a video we posted around a year ago:
So what’s Sensory’s involvement in these “always on” features from Android, Glass, Motorola, Nuance, Qualcomm, Samsung, etc.? I can’t say much except we have licensed our technology to Google/Motorola, Samsung and many others. We have not licensed Android or Qualcomm, but Qualcomm has commented on its interest in a partnership with Sensory for more involved applications.
With a mass market device like the Moto X, I’m excited to see more people experiencing the convenience of voice recognition that is always listening for your OK. Tomorrow I’m going to discuss leading voice recognition apps on the top mobile environments and then over the next few days and weeks, I’ll cover more topics around voice triggering technology such as pricing models (it’s free right?), power drain, privacy concerns with an “always listening” product, security and personalization. This is an exciting time for TrulyHandsfree™ voice control and I’d welcome your thoughts.
March 3, 2010
I was in Barcelona last month at the Mobile World Congress. Here are some of my speech-centric observations:
I went by the Microsoft booth on the first day of the show and asked when WinMobile7 would be announced. The guy on the floor acted like he had no clue what I was talking about. He wouldn’t even confirm it hadn’t been announced yet. The really ironic thing is that EVERYWHERE I went I saw Windows 7 advertisements…subways, stairs, hotel lobbies, etc. My friend Dan had a couple of corporate suites at the hotel across from the show, and asked about putting up a flier to say what floor they were on. He found out the entire hotel advertising space was taken by Microsoft! They had gotten an exclusive from the hotel.
Speaking of Dan…we’re old friends from school and decided to meet up for dinner. He said “Are you OK with a Tapas Bar?” and I said “Actually, I’m kinda hungry, if you really want to go, let’s do it after we eat.” I had made a speech recognition error…think about it.
Anyways…WinMobile 7 was announced on Day 2, and I saw some of the demos. I must say that Microsoft is taking a brave approach by completely redesigning the interface to be more focused on data (people, places) than on functions (applications, etc.) However, even with the new look and feel I didn’t hear any mention of any new speech recognition features, like um, a voice interface. I asked a guy on the floor, and he said the voice search was much improved. I like BING search, Google search and Vlingo search too as they are all getting more useful and robust. A couple of years ago, I was trying one of these search engines to find my hotel in downtown Boston, and after 3 or 4 failed attempts on a street corner, a woman pointed down the street and said “Your hotel is just down there”. A memory flashback…a cabbie on that trip asked me what I did and I said “speech recognition.” He said “oh I’ve been trying that for years…my wife talks to me and sometimes I respond properly.” But I digress…
Back to Barcelona. I saw a nice demo of MOTONAV at the Motorola booth. With a new independent consumer-product company spun out and Sanjay Jha in charge, they really seem to have turned things around. The people on the show floor seemed very upbeat and excited about where Motorola is right now. In addition to the 23 phones they currently offer, they have new ones coming out, including the new Devour and Cliq XT, both of which are based on the Android OS. I didn’t see much new stuff in the Bluetooth space, however. They are doing PNDs (portable navigation devices) and cell phones with MOTONAV. It’s a nice voice-controlled driving application, and the speech recognition in the demo I saw worked quite well on the hard stuff (addresses, etc.), but messed up on the easy things (it was a simple 2 word set that it got wrong.) Then again, small sets aren’t always easier than big ones. The Yes/No response is one of the hardest sets to get right (I heard that there are more than 50 ways to say No and almost as many ways to say Yes…like unh-unh and unh-huh…(I can’t even get that right spelling it!).
The big thing missing from MOTONAV is a Truly Hands-Free Trigger. In fact, that’s what is missing from the entire cell phone industry. All these products have built-in speech recognition, but the only way to activate it is with button presses. I found an article about “The First Truly Hands-Free Phone.” HOWEVER, when you read through it you find it really requires 2 button presses…one to turn it on and a second to activate the voice recognition. Well, Sensory can get rid of one of those button presses, which is a HUGE savings for products that can be turned on and are always listening. As battery technology improves and more “smart” listening windows are deployed, Truly Hands-Free triggers will become increasingly important for all products with speech technologies.