Speech Blog
HEAR ME - Speech Blog  |  Read more December 9, 2019 - Can Your Assistant Deliver?
HEAR ME - Speech Blog



Archive for the ‘voice search’ Category

Assistant vs Alexa: 8 things not discussed (enough)

October 14, 2016

I watched Sundar and Rick and the team at Google announce all the great new products from Google. I’ve read a few reviews and comparisons with Alexa/Assistant and Echo/Home, but it struck me that there’s quite an overlap in the reports I’m reading and some of the more interesting things aren’t being discussed. Here are a few of them, roughly in increasing order of importance:

  1. John Denver. Did anybody notice that the Google Home advertisement using John Denver’s Country Road song? Really? Couldn’t they have found something better? Country Roads didn’t make PlayBuzz’s list of the 15 best “home” songs or Jambase’s top 10 Home Songs Couldn’t someone have Googled “best home songs” to find something better?
  2. Siri and Cortana. With all the buzz about Amazon vs. Google, I’m wondering what’s up with Siri and Cortana? Didn’t see much commentary on that.
  3. AI acquisitions. Anybody notice that Google acquired API.ai? API.ai always claimed to have the highest rated voice assistant in the playstore. They called it “Assistant.” Hm. Samsung just acquired VIV – that’s Adam, Dag, Marco, and company that were behind the original Siri. Samsung has known for a while that they couldn’t trust Google and they always wanted to keep a distance.
  4. Assistant is a philosophical change. Google’s original positioning for its voice services were that Siri and Cortana could be personal assistants, but Google was just about getting to the information fast, not about personalities or conversations. The name “assistant” implies this might be changing.
  5. Google: a marketing company? Seems like Google used to pride itself of being void of marketing. They had engineers. Who needs marketing? This thinking came through loud and clear in the naming of their voice recognizer. Was it Google Voice, Google Now, OK Google? Nobody new. This historical lack of marketing and market focus was probably harmful. It would be fatal in an era of moving more heavily into hardware. That’s probably why they brought on Rick Osterloh, who understands hardware and marketing. Rick, did you approve that John Denver song?
  6. Data. Deep learning is all about data. Data that’s representative and labeled is the key. Google has been collecting and classifying all sorts of data for a very long time. Google will have a huge leg up on data for speech recognition, dialogs, pictures, video, searching, etc. Amazon is relatively new to the voice game, and it is at quite a disadvantage in the data game.
  7. Shopping. The point of all these assistants isn’t about making our lives better; it’s about getting our money. Google and Amazon are businesses with a profit motive, right? Google is very good at getting advertising dollars through search. Amazon is, among other things, very good at getting shoppers money (and they probably have a good amount of shopping data). If Amazon knows our buying habits and preferences and has the review system to know what’s best, then who wants ads? Just ship me what I need and if you get it wrong, let me return it hassle free. I don’t blame Google for trying to diversify. The ad model is under attack by Amazon through Alexa, Dash, Echo, Dot, Tap, etc.
  8. Personalization, privacy, embedded. Sundar talked a bit about personalization. He’s absolutely right that this is the direction assistants need to move (even if speaker verification isn’t built into the first Home units). Personalization occurs by collecting a lot of data about each individual user – what you sound like, how you say things, what music you listen to, what you control in your house, etc. Sundar didn’t talk much about privacy, but if you read user commentary on these home devices, the top issue by far relates to an invasion of privacy, which directly goes against personalization. The more privacy you give up, the more personalization you get. Unless… What if your data isn’t going to the cloud? What if it’s stored on your device in your home? Then privacy is at less risk, but the benefits of personalization can still exist. Maybe this is why Google briefly hit on the Embedded Assistant! Google gets it. More of the smarts need to move onto the device to ensure more privacy!

Touch-less Control Wins!

June 9, 2014

I still subscribe to the San Jose Mercury News, as they do a good job of tech business reporting. One of my favorite Mercury News writers is a true critic in the literary sense of the term, Troy Wolverton. Troy rarely raves and is typically critical, but in a smart, logical, and unemotional way.

A few days back he started writing about Microsoft’s  Cortana and said “Watch out Siri, someone wants your job.”

I was eager to read his review of Cortana this morning and in particular his comparison with Siri. He ended up giving it a 7/10, and concluding Siri was still ahead. What I thought was most interesting though was that in his final summary, he compared three products and three assistants based on the ease of calling up each of those assistants:

  • Cortana – required two touch steps to activate the personal voice assistant
  • Siri – required one touch step to activate the personal voice assistant
  • MotoX – The best, because you can just start talking with the keyword phrase “OK Google Now” making a TrulyHandsfree experience!!

Motorola is Sensory’s customer, and I am happy to read that Troy gets it and considers this front end activation an important metric in comparing personal assistants!

Mobile phones – It doesn’t have to be Cost OR Quality!

April 25, 2014

It’s not often that I rave about articles I read, but Ian Mansfield of Cellular News hit the nail on the head with this article.

Not only is it a well written and concise article but its chock full of recent data (primarily from JD Power research), and most importantly it’s data that tells a very interesting story that nicely aligns with Sensory’s strategy in mobile. So, thanks Ian, for getting me off my butt to start blogging again!

A few key points from the article:

  1. Price is becoming increasingly important in the choice of mobile phones, and simultaneously the prices of mobile phones are increasing.
  2. Although price might be the most important factor in choice, the overall customer satisfaction is driven by features.
  3. The features customers want are seamless voice control (36%); built-in sensors that can gauge temperature, lighting, noise and moods to customize settings to the environment (35%); and facial recognition and biometric security (28%).
  4. As everyone knows, Samsung and Apple have the overwhelming market share in mobile phones, but interesting to me was that they also both lead in customer satisfaction.

Now, let me dive one step deeper into the problem, and explore whether customer satisfaction can be achieved with minimal impact on cost:

Seamless voice control is here and soon every phone will have it, and it doesn’t add any hardware cost. Sensory introduced the technology with our TrulyHandsfree technology that allows users to just start talking, and our “trigger to search” technology has been nicely deployed by companies like Motorola that pioneered this “seamless voice control” in many of their recent releases. The seamless voice control really doesn’t add much cost, and with excellent engines from Google and Apple and Microsoft sitting in the clouds, it can and will be nicely implemented without effecting handset pricing.

Sensors are a different story. By their nature they will be embedded into the phones and will increase cost. Some “sensors” in the broadest sense of the term are no brainers and necessities, for example microphones and cameras are a must have, and the six-axis sensors combining GPS and accelerometers are arguably must haves as well. Magnetometers, barometers are getting increasingly common, and to differentiate further leading manufacturers are embedding things like heartbeat monitors; stereo 3D cameras are just around the corner. To address the desire for biometric security Samsung and Apple have the 2 bestselling phones in the world embedded with fingerprint sensors!

The problem is that all these sensors add cost, and in particular those finger print sensors are the most expensive and can add $5-$15 to the cost of goods. It’s kind of ironic that after spending all that money on biometric security, Apple doesn’t even allow them as a security measure for purchasing iTunes. And both Samsung and Apple have been chastised for fingerprint sensors that can be cracked with gummy bears or glue!

A much more accurate and cost effective solution can be achieved for biometrics by using the EXISTING sensors on the phones and not adding special purpose biometric sensors. In particular, the “must have sensors” like microphones, cameras, and 6-axis sensors can create a more secure environment that is just as seamless but much less difficult to crack. I’ll talk more about that in my next blog.

Have I told you how much I love those Moto X ads?

September 27, 2013

I think everybody in the speech industry must know about Motorola’s touchless control feature. Their ad campaign using comedian/actor TJ Miller has been a smashing success. Although their ads started off a bit racy (“touch each other not phones”), the switch to Miller introduced  the “lazy phone guy” (which appears to be a knock on Apple) and better  showcases key features and advantages of Moto X. The big advantage is in the low power speech activation technology that calls up Google Now without touching the phone!

The lazy phone campaign has ads for each of the device’s key features – Touchless Control, Quick Capture, Active Notifications, and the “Design It Yourself” concept. They are all entertaining, but it’s the touchless control that brings the most laughs. The first video went viral with over 15 million views, making it one of the most popular mobile phone ads ever.

The new touchless control ad is pretty funny with hundreds of thousands of views and growing!

Hello MOTO!

September 24, 2013

Motorola, who just happens to be a Sensory customer, launched a suite of new phones including Moto X and three Droids – Maxx, Ultra, and Mini – all with this awesome feature called “touchless control.” The “touchless control” uses a technology to wake up the phone by voice from a low power state, so the phone is always on and listening. Sorta like TrulyHandsfree! It links into GoogleNow so you can control pretty much anything and access information without touching the phone.

  • Moto launched an advertising campaign around the Lazy Phone Guy. These are my favorite ads ever, and the best of all these ads is the “no touching” Moto X phone. It’s already hit about 15M views!
  • Just saw this AdAge article about the Lazy Phone gone viral and beating out iPhone at its new launch. Says the touch ad has hit about 20M!
  • Even more impressive are the customer reviews for the “touchless control” technology. It’s one of the highest rated apps in the GooglePlay shop.

“Always Listening” doesn’t have to be always listening

August 16, 2013

I saw a post recently in the Android Central forum that talked about Sensory’s technology as used by Samsung:

What makes it different from any other voice app is its part of the OS. e.g. get a call, you can say ‘Answer’ or ‘Ignore’. alarm rings, just say ‘Snooze’. You don’t have to launch an app or press buttons to do this, the phone is always active and listening. No one else does this!

It’s an astute comment but not 100% accurate. When people talk about “always listening” what they really mean is that it appears to be “always listening”. At Sensory we call it TrulyHandsfree, and the idea is that there can be certain “modes” or “windows” where it listens for specific words. Like when the alarm goes off, it listens for “snooze” etc. If you say “snooze” when the alarm isn’t going off you find it’s not really “always listening”.

Glass has a similar usage model. It’s “always listening” but for different things at different times and only for short periods of time. I put my Glass on and timed it. The OK Glass trigger window seems to last 3-4 seconds, then the next set of commands (like Get Directions to) stays on 10-11 seconds.

What’s really cool about Glass is that during those listening windows you can say other things and it doesn’t “false fire” on them. I let my wife try out my Glass, and she said “You mean I just say OK Glass and then I can say any of these things like get directions to Chef Chu’s and…woah it works!” It ignores everything it’s not listening for and picks out the things it is listening for. The technology is known as “keyword spotting” for this reason.

To save power, Hallmark’s use of Sensory’s technology kicks into gear when the product is turned on. If it doesn’t hear one of the words it’s listening for spoken within a certain time frame, it will automatically power down, and stop “always listening” until its turned back on with a button press.

Sensory recently introduced a low power sound detection technology that further cuts power consumption by having the device “always listening” in a low power mode, where it doesn’t perform speech recognition. When it hears something it quickly powers up the recognizer for further analysis. This can cut the power consumption by “always listening” but not always recognizing, down to 1mA or so.

The price of free phone features

August 5, 2013

I often get the question, “If Android and Qualcomm offer voice activation for free, why would anyone license from Sensory?” While I’m not sure about Android and Qualcomm’s business models, I do know that decisions are based on accuracy, total added cost (royalties plus hardware requirements to run), power consumption, support, and other variables. Sensory seems to be consistently winning the shootouts it enters for embedded voice control. Some approaches that appear lower cost require a lot more memory or MIPS, driving up total cost and power consumption.

It’s interesting to note that companies like Nuance have a similar challenge on the server side where Google and Microsoft “give it away”. Because Google’s engine is so good it creates a high hurdle for Nuance. I’d guess Google’s rapid progress helps Nuance with their licensing of Apple, but may have made it more challenging to license Samsung. Samsung actually licensed Vlingo AND Nuance AND Sensory, then Nuance bought Vlingo.

Why doesn’t Samsung use Google recognition if it’s free? On the server it’s not power consumption effecting decisions, but cost, quality, and in this case CONTROL. On the cost side it could be that Samsung MAKES more money by using Nuance in some sort of ad revenue kickbacks, which I’d guess Google doesn’t allow. This is of course just hypothesizing. I don’t really know, and if I did know I couldn’t say. The control issue is big too as companies like Sensory and Nuance will sell to everyone and in that sense offer platform independence and more control. Working with a Microsoft or Google engine forces an investment in a specific platform implementation, and therefore less flexibility to have a uniform cross platform solution.

Apple Siri vs Android Voice Actions/Google Now vs Samsung S Voice.

August 2, 2013

  • Kudos to Apple for kicking it off with Siri, but shame on Apple for so quickly allowing its competitors to get ahead.
  • Kudos to Android for the speech technology devt. It’s really great, but Android – get a clue and hire some marketing people: Apple and Samsung have kicked your butt in branding the speech technology.  I still don’t know what to call it!
  • Kudos to Samsung for being first to have an always listening feature that worked!! You can’t stop there… Moto is beating you to the punch in a low power version and in the simple “trigger to search” flow with voice.

What about Microsoft and Amazon? Both have good cloud based recognition engines in house but neither seem particularly relevant in Mobile…YET!

Kudos to Microsoft for its always listening feature in XBox! It’s actually the best implementation I’ve seen that doesn’t use Sensory technology. I’ll blog more about how they do it and why they can’t do a low power implementation in the weeks ahead.

Out Today: Moto X is “Always Listening”

August 1, 2013

One of the leakiest announcements in recent memory, Motorola’s new Moto X is expected to be officially announced today. Rather than trying to one up Apple and Samsung with the highest resolution screen and fastest processor, the Moto X competes on its ability to be customized and its intelligent use of low power sensors. With my background, it’s no surprise that I’m excited to see the “always listening” technology enabling the wake-up command “OK Google Now”. With this feature, speech recognition is enabled but in an ultra low power state, so it can be on and responsive without draining the battery. From other “press leaks”, I’m looking forward to a line of Droid phones with similar “always listening” functionality.

Motorola isn’t the only one rolling out interesting new “always listening” kinds of functions. Samsung did this first in the mobile phone, but implemented it in a “driving mode” so that it was sometimes always listening. The new Moto phones have been compared with Google’s Glass and the “OK Glass” function which some hackers have noted can be put in an “always listening” mode. Qualcomm has even implemented a speech technology on their chips and Android has released a function like this in their OS. Motorola’s use of the “always listening” trigger is especially cool because it calls up Google Now for a seamless flow from client to server speech recognition.

Here’s a demo of Sensory’s use of a very similar approach that we call “trigger to search” from a video we posted around a year ago:


So what’s Sensory’s involvement in these “always on” features from Android, Glass, Motorola, Nuance, Qualcomm, Samsung, etc.? I can’t say much except we have licensed our technology to Google/Motorola, Samsung and many others. We have not licensed Android or Qualcomm, but Qualcomm has commented on its interest in a partnership with Sensory for more involved applications.

With a mass market device like the Moto X, I’m excited to see more people experiencing the convenience of voice recognition that is always listening for your OK. Tomorrow I’m going to discuss leading voice recognition apps on the top mobile environments and then over the next few days and weeks, I’ll cover more topics around voice triggering technology such as pricing models (it’s free right?), power drain, privacy concerns with an “always listening” product, security and personalization. This is an exciting time for TrulyHandsfree™ voice control and I’d welcome your thoughts.

Quick Thoughts

May 1, 2013

  1. Texas A&M Transportation Institute Study. Yeah they found that using Siri or Vlingo was as dangerous as texting while driving. Adam Cheyer hit the nail on the head in his response But Adam focused on Siri…Turns out they didn’t use Vlingo in the In Car mode, which was of course designed for In Car. Duh! Vlingo’s (now Nuance) In Car uses Sensory’s Truly Handsfree which requires NO TOUCHING and no distracted eyes while driving. All these articles which said “Handsfree texting no safer than typing” really got it wrong. It’s not TRULYHANDSFREE!!! In the study they held phones in their hands and hit buttons. Sorry that’s not Handsfee!
  2. Google Now on iOS. Cool! Android speech recognition is very good and probably the best, but having it built into the home button is easy, and easy usually trumps good. But Apple can’t be complacent, it’s gotta make some big moves or it will be left behind in the category they popularized.
  3. Google Glass. Holy smoke what a lot of press it gets. Sensory has 2 in house and we love the user experience! We believe wearables will become huge and Google is certainly driving the forefront of this. Glass must use Google’s speech recognition in the clouds. Wonder what they use on the client? It works GREAT!
  4. Galaxy S4. Yep, Sensory made it in for the embedded recognition used in triggers (with SVoice) and voice command and control! We got invited to the launch party. It’s a GREAT product, with a GREAT embedded speech recognizer.
  5. Icahn buying into Nuance.  Interesting…Can’t be bad for Nuance investors, until he sells! It’s nice to see speech technology reach the forefront in not just consumer electronics and technology but in the finance world too!
  6. Qualcomm introduces voice triggers. Yeah everyone knows that’s the area where Sensory dominates. Better accuracy, faster response time, lower power consumption, works in noise and from a distance, etc. People ask if Qualcomm is using Sensory technology. I say try it, and if it works GREAT then it’s probably Sensory’s. Anyways, we welcome the Qualcomm solution as it totally validates what we’ve been saying and doing. I tried it at mobile world congress and it responded well in noise, but you had to hit a button to turn it on to make it listen, which kind of defeats the purpose.
  7. Amazon buying the pieces. Yeah they bought up some of the best components available – TTS from Ivona, cloud speech recognition from YAP, and now intelligence from Evi. Even adding it all up, they haven’t paid that much and if they put it all together well, they should be in a strong position relative to their competitors.
  8. Industry. The overall speech field is aligning as a battle of titans all with good patent positions, large teams, and good technologies. Amazon, Google/Android, Microsoft, and Nuance are all major speech players today. Apple probably is too, but it’s hard to know what’s in house at Apple vs. Nuance. Nuance is the only substantive player that’s a vendor, out selling speech technology.  This puts them in a nice position, but they have competitors giving it away on all major platforms, so nobody is without challenges. Sensory might be the second largest speech vendor after Nuance and our sales are less than 2% of Nuance…pretty amazing gap there! I want to fill that gap! ;-)
« Older Entries