HEAR ME -
Speech Blog
HEAR ME - Speech Blog  |  Read more March 11, 2019 - Taking Back Control of Our Personal Data
HEAR ME - Speech Blog

Archives

Categories

Posts Tagged ‘Cortana’

Assistant vs Alexa: 8 things not discussed (enough)

October 14, 2016

I watched Sundar and Rick and the team at Google announce all the great new products from Google. I’ve read a few reviews and comparisons with Alexa/Assistant and Echo/Home, but it struck me that there’s quite an overlap in the reports I’m reading and some of the more interesting things aren’t being discussed. Here are a few of them, roughly in increasing order of importance:

  1. John Denver. Did anybody notice that the Google Home advertisement using John Denver’s Country Road song? Really? Couldn’t they have found something better? Country Roads didn’t make PlayBuzz’s list of the 15 best “home” songs or Jambase’s top 10 Home Songs Couldn’t someone have Googled “best home songs” to find something better?
  2. Siri and Cortana. With all the buzz about Amazon vs. Google, I’m wondering what’s up with Siri and Cortana? Didn’t see much commentary on that.
  3. AI acquisitions. Anybody notice that Google acquired API.ai? API.ai always claimed to have the highest rated voice assistant in the playstore. They called it “Assistant.” Hm. Samsung just acquired VIV – that’s Adam, Dag, Marco, and company that were behind the original Siri. Samsung has known for a while that they couldn’t trust Google and they always wanted to keep a distance.
  4. Assistant is a philosophical change. Google’s original positioning for its voice services were that Siri and Cortana could be personal assistants, but Google was just about getting to the information fast, not about personalities or conversations. The name “assistant” implies this might be changing.
  5. Google: a marketing company? Seems like Google used to pride itself of being void of marketing. They had engineers. Who needs marketing? This thinking came through loud and clear in the naming of their voice recognizer. Was it Google Voice, Google Now, OK Google? Nobody new. This historical lack of marketing and market focus was probably harmful. It would be fatal in an era of moving more heavily into hardware. That’s probably why they brought on Rick Osterloh, who understands hardware and marketing. Rick, did you approve that John Denver song?
  6. Data. Deep learning is all about data. Data that’s representative and labeled is the key. Google has been collecting and classifying all sorts of data for a very long time. Google will have a huge leg up on data for speech recognition, dialogs, pictures, video, searching, etc. Amazon is relatively new to the voice game, and it is at quite a disadvantage in the data game.
  7. Shopping. The point of all these assistants isn’t about making our lives better; it’s about getting our money. Google and Amazon are businesses with a profit motive, right? Google is very good at getting advertising dollars through search. Amazon is, among other things, very good at getting shoppers money (and they probably have a good amount of shopping data). If Amazon knows our buying habits and preferences and has the review system to know what’s best, then who wants ads? Just ship me what I need and if you get it wrong, let me return it hassle free. I don’t blame Google for trying to diversify. The ad model is under attack by Amazon through Alexa, Dash, Echo, Dot, Tap, etc.
  8. Personalization, privacy, embedded. Sundar talked a bit about personalization. He’s absolutely right that this is the direction assistants need to move (even if speaker verification isn’t built into the first Home units). Personalization occurs by collecting a lot of data about each individual user – what you sound like, how you say things, what music you listen to, what you control in your house, etc. Sundar didn’t talk much about privacy, but if you read user commentary on these home devices, the top issue by far relates to an invasion of privacy, which directly goes against personalization. The more privacy you give up, the more personalization you get. Unless… What if your data isn’t going to the cloud? What if it’s stored on your device in your home? Then privacy is at less risk, but the benefits of personalization can still exist. Maybe this is why Google briefly hit on the Embedded Assistant! Google gets it. More of the smarts need to move onto the device to ensure more privacy!

Speaking the language of the voice assistant

June 17, 2016

Hey Siri, Cortana, Google, Assistant, Alexa, BlueGenie, Hound, Galaxy, Ivee, Samantha, Jarvis, or any other voice-recognition assistant out there.

Now that Google and Apple have announced that they’ll be following Amazon into the home far-field voice assistant business, I’m wondering how many things in my home will always be on, listening for voice wakeup phrases. In addition, how will they work together (if at all). Let’s look at some possible alternatives:

Co-existence. We’re heading down a path where we as consumers will have multiple devices on and listening in our homes and each device will respond to its name when spoken to. This works well with my family; we just talk to each other, and if we need to, we use each other’s names to differentiate. I can have friends and family over or even a big party, and it doesn’t become problematic calling different people by different names.

The issue for household computer assistants all being on simultaneously is that false fires will grow in direct proportion to the number of devices on and listening. With Amazon’s Echo, I get a false fire about every other day, and Alexa does a great job of listening to what I say after the false fire and ignoring if it doesn’t seem to be an intended command. It’s actually the best performing system I’ve used and the fact that its starts playing music or talking every other week is a testament to what a good job they have done. However, interrupting my family every other week is not good enough. And if I have five always-listening devices interrupting us 10 times a month, that becomes unacceptable. And if they don’t do as good a job as Alexa, and interrupt more frequently, it becomes quite problematic.

Functional winners. Maybe each device could own a functional category. For example, all my music systems could use Alexa, my TV’s use Hi Galaxy, and all appliances are Bosch. Then I’d have less “names” to call out to and there would be some big benefits: 1) The devices using the same trigger phrase could communicate and compare what they heard to improve performance; 2) More relevant data could be collected on the specific usage models, thus further improving performance; and 3) With less names to call out, I’d have fewer false fires. Of course, this would force me as a consumer to decide on certain brands to stick to in certain categories.

Winner take all. Amazon is adopting a multi-pronged strategy of developing its own products (Echo, Dot, Tap, etc.) and also letting its products control other products. In addition, Amazon is offering the backend Alexa voice service to independent product developers. It’s unclear whether competitors will follow suit, but one thing is clear—the big guys want to own the home, not share it.

Amazon has a nice lead as it gets other products to be controlled by Echo. The company even launched an investment fund to spur more startups writing to Alexa. Consumers might choose an assistant we like (and we think performs well) and just stick with that across the household. The more we share with that assistant, the better it knows us, and the better it serves us. This knowledge base could carry across products and make our lives easier.

Just Talk. In the “co-existence” case previously mentioned, there six people in my household, so it can be a busy place. But when I speak to someone, I don’t always start with their name. In fact, I usually don’t. If there’s just one other person in the room, it’s obvious who I’m speaking to. If there are multiple people in the room, I tend to look at or gesture toward the person I’m addressing. This is more natural than speaking their name.

An “always listening” device should have other sensors to know things like how many people are in the room, where they’re standing and looking at, how they’re gesturing, and so on. These are the subconscious cues humans use to know who is talking to us, and our devices would be smarter and more capable if they could do it.

Google Assistant vs. Amazon’s Alexa

June 15, 2016

“Credit to the team at Amazon for creating a lot of excitement in this space,” Google CEO Sundar Pichai. He made this comment during his Google I/O speech last week when introducing Google’s new voice-controlled home speaker, Google Home which offers a similar sounding description to Amazon’s Echo. Many interpreted this as a “thanks for getting it started, now we’ll take over,” kind of comment.

Google has always been somewhat marketing challenged in naming its voice assistant. Everyone knows Apple has Siri, Microsoft has Cortana, and Amazon has Alexa. But what is Google’s voice assistant called? Is it Google Voice, Google Now, OK Google, Voice Actions? Even those of us in the speech industry have found Google’s branding to be confusing. Maybe they’re clearing that up now by calling their assistant “Google Assistant.” Maybe that’s the Google way of admitting it’s an assistant without admitting they were wrong by not giving it a human sounding name.

The combination of the early announcement of Google Home and Google Assistant has caused some to comment that Amazon has BIG competition at best, and at worst, Amazon’s Alexa is in BIG trouble.

Forbes called Google’s offering the Echo Killer, while Slate said it was smarter than Amazon’s Echo.

I thought I’d point out a few good reasons why Amazon is in pretty good shape:

  1. Google Home is not shipping. Google has a bit of a chicken-and-egg issue in that it needs to roll out a product that has industry support (for controlling third-party products by voice). How do you get industry partners without a product? You announce early! That was a smart move; now they just need to design it and ship it…not always an easy task.
  2. It’s about Voice Commerce. This is REALLY important. Many people think Google will own this home market because it has a better speech recognizer. Speech recognition capabilities are nice but not the end game. The value here is having a device that’s smart and trusted enough to take money out of our bank accounts and deliver us goods and services that we want when we want them. Amazon has a huge infrastructure lead here in products, reviews, shipping, and other key components of Internet commerce. Adding a convenient voice front end isn’t easy, but it’s also NOT the hardest part of enabling big revenue voice commerce systems.
  3. Amazon has far-field working and devices that always “talk back.” I admit the speech recognition is important, and Google has a lot of data, experience, and technologists in machine learning, AI, and speech recognition. But most of the Google experience is through Android and mobile-phone hardware. Where Amazon has made a mark is in far-field or longer distance recognition that really works, which is not easy to do. Speech recognition has always been about signal/noise ratios and far-field makes the task more difficult and requires acoustic echo cancellation, multiple microphones, plus various gain control and noise filtering/speech focusing approaches. Also, the Google recognizer was established around finding data through voice queries, most of such data being displayed on-screen (and often through search). The Google Home and Amazon Echo are no-screen devices. Having them intelligently talk back means more than just reading the text off a search. Google can handle this, of course, but it’s one more technical barrier that needs to be done right.
  4. Amazon has a head start and already is an industry standard. Amazon’s done a nice job with the Echo. It’s follow-on products, Tap and Dot, were intelligent offshoots. Even its Fire TV took advantage of in-house voice capabilities. The Alexa Voice Services work well and already are acting like a standard for voice control. Roughly three million Amazon devices have already sold, and I’d guess that in the next year, the number of Alexa connected devices will double through both Amazon sales and third parties using AVS. This is not to mention the tens of millions of devices on the market that can be controlled by Echo or other Amazon hardware. Amazon is pretty well entrenched!

Of course, Amazon has its challenges as well, but I’ll leave that for another blog.

TrulyHandsfree 4.0… Maintaining the big lead!

August 6, 2015

We first came out with TrulyHandsfree about five years ago. I remember talking to speech tech executives at MobileVoice as well as other industry tradeshows, and when talking about always-on hands-free voice control, everybody said it couldn’t be done. Many had attempted it, but their offerings suffered from too many false fires, or not working in noise, or consuming too much power to be always listening. Seems that everyone thought a button was necessary to be usable!

In fact, I remember the irony of being on an automotive panel, and giving a presentation about how we’ve eliminated the need for a trigger button, while the guy from Microsoft presented on the same panel the importance of where to put the trigger button in the car.

Now, five years later, voice activation is the norm… we see it all over the place with OK Google, Hey Siri, Hey Cortana, Alexa, Hey Jibo, and of course if you’ve been watching Sensory’s demos over the years, Hello BlueGenie!

Sensory pioneered the button free, touch free, always-on voice trigger approach with TrulyHandsfree 1.0 using a unique, patented keyword spotting technology we developed in-house– and from its inception, it was highly robust to noise and it was ultra-low power. Over the years we have ported it to dozens of platforms, Including DSP/MCU IP cores from ARM, Cadence, CEVA, NXP CoolFlux, Synopsys and Verisilicon, as well as for integrated circuits from Audience, Avnera, Cirrus Logic, Conexant, DSPG, Fortemedia, Intel, Invensense, NXP, Qualcomm, QuickLogic, Realtek, STMicroelectronics, TI and Yamaha.

This vast platform compatibility has allowed us to work with numerous OEMs to ship TrulyHandsfree in over a billion products!

Sensory didn’t just innovate a novel keyword spotting approach, we’ve continually improved it by adding features like speaker verification and user defined triggers. Working with partners, we lowered the draw on the battery to less than 1mA, and Sensory introduced hardware and software IP to enable ultra-low-power voice wakeup of TrulyHandsfree. All the while, our accuracy has remained the best in the industry for voice wakeup.

We believe the bigger, more capable companies trying to make voice triggers have been forced to use deep learning speech techniques to try and catch up with Sensory in the accuracy department. They have yet to catch up, but they have grown their products to a very usable accuracy level, through deep learning, but lost much of the advantages of small footprint and low power in the process.

Sensory has been architecting solutions for neural nets in consumer electronics since we opened the doors more than 20 years ago. With TrulyHandsfree 4.0 we are applying deep learning to improve accuracy even further, pushing the technology even more ahead of all other approaches, yet enabling an architecture that has the ability to remain small and ultra-low power. We are enabling new feature extraction approaches, as well as improved training in reverb and echo. The end result is a 60-80% boost in what was already considered industry-leading accuracy.

I can’t wait for TrulyHandsfree 5.0…we have been working on it in parallel with 4.0, and although it’s still a long ways off, I am confident we will make the same massive improvements in speaker verification with 5.0 that we are doing for speech recognition in 4.0! Once again further advancing the state of the art in embedded speech technologies!

Touch-less Control Wins!

June 9, 2014

I still subscribe to the San Jose Mercury News, as they do a good job of tech business reporting. One of my favorite Mercury News writers is a true critic in the literary sense of the term, Troy Wolverton. Troy rarely raves and is typically critical, but in a smart, logical, and unemotional way.

A few days back he started writing about Microsoft’s  Cortana and said “Watch out Siri, someone wants your job.”

I was eager to read his review of Cortana this morning and in particular his comparison with Siri. He ended up giving it a 7/10, and concluding Siri was still ahead. What I thought was most interesting though was that in his final summary, he compared three products and three assistants based on the ease of calling up each of those assistants:

  • Cortana – required two touch steps to activate the personal voice assistant
  • Siri – required one touch step to activate the personal voice assistant
  • MotoX – The best, because you can just start talking with the keyword phrase “OK Google Now” making a TrulyHandsfree experience!!

Motorola is Sensory’s customer, and I am happy to read that Troy gets it and considers this front end activation an important metric in comparing personal assistants!