HEAR ME -
Speech Blog
HEAR ME - Speech Blog  |  Read more March 11, 2019 - Taking Back Control of Our Personal Data
HEAR ME - Speech Blog

Archives

Categories

Posts Tagged ‘Alexa’

Who (or what) is really listening to your conversation

June 22, 2016

I’ve written a series of blogs about consumer devices with speech recognition, like Amazon Echo. I mentioned that everyone is getting into the “always listening” game (Alexa, OK Google, Hey Siri, Hi Galaxy, Assistant, Hey Cortana, OK Hound, etc.), and I’ve explained that privacy concerns attempt to be addressed by putting the “always listening” mode on the device, rather than in the cloud.

Read more at Embedded Computing…

Speaking the language of the voice assistant

June 17, 2016

Hey Siri, Cortana, Google, Assistant, Alexa, BlueGenie, Hound, Galaxy, Ivee, Samantha, Jarvis, or any other voice-recognition assistant out there.

Now that Google and Apple have announced that they’ll be following Amazon into the home far-field voice assistant business, I’m wondering how many things in my home will always be on, listening for voice wakeup phrases. In addition, how will they work together (if at all). Let’s look at some possible alternatives:

Read more at Embedded Computing…

Google Assistant vs. Amazon’s Alexa

June 15, 2016

“Credit to the team at Amazon for creating a lot of excitement in this space,” Google CEO Sundar Pichai. He made this comment during his Google I/O speech last week when introducing Google’s new voice-controlled home speaker, Google Home which offers a similar sounding description to Amazon’s Echo. Many interpreted this as a “thanks for getting it started, now we’ll take over,” kind of comment.

Google has always been somewhat marketing challenged in naming its voice assistant. Everyone knows Apple has Siri, Microsoft has Cortana, and Amazon has Alexa. But what is Google’s voice assistant called?

Read more at Embedded Computing…

TrulyHandsfree 4.0… Maintaining the big lead!

August 6, 2015

We first came out with TrulyHandsfree about five years ago. I remember talking to speech tech executives at MobileVoice as well as other industry tradeshows, and when talking about always-on hands-free voice control, everybody said it couldn’t be done. Many had attempted it, but their offerings suffered from too many false fires, or not working in noise, or consuming too much power to be always listening. Seems that everyone thought a button was necessary to be usable!

In fact, I remember the irony of being on an automotive panel, and giving a presentation about how we’ve eliminated the need for a trigger button, while the guy from Microsoft presented on the same panel the importance of where to put the trigger button in the car.

Now, five years later, voice activation is the norm… we see it all over the place with OK Google, Hey Siri, Hey Cortana, Alexa, Hey Jibo, and of course if you’ve been watching Sensory’s demos over the years, Hello BlueGenie!

Sensory pioneered the button free, touch free, always-on voice trigger approach with TrulyHandsfree 1.0 using a unique, patented keyword spotting technology we developed in-house– and from its inception, it was highly robust to noise and it was ultra-low power. Over the years we have ported it to dozens of platforms, Including DSP/MCU IP cores from ARM, Cadence, CEVA, NXP CoolFlux, Synopsys and Verisilicon, as well as for integrated circuits from Audience, Avnera, Cirrus Logic, Conexant, DSPG, Fortemedia, Intel, Invensense, NXP, Qualcomm, QuickLogic, Realtek, STMicroelectronics, TI and Yamaha.

This vast platform compatibility has allowed us to work with numerous OEMs to ship TrulyHandsfree in over a billion products!

Sensory didn’t just innovate a novel keyword spotting approach, we’ve continually improved it by adding features like speaker verification and user defined triggers. Working with partners, we lowered the draw on the battery to less than 1mA, and Sensory introduced hardware and software IP to enable ultra-low-power voice wakeup of TrulyHandsfree. All the while, our accuracy has remained the best in the industry for voice wakeup.

We believe the bigger, more capable companies trying to make voice triggers have been forced to use deep learning speech techniques to try and catch up with Sensory in the accuracy department. They have yet to catch up, but they have grown their products to a very usable accuracy level, through deep learning, but lost much of the advantages of small footprint and low power in the process.

Sensory has been architecting solutions for neural nets in consumer electronics since we opened the doors more than 20 years ago. With TrulyHandsfree 4.0 we are applying deep learning to improve accuracy even further, pushing the technology even more ahead of all other approaches, yet enabling an architecture that has the ability to remain small and ultra-low power. We are enabling new feature extraction approaches, as well as improved training in reverb and echo. The end result is a 60-80% boost in what was already considered industry-leading accuracy.

I can’t wait for TrulyHandsfree 5.0…we have been working on it in parallel with 4.0, and although it’s still a long ways off, I am confident we will make the same massive improvements in speaker verification with 5.0 that we are doing for speech recognition in 4.0! Once again further advancing the state of the art in embedded speech technologies!

OK, Amazon!

May 4, 2015

I was at the Mobile Voice Conference last week and was on a keynote panel with Adam Cheyer (Siri, Viv, etc.) and Phil Gray (Interactions) with Bill Meisel moderating. One of Bills questions was about the best speech products, and of course there was a lot of banter about Siri, Cortana, and Voice Actions (or GoogleNow as it’s often referred to). When it was my turn to chime in I spoke about Amazon’s Echo, and heaped lots of praise on it. I had done a bit of testing on it before the conference but I didn’t own one. I decided to buy one from Ebay since Amazon didn’t seem to ever get around to selling me one. It arrived yesterday.

Here are some miscellaneous thoughts:

  • Echo is a fantastic product! Not so much because of what it is today but for the platform it’s creating for tomorrow. I see it as every bit as revolutionary as Siri.
  • The naming is really confusing. You call it Alexa but the product is Echo. I suspect this isn’t the blunder that Google made (VoiceActions, GoogleNow, GoogleVoice, etc.), but more an indication that they are thinking of Echo as the product and Alexa as the personality, and that new products will ship with the same personality over time. This makes sense!
  • Setup was really nice and easy, the music content integration/access is awesome, the music quality could be a bit better but is useable; there’s lots of other stuff that normal reviewers will talk about…But I’m not a “normal” reviewer because I have been working with speech recognition consumer electronics for over 20 years, and my kids have grown up using voice products, so I’ll focus on speech…
  • My 11 year old son, Sam, is pretty used to me bringing home voice products, and is often enthusiastic (he insisted on taking my Vocca voice controlled light to keep in his room earlier this year). Sam watched me unpack it and immediately got the hang of it and used it to get stats on sports figures and play songs he likes. Sam wants one for his birthday! Amazon must have included some kids voice modeling in their data because it worked pretty well with his voice (unlike the Xbox when it first shipped, which I found particularly ironic since Xbox was targeting kids).
  • The Alexa trigger works VERY well. They have implemented beamforming and echo cancellation in a very state of the art implementation. The biggest issue is that it’s a very bandwidth intensive approach and is not low power. Green is in! That could be why its plug-in/AC only and not battery powered. Noise near the speaker definitely hurts performance as does distance, but it absolutely represents a new dimension in voice usability from a distance and unlike with the Xbox, you can move anywhere around it, and aren’t forced to be in a stationary position (thanks to their 7 mics, which surely must be overkill!)
  • The voice recognition in generally is good, but like all of the better engines today (Google, Siri, Cortana, and even Sensory’s TrulyNatural) it needs to get better. We did have a number of problems where Alexa got confused. Also, Alexa doesn’t appear to have memory of past events, which I expect will improve with upgrades. I tried playing the band Cake (a short word, making it more difficult) and it took about 4 attempts until it said “Would you like me to play Cake?” Then I made the mistake of trying “uh-huh” instead of “yes” and I had to start all over again!
  • My FAVORITE thing about the recognizer is that it does ignore things very nicely. It’s very hard to know when to respond and when not to. The Voice Assistants (Google, Siri, Cortana) seem to always defer to web searches and say things like “It’s too noisy” no matter what I do, and I thought Echo was good at deciding not to respond sometimes.

OK, Amazon… here’s my free advice (admittedly self-serving but nevertheless accurate):

  • You need to know who is talking and build models of their voices and remember who they are and what their preferences are. Sensory has the BEST embedded speaker identification/verification engine in the world, and it’s embedded so you don’t need to send a bunch of personal data into the cloud. Check out TrulySecure!
  • In fact, if you added a camera to Alexa, it too could be used for many vision features, including face authentication.
  • Make it battery powered and portable! To do this, you’d need an equally good embedded trigger technology that runs at low power – Check out TrulyHandsfree!
  • If it’s going to be portable, then it needs to work if even when not connected to the Internet. For this, you’d need an amazing large vocabulary embedded speech engine. Did I tell you about TrulyNatural?
  • Of course, the hope is that the product-line will quickly expand and as a result, you will then add various sensors, microphones, cameras, wheels, etc.; and at the same time, you will also want to develop lower cost versions that don’t have all the mics and expensive processing. You are first to market and that’s a big edge. A lot of companies are trying to follow you. You need to expand the product-line quickly, learning from Alexa. Too many big companies have NIH syndrome… don’t be like them! Look for partnering opportunities with 3rd parties who can help your products succeed – Like Sensory! ;-)
Newer Entries »