HEAR ME -
Speech Blog
HEAR ME - Speech Blog  |  Read more September 17, 2019 - IFA 2019 Takes Assistants Everywhere to a New Level
HEAR ME - Speech Blog

Archives

Categories

Posts Tagged ‘trulyhandsfree’

Staying Ahead with Advanced AI on Devices

June 8, 2017

Since the beginning, Sensory has been a pioneer in advancing AI technologies for consumer electronics. Not only did Sensory implement the first commercially successful speech recognition chip, but we also were first to bring biometrics to low cost chips, and speech recognition to Bluetooth devices. Perhaps what I am most proud of though, more than a decade ago Sensory introduced its TrulyHandsfree technology and showed the world that wakeup words could really work in real devices, getting around the false accept and false reject, and power consumption issues that had plagued the industry. No longer did speech recognition devices require button presses…and it caught on quickly!

Let me go on boasting because I think Sensory has a few more claims to fame… Do you think Apple developed the first “Hey Siri” wake word? Did Google develop the first “OK Google” wake word? What about “Hey Cortana”? I believe Sensory developed these initial wake words, some as demos and some shipped in real products (like the Motorola MotoX smartphone and certain glasses). Even third-party Alexa and Cortana products today are running Sensory technology to wake up the Alexa cloud service.

Sensory’s roots are in neural nets and machine learning. I know everyone does that today, but it was quite out of favor when Sensory used machine learning to create a neural net speech recognition system in the 1990’s and 2000’s.  Today everyone and their brother is doing deep learning (yeah that’s tongue in cheek because my brother is doing it too! (http://www.cs.colorado.edu/~mozer/index.php). And a lot of these deep learning companies are huge multi-billion-dollar business or extremely well-funded startups.

So, can Sensory stay ahead now and continuing pioneering innovation in AI now that everyone is using machine learning and doing AI? Of course, the answer is yes!

Sensory is now doing computer vision with convolutional neural nets. We are coming out with deep learning noise models to improve speech recognition performance and accuracy, and are working on small TTS systems using deep learning approaches that help them sound lifelike. And of course, we have efforts in biometrics and natural language that also use deep learning.

We are starting to combine a lot of technologies together to show that embedded systems can be quite powerful. And because we have been around longer and thought through most of these implementations years before others, we have a nice portfolio of over 3 dozen patents covering these embedded AI implementations. Hand in hand with Sensory’s improvements in AI software, companies like ARM, NVidia, Intel, Qualcomm and others are investing and improving upon neural net chips that can perform parallel processing for specialized AI functions, so the world will continue seeing better and better AI offerings on “the edge”.

Curious about the kind of on-device AI we can create when combining a bunch of our technologies together? So were we! That’s why we created this demo that showcases Sensory’s natural language speech recognition, chatbots, text-to-speech, avatar lip-sync and animation technologies. It’s our goal to integrate biometrics and computer vision into this demo in the months ahead:

Let me know what you think of that! If you are a potential customer and we sign an NDA, we would be happy to send you an APK of this demo so you can try it yourself! For more information about this exciting demo, please check out the formal announcement we made: http://www.prnewswire.com/news-releases/sensory-brings-chatbot-and-avatar-technology-to-consumer-devices-and-apps-300470592.html

Virtual Assistants coming to an Ear Near You!

January 5, 2017

Virtual handsfree assistants that you can talk to and that talk back have rapidly gained popularity. First, they arrived in mobile phones with Motorola’s MotoX that had an ‘always listening’ Moto Voice powered by Sensory’s TrulyHandsfree technology. The approach quickly spread across mobile phones and PCs to include Hey Siri, OK Google, and Hey Cortana.

Then Amazon took things to a whole new level with the Echo using Alexa. A true voice interface emerged, initially for music but quickly expanding domain coverage to include weather, Q&A, recipes, and the most common queries. On top of that, Amazon took a unique approach by enabling 3rd parties to develop “skills” that now number over 6000! These skills allow Amazon’s Echo line (with Tap, Dot) and 3rd Party Alexa equipped products (like Nucleus and Triby) to be used to control various functions, from reading heartrates on Fitbits to ordering Pizzas and controlling lights.

Until recently, handsfree assistants required a certain minimum power capability to really be always on and listening. Additionally, the hearable market segment including fitness headsets, hearing aids, stereo headsets and other Bluetooth devices needed to use touch control because of their power limitations. Also, Amazons Alexa had required WIFI communications so you could sit on your couch talking to your Echo and query Fitbit information, but you couldn’t go out on a run and ask Alexa what your heartrate was.

All this is changing now with Sensory’s VoiceGenie!

The VoiceGenie runs an embedded recognizer in a low power mode. Initially this is on a Qualcomm/CSR Bluetooth chip, but could be expanded to other platforms. Sensory has taken an SBC music decoder and intertwined a speech recognition system, so that the Bluetooth device can recognize speech while music is playing.

The VoiceGenie is on and listening for 2 keywords:
Alexa – this enables Alexa “On the Go” through a cellphone rather than requiring WiFi
VoiceGenie – this provides access to all the Bluetooth Device and Handset Device features

For example, a Bluetooth headset’s volume, pairing, battery strength, or connection status can only be controlled by the device itself, so VoiceGenie handles those controls without touching required. VoiceGenie can also read incoming callers’ names and ask the user if they want to answer or ignore. VoiceGenie can call up the phone’s assistant, like Google Assistant or Siri or Cortana, to ask by voice for a call to be made or a song to be played.
By saying Alexa, the user gets access to a mobile Alexa ‘On the Go’, so any of the Alexa skills can be utilized while out and about, whether hiking or running!

Some of the important facts behind the new VoiceGenie include:

  • VoiceGenie is a platform for VoiceAssistants to be used Handsfree on tiny devices
  • VoiceGenie enables Alexa for a whole new range of portable products
  • VoiceGenie enables a movement towards invisible assistants that are with you all the time and help you in your daily lives

This third point is perhaps the least understood, yet the most important. People want a personalized assistant that knows them, keeps their secrets safe, and helps them in their daily lives. This help can be accessing information or controlling your environment. It’s very difficult to accomplish this for privacy and power reasons in a cloud powered environment. There needs to be embedded intelligence. It needs to be low power. VoiceGenie is that low powered voice assistant.

Sensory Winning Awards

October 6, 2016

It’s always nice when Sensory wins an award. 2016 has been a special year for Sensory because we won more awards than any other year in our 23 year history!!

Check it out:

Sensory Earns Multiple Coveted Awards in 2016
Pioneering embedded speech and machine vision tech company receiving industry accolades

Sensory Inc., a Silicon Valley company that pioneered the hands-free voice wakeup word approach, today, announced it has won over half a dozen awards in 2016 across its product-line, including awards for products, technologies, and people, covering deep learning, biometric authentication and voice recognition.

The awards presented to Sensory include the following:
AIconics are the world’s only independently judged awards celebrating the drive, innovation and hard work in the international artificial intelligence community. Sensory was initially a finalist along with six other companies in the category of Best Innovation in Deep Learning, and judges determined Sensory to be the overall WINNER at an awards ceremony held in September 2016. The judging panel was comprised of 12 independent professionals spanning leaders in artificial intelligence R&D, academia, investments, journalists and analysts.

CTIA Super Mobility 2016™, the largest wireless event in America, announced more than 70 finalists for its 10th annual CTIA Emerging Technology (E-Tech) Awards. Sensory was nominated in the category of Mobile Security and Privacy for its TrulySecure™ technology, along with Nokia, Samsung, SAP, and others. Sensory was presented with the First Place award for the category in a ceremony on September 2016 at the CTIA Las Vegas event.

Speech Technology magazine, the leading provider of speech technology news and analysis, had its 10th annual Speech Industry Awards to recognize the creativity and notable achievements of key influencers (Luminaries), major innovators (Star Performers), and impressive deployments (Implementation Awards). The editors of Speech Technology magazine selected 2016 award winners based on their industry contributions during the past 12 months. Sensory’s CEO, Todd Mozer, was awarded with a Luminary Award, making it his second time winning the prestigious award. Sensory as a company was awarded the Star Performer award along with IBM, Amazon and others.

Two well-known industry analyst firms issued reports highlighting Sensory’s industry contributions for its TrulyHandsfree product and customer leadership, offering awards for innovations, customer deployment, and strategic leadership.

“Sensory has an incredibly talented team of speech recognition and biometrics experts dedicated to advancing the state-of-the-art of each respective field. We are pleased that our TrulyHandsfree, TrulySecure and TrulyNatural product lines are being recognized in so many categories, across the various industries in which we do business,” said Todd Mozer, CEO of Sensory. “I am also thrilled that Sensory’s research and innovations in the deep learning space has been noticed, generating our company prestigious accolades and management recognition.”

For more information about this announcement, Sensory or its technologies, please contact sales@sensory.com; Press inquiries: press@sensory.com

IoT Roadshow with Open Systems Media

May 6, 2016

Rich Nass and Barbara Quinlan from Open Systems Media visited Sensory on their “IoT Roadshow”.

IoT is a very interesting area. About 10 years ago we saw voice controlled IoT on the way, and we started calling the market SCIDs – Speech Controlled Internet Devices. I like IoT better, it’s certainly a more popular name for the segment! ;-)

I started our meeting off by talking about Sensory’s three products – TrulyHandsfree Voice Control, TrulySecure Authentication, and TrulyNatural large vocabulary embedded speech recognition.

Although TrulyHandsfree is best known for its “always on” capabilities, ideal for listening for key phrases (like OK Google, Hey Cortana, and Alexa), it can be used a ton of other ways. One of them is for hands-free photo taking, so no selfie stick is required. To demonstrate, I put my camera on the table and took pictures of Barbara and Rich.  (Normally I might have joined the pictures, but their healthy hair, naturally good looks, and formal attire was too outclassing for my participation).

 

IoT pic 1IoT pic 2

 

 

 

 

 

 

 

 

There’s a lot of hype about IoT and Wearables and I’m a big believer in both. That said, I think Amazon’s Echo is the perfect example of a revolutionary product that showcases the use of speech recognition in the IoT space and am looking forward to some innovative uses of speech in Wearables!

Here’s the article they wrote on their visit to Sensory and an impromptu video showing TrulyNatural performing on-device navigation, as well as a demo of TrulySecure via our AppLock Face/Voice Recognition app.

IoT Roadshow, Santa Clara – Sensory: Look ma, no hands!

Rich Nass, Embedded Computing Brand Director

If you’re an IoT device that requires hands-free operation, check out Sensory, just like I did while I was OpenSystems Media’s IoT Roadshow. Sensory’s technology worked flawlessly running through the demo, as you can see in the video. We ran through two different products, one for input and one for security.

TrulyHandsfree 4.0… Maintaining the big lead!

August 6, 2015

We first came out with TrulyHandsfree about five years ago. I remember talking to speech tech executives at MobileVoice as well as other industry tradeshows, and when talking about always-on hands-free voice control, everybody said it couldn’t be done. Many had attempted it, but their offerings suffered from too many false fires, or not working in noise, or consuming too much power to be always listening. Seems that everyone thought a button was necessary to be usable!

In fact, I remember the irony of being on an automotive panel, and giving a presentation about how we’ve eliminated the need for a trigger button, while the guy from Microsoft presented on the same panel the importance of where to put the trigger button in the car.

Now, five years later, voice activation is the norm… we see it all over the place with OK Google, Hey Siri, Hey Cortana, Alexa, Hey Jibo, and of course if you’ve been watching Sensory’s demos over the years, Hello BlueGenie!

Sensory pioneered the button free, touch free, always-on voice trigger approach with TrulyHandsfree 1.0 using a unique, patented keyword spotting technology we developed in-house– and from its inception, it was highly robust to noise and it was ultra-low power. Over the years we have ported it to dozens of platforms, Including DSP/MCU IP cores from ARM, Cadence, CEVA, NXP CoolFlux, Synopsys and Verisilicon, as well as for integrated circuits from Audience, Avnera, Cirrus Logic, Conexant, DSPG, Fortemedia, Intel, Invensense, NXP, Qualcomm, QuickLogic, Realtek, STMicroelectronics, TI and Yamaha.

This vast platform compatibility has allowed us to work with numerous OEMs to ship TrulyHandsfree in over a billion products!

Sensory didn’t just innovate a novel keyword spotting approach, we’ve continually improved it by adding features like speaker verification and user defined triggers. Working with partners, we lowered the draw on the battery to less than 1mA, and Sensory introduced hardware and software IP to enable ultra-low-power voice wakeup of TrulyHandsfree. All the while, our accuracy has remained the best in the industry for voice wakeup.

We believe the bigger, more capable companies trying to make voice triggers have been forced to use deep learning speech techniques to try and catch up with Sensory in the accuracy department. They have yet to catch up, but they have grown their products to a very usable accuracy level, through deep learning, but lost much of the advantages of small footprint and low power in the process.

Sensory has been architecting solutions for neural nets in consumer electronics since we opened the doors more than 20 years ago. With TrulyHandsfree 4.0 we are applying deep learning to improve accuracy even further, pushing the technology even more ahead of all other approaches, yet enabling an architecture that has the ability to remain small and ultra-low power. We are enabling new feature extraction approaches, as well as improved training in reverb and echo. The end result is a 60-80% boost in what was already considered industry-leading accuracy.

I can’t wait for TrulyHandsfree 5.0…we have been working on it in parallel with 4.0, and although it’s still a long ways off, I am confident we will make the same massive improvements in speaker verification with 5.0 that we are doing for speech recognition in 4.0! Once again further advancing the state of the art in embedded speech technologies!

Sensory Talks AI and Speech Recognition With Popular Science Radio Host Alan Taylor

June 11, 2015

Guest post by: Michael Farino

Pop Science Radio

 

 

 

 

 

 

 

Sensory’s CEO, Todd Mozer joined Alan Taylor, host of Popular Science Radio, in a fun discussion about artificial intelligence, Sensory’s involvement with the Jibo robot development team, and also gave the show’s listeners a look into the past 20 years of speech recognition. Todd and Alan additionally discussed some of the latest advancements in speech technology, and Todd provided an update on Sensory’s most recent achievements in the field of speech recognition as well as a brief look into what the future holds.

Listen to the full radio show at the link below:

Big Bang Theory, Science, and Robots | FULL EPISODE | Popular Science Radio #269
Ever wondered how accurate the science of the Big Bang Theory TV series is? Curious about how well speech recognition technology and robots are advancing? We interview two great minds to probe for these answers

OK, Amazon!

May 4, 2015

I was at the Mobile Voice Conference last week and was on a keynote panel with Adam Cheyer (Siri, Viv, etc.) and Phil Gray (Interactions) with Bill Meisel moderating. One of Bills questions was about the best speech products, and of course there was a lot of banter about Siri, Cortana, and Voice Actions (or GoogleNow as it’s often referred to). When it was my turn to chime in I spoke about Amazon’s Echo, and heaped lots of praise on it. I had done a bit of testing on it before the conference but I didn’t own one. I decided to buy one from Ebay since Amazon didn’t seem to ever get around to selling me one. It arrived yesterday.

Here are some miscellaneous thoughts:

  • Echo is a fantastic product! Not so much because of what it is today but for the platform it’s creating for tomorrow. I see it as every bit as revolutionary as Siri.
  • The naming is really confusing. You call it Alexa but the product is Echo. I suspect this isn’t the blunder that Google made (VoiceActions, GoogleNow, GoogleVoice, etc.), but more an indication that they are thinking of Echo as the product and Alexa as the personality, and that new products will ship with the same personality over time. This makes sense!
  • Setup was really nice and easy, the music content integration/access is awesome, the music quality could be a bit better but is useable; there’s lots of other stuff that normal reviewers will talk about…But I’m not a “normal” reviewer because I have been working with speech recognition consumer electronics for over 20 years, and my kids have grown up using voice products, so I’ll focus on speech…
  • My 11 year old son, Sam, is pretty used to me bringing home voice products, and is often enthusiastic (he insisted on taking my Vocca voice controlled light to keep in his room earlier this year). Sam watched me unpack it and immediately got the hang of it and used it to get stats on sports figures and play songs he likes. Sam wants one for his birthday! Amazon must have included some kids voice modeling in their data because it worked pretty well with his voice (unlike the Xbox when it first shipped, which I found particularly ironic since Xbox was targeting kids).
  • The Alexa trigger works VERY well. They have implemented beamforming and echo cancellation in a very state of the art implementation. The biggest issue is that it’s a very bandwidth intensive approach and is not low power. Green is in! That could be why its plug-in/AC only and not battery powered. Noise near the speaker definitely hurts performance as does distance, but it absolutely represents a new dimension in voice usability from a distance and unlike with the Xbox, you can move anywhere around it, and aren’t forced to be in a stationary position (thanks to their 7 mics, which surely must be overkill!)
  • The voice recognition in generally is good, but like all of the better engines today (Google, Siri, Cortana, and even Sensory’s TrulyNatural) it needs to get better. We did have a number of problems where Alexa got confused. Also, Alexa doesn’t appear to have memory of past events, which I expect will improve with upgrades. I tried playing the band Cake (a short word, making it more difficult) and it took about 4 attempts until it said “Would you like me to play Cake?” Then I made the mistake of trying “uh-huh” instead of “yes” and I had to start all over again!
  • My FAVORITE thing about the recognizer is that it does ignore things very nicely. It’s very hard to know when to respond and when not to. The Voice Assistants (Google, Siri, Cortana) seem to always defer to web searches and say things like “It’s too noisy” no matter what I do, and I thought Echo was good at deciding not to respond sometimes.

OK, Amazon… here’s my free advice (admittedly self-serving but nevertheless accurate):

  • You need to know who is talking and build models of their voices and remember who they are and what their preferences are. Sensory has the BEST embedded speaker identification/verification engine in the world, and it’s embedded so you don’t need to send a bunch of personal data into the cloud. Check out TrulySecure!
  • In fact, if you added a camera to Alexa, it too could be used for many vision features, including face authentication.
  • Make it battery powered and portable! To do this, you’d need an equally good embedded trigger technology that runs at low power – Check out TrulyHandsfree!
  • If it’s going to be portable, then it needs to work if even when not connected to the Internet. For this, you’d need an amazing large vocabulary embedded speech engine. Did I tell you about TrulyNatural?
  • Of course, the hope is that the product-line will quickly expand and as a result, you will then add various sensors, microphones, cameras, wheels, etc.; and at the same time, you will also want to develop lower cost versions that don’t have all the mics and expensive processing. You are first to market and that’s a big edge. A lot of companies are trying to follow you. You need to expand the product-line quickly, learning from Alexa. Too many big companies have NIH syndrome… don’t be like them! Look for partnering opportunities with 3rd parties who can help your products succeed – Like Sensory! ;-)

Mobile World Congress Day 1

March 3, 2015

It feels like I had a whole week’s worth of the trade show wrapped into one day! By the time mid week hits, I’ll surely be ready to head home! Here are some of the highlights from the first day of Mobile World Congress 2015:

  • First a word about Catalonia. That’s where Barcelona is…in the heart of Catalonia, a province of Spain. Don’t expect delayed meetings, inefficiencies, relaxed long lunches or anything like that. The Catalonians have the precision of Germans (to continue my gross stereotyping!), and my experience with one of the largest trade shows on the planet is that it’s going off without a hitch! I picked up my badge at the airport in a five-minute line that was well staffed and moved rapidly. I could just about walk into the show yesterday morning. The subways and trains though crowded and overheated ran extremely smoothly. Kudos to the show management for pulling off such a difficult feat!
  • I’d be remiss without mentioning the Galaxy S6. Samsung invited us to the launch and of course they continue to use Sensory in a relationship that has grown quite strong over the years.  Samsung continues to innovate with the Edge, and other products that everyone is talking about. It’s amazing how far Apple took the mantle in the first iPhone and how companies like Samsung and the Android system seem to now be leading the charge on innovation!
  • My favorite product that doesn’t feature Sensory technology that I bumped into was an electronic jump rope. They put sensors in the handles and a visual display shows across the field of the rope, kind of like those clocks that rapidly flash LED’s as the pendulum quickly moves back and forth in order to display the time. I talked with Alex Woo from Tangram and he said they were going to launch a crowdfunding campaign. I gave Alex a demo of our TrulyHandsfree with jump ropers jumping and all the show noise and of course it worked flawlessly. It would be really cool to be able to ask things like “How much time,” “How many jumps,” “What’s my heart rate,” or “How many calories burned” and so on, and the display would make voice control so much more functional!
  • We had a couple of partnership announcements here at the show, supporting both Qualcomm and Synopsys – both great partners to add to our support mix, and always nice when its customers driving our platform directions. The Qualcomm platform is interesting because it’s not their standard platform for 3rd parties to support. As far as I know they opened it up to Sensory and ONLY Sensory, and already we are seeing much interest!
  • Last night ZTE had a press party to indoctrinate Sensory and NXP into its Smart Voice Alliance. ZTE is really putting some forward thinking into the user experience and their research shows how much people want a voice interface but how dissatisfying the current state of the art actually is. Sensory’s hoping to change that! We’ll make one of our biggest announcements in history over the next month… and I’ll let you in on the secret (it’s on our website already!) We call it TrulyNatural, and it will be the highest accuracy large vocabulary embedded speech engine that the world has ever seen!

Hasta Luego!!!

Deep Listening in the Cloud

February 11, 2015

The advent of “always on” speech processing has raised concerns about organizations spying on us from the cloud.

4081596290_5ccb708d7d_mIn this Money/CNN article, Samsung is quoted as saying, “Samsung does not retain voice data or sell it to third parties.” But, does this also mean that your voice data isn’t being saved at all? Not necessarily. In a separate article, the speech recognition system in Samsung’s TVs is shown to be an always-learning cloud-based system solution from Nuance. I would guess that there is voice data being saved, and that Nuance is doing it.

This doesn’t mean Nuance is doing anything evil; this is just the way that machine learning works. There has been this big movement towards “deep” learning, and what “deep” really means is more sophisticated learning algorithms that require more data to work. In the case of speech recognition, the data needed is speech data, or speech features data that can be used to train and adapt the deep nets.

But just because there is a necessary use for capturing voice data and invading privacy, doesn’t mean that companies should do it. This isn’t just a cloud-based voice recognition software issue; it’s an issue with everyone doing cloud based deep learning. We all know that Google’s goal in life is to collect data on everything so Google can better assist you in spending money on the right things. We in fact sign away our privacy to get these free services!

I admit guilt too. When Sensory first achieved usable results for always-on voice triggers, the basis of our TrulyHandsfree technology, I applied for a patent on a “background recognition system” that listens to what you are talking about in private and puts together different things spoken at different times to figure out what you want…. without you directly asking for it.

Can speech recognition be done without having to send all this private data to the cloud? Sure it can! There’s two parts in today’s recognition systems: 1) The wake up phrase; 2) The cloud based deep net recognizer – AND NOW THEY CAN BOTH BE DONE ON DEVICE!

Sensory pioneered the low-power wake up phrase on device (item 1), now we have a big team working on making an EMBEDDED deep learning speech recognition system so that no personal data needs to be sent to the cloud. We call this approach TrulyNatural, and it’s going to hit the market very soon! We have benchmarked TrulyNatural against state-of-the-art cloud-based deep learning systems and have matched and in some cases bested the performance!

Way to go Moto Voice

September 5, 2014

I was very excited to hear Motorola’s announcements today about the new Moto X, MotoG, Moto Hint and Moto 360.

What particularly caught my ear was the statement that they were changing the name from Touchless Control to Moto Voice.  They made this decision because so many people thought the technology came from Google in the form of Android, and Moto wanted everyone to know it DIDN’T come from Google.

Actually…It came from Sensory.  At least we were an important part of it!!! We have been working on the cool new user defined triggers and are excited that Moto has adopted them for the flagship MotoX (Write-up).

This feature was announced in our TrulyHandsfree 3.0

The new Moto Hint headset is really cool too. It’s a bit like Intel’s Jarvis headset that was announced by Intel CEO Brian Krzanich at CES (and of course uses Sensory!).

Of course the Moto360 is AWESOME, and has some pretty cool voice control features. Yes, Sensory has done an “OK Google” trigger…we even benchmarked our trigger against Google’s…I might share the results in an upcoming blog if there is interest.

« Older Entries