Archive for the ‘truly hands-free’ Category
June 11, 2019
I used to blog a lot about wake words and voice triggers. Sensory pioneered this technology for voice assistants, and we evangelized the importance of not hitting buttons to speak to a voice recognizer. Then everybody caught on and the technology went into main stream use (think Alexa, OK Google, Hey Siri, etc.), and I stopped blogging about it. But I want to reopen the conversation…partly to talk about how important a GREAT wake word is to the consumer experience, and partly to congratulate my team on a recent comparison test that shows how Sensory continues to have the most accurate embedded wake word solutions.
Competitive Test Results. The comparison test was done by Vocalize.ai. Vocalize is an independent test house for voice enabled products. For a while, Sensory would contract out to them for independent testing of our latest technology updates. We have always tested in-house but found that our in-house simulations didn’t always sync up with our customers’ experience. Working with Vocalize allowed us to move from our in-house simulations to more real-world product testing. We liked Vocalize so much that we acquired them. So, now we “contract in” to them but keep their data and testing methodology and reporting uninfluenced by Sensory.
Vocalize compared two Sensory TrulyHandsfree wake word models (1MB size and 250KB size) with two external wake words (Amazon and Kitt.ai’s Snowboy), all using “Alexa” as the trigger. The results are replicable and show that Sensory’s TrulyHandsfree remains the superior solution on the market. TrulyHandsfree was better/lower on BOTH false accepting AND false rejecting. And in many cases our technology was better by a longshot! If you would like see the full report and more details on the evaluation methods, please send an email request to either Vocalize (firstname.lastname@example.org) or Sensory (email@example.com).
It’s Not Easy. There are over 20 companies today that offer on-device wake words. Probably half of these have no experience in a commercially shipping product and they never will; there are a lot of companies that just won’t be taken seriously. The other half can talk a good talk, and in the right environment they can even give a working demo. But this technology is complex, and really easy to do badly and really hard to do great. Some demos are carefully planned with the right noise in the right environment with the right person talking. Sensory has been focused on low power embedded speech for 25 years, we have 65 of the brightest minds working on the toughest challenges in embedded AI. There’s a reason that companies like Amazon, Google, Microsoft and Samsung have turned to Sensory for our TrulyHandsfree technology. Our stuff works, and they understand how difficult it is to make this kind of technology work on-device! We are happy to provide APK’s so you can do you’re your own testing and judge for yourself! OK, enough of the sales pitch…some interesting stuff lays ahead…
It’s Really Important. Getting a wake word to work well is more important than most people realize. It’s like the front door to your house. It might be a small part of your house, but if you aren’t letting the homeowners in then that’s horrible, and if you are letting strangers in by accident that’s even worse. The name a company gives their wake word is usually the company brand name, imagine the sentiment that comes off when I say a brand name and it doesn’t work. Recently I was at a tradeshow that had a Mercedes booth. There were big signs that said “Hey Mercedes”…I walked up to the demo area and I said “Hey Mercedes” but nothing happened…the woman working there informed me that they couldn’t demo it on the show floor because it was really too noisy. I quickly pulled out my mobile phone and showed her that I could use dozens of wake words and command sets without an error in that same environment. Mercedes has spent over 100 years building up one of the best quality brand reputations in the car industry. I wonder what will happen to that reputation if their wake word doesn’t respond in noise? Even worse is when devices accidentally go off. If you have family members that listen to music above volume 7 then you already know the shock that a false alarm causes!
It’s about Privacy. Amazon, like Google and a few others seem to have pretty good wake words, but if you go into your Alexa settings you can see all of the voice data that’s been collected, and a lot of it is being collected when you weren’t intentionally talking to Alexa! You can see this performance issue in the Vocalize test report. Sensory substantially outperformed Amazon in the false reject area. This is when a person tries to speak to Alexa and she doesn’t respond. The difference is most apparent in babble noise where Sensory falsely rejected 3% and Amazon falsely rejected 10% in comparable sized models (250KB). However the False Accept difference is nothing short of AMAZING. Amazon false accepted 13 times in 24 hours of random noise. In this same time period Sensory false accepted ZERO times (on comparably sized 250KB models). How is this possible you may be wondering? Amazon “fixes” its mistakes in the cloud. Even though the device falsely accepts quite frequently, their (larger and more sophisticated) models in the cloud collect the error. Was that a Freudian slip? They correct the error…AND they COLLECT the error. In effect, they are disregarding privacy to save device cost and collect more data.
As the voice revolution continues to grow, you can bet that privacy will continue to be a hot topic. What you now understand is that wake word quality has a direct impact on both the user experience and PRIVACY! While most developers and product engineers in the CE industry are aware of wake words and the difficulty in making them work well on-device, they don’t often consider that competing wake words technologies aren’t created equally – the test results from Vocalize prove it! Sensory is more accurate AND allows more privacy!
June 8, 2017
Since the beginning, Sensory has been a pioneer in advancing AI technologies for consumer electronics. Not only did Sensory implement the first commercially successful speech recognition chip, but we also were first to bring biometrics to low cost chips, and speech recognition to Bluetooth devices. Perhaps what I am most proud of though, more than a decade ago Sensory introduced its TrulyHandsfree technology and showed the world that wakeup words could really work in real devices, getting around the false accept and false reject, and power consumption issues that had plagued the industry. No longer did speech recognition devices require button presses…and it caught on quickly!
Let me go on boasting because I think Sensory has a few more claims to fame… Do you think Apple developed the first “Hey Siri” wake word? Did Google develop the first “OK Google” wake word? What about “Hey Cortana”? I believe Sensory developed these initial wake words, some as demos and some shipped in real products (like the Motorola MotoX smartphone and certain glasses). Even third-party Alexa and Cortana products today are running Sensory technology to wake up the Alexa cloud service.
Sensory’s roots are in neural nets and machine learning. I know everyone does that today, but it was quite out of favor when Sensory used machine learning to create a neural net speech recognition system in the 1990’s and 2000’s. Today everyone and their brother is doing deep learning (yeah that’s tongue in cheek because my brother is doing it too! (http://www.cs.colorado.edu/~mozer/index.php). And a lot of these deep learning companies are huge multi-billion-dollar business or extremely well-funded startups.
So, can Sensory stay ahead now and continuing pioneering innovation in AI now that everyone is using machine learning and doing AI? Of course, the answer is yes!
Sensory is now doing computer vision with convolutional neural nets. We are coming out with deep learning noise models to improve speech recognition performance and accuracy, and are working on small TTS systems using deep learning approaches that help them sound lifelike. And of course, we have efforts in biometrics and natural language that also use deep learning.
We are starting to combine a lot of technologies together to show that embedded systems can be quite powerful. And because we have been around longer and thought through most of these implementations years before others, we have a nice portfolio of over 3 dozen patents covering these embedded AI implementations. Hand in hand with Sensory’s improvements in AI software, companies like ARM, NVidia, Intel, Qualcomm and others are investing and improving upon neural net chips that can perform parallel processing for specialized AI functions, so the world will continue seeing better and better AI offerings on “the edge”.
Curious about the kind of on-device AI we can create when combining a bunch of our technologies together? So were we! That’s why we created this demo that showcases Sensory’s natural language speech recognition, chatbots, text-to-speech, avatar lip-sync and animation technologies. It’s our goal to integrate biometrics and computer vision into this demo in the months ahead:
Let me know what you think of that! If you are a potential customer and we sign an NDA, we would be happy to send you an APK of this demo so you can try it yourself! For more information about this exciting demo, please check out the formal announcement we made: http://www.prnewswire.com/news-releases/sensory-brings-chatbot-and-avatar-technology-to-consumer-devices-and-apps-300470592.html
February 1, 2017
The hands-free personal assistant that you can wake on voice and talk to naturally has significantly gained popularity the last couple of years. This kind of technology made its debut not all that long ago as a feature of Motorola’s MotoX, a smartphone that had always-listening Moto Voice technology powered by Sensory’s TrulyHandsfree technology. Since then, the always-listening digital assistant quickly spread across mobile phones and PCs from several different brands, making phrases like, “Hey Siri,” “Okay Google,” and, “Hey Cortana,” commonplace.
Then, out of nowhere, Amazon successfully tried its hand at the personal assistant with the Echo, sporting a true natural language voice interface and Alexa cloud-based AI. It was initially marketed for music, but quickly expanded domain coverage to include weather, Q&A, recipes, and the ability to answer common questions. On top of that, Amazon also opened its platform up to third-party developers, allowing them to proliferate the skill sets available on the Alexa platform, with now more than 10,000 skills accessible to users. These skills allow Amazon’s Echo, Tap, and Dot, as well as the several new third-party Alexa-equipped products like Nucleus and Triby, to be used to access and control various IoT functions, from reading heart rates on Fitbits to ordering pizzas and controlling lights within the home.
Until recently, always-listening, hands-free assistants required a certain minimum power capability, restricting form factors to table top speakers or appliance devices that had to either be plugged in to an outlet or have a large battery. Also, Amazon’s Echo, Tap, and Dot all required a Wi-Fi connection for communicating with the Alexa AI engine to make use of its available skills. Unfortunately, this meant you were restricted to using Alexa within your home or Wif-Fi network. If you wanted to go on a run, the only way to ask Alexa for your step count or heart rate was to wait until you got back home.
This is changing now with technology like Sensory’s VoiceGenie, an always-listening embedded speech recognizer for wearables and hearables that runs in a low power mode on a Qualcomm/CSR Bluetooth chip. The solution takes a session border controller (SBC) music decoder and intertwines it with a speech recognition system so that while music is playing and the decoder is in-use, VoiceGenie is on and actively listening, allowing the Bluetooth device to listen for two keywords:
To give an example of how this works, a Bluetooth headset’s volume, pairing process, battery strength, or connection status can only be controlled or monitored through the device itself, so VoiceGenie handles those controls with no touching required. VoiceGenie can also read the incoming caller’s name and ask the user if they want to answer or ignore. Additionally, VoiceGenie can call up the phone’s assistant like Google Assistant, Siri, or Cortana, to ask by voice for a call to be made or a song to be played. By saying, “Alexa,” the user can access the Alexa service directly from their Bluetooth headsets while out and about, using their smartphone as the connection to the Alexa cloud.
Today’s consumer wants a personalized assistant that knows them, is convenient to use, keeps their secrets safe, and helps them in their daily lives. This help can be accessing information, getting answers to questions or intelligently controlling your home environment. It’s very difficult to accomplish this for privacy and power reasons solely using cloud-based AI technology. There needs to be embedded intelligence on devices, and it needs to run at low power. A low-power embedded voice assistant that adds an intelligent voice interface to portable and wearable devices, while also adding Alexa functionality to them, can address those needs.
January 5, 2017
Virtual handsfree assistants that you can talk to and that talk back have rapidly gained popularity. First, they arrived in mobile phones with Motorola’s MotoX that had an ‘always listening’ Moto Voice powered by Sensory’s TrulyHandsfree technology. The approach quickly spread across mobile phones and PCs to include Hey Siri, OK Google, and Hey Cortana.
Then Amazon took things to a whole new level with the Echo using Alexa. A true voice interface emerged, initially for music but quickly expanding domain coverage to include weather, Q&A, recipes, and the most common queries. On top of that, Amazon took a unique approach by enabling 3rd parties to develop “skills” that now number over 6000! These skills allow Amazon’s Echo line (with Tap, Dot) and 3rd Party Alexa equipped products (like Nucleus and Triby) to be used to control various functions, from reading heartrates on Fitbits to ordering Pizzas and controlling lights.
Until recently, handsfree assistants required a certain minimum power capability to really be always on and listening. Additionally, the hearable market segment including fitness headsets, hearing aids, stereo headsets and other Bluetooth devices needed to use touch control because of their power limitations. Also, Amazons Alexa had required WIFI communications so you could sit on your couch talking to your Echo and query Fitbit information, but you couldn’t go out on a run and ask Alexa what your heartrate was.
All this is changing now with Sensory’s VoiceGenie!
The VoiceGenie runs an embedded recognizer in a low power mode. Initially this is on a Qualcomm/CSR Bluetooth chip, but could be expanded to other platforms. Sensory has taken an SBC music decoder and intertwined a speech recognition system, so that the Bluetooth device can recognize speech while music is playing.
The VoiceGenie is on and listening for 2 keywords:
For example, a Bluetooth headset’s volume, pairing, battery strength, or connection status can only be controlled by the device itself, so VoiceGenie handles those controls without touching required. VoiceGenie can also read incoming callers’ names and ask the user if they want to answer or ignore. VoiceGenie can call up the phone’s assistant, like Google Assistant or Siri or Cortana, to ask by voice for a call to be made or a song to be played.
Some of the important facts behind the new VoiceGenie include:
This third point is perhaps the least understood, yet the most important. People want a personalized assistant that knows them, keeps their secrets safe, and helps them in their daily lives. This help can be accessing information or controlling your environment. It’s very difficult to accomplish this for privacy and power reasons in a cloud powered environment. There needs to be embedded intelligence. It needs to be low power. VoiceGenie is that low powered voice assistant.
October 14, 2016
I watched Sundar and Rick and the team at Google announce all the great new products from Google. I’ve read a few reviews and comparisons with Alexa/Assistant and Echo/Home, but it struck me that there’s quite an overlap in the reports I’m reading and some of the more interesting things aren’t being discussed. Here are a few of them, roughly in increasing order of importance:
August 22, 2016
Sensory is proud to announce that it has been awarded with two 2016 Speech Tech Magazine Awards. With some stiff competition in the speech industry, Sensory continues to excel in offering the industry’s most advanced embedded speech recognition and speech-based security solutions for today’s voice-enabled consumer electronics movement.
The 2016 Speech Technology Awards include:
Speech Luminary Award – Awarded to Sensory’s CEO, Todd Mozer
“What really impresses me about Todd is his long commitment to speech technology, and specifically, his focus on embedded and small-footprint speech recognition,” says Deborah Dahl, principal at Conversational Technologies and chair of the World Wide Web Consortium’s Multimodal Interactions Working Group. “He focuses on what he does best and excels at that.”
Star Performers Award – Awarded to Sensory for its contributions in enabling voice-enabled IoT products via embedded technologies
“Sensory has always been in the forefront of embedded speech recognition, with its TrulyHandsfree product, a fast, accurate, and small-footprint speech recognition system. Its newer product, TrulyNatural, is ground- breaking because it supports large vocabulary speech recognition and natural language understanding on embedded devices, removing the dependence on the cloud,” said Deborah Dahl, principal at Conversational Technologies and chair of the World Wide Web Consortium’s Multimodal Interactions Working Group. “While cloud-based recognition is the right solution for many applications, if the application must work regardless of connectivity, embedded technology is required. The availability of TrulyNatural embedded natural language understanding should make many new types of applications possible.”
– Guest Blog by Michael Farino
June 22, 2016
I’ve written a series of blogs about consumer devices with speech recognition, like Amazon Echo. I mentioned that everyone is getting into the “always listening” game (Alexa, OK Google, Hey Siri, Hi Galaxy, Assistant, Hey Cortana, OK Hound, etc.), and I’ve explained that privacy concerns attempt to be addressed by putting the “always listening” mode on the device, rather than in the cloud.
Let’s now look deeper into the “always listening” approaches and compare some of the different methods and platforms available for embedded triggers.
There are a few basic approaches for running embedded voice wakeup triggers:
First, is running on an embedded DSP, microprocessor, and/or smart microphones. I like to think of this as a “deeply embedded: approach as opposed to running embedded on the operating system (OS). Knowles recently announced a design with a smart mike that provides low-power wake up assistance.
Many leading chip companies have small DSPs that are enabled for “wake up word” detection. These vendors include Audience, Avnera, Cirrus Logic, Conexant, DSPG, Fortemedia, Intel, InvenSense, NXP, Qualcomm, QuickLogic, Realtek, STMicroelectronics, TI, and Yamaha. Many of these companies combine noise suppression or acoustic echo cancellation to make these chips add value beyond speech recognition. Quicklogic recently announced availability of an “always listening” sensor fusion hub, the EOS S3, which lets the sensor listen while consuming very little power.
Next is DSP IP availability. The concept of low-power voice wakeup has gotten so popular amongst processor vendors that the leading DSP/MCU IP cores from ARM, Cadence, CEVA, NXP CoolFlux, Synopsys, and Verisilicon all offer this capability, and some even offer special versions targeting this function.
Running on an embedded OS is another option. Bigger systems like Android, Windows, or Linux can also run voice wake-up triggers. The bigger systems might not be so applicable for battery-operated devices, but they offer the advantage of being able to implement larger and more powerful voice models that can improve accuracy. The DSPs and MCUs might run a 50-kbyte trigger at 1 mA, while bigger systems can cut error rates in half by increasing models to hundreds of megabytes and power consumption to hundreds of milliamps. Apple used this approach in its initial implementation of Siri, thus explaining why the iPhone needed to be plugged in to be “always listening.”
Finally, one can try combinations and multi-level approaches. Some companies are implementing low-power wake-up engines that look to a more powerful system when woken up to confirm its accuracy. This can be done on the device itself or in the cloud. This approach works well for more complex uses of speech technology like speaker verification or identification, where the DSPs are often crippled in performance and a larger system can implement a more state of the art approach. It’s basically getting the accuracy of bigger models and systems, while lowering power consumption by running a less accurate and smaller wakeup system first.
A variant of this approach is accomplished with a low-power speech detection block acting as an always listening front-end, that then wakes up the deeply embedded recognition. Some companies have erred by using traditional speech-detection blocks that work fine for starting a recording of a sentence (like an answering machine), but fail when the job is to recognize a single word, where losing 100 ms can have a huge effect on accuracy. Sensory has developed a very low power hardware sound-detection block that runs on systems like the Knowles mike and Quicklogic sensor hub.
May 6, 2016
Rich Nass and Barbara Quinlan from Open Systems Media visited Sensory on their “IoT Roadshow”.
IoT is a very interesting area. About 10 years ago we saw voice controlled IoT on the way, and we started calling the market SCIDs – Speech Controlled Internet Devices. I like IoT better, it’s certainly a more popular name for the segment! ;-)
I started our meeting off by talking about Sensory’s three products – TrulyHandsfree Voice Control, TrulySecure Authentication, and TrulyNatural large vocabulary embedded speech recognition.
Although TrulyHandsfree is best known for its “always on” capabilities, ideal for listening for key phrases (like OK Google, Hey Cortana, and Alexa), it can be used a ton of other ways. One of them is for hands-free photo taking, so no selfie stick is required. To demonstrate, I put my camera on the table and took pictures of Barbara and Rich. (Normally I might have joined the pictures, but their healthy hair, naturally good looks, and formal attire was too outclassing for my participation).
There’s a lot of hype about IoT and Wearables and I’m a big believer in both. That said, I think Amazon’s Echo is the perfect example of a revolutionary product that showcases the use of speech recognition in the IoT space and am looking forward to some innovative uses of speech in Wearables!
Here’s the article they wrote on their visit to Sensory and an impromptu video showing TrulyNatural performing on-device navigation, as well as a demo of TrulySecure via our AppLock Face/Voice Recognition app.
Rich Nass, Embedded Computing Brand Director
If you’re an IoT device that requires hands-free operation, check out Sensory, just like I did while I was OpenSystems Media’s IoT Roadshow. Sensory’s technology worked flawlessly running through the demo, as you can see in the video. We ran through two different products, one for input and one for security.
October 1, 2015
Todd Mozer’s interview with Martin Wasserman on FutureTalk
August 26, 2015
Guest post: Sensory’s Marketing Team
The editors of Inc. identified Sensory as one of America’s fastest growing companies. The annual ranking of the 5,000 fastest-growing private companies in the United States put Sensory at 3,301 on the list with over 100% growth over three years and 30 new jobs added.
Sensory has a breadth of software products on the market contributing to its growth including TrulyHandsfree, TrulySecure and TrulyNatural, and can be found in over a billion consumer electronics devices around the world.
Congratulations to the Sensory team for making the Inc 5000 list this year!