Archive for the ‘Uncategorized’ Category
August 21, 2019
At a recent meeting Sensory was credited for “inventing the wake word”. I explained that Sensory certainly helped to evangelize and popularize it, but we didn’t “invent” it. What we really did was substantially improve upon the state of the art so that it became useable. And it was a VERY hard challenge since we did it in an era before deep learning allowed us to further improve the performance.
Today Sensory is taking on the challenge of sound and scene identification. There are many dozens of companies working on this challenge…and it’s another HUGE challenge. There are some similarities with wake words and dealing with speech but a lot of differences too! I’m writing this to provide an update on our progress, to share some of our techniques, compare a bit with wake words and speech, and to bring more clear metrics to the table to look at accuracy!
Sensory announced our initial SoundID solution at CES 2019 here.
Since then we have been working on accuracy improvements and adding gunshot identification into the mix of our sounds (CO2 and smoke alarms, glass break, baby cry, snoring, door knock/bell, scream/yell, etc.) to be identified.
Sensory is very proud of our progress in sound identification. We welcome and encourage others to share their accuracy reporting…I couldn’t find much online to determine “state of the art”.
Now we will begin work on scene analysis…and I expect Sensory to lead in this development as well!
June 11, 2019
I used to blog a lot about wake words and voice triggers. Sensory pioneered this technology for voice assistants, and we evangelized the importance of not hitting buttons to speak to a voice recognizer. Then everybody caught on and the technology went into main stream use (think Alexa, OK Google, Hey Siri, etc.), and I stopped blogging about it. But I want to reopen the conversation…partly to talk about how important a GREAT wake word is to the consumer experience, and partly to congratulate my team on a recent comparison test that shows how Sensory continues to have the most accurate embedded wake word solutions.
Competitive Test Results. The comparison test was done by Vocalize.ai. Vocalize is an independent test house for voice enabled products. For a while, Sensory would contract out to them for independent testing of our latest technology updates. We have always tested in-house but found that our in-house simulations didn’t always sync up with our customers’ experience. Working with Vocalize allowed us to move from our in-house simulations to more real-world product testing. We liked Vocalize so much that we acquired them. So, now we “contract in” to them but keep their data and testing methodology and reporting uninfluenced by Sensory.
Vocalize compared two Sensory TrulyHandsfree wake word models (1MB size and 250KB size) with two external wake words (Amazon and Kitt.ai’s Snowboy), all using “Alexa” as the trigger. The results are replicable and show that Sensory’s TrulyHandsfree remains the superior solution on the market. TrulyHandsfree was better/lower on BOTH false accepting AND false rejecting. And in many cases our technology was better by a longshot! If you would like see the full report and more details on the evaluation methods, please send an email request to either Vocalize (email@example.com) or Sensory (firstname.lastname@example.org).
It’s Not Easy. There are over 20 companies today that offer on-device wake words. Probably half of these have no experience in a commercially shipping product and they never will; there are a lot of companies that just won’t be taken seriously. The other half can talk a good talk, and in the right environment they can even give a working demo. But this technology is complex, and really easy to do badly and really hard to do great. Some demos are carefully planned with the right noise in the right environment with the right person talking. Sensory has been focused on low power embedded speech for 25 years, we have 65 of the brightest minds working on the toughest challenges in embedded AI. There’s a reason that companies like Amazon, Google, Microsoft and Samsung have turned to Sensory for our TrulyHandsfree technology. Our stuff works, and they understand how difficult it is to make this kind of technology work on-device! We are happy to provide APK’s so you can do you’re your own testing and judge for yourself! OK, enough of the sales pitch…some interesting stuff lays ahead…
It’s Really Important. Getting a wake word to work well is more important than most people realize. It’s like the front door to your house. It might be a small part of your house, but if you aren’t letting the homeowners in then that’s horrible, and if you are letting strangers in by accident that’s even worse. The name a company gives their wake word is usually the company brand name, imagine the sentiment that comes off when I say a brand name and it doesn’t work. Recently I was at a tradeshow that had a Mercedes booth. There were big signs that said “Hey Mercedes”…I walked up to the demo area and I said “Hey Mercedes” but nothing happened…the woman working there informed me that they couldn’t demo it on the show floor because it was really too noisy. I quickly pulled out my mobile phone and showed her that I could use dozens of wake words and command sets without an error in that same environment. Mercedes has spent over 100 years building up one of the best quality brand reputations in the car industry. I wonder what will happen to that reputation if their wake word doesn’t respond in noise? Even worse is when devices accidentally go off. If you have family members that listen to music above volume 7 then you already know the shock that a false alarm causes!
It’s about Privacy. Amazon, like Google and a few others seem to have pretty good wake words, but if you go into your Alexa settings you can see all of the voice data that’s been collected, and a lot of it is being collected when you weren’t intentionally talking to Alexa! You can see this performance issue in the Vocalize test report. Sensory substantially outperformed Amazon in the false reject area. This is when a person tries to speak to Alexa and she doesn’t respond. The difference is most apparent in babble noise where Sensory falsely rejected 3% and Amazon falsely rejected 10% in comparable sized models (250KB). However the False Accept difference is nothing short of AMAZING. Amazon false accepted 13 times in 24 hours of random noise. In this same time period Sensory false accepted ZERO times (on comparably sized 250KB models). How is this possible you may be wondering? Amazon “fixes” its mistakes in the cloud. Even though the device falsely accepts quite frequently, their (larger and more sophisticated) models in the cloud collect the error. Was that a Freudian slip? They correct the error…AND they COLLECT the error. In effect, they are disregarding privacy to save device cost and collect more data.
As the voice revolution continues to grow, you can bet that privacy will continue to be a hot topic. What you now understand is that wake word quality has a direct impact on both the user experience and PRIVACY! While most developers and product engineers in the CE industry are aware of wake words and the difficulty in making them work well on-device, they don’t often consider that competing wake words technologies aren’t created equally – the test results from Vocalize prove it! Sensory is more accurate AND allows more privacy!
Sensory Releases Embedded AI That Allows Voice Assistants and IoT Devices to Identify a Wide Variety of Home Sounds
January 7, 2019
TrulySecure Sound ID identifies sounds within the home and can send messages to homeowners
Santa Clara, Calif., January 7, 2019 – Sensory, a Silicon Valley-based company focused on improving the user experience and security of consumer electronics through state-of-the-art embedded AI technologies, today announces TrulySecure™ Sound ID, a major breakthrough for cloud-free always listening AI that gives devices the ability to identify a variety of common and critical sounds within the home, and intelligently interpret if action needs to be taken, without the security risk of sending audio recordings to the cloud for processing.
Backed by Sensory’s 25 years of experience developing practical applications for neural networks and deep learning, TrulySecure Sound ID is capable of recognizing a variety of environmental sounds, including glass breaking, babies crying, dogs barking, home security alarms, smoke/CO alarms and low battery warnings, doorbells, knocking, snoring and more. Consumer devices could instantly warn users when these sounds occur and send the owner sound clips to better understand the situation.
“With so many voice-controlled products entering our homes, we saw an excellent opportunity to enable the microphones in these devices to do more than just listen for wake words and recognize speech,” said Todd Mozer, CEO of Sensory. “With TrulySecure Sound ID, we’re making it possible for companies to create smart always-listening products that can alert us of things happening within our homes that we may not hear, and safeguard our home and family from potentially dangerous situations.”
Sharing the same distinguished qualities that make Sensory’s ‘AI at the Edge’ technologies industryleading, TrulySecure Sound ID boasts performance and privacy by performing all analyses on device. By training each specific sound profile on thousands of real-world samples, similar to how Sensory trains biometric recognition, TrulySecure Sound ID takes several discriminate factors into account when processing sounds. Sound ID learns each sound type utilizing an approach that combines deep learning model training with proprietary discriminant learning techniques, the two approaches combined produce a superior solution than either approach could achieve in isolation.
Sound ID is available today as a component of Sensory’s TrulySecure Speaker Verification (TSSV) suite of
All of the technologies in TSSV pose no privacy or security concerns because they run completely on device and never touch the cloud. TSSV supports all major operating systems and is hardware agnostic, offering nearly limitless implementation flexibility. Additionally, Sensory can customize TSSV to match the exact needs of its customers, enabling only the sound profiles required for specific use cases.
TrulySecure is a trademark of Sensory Inc.
April 3, 2018
Santa Clara, Calif., April 3, 2018 – Sensory’s TrulyHandsfree speech recognition has been re-engineered to run ultra-low-power on Android and iOS mobile applications without special hardware
Sensory, a Silicon Valley-based company focused on improving the user experience and security of consumer electronics through state-of-the-art embedded AI technologies, today announced that it has made a significant breakthrough in running its TrulyHandsfree™ wake word and speech recognition AI engine directly on Android and iOS smartphone applications at low-power. As a software component, TrulyHandsfree can be adapted to any app without requiring special purpose hardware or DSPs to capture efficiencies in computing.
Introduced in 2009, TrulyHandsfree paved the way for the hands-free operation we have come to expect with today’s always-listening personal assistant solutions. When released it revolutionized voice user interfaces by offering the first commercially successful always-listening low power wake word. With each succeeding generation, TrulyHandsfree has continually upped the benchmark for always-listening speech recognition performance, by increasing accuracy, lowering power consumption, and running across an increasing number of hardware platforms at ultra-low-power consumption.
TrulyHandsfree has seen large commercial success by running on special purpose hardware for low-power operation. Companies like Avnera, Cirrus Logic, Conexant/Synaptics, CSR/Qualcomm, DSP Group, Knowles, QuickLogic, Realtek, XMOS and many others have penetrated the market for voice assistants using Sensory TrulyHandsfree technology. This specialized hardware approach has worked well for Sensory’s customers like Samsung, Huawei, LG, Motorola and other Android mobile providers who design their own phones and wearables with their choice of hardware.
Until now, always-listening wake word solutions for apps required too much power to be practical, especially for apps that remain open and active in the background. Additionally, having to maintain the same user experience across operating systems, and across all different devices added an extra layer of complexity. However, this isn’t the case anymore. TrulyHandsfree streamlines the implementation and coding process, allowing developers to quickly and easily deploy apps with power-efficient always-listening wake word and command set capabilities across all popular mobile and PC operating systems.
In 2017 Sensory embarked on investigations of using Qualcomm and ARM as more standard cross-platform solutions to figure out how to lower power consumption for wake words used across mobile platforms. Sensory came up with a series of independent actions that when combined could lower power consumption on a mobile app using a wake word by more than 80%, or a reduction of approximately 200mAh in a 12-hour day. That enables a mobile app wake word to consume approximately one-percent of the smartphone battery in 12 hours. To achieve this outstanding reduction in power consumption, Sensory utilized an approach known as “little-big,” which uses a very small model to identify an interesting event and then revalidates the event on a large model (both events are processed on the Application Processor). This method provides the optimal user experience of the big model only when needed, while maintaining the power consumption of the little model most of the time. Frame stacking approaches further cut certain wake word model processing functions’ MIPS in half with negligible accuracy impact. Additionally, multithreading has been deployed to allow more efficient processing of speech recognition and can significantly improve the speed of execution for larger wake word models.
“Hands-free operation for voice control has become the norm, and application developers are now looking to create hands-free wake words for their own apps,” said Todd Mozer, CEO of Sensory. “For example, we recently helped Google’s Waze accept hands-free voice commands by supplying them with Sensory’s ‘OK Waze’ wake word that runs when the app is open. With previous versions of TrulyHandsfree, having our always-on wake word engine listening for the OK Waze wake word during a short trip would have had minimal effect on a smartphone’s battery, but for longer trips a more efficient system was desired – so we created it. Sensory is excited to now offer TrulyHandsfree with excellent low-power performance to all app developers!”
TrulyHandsfree is the most widely deployed embedded speech recognition engine in the world, having enabled a hands-free voice user experience on more than two billion devices from leading brands worldwide. TrulyHandsfree offers support for every voice UI application with several types of wake word options, such as independent fixed wake words, user enrolled fixed wake words, and user defined wake words. Sensory offers off-the-shelf wake word models for all major Assistant services, including Alexa, Hey Siri, OK Google, Hey Cortana, as well as wake word models for third-party devices that support cloud AI systems from Baidu, Alibaba and Tencent. Sensory can also combine multiple wake words into one solution and is the only supplier to have deployed numerous cross-assistant wake word solutions to the market.
Sensory’s TrulyHandsfree currently supports US English, UK English, Australian English, Indian English, Arabic, Dutch, French (EU and Canadian), German, Italian, Japanese, Korean, Mandarin, Portuguese (EU and Brazil), Russian, Spanish (EU, Latin America and US), Swedish and Turkish. An SDK for TrulyHandsfree is available for Android, iOS, Linux, Mac OS, QNX and Windows. Sensory provides developer support for cloud service interfaces on Android, iOS, Linux, Mac OS, Windows as well as support for dozens of proprietary DSPs, microcontrollers, smart microphones and other low-power embedded devices. SDK updates taking advantage of lower power TrulyHandsfree are now being rolled out for Android and iOS in Q2 2018.
TrulyHandsfree is a trademark of Sensory Inc.
August 30, 2017
A few days ago I wrote a blog that talked about assistants and wake words and I said:
“We’ll start seeing products that combine multiple assistants into one product. This could create some strange and interesting bedfellows.”
Interesting that this was just announced:
Here’s another prediction for you…
All assistants will start knowing who is talking to them. They will hear your voice and look at your face and know who you are. They will bring you the things you want (e.g. play my favorite songs), and only allow you to conduct transaction you are qualified for (e.g. order more black licorice). Today there is some training required but in the near future they will just learn who is who much like a new born quickly learns the family members without any formal training.
February 10, 2017
The wonders of deep learning are well utilized in the area of artificial intelligence, aka AI. Massive amounts of training data can be processed on very powerful platforms to create wonderful generalized models, which can be extremely accurate. But this in and of itself is not yet optimal, and there’s a movement afoot to move the intelligence and part of the learning onto the embedded platforms.
Certainly, the cloud offers the most power and data storage, allowing the most immense and powerful of systems. However, when it comes to agility, responsiveness, privacy, and personalization, the cloud looks less attractive. This is where edge computing and shallow learning through adaptation can become extremely effective. “Little” data can have a big impact on a particular individual. Think how accurately and how little data is required for a child to learn to recognize its mother.
A good example of specialized learning is when it comes to accents or speech impediments. Generalized acoustic models often don’t handle this well, resulting in customized models for different markets and accents. However, this customization is difficult to manage, can add to the cost of goods, and may negatively impact the user experience. Yet, this still results in a model generalized for a specific class of people or accents. An alternative approach could begin with a general model built with cloud resources, with the ability to adapt on the device to the distinct voices of the people that use it.
The challenge with embedded deep learning occurs in its limited resources and the need to deal with on-device data collection, which by its nature, will be less plentiful, unlabeled, yet more targeted. New approaches are being implemented such as teacher/student models where smaller models can be built from a wider body of data, essentially turning big powerful models into small powerful models that imitate the bigger ones while getting similar performance.
Generative data without supervision can also be deployed for on-the-fly learning and adaptation. Along with improvements in software and technology, the chip industry is going through somewhat of a deep learning revolution, adding more parallel processing and specialized vector math functions. For example, GPU vendor nVidia taking has some exciting products that take advantage of deep learning. Some smaller private embedded deep learning IP companies like Nervana, Movidius, and Apical are getting snapped up in highly valued acquisitions from larger companies like Intel and ARM.
Embedded deep learning and embedded AI is here.
September 9, 2016
We are pleased to announce that Sensory’s TrulySecure technology has earned first place in this year’s CTIA E-Tech Awards. We believe that this recognition serves as a testament to Sensory’s devotion to developing the best embedded speech recognition and biometric security technologies available.
For those of you unfamiliar with TrulySecure – TrulySecure is the result of more than 20 years of Sensory’s industry leading and award-winning experience in the biometric space. The TrulySecure SDK allows application developers concerned about both security and convenience to quickly and easily deploy a multimodal voice and vision authentication solution for mobile phones, tablets, and PCs. TrulySecure is highly secure, environment robust, and user friendly – offering better protection and greater convenience than passwords, PINs, fingerprint readers and other biometric scanners. TrulySecure offers the industry’s best accuracy at recognizing the right user, while keeping unauthorized users out. Sensory’s advanced deep learning neural networks are fine tuned to provide verified users with instant access to protected apps and services, without the all too common false rejections of the right user associated with other biometric authentication methods. TrulySecure features a quick and easy enrollment process – capturing voice and face simultaneously in a few seconds. Authentication is on-device and almost instantaneous.
TrulySecure provides maximum security against unauthorized attempts by mobile identity thieves from breaking into a protected mobile device, while ensuring the most accurate verification rates for the actual user. Compared to published data by Apple, the iPhone’s thumbprint reader offers about in 1:50K chance of a false accept of the wrong user, and the probability of the wrong user getting into the device gets higher when the user enrolls more than one finger. With TrulySecure, face and voice biometrics individually offer a baseline 1:50k false accept rate, but can each be made more secure depending on the security needs of the developer. When both face and voice biometrics are required for user authentication, TrulySecure is virtually impenetrable by anybody but the actual user. As a baseline, TrulySecure’s face+voice authentication offers a baseline of 1:100k False Accept Rate, but can be dialed in to offer as much as a 1:1Million False Accept Rate depending on security needs.
TrulySecure is robust to environmental challenges such as low light or high noise – it works in real-life situations that render lesser offerings useless. The proprietary speaker verification, face recognition, and biometric fusion algorithms leverage Sensory’s deep strength in speech processing, computer vision, and machine learning to continually make the user experience faster, more accurate, and more secure. The more the user uses TrulySecure, the more secure it gets.
TrulySecure offers ease-of-mind specifications: no special hardware is required – the solution uses standard microphones and cameras universally installed on today’s phones, tablets and PCs. All processing and encryption is done on-device, so personal data remains secure – no personally identifiable data is sent to the cloud. TrulySecure was also the first biometric fusion technology to be FIDO UAF Certified.
While we are truly honored to be the recipient of this prestigious award, we won’t rest on our laurels. Our engineers are already working on the next generation of TrulySecure, further improving accuracy and security, as well as refining the already excellent user experience.
Guest blog by Michael Farino
August 22, 2016
Sensory is proud to announce that it has been awarded with two 2016 Speech Tech Magazine Awards. With some stiff competition in the speech industry, Sensory continues to excel in offering the industry’s most advanced embedded speech recognition and speech-based security solutions for today’s voice-enabled consumer electronics movement.
The 2016 Speech Technology Awards include:
Speech Luminary Award – Awarded to Sensory’s CEO, Todd Mozer
“What really impresses me about Todd is his long commitment to speech technology, and specifically, his focus on embedded and small-footprint speech recognition,” says Deborah Dahl, principal at Conversational Technologies and chair of the World Wide Web Consortium’s Multimodal Interactions Working Group. “He focuses on what he does best and excels at that.”
Star Performers Award – Awarded to Sensory for its contributions in enabling voice-enabled IoT products via embedded technologies
“Sensory has always been in the forefront of embedded speech recognition, with its TrulyHandsfree product, a fast, accurate, and small-footprint speech recognition system. Its newer product, TrulyNatural, is ground- breaking because it supports large vocabulary speech recognition and natural language understanding on embedded devices, removing the dependence on the cloud,” said Deborah Dahl, principal at Conversational Technologies and chair of the World Wide Web Consortium’s Multimodal Interactions Working Group. “While cloud-based recognition is the right solution for many applications, if the application must work regardless of connectivity, embedded technology is required. The availability of TrulyNatural embedded natural language understanding should make many new types of applications possible.”
– Guest Blog by Michael Farino
May 6, 2016
Rich Nass and Barbara Quinlan from Open Systems Media visited Sensory on their “IoT Roadshow”.
IoT is a very interesting area. About 10 years ago we saw voice controlled IoT on the way, and we started calling the market SCIDs – Speech Controlled Internet Devices. I like IoT better, it’s certainly a more popular name for the segment! ;-)
I started our meeting off by talking about Sensory’s three products – TrulyHandsfree Voice Control, TrulySecure Authentication, and TrulyNatural large vocabulary embedded speech recognition.
Although TrulyHandsfree is best known for its “always on” capabilities, ideal for listening for key phrases (like OK Google, Hey Cortana, and Alexa), it can be used a ton of other ways. One of them is for hands-free photo taking, so no selfie stick is required. To demonstrate, I put my camera on the table and took pictures of Barbara and Rich. (Normally I might have joined the pictures, but their healthy hair, naturally good looks, and formal attire was too outclassing for my participation).
There’s a lot of hype about IoT and Wearables and I’m a big believer in both. That said, I think Amazon’s Echo is the perfect example of a revolutionary product that showcases the use of speech recognition in the IoT space and am looking forward to some innovative uses of speech in Wearables!
Here’s the article they wrote on their visit to Sensory and an impromptu video showing TrulyNatural performing on-device navigation, as well as a demo of TrulySecure via our AppLock Face/Voice Recognition app.
Rich Nass, Embedded Computing Brand Director
If you’re an IoT device that requires hands-free operation, check out Sensory, just like I did while I was OpenSystems Media’s IoT Roadshow. Sensory’s technology worked flawlessly running through the demo, as you can see in the video. We ran through two different products, one for input and one for security.
June 11, 2015
Guest post by: Michael Farino
Sensory’s CEO, Todd Mozer joined Alan Taylor, host of Popular Science Radio, in a fun discussion about artificial intelligence, Sensory’s involvement with the Jibo robot development team, and also gave the show’s listeners a look into the past 20 years of speech recognition. Todd and Alan additionally discussed some of the latest advancements in speech technology, and Todd provided an update on Sensory’s most recent achievements in the field of speech recognition as well as a brief look into what the future holds.
Listen to the full radio show at the link below:
Big Bang Theory, Science, and Robots | FULL EPISODE | Popular Science Radio #269