Speech Blog
HEAR ME - Speech Blog  |  Read more March 11, 2020 - In A World of Cookie-Cutter Voice Assistants – Sensory Helps Companies Stand Out
HEAR ME - Speech Blog



Archive for the ‘voice assistant’ Category

In A World of Cookie-Cutter Voice Assistants – Sensory Helps Companies Stand Out

March 11, 2020

For many years Sensory has been considered the de-facto standard for embedded wake words and voice control. In fact, our low power, small footprint TrulyHandsfree has shipped in billions of devices and apps around the globe. Sensory’s best in class accuracy has enabled us to be the only company offering wake word solutions approved by Amazon, Apple, Google, Microsoft, Samsung and LG, as well as international partners like Alibaba, Baidu and Tencent. With global smart speaker sales breaking records year over year, people are definitely getting more comfortable speaking to technology and this creates new opportunities. Savvy voice-first users now want the next wave of use cases that are more compelling than the typical smart speaker trivia, weather and music. While smart speakers are handy and for introducing the convenience afforded by a voice user interface, they aren’t good for actually getting things done. To really accomplish meaningful tasks, the entire Sensory team is excited to charge into the next phase of the voice user interface. A charge that is led by custom branded wake words and domain specific voice assistants.

Your Brand, Your Wake Word

Most voice industry insiders are familiar with the Sonos/Alexa story. Mom brings home a new Sonos One speaker and sets it up with Alexa for voice control. Every time the kids use it to check the weather, play some music, or hear the news, the conversation starts with “Alexa.” After a few weeks Mom asks, “how do you like that new Sonos speaker?” The kids answer in unison, “what Sonos speaker? Oh, you mean the Alexa speaker!” Enabling Alexa or Google to highjack your brand might seem like a good idea to sell a few more units, but it is not a good long-term strategy for building a brand. Unfortunately, Sonos learned the hard way, but your company doesn’t don’t have to. Sensory’s TrulyHandsfree provides companies with the means to create custom branded wake words that deliver accuracy and performance that is equal to or better than anything offered by the digital giants. Very soon you will hear about several high-profile brands, rolling out their custom wake words. Stay tuned for more on that!

Another business case to consider, what if Amazon and Google don’t want to support your specific use case? For example, a recent article in the Boston Globe explains how Amazon decided not to support LifePod and their quest to create a smart speaker tailored to the elderly. At first glance this may have seemed like a showstopper for LifePod, but actually it enabled them to create their own smart speaker solution, and they turned to Sensory to create the “Hello LifePod” branded wake word.
“LifePod is a domain-specific, purpose-driven voice assistant and our mission is to put proactive-voice caregiving to work for caregivers worldwide. Proactive-voice means the user does not have to wake up the smart speaker – it just talks to them according to a schedule setup by their caregiver! But the care recipient will need to wake LifePod up and ask for general purpose, voice-first services like music or the weather. And they will also need to wake it up to “call for help!” when they’ve fallen or hurt themselves. In all these “reactive-voice” cases, we needed the best the best custom wake word on the market – and that comes from Sensory. They’ve been leading the wake-word, specialized voice processing market since its inception!” said Stu Patterson, CEO of LifePod.

Using Voice to Get It Done

More than just a custom wake word, Sensory is also supporting clients to create domain specific voice assistants. How does this compare to the typical smart speaker? Consider the smart speaker as a generalist. It can do many, many things, but doesn’t excel at any particular task. A domain specific assistant focuses on one task or a collection of tasks and is specifically designed to accomplish these tasks. To drive the point home, a smart speaker is a mile wide and an inch deep. A domain specific voice assistant is a mile deep and an inch wide. For example, when Midea MCA, a global leader in microwave appliances, wanted a voice-controlled microwave, they didn’t need it to play music or talk about the weather. The Sensory team leveraged our TrulyNatural embedded speech recognition and created a custom language model to support microwave cooking tasks. The end product is a specialist. A voice-controlled microwave oven that knows how to cook potatoes, pop popcorn, reheat coffee, melt butter, soften ice cream, defrost vegetables and much more. All by voice. By being a specialist, this voice enabled microwave provides users with advanced features that are usually hidden behind confusing menus and semi-secret button press sequences.

“We firmly believe in ‘consumer-first technology’ and strive to maintain a technological advantage through teamwork and innovation. By teaming up with Sensory we are able to not only modernize the consumer experience offered by our products but do so in a way that addresses the increasingly alarming privacy issue that concerns many consumers worldwide,” said Dr. Scott Sun, Deputy Director of Midea MCA.
View a video demo of Sensory’s Domain Specific Voice Assistant for Smart Appliances embedded in a Midea MCA microwave oven here.

Does Your Brand Have A Voice?

Sensory’s embedded AI enables companies to own and control their voice experiences. Pairing a custom branded wake word with domain specific voice assistant creates a voice user experience that is unique to your brand. Every interaction reinforces the brand-customer bond while also effectively completing the task at hand. Products such as smart appliances, remote controls, wireless headsets, hearables, and wearables can all benefit from this next wave of voice user interface.

If you would like to learn more about Sensory, then please contact us here. We would be happy to discuss how we can best support your company’s custom voice strategy.

Voice Assistants Going Embedded

January 13, 2020

Everyone knows that the use of voice assistants is exploding. Just returning from CES, it was amazing to see the number of booths that support Amazon’s Alexa and Google’s Assistant. Nevertheless, recent research from Edison-NPR is showing that US household adoption of voice assistant applications is slowing down.

Read the full blog at Embedded Computing.

Can Your Assistant Deliver?

December 9, 2019

In the old days of corporate America an “assistant” would bring you coffee. Today the “assistant” is Alexa or Google or some other voice assistant, and they don’t directly bring you coffee…yet.

Over 30% of U.S. adults now have at least one smart speaker, 60% have used a voice assistant on a smartphone, and the market penetration continues to grow. Currently people use their voice assistants for a lot of music playing, alarm setting and general queries. Another trend is occurring in parallel to the growth of voice assistants, and that is the market for ordering ahead for food to go.

Walmart already allows orders through Google Assistant, and Alexa Skills have been built for ordering from dozens if not hundreds of restaurants. A new survey from The MDR Group and Progressive Business Insights shows more than two-thirds of adults are interested in using voice assistants to arrange food orders.

Read the full blog at voicebot.ai.

IFA 2019 Takes Assistants Everywhere to a New Level

September 17, 2019

The IFA (Internationale Funkausstellung) Show occurred this past week in Berlin. It is certainly Europe’s largest tech show, and some would say it has now surpassed the Las Vegas Consumer Electronics Show (CES). I hit CES every year and this was my second IFA, but something radically changed this year in the world of Voice Assistants. I’m going to make up a conversation based on 2 imaginary people. For fun I’ll call them Sundar and Jeff:

Read the full blog at voicebot.ai.

Revisiting Wake Word Accuracy and Privacy

June 11, 2019

I used to blog a lot about wake words and voice triggers. Sensory pioneered this technology for voice assistants, and we evangelized the importance of not hitting buttons to speak to a voice recognizer. Then everybody caught on and the technology went into main stream use (think Alexa, OK Google, Hey Siri, etc.), and I stopped blogging about it. But I want to reopen the conversation…partly to talk about how important a GREAT wake word is to the consumer experience, and partly to congratulate my team on a recent comparison test that shows how Sensory continues to have the most accurate embedded wake word solutions.

Competitive Test Results. The comparison test was done by Vocalize.ai. Vocalize is an independent test house for voice enabled products. For a while, Sensory would contract out to them for independent testing of our latest technology updates. We have always tested in-house but found that our in-house simulations didn’t always sync up with our customers’ experience. Working with Vocalize allowed us to move from our in-house simulations to more real-world product testing. We liked Vocalize so much that we acquired them. So, now we “contract in” to them but keep their data and testing methodology and reporting uninfluenced by Sensory.

Vocalize compared two Sensory TrulyHandsfree wake word models (1MB size and 250KB size) with two external wake words (Amazon and Kitt.ai’s Snowboy), all using “Alexa” as the trigger. The results are replicable and show that Sensory’s TrulyHandsfree remains the superior solution on the market. TrulyHandsfree was better/lower on BOTH false accepting AND false rejecting. And in many cases our technology was better by a longshot! If you would like see the full report and more details on the evaluation methods, please send an email request to either Vocalize (dev@vocalize.ai) or Sensory (sales@sensory.com).


It’s Not Easy. There are over 20 companies today that offer on-device wake words. Probably half of these have no experience in a commercially shipping product and they never will; there are a lot of companies that just won’t be taken seriously. The other half can talk a good talk, and in the right environment they can even give a working demo. But this technology is complex, and really easy to do badly and really hard to do great. Some demos are carefully planned with the right noise in the right environment with the right person talking. Sensory has been focused on low power embedded speech for 25 years, we have 65 of the brightest minds working on the toughest challenges in embedded AI. There’s a reason that companies like Amazon, Google, Microsoft and Samsung have turned to Sensory for our TrulyHandsfree technology. Our stuff works, and they understand how difficult it is to make this kind of technology work on-device! We are happy to provide APK’s so you can do you’re your own testing and judge for yourself! OK, enough of the sales pitch…some interesting stuff lays ahead…

It’s Really Important. Getting a wake word to work well is more important than most people realize. It’s like the front door to your house. It might be a small part of your house, but if you aren’t letting the homeowners in then that’s horrible, and if you are letting strangers in by accident that’s even worse. The name a company gives their wake word is usually the company brand name, imagine the sentiment that comes off when I say a brand name and it doesn’t work. Recently I was at a tradeshow that had a Mercedes booth. There were big signs that said “Hey Mercedes”…I walked up to the demo area and I said “Hey Mercedes” but nothing happened…the woman working there informed me that they couldn’t demo it on the show floor because it was really too noisy. I quickly pulled out my mobile phone and showed her that I could use dozens of wake words and command sets without an error in that same environment. Mercedes has spent over 100 years building up one of the best quality brand reputations in the car industry. I wonder what will happen to that reputation if their wake word doesn’t respond in noise? Even worse is when devices accidentally go off. If you have family members that listen to music above volume 7 then you already know the shock that a false alarm causes!

It’s about Privacy. Amazon, like Google and a few others seem to have pretty good wake words, but if you go into your Alexa settings you can see all of the voice data that’s been collected, and a lot of it is being collected when you weren’t intentionally talking to Alexa! You can see this performance issue in the Vocalize test report. Sensory substantially outperformed Amazon in the false reject area. This is when a person tries to speak to Alexa and she doesn’t respond. The difference is most apparent in babble noise where Sensory falsely rejected 3% and Amazon falsely rejected 10% in comparable sized models (250KB). However the False Accept difference is nothing short of AMAZING. Amazon false accepted 13 times in 24 hours of random noise. In this same time period Sensory false accepted ZERO times (on comparably sized 250KB models). How is this possible you may be wondering? Amazon “fixes” its mistakes in the cloud. Even though the device falsely accepts quite frequently, their (larger and more sophisticated) models in the cloud collect the error. Was that a Freudian slip? They correct the error…AND they COLLECT the error. In effect, they are disregarding privacy to save device cost and collect more data.

As the voice revolution continues to grow, you can bet that privacy will continue to be a hot topic. What you now understand is that wake word quality has a direct impact on both the user experience and PRIVACY! While most developers and product engineers in the CE industry are aware of wake words and the difficulty in making them work well on-device, they don’t often consider that competing wake words technologies aren’t created equally – the test results from Vocalize prove it! Sensory is more accurate AND allows more privacy!

Taking Back Control of Our Personal Data

March 11, 2019

As more and more devices that are designed to watch and listen to us flood the market, there is rising concern about how the personal data that are collected gets used. Facebook has admitted to monetizing our personal data and the personal data of our friends. Google has been duplicitous in its tracking of users even when privacy settings are set to not track, and more recently has admitted to placing microphones in products without informing consumers. In no realm is the issue of privacy more relevant than today’s voice assistants. A recent PC Magazine survey of over 2,000 people found that privacy was the top concern for smart home devices (more important than cost!). The issue becomes more complex as personal assistant devices become increasingly better-equipped with IP cameras and various other sensors to watch, track and listen to us.

Admittedly, people get a lot in return for giving up their privacy. We’ve become numb to privacy policies. It is almost expected that we will tap “accept” on End User License Agreements (EULAs) without reading through terms that are intentionally designed to be difficult to read. We get free software, free services, music and audio feeds, discounted rates and other valuable benefits. What we don’t fully realize is what we are giving up, because big data companies don’t openly share that with us. We fall into two main categories of consumers: trust the giants of industry to do what’s right, or don’t play (and lose out on the benefits). For example, I love having smart speakers in my home, but many of my friends won’t get them — precisely because of these very fair privacy concerns. Our legislators are trying to address privacy concerns, but the government needs to fully understand how data is used before writing legislation. They need help from the tech community on how to deal with these issues. Europe has adopted the new General Data Protection Regulation (GDPR) to address concerns related to businesses’ handling and protection of user data. This is a step in the right direction, as it provides clear rules to companies that handle personal data and offers clearly defined monetary penalties for failure to protect the data of citizens. However, is it enough to slap fines on these companies for failing to comply? What happens when they think the market value of the infringement will be greater than the penalty? Or when human errors are made, which happened when a European user requested their data and was accidentally given someone else’s data.

In this day and age, a great deal of our daily tasks require us to share personal data. Whether we are talking to our AI assistants, using our smart phone to navigate around traffic or making a credit card transaction at a local retailer, we’re sharing data. There are benefits to sharing this data, and as time goes on people will see even more benefits from data sharing, but there are so many problems too.

AI personal assistants provide examples of how sharing data can benefit or hurt the end user. The more we use these devices the more they learn about us. As they become capable of recognizing who we are by face or voice and get to know and memorize our histories, preferences and needs, these systems will evolve from devices that answer simple questions into proactive, accurate helpers that offer useful recommendations, unprompted reminders, improved home security — assistance we users find truly helpful. But they can also reveal private information that we don’t want others to know, which is what happened when a girl’s shopping habits triggered advertising for baby products before her parents knew she was pregnant.

As digital assistant technologies get smarter, there will be more concern about what private data they collect, store and share over the internet. One solution to the privacy problem would be keeping whatever is learned by the device on the device by moving the AI processing out of the cloud to the edge (on device), ensuring our assistants never take personal data to the cloud so there is no privacy risk. This would be a good solution as embedded AI becomes more powerful. The biggest disadvantage may be that every new device would have to re-learn who we are.

Another possible solution is more collaboration among data companies. Some of the AI assistant companies are starting to work together; for example, Cortana and Alexa are playing friendly. This current approach, however, is about transferring the baton from one assistant to another so that multiple assistants can be accessed from a single device. It’s unlikely this approach will be widely adopted, and even if it were, it would result in inefficiency in the collection and use of our personal data, because each of the AI assistant providers would have to build their own profile of who we are based on the data they collect. However, because companies will want to eventually monetize the data they collect, sharing done right could actually benefit us.

Could more sharing of our data and preferences improve AI assistants?

Sharing data does create the best and most consistent user experience. It allows consumers to switch AI brands or buy new devices without losing any of the benefits in user experience and device knowledge of who we are. Each assistant having its own unique data and profile for its users skirts the privacy issue and misses the advantages of big data. That doesn’t seem to be the right way for the industry to advance. Ideally, we would create a system where there is shared knowledge of who we are, but we, as individuals, need to control and manage that knowledge.

For example, Google knows that the most frequent search I do on Google Maps is for vegetarian restaurants. Google has probably figured out that I’m a vegetarian. Quite ironically, I found “meat” in my shopping cart at Amazon. Somehow, Alexa thought I asked it to add “meat” to my shopping cart. Perhaps if Alexa had the same knowledge of me as Google it would have made a different and better decision. Likewise, Amazon knows a lot about my shopping details. It knows what pills I take, my shoe size and a whole lot of other things that might be helpful to the Google Assistant. A shared knowledge base would benefit me, but I want to be able to control and oversee this shared database or profile.

Improved privacy without losing the advantages of shared data is achievable through a combination of private devices with embedded intelligence and cloud devices restricted and governed by legislature and industry standards.

Legislature should enable user-controlled personal-data sharing systems, then the user can choose their own mix of private devices with embedded intelligence and cloud devices that offer more general and standard protections. Of course the legislature is the hard part since our government tends to work more for the industries that fund reelections and bills than the people…nevertheless here are some thoughts on how legislature for privacy could work:

    1. Companies can’t retain an individual’s private information. That’s it! There is no good reason for them to have it. Period….but they can collect it and provide it to the owner.
    2. Every person has and controls their “Shared Profile” or SP. The SP contains any and all information collected from any company. It is an item list categorized by type (clothing sizes, restaurants, etc.).
    3. The SP is divided into “confidential” (publicly available to any company) and “non-confidential” (not publicly available) sections. The SP owner has direct control over their data and decides the following:
      • Which categories reside in the “non-confidential” SP vs. the “confidential” SP
      • Which items are included in those public categories inside the “non-confidential” SP
      • Which companies can access information in the “confidential” SP and what specific information they have access to
    4. Companies can access the “non-confidential” SP and utilize it for their purposes, but they can’t disclose or sell anything in it to others. It’s our data, not theirs!
    5. Companies can create new categories or items within categories on our SP; however, data added by companies is automatically placed in the “confidential” section. The user gets notified about new information and they can monitor, screen, edit and move such information into the “non-confidential” SP, making it useful for other companies to access.

Ultimately, this concept is a shared format designed to give companies equal access to data while providing users full ownership and control of their personal data and a clear understanding of what is shared or made public. No longer would the user have to deal with individual companies and complex EULAs, and no longer would they need to fear that their devices are carrying out unscrupulous behavior because of the vagueness of our legal system and the less-than-transparent ways big-data companies use our data.

With a little creative thinking and a shift in regulation, we the people can change the future of data collection and take back ownership and control of our personal data.


The Move Towards On-Device Assistants for Performance and Privacy

February 11, 2019

Voice Assistants are growing in both popularity and capability. They are arriving in our home, cars, mobile devices and seem to now be a standard part of American culture, entering our tv shows, movies, music, and Super Bowl ads. However, this popularity is accompanied by a persistent concern over our privacy and the safety of our personal data when these devices are always listening and always watching.

There is a significant distrust of big companies like Facebook, Google, Apple, and Amazon. Facebook and Google have admitted to misusing our private data, and Apple and Amazon have admitted that system failures have led to a loss of private data.

So naturally, there would be an advantage of not sending our voices or videos into the cloud and doing the processing on-device. Then no data loss is at risk. Cloud-based queries could still occur, but through anonymized text only.

There are forces bringing us closer to edge-based assistants and there are other forces leading to data going through the cloud. Here are a few ideas to consider.

  • Power and Memory. There is no doubt that cloud-based solutions offer more power and memory, and deep learning approaches can certainly take advantage of those features. However, access speed and available bandwidth are often issues giving an edge to working on-device. Current, state of the art deep net modeling can allow limited domain natural language engines that require substantially less memory and MIPS than general purpose models, making natural language on device realistic today. Furthermore, powerful on-device voice experiences are increasingly realistic as we pack more and more memory and MIPS into smaller and cheaper packages. New chip architectures targeting deep learning methodologies can also lead to on-device breakthroughs and these designs are now hitting the markets.
  • Accuracy. Although power and memory may be key factors in influencing accuracy, an on-device assistant may be able to take advantage of sensor and usage data and other embedded information not available to the cloud-based assistant so that it can better adapt to users and their preferences.
  • Privacy. Not sending data to the cloud is more private.

Some have argued that we have carried microphones and cameras around with us for years without any issues, but I see this thinking as flawed. Just recently, Apple admitted to a facetime bug on mobile phones enabling “eavesdropping” on others.

Also, if my phone is listening for a wake word it’s a very different technology model than an IoT device that’s “always on.” Phones are usually designed to listen in arms-length situations of 2 or 3 feet. An IoT speaker is designed to listen to 20 feet! If we assume constant noise across a room that could make an assistant “false fire” and start listening, then we can think of 2 listening circles, one with a radius of 3 feet and one with a radius of 20 feet, to compare the listening area of the phone with a far-field IoT device such as a smart speaker. The phone has a listening area of π r2 or 9 π, the IoT device has a listening area of 400 π. So, all else equal the IoT device is about 44 times more likely to false fire and start listening when it wasn’t intended to.

As cloud-based far-field assistants enter the home there is a definite risk of our private data getting intercepted. It’s not just machine errors but human errors too, like the Amazon employee that accidentally sent out the wrong data to a person that requested it.

There are also other means in which we can lose our cloud-connected private data like the “dolphin attack” that can allow outsiders to listen in.

  • The will of Amazon, Google, Apple, Government, and others. We should not underestimate the market power and persuasiveness of these tech giants. They want to open our wallets, and the best way to do that is to present us with things we want to buy…whether, food, shelter, gifts or whatever. Amazon is pretty good at selling us stuff. Google is pretty good at making money connecting people with things they want and showing them ads. User data makes all of these things easier and more effective. More effective means they make more money showing us ads and selling us stuff. I suspect that most of these giant players will have strong incentives to keep our assistants and our data flowing into the cloud. Of course, tempering this will is the various govt agencies trying to protect consumer privacy. Europe has launched GDPR (ironically leading to the Amazon accident mentioned above!) which could provide some disincentives around using cloud-based services.

My conclusion is that there is a lot of opportunity in bringing assistants onto devices. It can not only protect privacy but through adaptation and domain limitation it can create a better-customized user experience. I predict increasingly more products to use on-device voice control and assistants! Of course, I also predict increasingly more devices to use cloud assistants. What wins out, in the long run, will probably depend more on government legislation and individual privacy concerns than anything else.