HEAR ME -
Speech Blog
HEAR ME - Speech Blog  |  Read more January 13, 2020 - Voice Assistants Going Embedded
HEAR ME - Speech Blog

Archives

Categories

Posts Tagged ‘embedded speech recognition’

Sensory Brings Natural Language Understanding to the Edge with TrulyNatural

April 18, 2019

 

Ideal for Home Appliances, IoT, Set Top Box, Automobiles and More, TrulyNatural Offers a Fast and Reliable Voice Interface Without Privacy Concerns

Santa Clara, Calif., – April 18, 2019 – Sensory Inc., a Silicon Valley company dedicated to pioneering new capabilities for machine learning and embedded AI, today announced the first full feature release of TrulyNatural, the company’s embedded large vocabulary speech recognition platform, with natural language understanding. With more than 50 people-years of development and five years of beta testing behind it, TrulyNatural will help companies move beyond the cloud to create exciting products capable of natural language interaction without compromising their customers’ privacy and without the high memory cost of open source-based solutions.

In March of 2019, PCMag.com published results from a consumer survey where 40-percent of the 2,000 US consumers questioned placed privacy as their top concern related to smart home devices in their homes; far surpassing other concerns like cost, installation, product options and cross platform interoperability. Furthermore, Bloomberg published an article last week titled, “Amazon Workers Are Listening to What You Tell Alexa,” which explains that Amazon’s Alexa team does in fact pay people to listen to recordings for algorithm training purposes. The Bloomberg article quoted, “Occasionally the listeners pick up things Echo owners likely would rather stay private: a woman singing badly off key in the shower, say, or a child screaming for help. The teams use internal chat rooms to share files when they need help parsing a muddled word—or come across an amusing recording.”

Privacy has never been a hotter topic than it is today. TrulyNatural is the perfect solution for addressing these consumer concerns, because it provides devices with an extremely intelligent natural language user interface, while keeping voice data private and secure; voice requests never leave the device, nor are they ever stored.

“To benefit from the advantages afforded by cloud-based natural language processing, companies have been forced to risk customer privacy by allowing always listening devices to share voice data with the recognition service providers,” said Todd Mozer, CEO at Sensory. “TrulyNatural does not require any data to leave the device and eliminates the privacy risks associated with sending voice data to the cloud, and as an added benefit it allows product manufacturers to own the customer relationship and experience.”

TrulyNatural can provide a natural language voice UI on devices of all shapes and sizes, and can be deployed for domain-specific applications, such as home appliances, vehicle infotainment systems, set top boxes, home automation, industrial and enterprise applications, mobile apps and more. Sensory is unique in developing its speech recognizer from scratch with the goal of providing the best quality of experience in the smallest footprint. Many companies take open source solutions and resell it. Sensory explored doing this too, but found that it could create its own solution that is an order of magnitude smaller than open source options without sacrificing performance, boasting an excellent task completion rate measured at greater than 90 percent accuracy1. TrulyNatural can be as small as under 10MB in a natural language and large vocabulary setting, but it can also be scaled to support broad-domain applications like virtual assistants and call center chatbots with a virtually unlimited vocabulary. By categorizing speech into unlimited intents and entities, the natural language understanding component of the system enables intelligent interpretation of any speech and does not require scripted grammars.
“Consumer concerns over security and privacy have been growing over time and Sensory’s TrulyNatural platform addresses this by embedding natural language speech recognition locally on device. As a result, TrulyNatural improves response time and delivers a high performing, more secure and reliable solution. Product manufacturers will appreciate TrulyNatural’s speech engine technology because it enables them to implement a highly valued voice experience through their own brand name and avoid surrendering customers to a potential competitor,” said Dennis Goldenson, Research Director, Artificial Intelligence and Machine Learning with SAR Insight and Consulting.
Designed to run completely on an applications processor, TrulyNatural does not require an internet connection, as all of the speech processing is done natively (at the edge), not in the cloud. It enables a safe, secure, consistent, reliable and easy to implement experience for the end-user, free of requiring any extra apps or WIFI to be setup or operational. By combining TrulyNatural with other Sensory technologies, such as TrulyHandfreewake words, product manufacturers can further enhance the user experience offered by their products by utilizing their own branded wake words, or even let the customer create their own. Furthermore, device manufacturers can bolster the security of their devices by pairing TrulyNatural with TrulySecure to restrict user access or features through voice biometrics.

As an added bonus, TrulyNatural can be combined with other Sensory technologies to unlock powerful features and capabilities. These technologies include:

  • TrulyHandsfree custom branded always listening wake words
  • Seamless enrollment of regular users
  • TrulySecure speaker identification and verification
  • TrulySecure face and/or voice biometrics
  • Sound identification

TrulyHandsfree TrulyNatural currently supports US English, with UK English, French, German, Italian, Japanese, Korean, Mandarin Chinese, Portuguese, Russian and Spanish planned for release in 2019 and 2020. SDK’s are available for Android, iOS, Windows, Linux and other leading platforms.

For more information about this announcement, Sensory or its technologies, please contact sales@sensory.com ; Press inquiries: press@sensory.com.

About Sensory Inc.
Sensory Inc. creates a safer and superior UX through vision and voice technologies. Sensory’s technologies are widely deployed in consumer electronics applications including mobile phones, automotive, wearables, toys, IoT and various home electronics. With its TrulyHandsfree™ voice control, Sensory has set the standard for mobile handset platforms’ ultra-low power “always listening” touchless control. To date, Sensory’s technologies have shipped in over a billion units of leading consumer products.

TrulyNatural is a trademark of Sensory Inc.

1: A home appliance task was analyzed through a spectrum of accented US English speakers across a mix of distances (1-10 ft) with a variety of background noise sources and levels representing realistic home conditions. Tasks included cooking methods, timers, time periods, food types and other possible functions (reset, stop, open/close, etc.) and users were not instructed on things they could or couldn’t request. Multiple types of entities and intents were chosen through NLU and one or more errors from a single phrase would be counted as an error, such that only completely correct interpretations were counted as accurate task completions. Garbage phrases that were ignored were counted as correct, any action taken on a garbage phrase was counted as failure. The task completion rate was measured at over 90% accurate.

Good Technology Exists – So Why Does Speech Recognition Still Fall Short?

March 30, 2015

At Mobile World Congress, I participated in ZTE’s Mobile Voice Alliance panel. ZTE presented data researched in China that basically said people want to use speech recognition on their phones, but they don’t use it because it doesn’t work well enough. I have seen similar data on US mobile phone users, and the automotive industry has also shown data supporting the high level of dissatisfaction with speech recognition.

In fact, when I bought my new car last year I wanted the state of the art in speech recognition to make navigation easier… but sadly I’ve come to learn that the system used in my Lexus just doesn’t work well — even the voice dialing doesn’t work well.

As an industry, I feel we must do better than this, so in this blog I’ll provide my two-cents as to why speech recognition isn’t where it should be today, even when technology that works well exists:

  1. Many core algorithms, especially the ones provided to the automotive industry are just not that good. It’s kind of ironic, but the largest independent supplier of speech technologies actually has one of the worst performing speech engines. Sadly, it’s this engine that gets used by many of the automotive companies, as well as some of the mobile companies.
  2. Even many of the good engines don’t work well in noise. In many tests, Googles speech recognition would come in as tops, but when the environment gets noisy even Google fails. I use my Moto X to voice dial while driving (at least I try to). I also listen to music while driving. The “OK Google Now” trigger works great (kudo’s to Sensory!), but everything I say after that gets lost and I see an “it’s too noisy” message from Google. I end up turning down the radio to voice dial or use Sensory’s VoiceDial app, because Sensory always works… even when it’s noisy!
  3. Speech Application designs are really bad. I was using the recognizer last week on a popular phone. The room was quiet, I had a great internet connection and the recognizer was working great but as a user I was totally confused. I said “set alarm for 4am” and it accurately transcribed “set alarm for 4am” but rather than confirm that the alarm was set for 4am, it asked me what I wanted to do with the alarm. I repeated the command, it accurately transcribed again and asked one more time what I wanted to do with the alarm. Even though it was recognizing correctly it was interfacing so poorly with me that I couldn’t tell what was happening, and it didn’t appear to be doing what I asked it to do. Simple and clear application designs can make all the difference in the world.
  4. Wireless connections are unreliable. This is a HUGE issue. If the recognizer only works when there’s a strong Internet connection, then the recognizer is going to fail A GREAT DEAL of the time. My prediction – over the next couple of years, the speech industry will come to realize that embedded speech recognition offers HUGE advantages over the common cloud based approaches used today – and these advantages exist in not just accuracy and response time, but privacy too!

Deep learning nets have enabled some amazing progress in speech recognition over the last five years. The next five years will see embedded recognition with high performance noise cancelling and beamforming coming to the forefront, and Sensory will be leading this charge… and just like how Sensory led the way with the “always on” low-power trigger, I expect to see Google, Apple, Microsoft, Amazon, Facebook and others follow suit.