Speech Blog
HEAR ME - Speech Blog  |  Read more October 12, 2017 - Smart speakers coming from all over
HEAR ME - Speech Blog



Smart speakers coming from all over

October 12, 2017

Amazon, Google, Sonos, and LINE all introduced smart speakers within a few weeks of each other. Here’s my quick take and commentary on those announcements.

Amazon now has the new Echo, the old Echo, the Echo Plus, Spot, Dot, Show, and Look. The company is improving quality, adding incremental features, lowering cost, and seemingly expanding its leadership position. They make great products for consumers, have a very strong eco-system, and make very tough products to compete with for both their competitors and their many platform partners that use Alexa.

Read more at Embedded Computing

Sensory Demos Awesome AI Mashup at Finovate!

September 28, 2017

Finovate is one of those shows where you get up on stage and give a short intro and live demo. They are selective in who they allow to present and many applicants are rejected. Sensory demonstrated some really cutting-, perhaps bleeding-, edge stuff by combining animated talking avatars, with text-to-speech, lip movement synchronization, natural language speech recognition and face and voice biometrics. I don’t know of any company ever combining so many AI technologies into a single product or demo!

Speech recognition has a long history of failing on stage, and one of the ways Sensory has always differentiated itself, is that our demos always work! And all our AI technologies worked here too! Even with bright backlighting, our TrulySecure face recognition was so fast and accurate some missed it. With the microphones and echo’s in the large room, our TrulyNatural speech recognition was perfect! That said, we did have a user-error… before Jeff and I got on stage he put his demo phone in DND mode, which cut our audio output – but quickly recovered from that mishap.

Alexa on batteries: a life-changing door just opened

September 25, 2017

Several hundred articles have been written about Amazon’s new moves into Smart Glasses with the Alexa assistant. And it’s not just TechCrunch, Gizmodo, The Verge, Engadget, and all the consumer tech pubs doing the writing. It’s also places like but CNBC, USA Today, Fox News, Forbes, and many others.

I’ve read a dozen or more and they all say similar things about Amazon (difficulties in phone hardware), Google (failure in Glass), bone conduction mics, mobility for Alexa, strategy to get Alexa Everywhere, etc. But something big got lost in the shuffle.

Read more at Embedded Computing

Apple erred on facial recognition

September 15, 2017

On the same day that Apple rolled out the iPhone X on the coolest stage of the coolest corporate campus in the world, Sensory gave a demo of an interactive talking and listening avatar that uses a biometric ID to know who’s talking to it. In Trump metrics, the event I attended had a few more attendees than Apple.

Interestingly, Sensory’s face ID worked flawlessly, and Apple’s failed. Sensory used a traditional camera using convolutional neural networks with deep learning anti-spoofing models. Apple used a 3D camera.

Read more at Embedded Computing

I Nailed It!

August 30, 2017

A few days ago I wrote a blog that talked about assistants and wake words and I said:

“We’ll start seeing products that combine multiple assistants into one product. This could create some strange and interesting bedfellows.”

Interesting that this was just announced:


Here’s another prediction for you…

All assistants will start knowing who is talking to them. They will hear your voice and look at your face and know who you are. They will bring you the things you want (e.g. play my favorite songs), and only allow you to conduct transaction you are qualified for (e.g. order more black licorice). Today there is some training required but in the near future they will just learn who is who much like a new born quickly learns the family members without any formal training.

Here’s what’s next for always listening devices

August 28, 2017

Ten years ago, I tried to explain to friends and family that my company Sensory was working on a solution that would allow IoT devices to always be “on” and listening for a key wake up word without “false firing” and doing it at ultra-low power and with very little processing power. Generally, the response was “Huh?”

Today, I say, “Just like Hey Siri, OK Google, Alexa, Hey Cortana, and so on.” Now, everybody gets it and the technology is mainstream. In fact, next year, Sensory will have technology that’s embedded in IoT devices that listens to all those things (and more). But that’s not good enough.

Read more at Embedded Computing

How Hollywood gets biometrics wrong (and what it gets right)

June 26, 2017

Setting aside the question of whether rogue robots will create a dystopian future, there is one area that artificial intelligence (AI) in movies all seem to coalesce on: biometrics will take over for keys and passwords. There are over 200 movies that show the use of biometrics – here’s a list of 184 of them, and here’s a compilation of clips from several dozen movies.

Whether its fingerprint, voiceprint, iris, retina, face, or other biometrics, there always seems to be some sort of physical scanner in Hollywood depictions of biometrics in action. They have to hold their face or hand up to a device and the device often shines a laser and makes a noise. When they speak, a pass phrase like, “My voice is my password,” is typically required. In other words, the biometrics aren’t particularly fast or easy. The devices don’t just know who people are; they need to be queried and some sort of physical analysis needs to happen after the query.

Read more at Embedded Computing

Staying Ahead with Advanced AI on Devices

June 8, 2017

Since the beginning, Sensory has been a pioneer in advancing AI technologies for consumer electronics. Not only did Sensory implement the first commercially successful speech recognition chip, but we also were first to bring biometrics to low cost chips, and speech recognition to Bluetooth devices. Perhaps what I am most proud of though, more than a decade ago Sensory introduced its TrulyHandsfree technology and showed the world that wakeup words could really work in real devices, getting around the false accept and false reject, and power consumption issues that had plagued the industry. No longer did speech recognition devices require button presses…and it caught on quickly!

Let me go on boasting because I think Sensory has a few more claims to fame… Do you think Apple developed the first “Hey Siri” wake word? Did Google develop the first “OK Google” wake word? What about “Hey Cortana”? I believe Sensory developed these initial wake words, some as demos and some shipped in real products (like the Motorola MotoX smartphone and certain glasses). Even third-party Alexa and Cortana products today are running Sensory technology to wake up the Alexa cloud service.

Sensory’s roots are in neural nets and machine learning. I know everyone does that today, but it was quite out of favor when Sensory used machine learning to create a neural net speech recognition system in the 1990’s and 2000’s.  Today everyone and their brother is doing deep learning (yeah that’s tongue in cheek because my brother is doing it too! (http://www.cs.colorado.edu/~mozer/index.php). And a lot of these deep learning companies are huge multi-billion-dollar business or extremely well-funded startups.

So, can Sensory stay ahead now and continuing pioneering innovation in AI now that everyone is using machine learning and doing AI? Of course, the answer is yes!

Sensory is now doing computer vision with convolutional neural nets. We are coming out with deep learning noise models to improve speech recognition performance and accuracy, and are working on small TTS systems using deep learning approaches that help them sound lifelike. And of course, we have efforts in biometrics and natural language that also use deep learning.

We are starting to combine a lot of technologies together to show that embedded systems can be quite powerful. And because we have been around longer and thought through most of these implementations years before others, we have a nice portfolio of over 3 dozen patents covering these embedded AI implementations. Hand in hand with Sensory’s improvements in AI software, companies like ARM, NVidia, Intel, Qualcomm and others are investing and improving upon neural net chips that can perform parallel processing for specialized AI functions, so the world will continue seeing better and better AI offerings on “the edge”.

Curious about the kind of on-device AI we can create when combining a bunch of our technologies together? So were we! That’s why we created this demo that showcases Sensory’s natural language speech recognition, chatbots, text-to-speech, avatar lip-sync and animation technologies. It’s our goal to integrate biometrics and computer vision into this demo in the months ahead:

Let me know what you think of that! If you are a potential customer and we sign an NDA, we would be happy to send you an APK of this demo so you can try it yourself! For more information about this exciting demo, please check out the formal announcement we made: http://www.prnewswire.com/news-releases/sensory-brings-chatbot-and-avatar-technology-to-consumer-devices-and-apps-300470592.html

What Makes the Latest Version of TrulySecure so Different?

May 17, 2017

A key measure of any biometric system is the inherent accuracy of the matching algorithm. Earlier attempts at face recognition were based on traditional computer vision (CV) techniques. The first attempts involved measuring key distances on the face and comparing those across images, from which the idea of the number of “facial features” associated with an algorithm was born. This method turned out to be very brittle however, especially as the pose angle or expression varied. The next class of algorithms involved parsing the face into a grid, and analyzing each section of the grid individually via standard CV techniques, such as frequency analysis, wavelet transforms, local binary patterns (LBP), etc. Up until recently, these constituted the state of the art in face recognition. Voice recognition has a similar history in the use of traditional signal processing techniques.

Sensory’s TrulySecure uses a deep learning approach in our face and voice recognition algorithms. Deep learning (a subset of machine learning) is a modern variant of artificial neural networks, which Sensory has been using since the very beginning in 1994, and thus we have extensive experience in this area. In just the last few years, deep learning has become the primary technology for many CV applications, and especially face recognition. There have been recent announcements in the news by Google, Facebook, and others on face recognition systems they have developed that outperform humans. This is based on analyzing a data set such as Labeled Faces in the Wild, which has images captured over a very wide ranging set of conditions, especially larger angles and distances from the face. We’ve trained our network for the authentication case, which has a more limited range of conditions, using our large data set collected via AppLock and other methods. This allows us to perform better than those algorithms would do for this application, while also keeping our size and processing power requirements under control (the Google and Facebook deep learning implementations are run on arrays of servers).

One consequence of the deep learning approach is that we don’t use a number of points on the face per se. The salient features of a face are compressed down to a set of coefficients, but they do not directly correspond to physical locations or measurements of the face. Rather these “features” are discovered by the algorithm during the training phase – the model is optimized to reduce face images to a set of coefficients that efficiently separate faces of a particular individual from faces of all others. This is a much more robust way of assessing the face than the traditional methods, and that is why we decided to utilize deep learning opposed to CV algorithms for face recognition.

Sensory has also developed a great deal of expertise in making these deep learning approaches work in limited memory or processing power environments (e.g., mobile devices). This combination creates a significant barrier for any competitor to try to switch to a deep learning paradigm. Optimizing neural networks for constrained environments has been part of Sensory’s DNA since the very beginning.

One of the most critical elements to creating a successful deep learning based algorithm such as the ones used in TrulySecure is the availability of a large and realistic data set. Sensory has been amassing data from a wide array of real world conditions and devices for the past several years, which has made it possible to train and independently test the TrulySecure system to a high statistical significance, even at extremely low FARs.

It is important to understand how Sensory’s TrulySecure fuses the face and voice biometrics when both are available. We implement two different combination strategies in our technology. In both cases, we compute a combined score that fuses face and voice information (when both are present). Convenience mode allows the use of either face or voice or the combined score to authenticate. TrulySecure mode requires both face and voice to match individually.

More specifically, Convenience mode checks for one of face, voice, or the combined score to pass the current security level setting. It assumes a willingness by the user to present both biometrics if necessary to achieve authentication, though in most cases, they will only need to present one. For example, when face alone does not succeed, the user would then try saying the passphrase. In this mode the system is extremely robust to environmental conditions, such as relying on voice instead of face when the lighting is very low. TrulySecure mode, on the other hand, requires that both face and voice meet a minimum match requirement, and that the combined score passes the current security level setting.

TrulySecure utilizes adaptive enrollment to improve FRR with virtually no change in FAR. Sensory’s Adaptive Enrollment technology can quickly enhance a user profile from the initial single enrollment and dramatically improve the detection rate, and is able to do this seamlessly during normal use. Adaptive enrollment can produce a rapid reduction in the false rejection rate. In testing, after just 2 adaptations, we have seen almost a 40% reduction in FRR. After 6 failed authentication attempts, we see more than 60% reduction. This improvement in FRR comes with virtually no change in FAR. Additionally, adaptive enrollment alleviates the false rejects associated with users wearing sunglasses, hats, or trying to authenticate in low-light, during rapid motion, challenging angles, with changing expressions and changing facial hair.

Guest post by Michael Farino

Embedded AI is here

February 10, 2017

The wonders of deep learning are well utilized in the area of artificial intelligence, aka AI. Massive amounts of training data can be processed on very powerful platforms to create wonderful generalized models, which can be extremely accurate. But this in and of itself is not yet optimal, and there’s a movement afoot to move the intelligence and part of the learning onto the embedded platforms.

Certainly, the cloud offers the most power and data storage, allowing the most immense and powerful of systems. However, when it comes to agility, responsiveness, privacy, and personalization, the cloud looks less attractive. This is where edge computing and shallow learning through adaptation can become extremely effective. “Little” data can have a big impact on a particular individual. Think how accurately and how little data is required for a child to learn to recognize its mother.

Read more at Embedded Computing

« Older Entries