Posts Tagged ‘deep learning’
May 17, 2017
A key measure of any biometric system is the inherent accuracy of the matching algorithm. Earlier attempts at face recognition were based on traditional computer vision (CV) techniques. The first attempts involved measuring key distances on the face and comparing those across images, from which the idea of the number of “facial features” associated with an algorithm was born. This method turned out to be very brittle however, especially as the pose angle or expression varied. The next class of algorithms involved parsing the face into a grid, and analyzing each section of the grid individually via standard CV techniques, such as frequency analysis, wavelet transforms, local binary patterns (LBP), etc. Up until recently, these constituted the state of the art in face recognition. Voice recognition has a similar history in the use of traditional signal processing techniques.
Sensory’s TrulySecure uses a deep learning approach in our face and voice recognition algorithms. Deep learning (a subset of machine learning) is a modern variant of artificial neural networks, which Sensory has been using since the very beginning in 1994, and thus we have extensive experience in this area. In just the last few years, deep learning has become the primary technology for many CV applications, and especially face recognition. There have been recent announcements in the news by Google, Facebook, and others on face recognition systems they have developed that outperform humans. This is based on analyzing a data set such as Labeled Faces in the Wild, which has images captured over a very wide ranging set of conditions, especially larger angles and distances from the face. We’ve trained our network for the authentication case, which has a more limited range of conditions, using our large data set collected via AppLock and other methods. This allows us to perform better than those algorithms would do for this application, while also keeping our size and processing power requirements under control (the Google and Facebook deep learning implementations are run on arrays of servers).
One consequence of the deep learning approach is that we don’t use a number of points on the face per se. The salient features of a face are compressed down to a set of coefficients, but they do not directly correspond to physical locations or measurements of the face. Rather these “features” are discovered by the algorithm during the training phase – the model is optimized to reduce face images to a set of coefficients that efficiently separate faces of a particular individual from faces of all others. This is a much more robust way of assessing the face than the traditional methods, and that is why we decided to utilize deep learning opposed to CV algorithms for face recognition.
Sensory has also developed a great deal of expertise in making these deep learning approaches work in limited memory or processing power environments (e.g., mobile devices). This combination creates a significant barrier for any competitor to try to switch to a deep learning paradigm. Optimizing neural networks for constrained environments has been part of Sensory’s DNA since the very beginning.
One of the most critical elements to creating a successful deep learning based algorithm such as the ones used in TrulySecure is the availability of a large and realistic data set. Sensory has been amassing data from a wide array of real world conditions and devices for the past several years, which has made it possible to train and independently test the TrulySecure system to a high statistical significance, even at extremely low FARs.
It is important to understand how Sensory’s TrulySecure fuses the face and voice biometrics when both are available. We implement two different combination strategies in our technology. In both cases, we compute a combined score that fuses face and voice information (when both are present). Convenience mode allows the use of either face or voice or the combined score to authenticate. TrulySecure mode requires both face and voice to match individually.
More specifically, Convenience mode checks for one of face, voice, or the combined score to pass the current security level setting. It assumes a willingness by the user to present both biometrics if necessary to achieve authentication, though in most cases, they will only need to present one. For example, when face alone does not succeed, the user would then try saying the passphrase. In this mode the system is extremely robust to environmental conditions, such as relying on voice instead of face when the lighting is very low. TrulySecure mode, on the other hand, requires that both face and voice meet a minimum match requirement, and that the combined score passes the current security level setting.
TrulySecure utilizes adaptive enrollment to improve FRR with virtually no change in FAR. Sensory’s Adaptive Enrollment technology can quickly enhance a user profile from the initial single enrollment and dramatically improve the detection rate, and is able to do this seamlessly during normal use. Adaptive enrollment can produce a rapid reduction in the false rejection rate. In testing, after just 2 adaptations, we have seen almost a 40% reduction in FRR. After 6 failed authentication attempts, we see more than 60% reduction. This improvement in FRR comes with virtually no change in FAR. Additionally, adaptive enrollment alleviates the false rejects associated with users wearing sunglasses, hats, or trying to authenticate in low-light, during rapid motion, challenging angles, with changing expressions and changing facial hair.
Guest post by Michael Farino
February 10, 2017
The wonders of deep learning are well utilized in the area of artificial intelligence, aka AI. Massive amounts of training data can be processed on very powerful platforms to create wonderful generalized models, which can be extremely accurate. But this in and of itself is not yet optimal, and there’s a movement afoot to move the intelligence and part of the learning onto the embedded platforms.
Certainly, the cloud offers the most power and data storage, allowing the most immense and powerful of systems. However, when it comes to agility, responsiveness, privacy, and personalization, the cloud looks less attractive. This is where edge computing and shallow learning through adaptation can become extremely effective. “Little” data can have a big impact on a particular individual. Think how accurately and how little data is required for a child to learn to recognize its mother.
A good example of specialized learning is when it comes to accents or speech impediments. Generalized acoustic models often don’t handle this well, resulting in customized models for different markets and accents. However, this customization is difficult to manage, can add to the cost of goods, and may negatively impact the user experience. Yet, this still results in a model generalized for a specific class of people or accents. An alternative approach could begin with a general model built with cloud resources, with the ability to adapt on the device to the distinct voices of the people that use it.
The challenge with embedded deep learning occurs in its limited resources and the need to deal with on-device data collection, which by its nature, will be less plentiful, unlabeled, yet more targeted. New approaches are being implemented such as teacher/student models where smaller models can be built from a wider body of data, essentially turning big powerful models into small powerful models that imitate the bigger ones while getting similar performance.
Generative data without supervision can also be deployed for on-the-fly learning and adaptation. Along with improvements in software and technology, the chip industry is going through somewhat of a deep learning revolution, adding more parallel processing and specialized vector math functions. For example, GPU vendor nVidia taking has some exciting products that take advantage of deep learning. Some smaller private embedded deep learning IP companies like Nervana, Movidius, and Apical are getting snapped up in highly valued acquisitions from larger companies like Intel and ARM.
Embedded deep learning and embedded AI is here.
October 14, 2016
I watched Sundar and Rick and the team at Google announce all the great new products from Google. I’ve read a few reviews and comparisons with Alexa/Assistant and Echo/Home, but it struck me that there’s quite an overlap in the reports I’m reading and some of the more interesting things aren’t being discussed. Here are a few of them, roughly in increasing order of importance:
October 6, 2016
It’s always nice when Sensory wins an award. 2016 has been a special year for Sensory because we won more awards than any other year in our 23 year history!!
Check it out:
Sensory Earns Multiple Coveted Awards in 2016
Sensory Inc., a Silicon Valley company that pioneered the hands-free voice wakeup word approach, today, announced it has won over half a dozen awards in 2016 across its product-line, including awards for products, technologies, and people, covering deep learning, biometric authentication and voice recognition.
The awards presented to Sensory include the following:
CTIA Super Mobility 2016™, the largest wireless event in America, announced more than 70 finalists for its 10th annual CTIA Emerging Technology (E-Tech) Awards. Sensory was nominated in the category of Mobile Security and Privacy for its TrulySecure™ technology, along with Nokia, Samsung, SAP, and others. Sensory was presented with the First Place award for the category in a ceremony on September 2016 at the CTIA Las Vegas event.
Speech Technology magazine, the leading provider of speech technology news and analysis, had its 10th annual Speech Industry Awards to recognize the creativity and notable achievements of key influencers (Luminaries), major innovators (Star Performers), and impressive deployments (Implementation Awards). The editors of Speech Technology magazine selected 2016 award winners based on their industry contributions during the past 12 months. Sensory’s CEO, Todd Mozer, was awarded with a Luminary Award, making it his second time winning the prestigious award. Sensory as a company was awarded the Star Performer award along with IBM, Amazon and others.
Two well-known industry analyst firms issued reports highlighting Sensory’s industry contributions for its TrulyHandsfree product and customer leadership, offering awards for innovations, customer deployment, and strategic leadership.
“Sensory has an incredibly talented team of speech recognition and biometrics experts dedicated to advancing the state-of-the-art of each respective field. We are pleased that our TrulyHandsfree, TrulySecure and TrulyNatural product lines are being recognized in so many categories, across the various industries in which we do business,” said Todd Mozer, CEO of Sensory. “I am also thrilled that Sensory’s research and innovations in the deep learning space has been noticed, generating our company prestigious accolades and management recognition.”
May 1, 2015
Winning on Accuracy & Speed… How can a tiny player like Sensory compete in deep learning technology with giants like Microsoft, Google, Facebook, Baidu and others?
There’s a number of ways, and let me address them specifically:
These 3 items together have provided Sensory with the highest quality embedded speech engines in the world. It’s worth reiterating why embedded is needed, even if speech recognition can all be done in the cloud:
April 22, 2015
How does Big Data and Privacy fit into the whole Deep Learning Puzzle?
Privacy and Big Data have become big concerns in the world of Deep Learning. However, there is an interesting relationship between the Privacy of personal data and information, Big Data, and Deep Learning. That’s because a lot of the Big Data is personal information used as the data source for Deep Learning. That’s right, to make vision, speech and other systems better, many companies invade users’ personal information and the acquired data is used to train their neural networks. So basically, Deep Learning is neural nets learning from your personal data, stats, and usage information. This is why when you sign a EULA (end user license agreement) you typically give up the rights to your data, whether its usage data, voice data, image data, personal demographic info, or other data supplied through the “free” software or service.
Recently, it was brought to consumers’ attention that some TVs and even children’s toys were listening in on consumers, and/or sharing and storing that information to the cloud. A few editors called me to get my input and I explained that there are a few possible reasons for devices to do this kind of “spying” and none of which are the least bit nefarious: The two most common reasons are 1) The speech recognition technology being used needs the voice data to train better models, so it gets sent to the cloud to be stored and used for Deep Learning and/or 2) The speech recognition needs to process the voice data in the cloud because it is unable to do so on the device. (Sensory will change this second point with our upcoming TrulyNatural release!)
The first reason is exactly what I’ve been blogging about when we say Deep Learning. More data is better! The more data that gets collected, the better the Deep Learning can be. The benefits can be applied across all users, and as long as the data is well protected and not released, then it only has beneficial consequences.
Therein lies the challenge: “as long as the data is well protected and not released…” If banks, billion dollar companies and governments can’t protect personal data in the cloud, then who can, and why should people ever assume their data is safe, especially from systems where there is no EULA is place and data is being collected without consent (which happens all the time BTW)?
Having devices listen in on people and share their voice data with the cloud for Deep Learning or speech recognition processing is an invasion of privacy. If we could just keep all of the deep neural net and recognition processing on device, then there would be no need to risk the security of peoples’ personal data by sharing and storing it on the cloud… and its with this philosophy that Sensory pioneered an entirely different, “embedded” approach to deep neural net based speech recognition which we will soon be bringing to market. Sensory actually uses Deep Learning approaches to train our nets with data collected from EULA consenting and often paid subjects. We then take the recognizer built from that research and run it on our OEM customers’ devices and because of that, never have to collect personal data; so, the consumers who buy products from Sensory’s OEM customers can rest assured that Sensory is never putting their personal data at risk!
In my next blog, I’ll address the question about how accurate Sensory can be using deep nets on device without continuing data collection in the cloud. There are actually a lot of advantages for running on device beyond privacy, and it can include not only response time but accuracy as well!
April 15, 2015
Deep Neural Nets, Deep Belief Nets, Deep Learning, DeepMind, DeepFace, DeepSpeech, DeepImage… Deep is all the rage! In my next few blogs I will try to address some of the questions and issues surrounding all of these “deep” thoughts including:
Part 1: What is Deep Learning and is Sensory Just Jumping on the Bandwagon?
Artificial Neural Network approaches have been around for a long time, and have gone in and out of favor. Neural Nets are an approach within the field of Machine Learning and today they are all the rage. Sensory has been working with Neural Net technology since our founding more than 20 years ago, so the approach is certainly not new for us. We are not just jumping on the bandwagon… we are one of the leading carts! ;-)
Neural Networks are very loosely modeled after how our brains work – nonlinear, parallel processing, and learning from exposure to data rather than being programmed. Unlike common computer architectures that separate memory from processing, our brains have billions of neurons that communicate and process all in parallel and with huge quantities of connections. This architecture based on how our brains work turns out to be much better than traditional computer programs at dealing with ambiguous and “sensory” information like vision and speech – a little Trivia: that’s how we came up with the name Sensory!
In the early days of Sensory, we were often asked by engineers, “What kind of neural networks are you running?” They were looking for a simple answer, something like a “Kohonen Net.” I once asked my brother, Mike Mozer, a pioneer in the field of neural nets, a Sensory co-founder, and a professor of computer science at U. Colorado Boulder, for a few one liners to satisfy curious engineers without giving anything away. We had two lines: the first being, “a feed forward multi-layer net” which satisfied 90% of those asking, and the other response for those that asked for more was, “it’s actually a nonlinear and multivariate function.” That quieted pretty much everyone down.
In the last five years Neural Networks have proven to be the best-known approaches for various recognition and ambiguous data challenges like vision and speech. The breakthrough and improvement in performance came from these various terms that use the word “deep.” The “deep” approaches entailed more complex architectures that receive more data. The architecture relates to the ways that information is shared and processed (like all those connections in our brain), and the increased data allows the system to adapt and improve through continuous learning, hence the terms, “Deep Learning” and “Deep Learning Net.” Performance has improved dramatically in the past five years and Deep Learning approaches have far exceeded traditional “expert-based” techniques for programming complex feature extraction and analysis.