A key measure of any biometric system is the inherent accuracy of the matching algorithm. Earlier attempts at face recognition were based on traditional computer vision (CV) techniques. The first attempts involved measuring key distances on the face and comparing those across images, from which the idea of the number of “facial features” associated with an algorithm was born. This method turned out to be very brittle however, especially as the pose angle or expression varied. The next class of algorithms involved parsing the face into a grid, and analyzing each section of the grid individually via standard CV techniques, such as frequency analysis, wavelet transforms, local binary patterns (LBP), etc. Up until recently, these constituted the state of the art in face recognition. Voice recognition has a similar history in the use of traditional signal processing techniques.
Sensory’s TrulySecure uses a deep learning approach in our face and voice recognition algorithms. Deep learning (a subset of machine learning) is a modern variant of artificial neural networks, which Sensory has been using since the very beginning in 1994, and thus we have extensive experience in this area. In just the last few years, deep learning has become the primary technology for many CV applications, and especially face recognition. There have been recent announcements in the news by Google, Facebook, and others on face recognition systems they have developed that outperform humans. This is based on analyzing a data set such as Labeled Faces in the Wild, which has images captured over a very wide ranging set of conditions, especially larger angles and distances from the face. We’ve trained our network for the authentication case, which has a more limited range of conditions, using our large data set collected via AppLock and other methods. This allows us to perform better than those algorithms would do for this application, while also keeping our size and processing power requirements under control (the Google and Facebook deep learning implementations are run on arrays of servers).
One consequence of the deep learning approach is that we don’t use a number of points on the face per se. The salient features of a face are compressed down to a set of coefficients, but they do not directly correspond to physical locations or measurements of the face. Rather these “features” are discovered by the algorithm during the training phase – the model is optimized to reduce face images to a set of coefficients that efficiently separate faces of a particular individual from faces of all others. This is a much more robust way of assessing the face than the traditional methods, and that is why we decided to utilize deep learning opposed to CV algorithms for face recognition.
Sensory has also developed a great deal of expertise in making these deep learning approaches work in limited memory or processing power environments (e.g., mobile devices). This combination creates a significant barrier for any competitor to try to switch to a deep learning paradigm. Optimizing neural networks for constrained environments has been part of Sensory’s DNA since the very beginning.
One of the most critical elements to creating a successful deep learning based algorithm such as the ones used in TrulySecure is the availability of a large and realistic data set. Sensory has been amassing data from a wide array of real world conditions and devices for the past several years, which has made it possible to train and independently test the TrulySecure system to a high statistical significance, even at extremely low FARs.
It is important to understand how Sensory’s TrulySecure fuses the face and voice biometrics when both are available. We implement two different combination strategies in our technology. In both cases, we compute a combined score that fuses face and voice information (when both are present). Convenience mode allows the use of either face or voice or the combined score to authenticate. TrulySecure mode requires both face and voice to match individually.
More specifically, Convenience mode checks for one of face, voice, or the combined score to pass the current security level setting. It assumes a willingness by the user to present both biometrics if necessary to achieve authentication, though in most cases, they will only need to present one. For example, when face alone does not succeed, the user would then try saying the passphrase. In this mode the system is extremely robust to environmental conditions, such as relying on voice instead of face when the lighting is very low. TrulySecure mode, on the other hand, requires that both face and voice meet a minimum match requirement, and that the combined score passes the current security level setting.
TrulySecure utilizes adaptive enrollment to improve FRR with virtually no change in FAR. Sensory’s Adaptive Enrollment technology can quickly enhance a user profile from the initial single enrollment and dramatically improve the detection rate, and is able to do this seamlessly during normal use. Adaptive enrollment can produce a rapid reduction in the false rejection rate. In testing, after just 2 adaptations, we have seen almost a 40% reduction in FRR. After 6 failed authentication attempts, we see more than 60% reduction. This improvement in FRR comes with virtually no change in FAR. Additionally, adaptive enrollment alleviates the false rejects associated with users wearing sunglasses, hats, or trying to authenticate in low-light, during rapid motion, challenging angles, with changing expressions and changing facial hair.
Guest post by Michael Farino