Winning on Accuracy & Speed… How can a tiny player like Sensory compete in deep learning technology with giants like Microsoft, Google, Facebook, Baidu and others?
There’s a number of ways, and let me address them specifically:
- Personnel: We all know it’s about quality, not quantity. I’d like to think that at Sensory we hire higher-caliber engineers than they do at Google and Microsoft; and maybe to an extent that is true, but probably not true when comparing their best with our best. We probably do however have less “turnover”. Less turnover means our experience and knowledge base is more likely to stay in house rather than walk off to our competitors, or get lost because it wasn’t documented.
- Focus and strategy: Sensory’s ability to stay ahead in the field of speech recognition and vision is because we have remained quite focused and consistent from our start. We pioneered the use of neural networks for speech recognition in consumer products. We were focused on consumer electronics before anyone thought it was a market…more than a dozen years before Siri!
- “Specialized” learning: Deep learning works. But Sensory has a theory that it can also be destructive when individual users fall outside the learned norms. Sensory learns deep on a general usage model, but once we go on device, we learn shallow through a specialized adaptive process. We learn to the specifics of the individual users of the device, rather than to a generalized population.
These 3 items together have provided Sensory with the highest quality embedded speech engines in the world. It’s worth reiterating why embedded is needed, even if speech recognition can all be done in the cloud:
- Privacy: Privacy is at the forefront of todays most heated topics. There is growing concern about “big brother” organizations (and governments) that know the intimate details of our lives. Using embedded speech recognition can help improve privacy by not sending personal data for analysis into the cloud.
- Speed: Embedded speech recognition can be ripping fast and consistently available. Accessing online or cloud based recognition services can be spotty when Internet connections are unstable, and not always available.
- Accuracy: Embedded speech systems have the potential advantage of a superior signal to noise ratio and don’t risk data loss or performance issues due to a poor or non-existent connection.