HEAR ME -
Speech Blog
HEAR ME - Speech Blog  |  Read more August 26, 2015 - Sensory Makes Inc. 5000 2015 List
HEAR ME - Speech Blog

Archives

Categories

Connect

Sensory Makes Inc. 5000 2015 List

August 26, 2015

Guest post: Sensory’s Marketing Team

The editors of Inc. identified Sensory as one of America’s fastest growing companies. The annual ranking of the 5,000 fastest-growing private companies in the United States put Sensory at 3,301 on the list with over 100% growth over three years and 30 new jobs added.

Sensory has a breadth of software products on the market contributing to its growth including TrulyHandsfree, TrulySecure and TrulyNatural, and can be found in over a billion consumer electronics devices around the world.

Congratulations to the Sensory team for making the Inc 5000 list this year!

Sensory Wins Coveted 2015 Speech Technology Magazine’s Industry Star Performer Award for TrulyNatural

August 11, 2015

Guest post by: Sensory’s Marketing Department:

SpeechTeCoverFor the second year in a row, Sensory earns Speech Technology Magazine’s Industry Star Performer Award! Having won the award in 2014 for TrulySecure Speaker Verification and for TrulyHandsfree 3.0, Speech Technology Magazine awarded Sensory the 2015 Speech Industry Star Performer Award for its recently released TrulyNatural technology.

TrulyNatural is a major leap forward for client-based speech recognition and is the first embedded large-vocabulary deep neural nets speech recognition platform capable of supporting natural language. TrulyNatural is a scalable solution that can be implemented on highly constricted devices, supporting hundreds of phrases, with a footprint of under a megabyte, or as a natural language engine on devices with more available memory, like mobile devices, cars, and more.

For more information about TrulyNatural, please visit the technology page.

See official article announcing the award at: http://www.speechtechmag.com

STM15AWARD_starperformbig

TrulyHandsfree 4.0… Maintaining the big lead!

August 6, 2015

We first came out with TrulyHandsfree about five years ago. I remember talking to speech tech executives at MobileVoice as well as other industry tradeshows, and when talking about always-on hands-free voice control, everybody said it couldn’t be done. Many had attempted it, but their offerings suffered from too many false fires, or not working in noise, or consuming too much power to be always listening. Seems that everyone thought a button was necessary to be usable!

In fact, I remember the irony of being on an automotive panel, and giving a presentation about how we’ve eliminated the need for a trigger button, while the guy from Microsoft presented on the same panel the importance of where to put the trigger button in the car.

Now, five years later, voice activation is the norm… we see it all over the place with OK Google, Hey Siri, Hey Cortana, Alexa, Hey Jibo, and of course if you’ve been watching Sensory’s demos over the years, Hello BlueGenie!

Sensory pioneered the button free, touch free, always-on voice trigger approach with TrulyHandsfree 1.0 using a unique, patented keyword spotting technology we developed in-house– and from its inception, it was highly robust to noise and it was ultra-low power. Over the years we have ported it to dozens of platforms, Including DSP/MCU IP cores from ARM, Cadence, CEVA, NXP CoolFlux, Synopsys and Verisilicon, as well as for integrated circuits from Audience, Avnera, Cirrus Logic, Conexant, DSPG, Fortemedia, Intel, Invensense, NXP, Qualcomm, QuickLogic, Realtek, STMicroelectronics, TI and Yamaha.

This vast platform compatibility has allowed us to work with numerous OEMs to ship TrulyHandsfree in over a billion products!

Sensory didn’t just innovate a novel keyword spotting approach, we’ve continually improved it by adding features like speaker verification and user defined triggers. Working with partners, we lowered the draw on the battery to less than 1mA, and Sensory introduced hardware and software IP to enable ultra-low-power voice wakeup of TrulyHandsfree. All the while, our accuracy has remained the best in the industry for voice wakeup.

We believe the bigger, more capable companies trying to make voice triggers have been forced to use deep learning speech techniques to try and catch up with Sensory in the accuracy department. They have yet to catch up, but they have grown their products to a very usable accuracy level, through deep learning, but lost much of the advantages of small footprint and low power in the process.

Sensory has been architecting solutions for neural nets in consumer electronics since we opened the doors more than 20 years ago. With TrulyHandsfree 4.0 we are applying deep learning to improve accuracy even further, pushing the technology even more ahead of all other approaches, yet enabling an architecture that has the ability to remain small and ultra-low power. We are enabling new feature extraction approaches, as well as improved training in reverb and echo. The end result is a 60-80% boost in what was already considered industry-leading accuracy.

I can’t wait for TrulyHandsfree 5.0…we have been working on it in parallel with 4.0, and although it’s still a long ways off, I am confident we will make the same massive improvements in speaker verification with 5.0 that we are doing for speech recognition in 4.0! Once again further advancing the state of the art in embedded speech technologies!

Sensory Talks AI and Speech Recognition With Popular Science Radio Host Alan Taylor

June 11, 2015

Guest post by: Michael Farino

Pop Science Radio

 

 

 

 

 

 

 

Sensory’s CEO, Todd Mozer joined Alan Taylor, host of Popular Science Radio, in a fun discussion about artificial intelligence, Sensory’s involvement with the Jibo robot development team, and also gave the show’s listeners a look into the past 20 years of speech recognition. Todd and Alan additionally discussed some of the latest advancements in speech technology, and Todd provided an update on Sensory’s most recent achievements in the field of speech recognition as well as a brief look into what the future holds.

Listen to the full radio show at the link below:

Big Bang Theory, Science, and Robots | FULL EPISODE | Popular Science Radio #269
Ever wondered how accurate the science of the Big Bang Theory TV series is? Curious about how well speech recognition technology and robots are advancing? We interview two great minds to probe for these answers

Rambling On… Chip Acquisitions and Software Differentiation

June 3, 2015

When I started Sensory over 20 years ago, I knew how difficult it would be to sell software to cost sensitive consumer electronic OEMs that would know my cost of goods. A chip based method of packaging up the technology made a lot of sense as a turnkey solution that could maintain a floor price by adding the features of a microcontroller or DSP with the added benefit of providing speech I/O. The idea was “buy Sensory’s micro or DSP and get speech I/O thrown in for free”.

After about 10 years it was becoming clear that Sensory’s value add in the market was really in technology development, and particularly in developing technologies that could run on low cost chips and with smaller footprints, less power, and superior accuracy than other solutions. Our strategy of using trailing IC technologies to get the best price point was becoming useless because we lacked the scale to negotiate the best pricing, and more cutting edge technologies were becoming further out of reach; even getting the supply commitments we needed was difficult in a world of continuing flux between over and under capacity.

So Sensory began porting our speech technologies onto other people’s chips. Last year about 10% of our sales came from our internal IC’s! Sensory’s DSP, IP, and platform partners have turned into the most strategic of our partnerships.

Today in the semiconductor industry there is a consolidation that is occurring that somewhat mirrors Sensory’s thinking over the past 10 years, albeit at a much larger scale. Avago pays $37 billion dollars for Broadcom, Intel pays $16.7B for Altera, and NXP pays $12B for Freescale, and the list goes on, dwarfing acquisitions of earlier time periods.

It used to be the multi-billion dollar chip companies gobbled up the smaller fabless companies, but now even the multibillion-dollar chip companies are being gobbled up. There’s a lot of reasons for this but economies of scale is probably #1. As chips get smaller and smaller, there are increasing costs for design tools, tape outs, prototyping, and although the actual variable per chip cost drops, the fixed costs are skyrocketing, making consolidation and scale more attractive.

That sort of consolidation strategy is very much a hardware centered philosophy. I think the real value will come to these chip giants through in house technology differentiation. It’s that differentiation that will add value to their chips, enabling better margins and/or more sales.

I expect that over time the chip giants will realize what Sensory concluded 10 years ago…that machine learning, algorithmic differentiation, and software skills, are where the majority of the value added equation on “smart” chips needs to come from, and that improving the user experience on devices can be a pot of gold! In fact, we have already seen Intel, Qualcomm and many other chip giants investing in speech recognition, biometrics, and other user experience technologies, so the change is underway!

OK, Amazon!

May 4, 2015

I was at the Mobile Voice Conference last week and was on a keynote panel with Adam Cheyer (Siri, Viv, etc.) and Phil Gray (Interactions) with Bill Meisel moderating. One of Bills questions was about the best speech products, and of course there was a lot of banter about Siri, Cortana, and Voice Actions (or GoogleNow as it’s often referred to). When it was my turn to chime in I spoke about Amazon’s Echo, and heaped lots of praise on it. I had done a bit of testing on it before the conference but I didn’t own one. I decided to buy one from Ebay since Amazon didn’t seem to ever get around to selling me one. It arrived yesterday.

Here are some miscellaneous thoughts:

  • Echo is a fantastic product! Not so much because of what it is today but for the platform it’s creating for tomorrow. I see it as every bit as revolutionary as Siri.
  • The naming is really confusing. You call it Alexa but the product is Echo. I suspect this isn’t the blunder that Google made (VoiceActions, GoogleNow, GoogleVoice, etc.), but more an indication that they are thinking of Echo as the product and Alexa as the personality, and that new products will ship with the same personality over time. This makes sense!
  • Setup was really nice and easy, the music content integration/access is awesome, the music quality could be a bit better but is useable; there’s lots of other stuff that normal reviewers will talk about…But I’m not a “normal” reviewer because I have been working with speech recognition consumer electronics for over 20 years, and my kids have grown up using voice products, so I’ll focus on speech…
  • My 11 year old son, Sam, is pretty used to me bringing home voice products, and is often enthusiastic (he insisted on taking my Vocca voice controlled light to keep in his room earlier this year). Sam watched me unpack it and immediately got the hang of it and used it to get stats on sports figures and play songs he likes. Sam wants one for his birthday! Amazon must have included some kids voice modeling in their data because it worked pretty well with his voice (unlike the Xbox when it first shipped, which I found particularly ironic since Xbox was targeting kids).
  • The Alexa trigger works VERY well. They have implemented beamforming and echo cancellation in a very state of the art implementation. The biggest issue is that it’s a very bandwidth intensive approach and is not low power. Green is in! That could be why its plug-in/AC only and not battery powered. Noise near the speaker definitely hurts performance as does distance, but it absolutely represents a new dimension in voice usability from a distance and unlike with the Xbox, you can move anywhere around it, and aren’t forced to be in a stationary position (thanks to their 7 mics, which surely must be overkill!)
  • The voice recognition in generally is good, but like all of the better engines today (Google, Siri, Cortana, and even Sensory’s TrulyNatural) it needs to get better. We did have a number of problems where Alexa got confused. Also, Alexa doesn’t appear to have memory of past events, which I expect will improve with upgrades. I tried playing the band Cake (a short word, making it more difficult) and it took about 4 attempts until it said “Would you like me to play Cake?” Then I made the mistake of trying “uh-huh” instead of “yes” and I had to start all over again!
  • My FAVORITE thing about the recognizer is that it does ignore things very nicely. It’s very hard to know when to respond and when not to. The Voice Assistants (Google, Siri, Cortana) seem to always defer to web searches and say things like “It’s too noisy” no matter what I do, and I thought Echo was good at deciding not to respond sometimes.

OK, Amazon… here’s my free advice (admittedly self-serving but nevertheless accurate):

  • You need to know who is talking and build models of their voices and remember who they are and what their preferences are. Sensory has the BEST embedded speaker identification/verification engine in the world, and it’s embedded so you don’t need to send a bunch of personal data into the cloud. Check out TrulySecure!
  • In fact, if you added a camera to Alexa, it too could be used for many vision features, including face authentication.
  • Make it battery powered and portable! To do this, you’d need an equally good embedded trigger technology that runs at low power – Check out TrulyHandsfree!
  • If it’s going to be portable, then it needs to work if even when not connected to the Internet. For this, you’d need an amazing large vocabulary embedded speech engine. Did I tell you about TrulyNatural?
  • Of course, the hope is that the product-line will quickly expand and as a result, you will then add various sensors, microphones, cameras, wheels, etc.; and at the same time, you will also want to develop lower cost versions that don’t have all the mics and expensive processing. You are first to market and that’s a big edge. A lot of companies are trying to follow you. You need to expand the product-line quickly, learning from Alexa. Too many big companies have NIH syndrome… don’t be like them! Look for partnering opportunities with 3rd parties who can help your products succeed – Like Sensory! ;-)

Going Deep Series – Part 3 of 3

May 1, 2015

Going Deep Banner small

 

 

Winning on Accuracy & Speed… How can a tiny player like Sensory compete in deep learning technology with giants like Microsoft, Google, Facebook, Baidu and others?

There’s a number of ways, and let me address them specifically:

  1. Personnel: We all know it’s about quality, not quantity. I’d like to think that at Sensory we hire higher-caliber engineers than they do at Google and Microsoft; and maybe to an extent that is true, but probably not true when comparing their best with our best. We probably do however have less “turnover”. Less turnover means our experience and knowledge base is more likely to stay in house rather than walk off to our competitors, or get lost because it wasn’t documented.
  2. Focus and strategy: Sensory’s ability to stay ahead in the field of speech recognition and vision is because we have remained quite focused and consistent from our start. We pioneered the use of neural networks for speech recognition in consumer products. We were focused on consumer electronics before anyone thought it was a market…more than a dozen years before Siri!
  3. “Specialized” learning: Deep learning works. But Sensory has a theory that it can also be destructive when individual users fall outside the learned norms. Sensory learns deep on a general usage model, but once we go on device, we learn shallow through a specialized adaptive process. We learn to the specifics of the individual users of the device, rather than to a generalized population.

These 3 items together have provided Sensory with the highest quality embedded speech engines in the world. It’s worth reiterating why embedded is needed, even if speech recognition can all be done in the cloud:

  1. Privacy: Privacy is at the forefront of todays most heated topics. There is growing concern about “big brother” organizations (and governments) that know the intimate details of our lives. Using embedded speech recognition can help improve privacy by not sending personal data for analysis into the cloud.
  2. Speed: Embedded speech recognition can be ripping fast and consistently available. Accessing online or cloud based recognition services can be spotty when Internet connections are unstable, and not always available.
  3. Accuracy: Embedded speech systems have the potential advantage of a superior signal to noise ratio and don’t risk data loss or performance issues due to a poor or non-existent connection.

 

Going Deep Series – Part 2 of 3

April 22, 2015

Going Deep Banner small

 

 

How does Big Data and Privacy fit into the whole Deep Learning Puzzle?

Privacy and Big Data have become big concerns in the world of Deep Learning. However, there is an interesting relationship between the Privacy of personal data and information, Big Data, and Deep Learning. That’s because a lot of the Big Data is personal information used as the data source for Deep Learning. That’s right, to make vision, speech and other systems better, many companies invade users’ personal information and the acquired data is used to train their neural networks. So basically, Deep Learning is neural nets learning from your personal data, stats, and usage information. This is why when you sign a EULA (end user license agreement) you typically give up the rights to your data, whether its usage data, voice data, image data, personal demographic info, or other data supplied through the “free” software or service.

Recently, it was brought to consumers’ attention that some TVs and even children’s toys were listening in on consumers, and/or sharing and storing that information to the cloud. A few editors called me to get my input and I explained that there are a few possible reasons for devices to do this kind of “spying” and none of which are the least bit nefarious: The two most common reasons are 1) The speech recognition technology being used needs the voice data to train better models, so it gets sent to the cloud to be stored and used for Deep Learning and/or 2) The speech recognition needs to process the voice data in the cloud because it is unable to do so on the device. (Sensory will change this second point with our upcoming TrulyNatural release!)

The first reason is exactly what I’ve been blogging about when we say Deep Learning. More data is better! The more data that gets collected, the better the Deep Learning can be. The benefits can be applied across all users, and as long as the data is well protected and not released, then it only has beneficial consequences.

Therein lies the challenge: “as long as the data is well protected and not released…” If banks, billion dollar companies and governments can’t protect personal data in the cloud, then who can, and why should people ever assume their data is safe, especially from systems where there is no EULA is place and data is being collected without consent (which happens all the time BTW)?

Having devices listen in on people and share their voice data with the cloud for Deep Learning or speech recognition processing is an invasion of privacy. If we could just keep all of the deep neural net and recognition processing on device, then there would be no need to risk the security of peoples’ personal data by sharing and storing it on the cloud… and its with this philosophy that Sensory pioneered an entirely different, “embedded” approach to deep neural net based speech recognition which we will soon be bringing to market. Sensory actually uses Deep Learning approaches to train our nets with data collected from EULA consenting and often paid subjects. We then take the recognizer built from that research and run it on our OEM customers’ devices and because of that, never have to collect personal data; so, the consumers who buy products from Sensory’s OEM customers can rest assured that Sensory is never putting their personal data at risk!

In my next blog, I’ll address the question about how accurate Sensory can be using deep nets on device without continuing data collection in the cloud. There are actually a lot of advantages for running on device beyond privacy, and it can include not only response time but accuracy as well!

Going Deep Series – Part 1 of 3

April 15, 2015

Going Deep Banner small

 

 

Deep Neural Nets, Deep Belief Nets, Deep Learning, DeepMind, DeepFace, DeepSpeech, DeepImage… Deep is all the rage! In my next few blogs I will try to address some of the questions and issues surrounding all of these “deep” thoughts including:

  • What is Deep Learning and why has it gotten so popular as of late
  • Is Sensory just jumping on the bandwagon with Deep Nets for voice and vision?
  • How does Big Data and Privacy fit into the whole Deep Learning arena?
  • How can a tiny player like Sensory compete in this “deep” technology with giants like Microsoft, Google, Facebook, Baidu and others investing so heavily?

Part 1: What is Deep Learning and is Sensory Just Jumping on the Bandwagon?

Artificial Neural Network approaches have been around for a long time, and have gone in and out of favor. Neural Nets are an approach within the field of Machine Learning and today they are all the rage. Sensory has been working with Neural Net technology since our founding more than 20 years ago, so the approach is certainly not new for us. We are not just jumping on the bandwagon… we are one of the leading carts! ;-)

Neural Networks are very loosely modeled after how our brains work – nonlinear, parallel processing, and learning from exposure to data rather than being programmed. Unlike common computer architectures that separate memory from processing, our brains have billions of neurons that communicate and process all in parallel and with huge quantities of connections. This architecture based on how our brains work turns out to be much better than traditional computer programs at dealing with ambiguous and “sensory” information like vision and speech – a little Trivia: that’s how we came up with the name Sensory!

In the early days of Sensory, we were often asked by engineers, “What kind of neural networks are you running?” They were looking for a simple answer, something like a “Kohonen Net.”  I once asked my brother, Mike Mozer, a pioneer in the field of neural nets, a Sensory co-founder, and a professor of computer science at U. Colorado Boulder, for a few one liners to satisfy curious engineers without giving anything away. We had two lines: the first being, “a feed forward multi-layer net” which satisfied 90% of those asking, and the other response for those that asked for more was, “it’s actually a nonlinear and multivariate function.” That quieted pretty much everyone down.

In the last five years Neural Networks have proven to be the best-known approaches for various recognition and ambiguous data challenges like vision and speech. The breakthrough and improvement in performance came from these various terms that use the word “deep.” The “deep” approaches entailed more complex architectures that receive more data. The architecture relates to the ways that information is shared and processed (like all those connections in our brain), and the increased data allows the system to adapt and improve through continuous learning, hence the terms, “Deep Learning” and “Deep Learning Net.” Performance has improved dramatically in the past five years and Deep Learning approaches have far exceeded traditional “expert-based” techniques for programming complex feature extraction and analysis.

Sensory Joins the FIDO Alliance to Progress the State of Biometric Authentication

April 6, 2015

FIDO Alliance Member LogoLets face it, 20 years ago passwords made sense and were an easy and somewhat secure way for keeping our private stuff private. But today, as a result of countless cyber attacks on the public, minimum password requirements vastly skew from site to site, forcing many people to remember upwards of 20 (some highly complex) passwords. Thankfully, better methods for identity authentication exist, and an organization called the FIDO Alliance is working with numerous players in the space, Sensory being one of them, to change the nature of online authentication by defining an open, scalable, interoperable set of mechanisms that reduce the reliance on passwords.

As many of you already know, Sensory is a leading provider of deep learning face and voice recognition biometric solutions, and we believe that with solutions like TrulySecure, your face or voice alone can serve as a very accurate method for identity authentication, and when combined, offers the strongest level of security feasible. We have learned a great deal about how to utilize deep learning principles for biometric authentication and are working with the FIDO Alliance to have our solutions FIDO-Certified, which will enable us to offer them to customers of end-to-end FIDO solutions.

The FIDO (Fast IDentity Online) Alliance is a 501(c)6 non-profit organization nominally formed in July 2012 to address the lack of interoperability among strong authentication devices as well as the problems users face with creating and remembering multiple usernames and passwords. The FIDO Alliance plans to change the nature of authentication by developing specifications that define an open, scalable, interoperable set of mechanisms that supplant reliance on passwords to securely authenticate users of online services. This new standard for security devices and browser plugins will allow any website or cloud application to interface with a broad variety of existing and future FIDO-enabled devices that the user has for online security.

« Older Entries