HEAR ME -
Speech Blog
HEAR ME - Speech Blog  |  Read more September 17, 2019 - IFA 2019 Takes Assistants Everywhere to a New Level
HEAR ME - Speech Blog

Archives

Categories

Archive for the ‘consumer electronics’ Category

Sensory Winning Awards

October 6, 2016

It’s always nice when Sensory wins an award. 2016 has been a special year for Sensory because we won more awards than any other year in our 23 year history!!

Check it out:

Sensory Earns Multiple Coveted Awards in 2016
Pioneering embedded speech and machine vision tech company receiving industry accolades

Sensory Inc., a Silicon Valley company that pioneered the hands-free voice wakeup word approach, today, announced it has won over half a dozen awards in 2016 across its product-line, including awards for products, technologies, and people, covering deep learning, biometric authentication and voice recognition.

The awards presented to Sensory include the following:
AIconics are the world’s only independently judged awards celebrating the drive, innovation and hard work in the international artificial intelligence community. Sensory was initially a finalist along with six other companies in the category of Best Innovation in Deep Learning, and judges determined Sensory to be the overall WINNER at an awards ceremony held in September 2016. The judging panel was comprised of 12 independent professionals spanning leaders in artificial intelligence R&D, academia, investments, journalists and analysts.

CTIA Super Mobility 2016™, the largest wireless event in America, announced more than 70 finalists for its 10th annual CTIA Emerging Technology (E-Tech) Awards. Sensory was nominated in the category of Mobile Security and Privacy for its TrulySecure™ technology, along with Nokia, Samsung, SAP, and others. Sensory was presented with the First Place award for the category in a ceremony on September 2016 at the CTIA Las Vegas event.

Speech Technology magazine, the leading provider of speech technology news and analysis, had its 10th annual Speech Industry Awards to recognize the creativity and notable achievements of key influencers (Luminaries), major innovators (Star Performers), and impressive deployments (Implementation Awards). The editors of Speech Technology magazine selected 2016 award winners based on their industry contributions during the past 12 months. Sensory’s CEO, Todd Mozer, was awarded with a Luminary Award, making it his second time winning the prestigious award. Sensory as a company was awarded the Star Performer award along with IBM, Amazon and others.

Two well-known industry analyst firms issued reports highlighting Sensory’s industry contributions for its TrulyHandsfree product and customer leadership, offering awards for innovations, customer deployment, and strategic leadership.

“Sensory has an incredibly talented team of speech recognition and biometrics experts dedicated to advancing the state-of-the-art of each respective field. We are pleased that our TrulyHandsfree, TrulySecure and TrulyNatural product lines are being recognized in so many categories, across the various industries in which we do business,” said Todd Mozer, CEO of Sensory. “I am also thrilled that Sensory’s research and innovations in the deep learning space has been noticed, generating our company prestigious accolades and management recognition.”

For more information about this announcement, Sensory or its technologies, please contact sales@sensory.com; Press inquiries: press@sensory.com

TrulySecure 2.0 Wins First Place in 2016 CTIA E-Tech Awards

September 9, 2016

Print

We are pleased to announce that Sensory’s TrulySecure technology has earned first place in this year’s CTIA E-Tech Awards. We believe that this recognition serves as a testament to Sensory’s devotion to developing the best embedded speech recognition and biometric security technologies available.

For those of you unfamiliar with TrulySecure – TrulySecure is the result of more than 20 years of Sensory’s industry leading and award-winning experience in the biometric space. The TrulySecure SDK allows application developers concerned about both security and convenience to quickly and easily deploy a multimodal voice and vision authentication solution for mobile phones, tablets, and PCs. TrulySecure is highly secure, environment robust, and user friendly – offering better protection and greater convenience than passwords, PINs, fingerprint readers and other biometric scanners. TrulySecure offers the industry’s best accuracy at recognizing the right user, while keeping unauthorized users out. Sensory’s advanced deep learning neural networks are fine tuned to provide verified users with instant access to protected apps and services, without the all too common false rejections of the right user associated with other biometric authentication methods. TrulySecure features a quick and easy enrollment process – capturing voice and face simultaneously in a few seconds. Authentication is on-device and almost instantaneous.

TrulySecure provides maximum security against unauthorized attempts by mobile identity thieves from breaking into a protected mobile device, while ensuring the most accurate verification rates for the actual user. Compared to published data by Apple, the iPhone’s thumbprint reader offers about in 1:50K chance of a false accept of the wrong user, and the probability of the wrong user getting into the device gets higher when the user enrolls more than one finger. With TrulySecure, face and voice biometrics individually offer a baseline 1:50k false accept rate, but can each be made more secure depending on the security needs of the developer. When both face and voice biometrics are required for user authentication, TrulySecure is virtually impenetrable by anybody but the actual user. As a baseline, TrulySecure’s face+voice authentication offers a baseline of 1:100k False Accept Rate, but can be dialed in to offer as much as a 1:1Million False Accept Rate depending on security needs.

TrulySecure is robust to environmental challenges such as low light or high noise – it works in real-life situations that render lesser offerings useless. The proprietary speaker verification, face recognition, and biometric fusion algorithms leverage Sensory’s deep strength in speech processing, computer vision, and machine learning to continually make the user experience faster, more accurate, and more secure. The more the user uses TrulySecure, the more secure it gets.

TrulySecure offers ease-of-mind specifications: no special hardware is required – the solution uses standard microphones and cameras universally installed on today’s phones, tablets and PCs. All processing and encryption is done on-device, so personal data remains secure – no personally identifiable data is sent to the cloud. TrulySecure was also the first biometric fusion technology to be FIDO UAF Certified.

While we are truly honored to be the recipient of this prestigious award, we won’t rest on our laurels. Our engineers are already working on the next generation of TrulySecure, further improving accuracy and security, as well as refining the already excellent user experience.

Guest blog by Michael Farino

Sensory Earns Two Coveted 2016 Speech Tech Magazine Awards

August 22, 2016

Sensory is proud to announce that it has been awarded with two 2016 Speech Tech Magazine Awards. With some stiff competition in the speech industry, Sensory continues to excel in offering the industry’s most advanced embedded speech recognition and speech-based security solutions for today’s voice-enabled consumer electronics movement.

The 2016 Speech Technology Awards include:

sla2016

Speech Luminary Award – Awarded to Sensory’s CEO, Todd Mozer

“What really impresses me about Todd is his long commitment to speech technology, and specifically, his focus on embedded and small-footprint speech recognition,” says Deborah Dahl, principal at Conversational Technologies and chair of the World Wide Web Consortium’s Multimodal Interactions Working Group. “He focuses on what he does best and excels at that.”

spa2016

Star Performers Award – Awarded to Sensory for its contributions in enabling voice-enabled IoT products via embedded technologies

“Sensory has always been in the forefront of embedded speech recognition, with its TrulyHandsfree product, a fast, accurate, and small-footprint speech recognition system. Its newer product, TrulyNatural, is ground- breaking because it supports large vocabulary speech recognition and natural language understanding on embedded devices, removing the dependence on the cloud,” said Deborah Dahl, principal at Conversational Technologies and chair of the World Wide Web Consortium’s Multimodal Interactions Working Group. “While cloud-based recognition is the right solution for many applications, if the application must work regardless of connectivity, embedded technology is required. The availability of TrulyNatural embedded natural language understanding should make many new types of applications possible.”

– Guest Blog by Michael Farino

 

Google Assistant vs. Amazon’s Alexa

June 15, 2016

“Credit to the team at Amazon for creating a lot of excitement in this space,” Google CEO Sundar Pichai. He made this comment during his Google I/O speech last week when introducing Google’s new voice-controlled home speaker, Google Home which offers a similar sounding description to Amazon’s Echo. Many interpreted this as a “thanks for getting it started, now we’ll take over,” kind of comment.

Google has always been somewhat marketing challenged in naming its voice assistant. Everyone knows Apple has Siri, Microsoft has Cortana, and Amazon has Alexa. But what is Google’s voice assistant called? Is it Google Voice, Google Now, OK Google, Voice Actions? Even those of us in the speech industry have found Google’s branding to be confusing. Maybe they’re clearing that up now by calling their assistant “Google Assistant.” Maybe that’s the Google way of admitting it’s an assistant without admitting they were wrong by not giving it a human sounding name.

The combination of the early announcement of Google Home and Google Assistant has caused some to comment that Amazon has BIG competition at best, and at worst, Amazon’s Alexa is in BIG trouble.

Forbes called Google’s offering the Echo Killer, while Slate said it was smarter than Amazon’s Echo.

I thought I’d point out a few good reasons why Amazon is in pretty good shape:

  1. Google Home is not shipping. Google has a bit of a chicken-and-egg issue in that it needs to roll out a product that has industry support (for controlling third-party products by voice). How do you get industry partners without a product? You announce early! That was a smart move; now they just need to design it and ship it…not always an easy task.
  2. It’s about Voice Commerce. This is REALLY important. Many people think Google will own this home market because it has a better speech recognizer. Speech recognition capabilities are nice but not the end game. The value here is having a device that’s smart and trusted enough to take money out of our bank accounts and deliver us goods and services that we want when we want them. Amazon has a huge infrastructure lead here in products, reviews, shipping, and other key components of Internet commerce. Adding a convenient voice front end isn’t easy, but it’s also NOT the hardest part of enabling big revenue voice commerce systems.
  3. Amazon has far-field working and devices that always “talk back.” I admit the speech recognition is important, and Google has a lot of data, experience, and technologists in machine learning, AI, and speech recognition. But most of the Google experience is through Android and mobile-phone hardware. Where Amazon has made a mark is in far-field or longer distance recognition that really works, which is not easy to do. Speech recognition has always been about signal/noise ratios and far-field makes the task more difficult and requires acoustic echo cancellation, multiple microphones, plus various gain control and noise filtering/speech focusing approaches. Also, the Google recognizer was established around finding data through voice queries, most of such data being displayed on-screen (and often through search). The Google Home and Amazon Echo are no-screen devices. Having them intelligently talk back means more than just reading the text off a search. Google can handle this, of course, but it’s one more technical barrier that needs to be done right.
  4. Amazon has a head start and already is an industry standard. Amazon’s done a nice job with the Echo. It’s follow-on products, Tap and Dot, were intelligent offshoots. Even its Fire TV took advantage of in-house voice capabilities. The Alexa Voice Services work well and already are acting like a standard for voice control. Roughly three million Amazon devices have already sold, and I’d guess that in the next year, the number of Alexa connected devices will double through both Amazon sales and third parties using AVS. This is not to mention the tens of millions of devices on the market that can be controlled by Echo or other Amazon hardware. Amazon is pretty well entrenched!

Of course, Amazon has its challenges as well, but I’ll leave that for another blog.

Consumer concerns about being connected

March 28, 2016

blogimage

 

Just saw an interesting article on www.eweek.com

Covers a consumer survey about being connected and particularly with IoT devices. What’s interesting is that those surveyed were technically savvy (70% were self-described as intermediate or advanced with computers, and 83% said they could set up their own router), yet the survey found:

1)    68 percent of consumers expressed concern about security risks such as viruses, malware and hackers;
2)    65 percent of consumers were concerned over data collected by device manufacturers being inappropriately used or stolen; and
3)    51 percent of consumers said they are also anxious about privacy breaches.

These concerns are quite understandable, since we as consumers tend to give away many of our data rights in return for free services and software.

People have asked me if embedded speech and other embedded technologies will continue to persist if our cloud connections get better and faster, and the privacy issues are one of the reasons why embedded is critical.

This is especially true for “always on” devices that listen for triggers; if the always on listening is in the cloud, then everything we discuss around the always on mics goes into the cloud to be analyzed and potentially collected!

Sensory’s CEO, Todd Mozer, interviewed on FutureTalk

October 1, 2015

Todd Mozer’s interview with Martin Wasserman on FutureTalk

TrulyHandsfree 4.0… Maintaining the big lead!

August 6, 2015

We first came out with TrulyHandsfree about five years ago. I remember talking to speech tech executives at MobileVoice as well as other industry tradeshows, and when talking about always-on hands-free voice control, everybody said it couldn’t be done. Many had attempted it, but their offerings suffered from too many false fires, or not working in noise, or consuming too much power to be always listening. Seems that everyone thought a button was necessary to be usable!

In fact, I remember the irony of being on an automotive panel, and giving a presentation about how we’ve eliminated the need for a trigger button, while the guy from Microsoft presented on the same panel the importance of where to put the trigger button in the car.

Now, five years later, voice activation is the norm… we see it all over the place with OK Google, Hey Siri, Hey Cortana, Alexa, Hey Jibo, and of course if you’ve been watching Sensory’s demos over the years, Hello BlueGenie!

Sensory pioneered the button free, touch free, always-on voice trigger approach with TrulyHandsfree 1.0 using a unique, patented keyword spotting technology we developed in-house– and from its inception, it was highly robust to noise and it was ultra-low power. Over the years we have ported it to dozens of platforms, Including DSP/MCU IP cores from ARM, Cadence, CEVA, NXP CoolFlux, Synopsys and Verisilicon, as well as for integrated circuits from Audience, Avnera, Cirrus Logic, Conexant, DSPG, Fortemedia, Intel, Invensense, NXP, Qualcomm, QuickLogic, Realtek, STMicroelectronics, TI and Yamaha.

This vast platform compatibility has allowed us to work with numerous OEMs to ship TrulyHandsfree in over a billion products!

Sensory didn’t just innovate a novel keyword spotting approach, we’ve continually improved it by adding features like speaker verification and user defined triggers. Working with partners, we lowered the draw on the battery to less than 1mA, and Sensory introduced hardware and software IP to enable ultra-low-power voice wakeup of TrulyHandsfree. All the while, our accuracy has remained the best in the industry for voice wakeup.

We believe the bigger, more capable companies trying to make voice triggers have been forced to use deep learning speech techniques to try and catch up with Sensory in the accuracy department. They have yet to catch up, but they have grown their products to a very usable accuracy level, through deep learning, but lost much of the advantages of small footprint and low power in the process.

Sensory has been architecting solutions for neural nets in consumer electronics since we opened the doors more than 20 years ago. With TrulyHandsfree 4.0 we are applying deep learning to improve accuracy even further, pushing the technology even more ahead of all other approaches, yet enabling an architecture that has the ability to remain small and ultra-low power. We are enabling new feature extraction approaches, as well as improved training in reverb and echo. The end result is a 60-80% boost in what was already considered industry-leading accuracy.

I can’t wait for TrulyHandsfree 5.0…we have been working on it in parallel with 4.0, and although it’s still a long ways off, I am confident we will make the same massive improvements in speaker verification with 5.0 that we are doing for speech recognition in 4.0! Once again further advancing the state of the art in embedded speech technologies!

Rambling On… Chip Acquisitions and Software Differentiation

June 3, 2015

When I started Sensory over 20 years ago, I knew how difficult it would be to sell software to cost sensitive consumer electronic OEMs that would know my cost of goods. A chip based method of packaging up the technology made a lot of sense as a turnkey solution that could maintain a floor price by adding the features of a microcontroller or DSP with the added benefit of providing speech I/O. The idea was “buy Sensory’s micro or DSP and get speech I/O thrown in for free”.

After about 10 years it was becoming clear that Sensory’s value add in the market was really in technology development, and particularly in developing technologies that could run on low cost chips and with smaller footprints, less power, and superior accuracy than other solutions. Our strategy of using trailing IC technologies to get the best price point was becoming useless because we lacked the scale to negotiate the best pricing, and more cutting edge technologies were becoming further out of reach; even getting the supply commitments we needed was difficult in a world of continuing flux between over and under capacity.

So Sensory began porting our speech technologies onto other people’s chips. Last year about 10% of our sales came from our internal IC’s! Sensory’s DSP, IP, and platform partners have turned into the most strategic of our partnerships.

Today in the semiconductor industry there is a consolidation that is occurring that somewhat mirrors Sensory’s thinking over the past 10 years, albeit at a much larger scale. Avago pays $37 billion dollars for Broadcom, Intel pays $16.7B for Altera, and NXP pays $12B for Freescale, and the list goes on, dwarfing acquisitions of earlier time periods.

It used to be the multi-billion dollar chip companies gobbled up the smaller fabless companies, but now even the multibillion-dollar chip companies are being gobbled up. There’s a lot of reasons for this but economies of scale is probably #1. As chips get smaller and smaller, there are increasing costs for design tools, tape outs, prototyping, and although the actual variable per chip cost drops, the fixed costs are skyrocketing, making consolidation and scale more attractive.

That sort of consolidation strategy is very much a hardware centered philosophy. I think the real value will come to these chip giants through in house technology differentiation. It’s that differentiation that will add value to their chips, enabling better margins and/or more sales.

I expect that over time the chip giants will realize what Sensory concluded 10 years ago…that machine learning, algorithmic differentiation, and software skills, are where the majority of the value added equation on “smart” chips needs to come from, and that improving the user experience on devices can be a pot of gold! In fact, we have already seen Intel, Qualcomm and many other chip giants investing in speech recognition, biometrics, and other user experience technologies, so the change is underway!

OK, Amazon!

May 4, 2015

I was at the Mobile Voice Conference last week and was on a keynote panel with Adam Cheyer (Siri, Viv, etc.) and Phil Gray (Interactions) with Bill Meisel moderating. One of Bills questions was about the best speech products, and of course there was a lot of banter about Siri, Cortana, and Voice Actions (or GoogleNow as it’s often referred to). When it was my turn to chime in I spoke about Amazon’s Echo, and heaped lots of praise on it. I had done a bit of testing on it before the conference but I didn’t own one. I decided to buy one from Ebay since Amazon didn’t seem to ever get around to selling me one. It arrived yesterday.

Here are some miscellaneous thoughts:

  • Echo is a fantastic product! Not so much because of what it is today but for the platform it’s creating for tomorrow. I see it as every bit as revolutionary as Siri.
  • The naming is really confusing. You call it Alexa but the product is Echo. I suspect this isn’t the blunder that Google made (VoiceActions, GoogleNow, GoogleVoice, etc.), but more an indication that they are thinking of Echo as the product and Alexa as the personality, and that new products will ship with the same personality over time. This makes sense!
  • Setup was really nice and easy, the music content integration/access is awesome, the music quality could be a bit better but is useable; there’s lots of other stuff that normal reviewers will talk about…But I’m not a “normal” reviewer because I have been working with speech recognition consumer electronics for over 20 years, and my kids have grown up using voice products, so I’ll focus on speech…
  • My 11 year old son, Sam, is pretty used to me bringing home voice products, and is often enthusiastic (he insisted on taking my Vocca voice controlled light to keep in his room earlier this year). Sam watched me unpack it and immediately got the hang of it and used it to get stats on sports figures and play songs he likes. Sam wants one for his birthday! Amazon must have included some kids voice modeling in their data because it worked pretty well with his voice (unlike the Xbox when it first shipped, which I found particularly ironic since Xbox was targeting kids).
  • The Alexa trigger works VERY well. They have implemented beamforming and echo cancellation in a very state of the art implementation. The biggest issue is that it’s a very bandwidth intensive approach and is not low power. Green is in! That could be why its plug-in/AC only and not battery powered. Noise near the speaker definitely hurts performance as does distance, but it absolutely represents a new dimension in voice usability from a distance and unlike with the Xbox, you can move anywhere around it, and aren’t forced to be in a stationary position (thanks to their 7 mics, which surely must be overkill!)
  • The voice recognition in generally is good, but like all of the better engines today (Google, Siri, Cortana, and even Sensory’s TrulyNatural) it needs to get better. We did have a number of problems where Alexa got confused. Also, Alexa doesn’t appear to have memory of past events, which I expect will improve with upgrades. I tried playing the band Cake (a short word, making it more difficult) and it took about 4 attempts until it said “Would you like me to play Cake?” Then I made the mistake of trying “uh-huh” instead of “yes” and I had to start all over again!
  • My FAVORITE thing about the recognizer is that it does ignore things very nicely. It’s very hard to know when to respond and when not to. The Voice Assistants (Google, Siri, Cortana) seem to always defer to web searches and say things like “It’s too noisy” no matter what I do, and I thought Echo was good at deciding not to respond sometimes.

OK, Amazon… here’s my free advice (admittedly self-serving but nevertheless accurate):

  • You need to know who is talking and build models of their voices and remember who they are and what their preferences are. Sensory has the BEST embedded speaker identification/verification engine in the world, and it’s embedded so you don’t need to send a bunch of personal data into the cloud. Check out TrulySecure!
  • In fact, if you added a camera to Alexa, it too could be used for many vision features, including face authentication.
  • Make it battery powered and portable! To do this, you’d need an equally good embedded trigger technology that runs at low power – Check out TrulyHandsfree!
  • If it’s going to be portable, then it needs to work if even when not connected to the Internet. For this, you’d need an amazing large vocabulary embedded speech engine. Did I tell you about TrulyNatural?
  • Of course, the hope is that the product-line will quickly expand and as a result, you will then add various sensors, microphones, cameras, wheels, etc.; and at the same time, you will also want to develop lower cost versions that don’t have all the mics and expensive processing. You are first to market and that’s a big edge. A lot of companies are trying to follow you. You need to expand the product-line quickly, learning from Alexa. Too many big companies have NIH syndrome… don’t be like them! Look for partnering opportunities with 3rd parties who can help your products succeed – Like Sensory! ;-)

Mobile World Congress Day 1

March 3, 2015

It feels like I had a whole week’s worth of the trade show wrapped into one day! By the time mid week hits, I’ll surely be ready to head home! Here are some of the highlights from the first day of Mobile World Congress 2015:

  • First a word about Catalonia. That’s where Barcelona is…in the heart of Catalonia, a province of Spain. Don’t expect delayed meetings, inefficiencies, relaxed long lunches or anything like that. The Catalonians have the precision of Germans (to continue my gross stereotyping!), and my experience with one of the largest trade shows on the planet is that it’s going off without a hitch! I picked up my badge at the airport in a five-minute line that was well staffed and moved rapidly. I could just about walk into the show yesterday morning. The subways and trains though crowded and overheated ran extremely smoothly. Kudos to the show management for pulling off such a difficult feat!
  • I’d be remiss without mentioning the Galaxy S6. Samsung invited us to the launch and of course they continue to use Sensory in a relationship that has grown quite strong over the years.  Samsung continues to innovate with the Edge, and other products that everyone is talking about. It’s amazing how far Apple took the mantle in the first iPhone and how companies like Samsung and the Android system seem to now be leading the charge on innovation!
  • My favorite product that doesn’t feature Sensory technology that I bumped into was an electronic jump rope. They put sensors in the handles and a visual display shows across the field of the rope, kind of like those clocks that rapidly flash LED’s as the pendulum quickly moves back and forth in order to display the time. I talked with Alex Woo from Tangram and he said they were going to launch a crowdfunding campaign. I gave Alex a demo of our TrulyHandsfree with jump ropers jumping and all the show noise and of course it worked flawlessly. It would be really cool to be able to ask things like “How much time,” “How many jumps,” “What’s my heart rate,” or “How many calories burned” and so on, and the display would make voice control so much more functional!
  • We had a couple of partnership announcements here at the show, supporting both Qualcomm and Synopsys – both great partners to add to our support mix, and always nice when its customers driving our platform directions. The Qualcomm platform is interesting because it’s not their standard platform for 3rd parties to support. As far as I know they opened it up to Sensory and ONLY Sensory, and already we are seeing much interest!
  • Last night ZTE had a press party to indoctrinate Sensory and NXP into its Smart Voice Alliance. ZTE is really putting some forward thinking into the user experience and their research shows how much people want a voice interface but how dissatisfying the current state of the art actually is. Sensory’s hoping to change that! We’ll make one of our biggest announcements in history over the next month… and I’ll let you in on the secret (it’s on our website already!) We call it TrulyNatural, and it will be the highest accuracy large vocabulary embedded speech engine that the world has ever seen!

Hasta Luego!!!

« Older EntriesNewer Entries »