HEAR ME -
Speech Blog
HEAR ME - Speech Blog  |  Read more October 25, 2018 - Biometrics’ Place in the Embedded Industry
HEAR ME - Speech Blog

Archives

Categories

Posts Tagged ‘Apple’

Apple is Getting Sirious – $1 Trillion is Not the Endgame

August 6, 2018

Apple introduced Siri in 2011 and my world changed. I was running Sensory back then as I am today and suddenly every company wanted speech recognition. Sensory was there to sell it! Steve Jobs, a notorious nay-sayer on speech recognition, had finally given speech recognition the thumbs up. Every consumer electronics company noticed and decided the time had come. Sensory’s sales shot up for a few years driven by this sudden confidence in speech recognition as a user interface for consumer electronics…

Read more on Voicebot.ai

Voice assistant battles, part two: The strategic importance

August 6, 2018

Here’s the basic motivation that I see in creating Voice Assistants…Build a cross platform user experience that makes it easy for consumers to interact, control and request things through their assistant. This will ease adoption and bring more power to consumers who will use the products more and in doing so create more data for the cloud providers. This “data” will include all sorts of preferences, requests, searches, purchases, and will allow the assistants to learn more and more about the users. The more the assistant knows about any given user, the BETTER the assistant can help the user in providing services such as entertainment and assisting with purchases (e.g. offering special deals on things the consumer might want). Let’s look at each of these in a little more detail:….

Read more at Embedded Computing

Smart speakers coming from all over

October 12, 2017

Amazon, Google, Sonos, and LINE all introduced smart speakers within a few weeks of each other. Here’s my quick take and commentary on those announcements.

Amazon now has the new Echo, the old Echo, the Echo Plus, Spot, Dot, Show, and Look. The company is improving quality, adding incremental features, lowering cost, and seemingly expanding its leadership position. They make great products for consumers, have a very strong eco-system, and make very tough products to compete with for both their competitors and their many platform partners that use Alexa.

Read more at Embedded Computing

Apple erred on facial recognition

September 15, 2017

On the same day that Apple rolled out the iPhone X on the coolest stage of the coolest corporate campus in the world, Sensory gave a demo of an interactive talking and listening avatar that uses a biometric ID to know who’s talking to it. In Trump metrics, the event I attended had a few more attendees than Apple.

Interestingly, Sensory’s face ID worked flawlessly, and Apple’s failed. Sensory used a traditional camera using convolutional neural networks with deep learning anti-spoofing models. Apple used a 3D camera.

Read more at Embedded Computing

Hey Siri what’s really in iOS8?

June 4, 2014

It was about 4 years ago that Sensory partnered with Vlingo to create a voice assistant with a special “in car” mode that would allow the user to just say “Hey Vlingo” then ask any question. This was one of the first “TrulyHandsfree” voice experiences on a mobile phone, and it was this feature that was often cited for giving Vlingo the lead in the mobile assistant wars (and helped lead to their acquisition by Nuance).

About 2 years ago Sensory introduced a few new concepts including “trigger to search” and our “deeply embedded” ultra-low power always listening (now down to under 2mW, including audio subsystem!). Motorola took advantage of these excellent approaches from Sensory and created what I most biasedly think is the best voice experience on a mobile phone. Samsung too has taken the Sensory technology and used in a number of very innovative ways going beyond mere triggers and using the same noise robust technology for what I call “sometimes always listening”. For example when the camera is open it is always listening for “shoot” “photo” “cheese” and a few other words.

So I’m curious about what Google, Microsoft, and Apple will do to push the boundaries of voice control further. Clearly all 3 like this “sometimes always on” approach, as they don’t appear to be offering the low power options that Motorola has enabled. At Apple’s WWDC there wasn’t much talk about Siri, but what they did say seemed quite similar to what Sensory and Vlingo did together 4 years ago…enable an in car mode that can be triggered by “Hey Siri” when the phone is plugged in and charging.

I don’t think that will be all…I’m looking forward to seeing what’s really in store for Siri. They have hired a lot of smart people, and I know something good is coming that will make me go back to the iPhone, but for now it’s Moto and Samsung for me!

Biometrics – The Studies Don’t Reveal the Truth

May 7, 2014

If you read through the biometrics literature you will see a general security based ranking of biometric techniques starting with retinal scans as the most secure, followed by iris, hand geometry and fingerprint, voice, face recognition, and then a variety of behavioral characteristics.

The problem is that these studies have more to do with “in theory” than “in practice” on a mobile phone, but they never-the-less mislead many companies into thinking that a single biometric can provide the results required. This is really not the case in practice. Most companies will require that False Accepts (error caused by wrong person or thing getting in) and False Rejects (error caused by the right person not getting in) be so low that the rate where these two are equal (equal error rate or EER) would be well under 1% across all conditions. Here’s why the studies don’t reflect the real world of a mobile phone user:

  1. Cost is key. Mobile phone manufacturers will not be willing to invest in the highest end approaches for capturing and measuring biometrics that are used by academic studies. This means less MIPS less memory, and poorer quality readers.
  2. Size matters. Mobile phone manufacturers have extremely limited real estate, so larger systems cannot be properly deployed, and further complicating things is that an extremely fast enrollment and usage is required without a form factor change.
  3. Conditions are uncontrollable. Noisy environments, lighting, dirty hands, oily screens/cameras/readers are all uncontrollable and will affect performance
  4. User compliance cannot be assumed. The careful placement of an eye, finger or face does not always happen.

A great case in point is the fingerprint readers now deployed by Apple and Samsung. These are extremely expensive devices, and the literature would make one think that they are highly accurate, but Apple doesn’t have the confidence to allow them to be used in the iTunes store for ID, and San Jose Mercury News columnist Troy Wolverton says:

“I’ve not been terribly happy with the fingerprint reader on my iPhone, but it puts the one on the S5 to shame. Samsung’s fingerprint sensor failed repeatedly. At best, I would get it to recognize my print on the second try. But quite often, it would fail so many times in a row that I’d be prompted to enter my password instead. I ended up turning it off because it was so unreliable (full article).”

There is a solution to this problem…It’s to utilize sensors already on the phone to minimize cost, and deploy a biometric chain combining face verification, voice verification, or other techniques that can be easily implemented in a user friendly manner that allows the combined usage to create a very low equal error rate, that become “immune” to conditions and compliance issues by having a series of biometric and other secure backup systems.

Sensory has an approach we call SMART, Sensory Methodology for Adaptive Recognition Thresholding that takes a look at environmental and usage conditions and intelligently deploys thresholds across a multitude of biometric technologies to yield a highly accurate solution that is easy to use and fast in responding yet robust to environmental and usage models AND uses existing hardware to keep costs low.

Mobile phones – It doesn’t have to be Cost OR Quality!

April 25, 2014

It’s not often that I rave about articles I read, but Ian Mansfield of Cellular News hit the nail on the head with this article.

Not only is it a well written and concise article but its chock full of recent data (primarily from JD Power research), and most importantly it’s data that tells a very interesting story that nicely aligns with Sensory’s strategy in mobile. So, thanks Ian, for getting me off my butt to start blogging again!

A few key points from the article:

  1. Price is becoming increasingly important in the choice of mobile phones, and simultaneously the prices of mobile phones are increasing.
  2. Although price might be the most important factor in choice, the overall customer satisfaction is driven by features.
  3. The features customers want are seamless voice control (36%); built-in sensors that can gauge temperature, lighting, noise and moods to customize settings to the environment (35%); and facial recognition and biometric security (28%).
  4. As everyone knows, Samsung and Apple have the overwhelming market share in mobile phones, but interesting to me was that they also both lead in customer satisfaction.

Now, let me dive one step deeper into the problem, and explore whether customer satisfaction can be achieved with minimal impact on cost:

Seamless voice control is here and soon every phone will have it, and it doesn’t add any hardware cost. Sensory introduced the technology with our TrulyHandsfree technology that allows users to just start talking, and our “trigger to search” technology has been nicely deployed by companies like Motorola that pioneered this “seamless voice control” in many of their recent releases. The seamless voice control really doesn’t add much cost, and with excellent engines from Google and Apple and Microsoft sitting in the clouds, it can and will be nicely implemented without effecting handset pricing.

Sensors are a different story. By their nature they will be embedded into the phones and will increase cost. Some “sensors” in the broadest sense of the term are no brainers and necessities, for example microphones and cameras are a must have, and the six-axis sensors combining GPS and accelerometers are arguably must haves as well. Magnetometers, barometers are getting increasingly common, and to differentiate further leading manufacturers are embedding things like heartbeat monitors; stereo 3D cameras are just around the corner. To address the desire for biometric security Samsung and Apple have the 2 bestselling phones in the world embedded with fingerprint sensors!

The problem is that all these sensors add cost, and in particular those finger print sensors are the most expensive and can add $5-$15 to the cost of goods. It’s kind of ironic that after spending all that money on biometric security, Apple doesn’t even allow them as a security measure for purchasing iTunes. And both Samsung and Apple have been chastised for fingerprint sensors that can be cracked with gummy bears or glue!

A much more accurate and cost effective solution can be achieved for biometrics by using the EXISTING sensors on the phones and not adding special purpose biometric sensors. In particular, the “must have sensors” like microphones, cameras, and 6-axis sensors can create a more secure environment that is just as seamless but much less difficult to crack. I’ll talk more about that in my next blog.

The price of free phone features

August 5, 2013

I often get the question, “If Android and Qualcomm offer voice activation for free, why would anyone license from Sensory?” While I’m not sure about Android and Qualcomm’s business models, I do know that decisions are based on accuracy, total added cost (royalties plus hardware requirements to run), power consumption, support, and other variables. Sensory seems to be consistently winning the shootouts it enters for embedded voice control. Some approaches that appear lower cost require a lot more memory or MIPS, driving up total cost and power consumption.

It’s interesting to note that companies like Nuance have a similar challenge on the server side where Google and Microsoft “give it away”. Because Google’s engine is so good it creates a high hurdle for Nuance. I’d guess Google’s rapid progress helps Nuance with their licensing of Apple, but may have made it more challenging to license Samsung. Samsung actually licensed Vlingo AND Nuance AND Sensory, then Nuance bought Vlingo.

Why doesn’t Samsung use Google recognition if it’s free? On the server it’s not power consumption effecting decisions, but cost, quality, and in this case CONTROL. On the cost side it could be that Samsung MAKES more money by using Nuance in some sort of ad revenue kickbacks, which I’d guess Google doesn’t allow. This is of course just hypothesizing. I don’t really know, and if I did know I couldn’t say. The control issue is big too as companies like Sensory and Nuance will sell to everyone and in that sense offer platform independence and more control. Working with a Microsoft or Google engine forces an investment in a specific platform implementation, and therefore less flexibility to have a uniform cross platform solution.

There You Go Again!

June 17, 2011

That’s what America’s most charismatic President used to say! I didn’t necessarily agree with Reagan’s politics, but I sure did like his presentation. Nuance’s Paul Ricci is kind of the inverse of that; a lot of people don’t like him, but it’s hard to argue with his politics (although I will later in this blog…)
Nuance does seem to perform remarkably well. They have an amazing patent position, and are quite highly valued by almost any financial metric you can apply, including their market cap (over $6B and near an all-time high), their revenue multiplier (5-6 range), as well as P/E over 2000 (and although fairly meaningless, it does show they are finally profitable using GAAP rather than their modified accounting policies!!!!)

I’ve never met Ricci. I’ve known a lot of people who have worked for him, with him, and against him. Everybody agrees he’s a tough guy, and I think most would also use words like ruthless and smart. A lot of people might even call him an asshole, and whether true or not, I don’t think he cares about that. He’s a competitive strategy gameplay kind of guy, and he’s done pretty well. However, he has a HUGE challenge being up against the likes of Google, Microsoft, and eventually Apple (let alone the smart little guys like Vlingo, Yap, Loquendo, etc.). But I digress…

I started this blog thinking about Nuance’s recent acquisition of SVOX. And I wanted to congratulate Nuance and Ricci for ACQUIRING SVOX WITHOUT SUING THEM. If I look back a ways (and I can look back VERY FAR!), Nuance (or the company formerly known as Lernout and Hauspie and then Scansoft) has at least 4 embedded speech recognition companies wrapped into it over the years. In rough chronological order: Voice Control Systems (VCS was probably the FIRST embedded speech company and the first and only embedded group to go public), Phillips Embedded Speech Division (I think they had acquired VCS for around $50M), Advanced Recognition Technologies, and Voice Signal Technologies. I believe Ricci was at the helm during the Philips embedded acquisition (this was the one closer to 2000 as opposed to the Philips Medical group a few years ago), ART, and VST. Interestingly, 2 of these 3 were lawsuit acquisitions. There are probably some inside stories about SVOX that I don’t know (e.g. threats of lawsuits??), but it appears that Nuance’s acquisitions of embedded companies are now down to 50% lawsuit driven. Thanks, Paul, you’re moving in the right direction! ;-)

OK, so what’s wrong with suing the companies you want to acquire? It probably does lower their price and reduce competitive bidding. Setting aside the legal and moral issues, there is one huge issue that’s clear- If you want to hold onto your star employees and technologists, you need to treat them well. Everyone understands who the “stars” are – they are the 10% of the workforce that contribute to 90% of the innovation. They are not going to stick around unless they are treated right, and starting off a relationship by calling them thieves is not a good way to court a long term relationship.

For example, there’s been a lot of press lately about the Vlingo/Nuance situation and how Ricci offered the top 3 employee/founders $5M each to sell Vlingo (plus a bundle of money for Vlingo!) Well, Mike Phillips used to be Nuance’s CTO (through acquisition of Speechworks)…so wouldn’t it have been more valuable to KEEP Mike there than BUY him back? The “other” Mike…Mike Cohen is Google’s head of speech. He FOUNDED Nuance (well, the company formerly known as Nuance!) and left to join Google, and of course this caused a lawsuit…think either of the Mike’s (two of the smartest speech technologists in the industry) would ever go back to Nuance? Google has managed to hold onto Cohen, so it’s not just an issue of the best people leaving big companies because “little companies innovate.” I’ve also seen the recent rumor mill about Nuance’s Head of Smart Phone Architecture leaving for Apple…
By the way, you gotta treat customers nicely too! Strong arm tactics on customers and competitors might close short term deals, but I think there are better approaches in the long run.

So it’s the personnel and customer thing that Nuance is missing out on in their competitive gameplay strategy, and my hope is that SVOX’s acquisition represents a significant change in how Nuance does business!

As a point in contrast, Sensory has acquired only one company in our history – Fluent Speech Technologies (and no, we didn’t sue them first.) This was a group that spun out of the former Oregon Graduate Institute back in the 1990’s. We saw a demo of theirs back in 1997-1998, and thought the technology was great. They offered to sell us the speech recognition technology (not the company), so they could focus on animation opportunities, but we had NO INTEREST in that. We wanted the people that made the technology, not the technology itself. That’s how our Oregon office was born; we acquired the company with the people. The office is now about as big as our headquarters (and some of our people in Silicon Valley have even moved up there!) By the way, ALL the technologists that came with that acquisition are still with us after 12 years, and we’ve kept a very friendly relationship with the former OGI as well.

Time for a breather…Yeah, I do long blogs….if you see a short one, which might start appearing, it’s probably a “ghostwriter” helping me out…. ;-)

So let’s look at Nuance’s acquisition of SVOX. Why did Nuance acquire them?

  1. SVOX was for sale. I don’t mean this tongue in cheek. I suspect SVOX proactively approached Nuance (and probably Google and others as well) to buy them. If you look at SVOX’s Board (many of whom are their investors), it’s a bunch of guys that ran retail empires and huge organizations, so they probably got tired (in the midst of the economic downturn of the last few years) of waiting.
  2. SVOX was affordable. I don’t mean cheap, and I don’t know yet what Nuance paid, but my guess is Nuance probably paid in the 4-7x sales range. SVOX as a wildass guess was doing in the $20-$30M year range, so Nuance might have paid $80-$210M…quite affordable for Nuance. Since Nuance is traded at around 5-6x sales, that’s not too bad from a revenue multiplier perspective, and I’d guess SVOX has been profitable so the deal should be accretive to Nuance. If the numbers come out and Nuance paid more than $200M (their prior embedded acquisition of VST was about $300M!), that means there was some serious bidding going on – and probably with Google, Microsoft, or Apple (The Big Guys) in the mix, since they all could have used SVOX technology and patents.
  3. SVOX had Patents. SVOX acquired/merged with Siemens’ speech group a few years back, and with this merger came “60 patent families.” That’s a lot of patents, especially when you add on the patents that SVOX got before and after the merger with Siemens. This will continue to fuel Nuance’s tremendous patent position. My opinion is that it was quite a mistake for the Big Guys – especially Apple- to pass up this combination of talent, technology and patents…they could have easily outbid Nuance !
  4. Customer acquisition. OK, this was probably Nuance’s primary motivation, and probably the reason that Nuance would outbid companies wanting SVOX for “in-house” solutions. SVOX had a lot of deals in automotive and mobile handsets! They were very strong in small-to-medium footprint (1-50MB) TTS, and were making fast inroads with their speech recognition. Nuance loves to buy customers. SVOX had customers.
  5. Keeping Apple and Google from Acquiring SVOX. It’s not often that Apple loses, but I think they lost on this one. SVOX would have been a really cheap way for Apple to make a big move into speech with an in-house technology. It’s going to be hard to grow it all internally, but what a nice bootstrap SVOX would have been in patents and technologies! Google is one of SVOX’s customers for TTS (Hey – Nuance was one of the founding members of the Open Handset Alliance that developed Android!), but with Google’s hiring and acquisitions in the speech space, the writing was on the wall for SVOX to go the way of Nuance, and get designed out of Android for Google’s internal solutions. By keeping SVOX away from Apple and Google, Nuance has the opportunity to keep two huge customers (i.e. Google from SVOX and Apple) from jumping ship…but I still think it will happen eventually!
  6. Automotive Industry Contacts. I read the press release about advancing “the proliferation of voice in the automotive market”, and accelerating “the development of new voice capabilities that enable natural, conversational interactions” and about SVOX supplying the Client for Client/Server hybrid solutions. None of that market-speak makes my list. I think the technologies that SVOX had were pretty redundant to what Nuance has. SVOX had better customer relations and accounts in automotive…that was really the driver!

Anyways…I suspect the acquisition was a good deal for Nuance and its investors, and probably a GREAT deal for SVOX and its investors. Nuance’s market price didn’t seem to move much, but maybe it will once the price is disclosed. I commend and encourage Nuance to cut the lawsuits…one of them could bite back a lot worse than the pain of losing employees!

Todd
sensoryblog@sensoryinc.com

Conversation with an Analyst

April 21, 2011

I had an interesting email conversation with a blog reader last month, and I thought I’d share some of the dialog. He is an equity analyst (who wishes to remain anonymous) that follows some companies in the speech industry. He emailed me saying:

“I came across your blog some time ago and have been reading it since with great interest. A topic of particular interest to me has been your periodic comments about how Apple has lagged the investments made by Google in speech recognition technology, opting instead to lean on Nuance. I was also struck by your observation that big companies, such as Google, have a history of licensing Nuance technologies before eventually taking those capabilities in-house.”

This makes me feel the need to clarify something…Nuance has great technologies, period. When companies feel the need to bring the technology “in-house”, it’s not driven by a failing of Nuance, but simply the fact that the USER EXPERIENCE IS SO CRITICAL to the success of consumer products. It’s difficult for big companies like Apple, Google, Microsoft, HP and others that depend heavily upon positive consumer experiences to farm out the technology for such a critical component.

The conversation turned to Apple, and the equity manager asked about the all too common question of whether Apple might acquire Nuance. Here’s, roughly, how the conversation went:

Analyst: What is your current view on Apple’s efforts in this space? As a company they seem to take great pride in controlling the user experience and that extends to how they think about key technologies (witness the Flash vs. HTML 5 spat, for example). It makes me wonder if Apple would be satisfied relying on Nuance for such a visible and important capability or whether they’d feel the need to also bring it in-house.

Todd: Apple can definitely afford Nuance. In fact, Apple probably makes enough profit in a good quarter to buy Nuance outright. Nevertheless, it would be a BIG price tag, and not in line with Apple’s traditional acquisition strategy. I wouldn’t rule it out, but I wouldn’t say they “need” Nuance, either, but they do need to do something, and they know it. Apple has been posting job requisitions this year in the area of speech recognition, so they definitely want to bring more of the technology in-house. My guess is they’ll do some M&A in the speech technology area as well. Google and Microsoft have combined aggressive hiring with M&A, so it seems likely that Apple will go beyond the SIRI acquisition (which added an AI layer on top of Nuance) and acquire more core speech technology expertise.

Analyst: I agree with you that Apple makes/has enough cash to acquire Nuance, but that it would be out of character for Apple to do so. Where I’m most interested is whether there are meaningful technical/architectural reasons why Apple must partner with Nuance for SR, or if the gap between Nuance and these smaller players is narrow enough that Apple would acquire or partner more closely with one of the small guys in order to maintain more control over the technology. Many people seem to think that an SR acquisition would have to be of Nuance, but I’ve been told that there are many quality SR start-ups. If you had to bet, do you think that Apple needs the 800-pound gorilla Nuance in order to do a good job in SR, or would one of these smaller companies give Apple a sufficient base upon which to build out a solution?

Todd: I’m confident Apple will eventually own it. I’d say the odds of them buying Nuance though are quite low (10-30% as a wild guess). There’s no technical reason why they can’t use another technology, but the 3 best reasons they’d acquire Nuance are:

  1. Patents
  2. Language coverage
  3. Ease of integration

Apple’s in-house teams are quite familiar with the Nuance engines as they have already implemented them in some products. Apple is engaged in a lot of patent fights, and Nuance has the best portfolio of speech patents in the world – That’s a really valuable asset that the Google’s and Microsoft’s would probably fight over! Of course, for the cost of Nuance, someone could probably buy all of the other TTS and SR tech companies in the world! ;-)

Analyst: Apple really has a phobia about adding third-party software to their products. No Mosaic core in their browser, no audio compression codecs from Dolby or DTS, no Flash from Adobe…. They acquired two microprocessor design companies to create a proprietary stack on ARM chips rather than using broadly available chipsets from Qualcomm or Broadcom. Now comes the question of what to do with SR technology….

Todd: It will be interesting to see how this all unfolds. I suspect a lot of other large companies will want to get into the game as well. It could be that the cloud-based solutions for TTS and SR become generic and replaceable enough that there isn’t a need to bring them “in-house”. Of course, Sensory is hoping and betting on the need for the Client/Server approaches, where an embedded solution (like our Truly Handsfree Triggers) nicely complement the cloud-based offerings.

Todd
sensoryblog@sensoryinc.com