Posts Tagged ‘Microsoft’
June 15, 2016
“Credit to the team at Amazon for creating a lot of excitement in this space,” Google CEO Sundar Pichai. He made this comment during his Google I/O speech last week when introducing Google’s new voice-controlled home speaker, Google Home which offers a similar sounding description to Amazon’s Echo. Many interpreted this as a “thanks for getting it started, now we’ll take over,” kind of comment.
Google has always been somewhat marketing challenged in naming its voice assistant. Everyone knows Apple has Siri, Microsoft has Cortana, and Amazon has Alexa. But what is Google’s voice assistant called? Is it Google Voice, Google Now, OK Google, Voice Actions? Even those of us in the speech industry have found Google’s branding to be confusing. Maybe they’re clearing that up now by calling their assistant “Google Assistant.” Maybe that’s the Google way of admitting it’s an assistant without admitting they were wrong by not giving it a human sounding name.
The combination of the early announcement of Google Home and Google Assistant has caused some to comment that Amazon has BIG competition at best, and at worst, Amazon’s Alexa is in BIG trouble.
I thought I’d point out a few good reasons why Amazon is in pretty good shape:
Of course, Amazon has its challenges as well, but I’ll leave that for another blog.
November 12, 2015
A really smart guy told me years ago that neural networks would prove to be the second best solution to many problems. While he was right about lots of stuff, he missed that one! Out of favor for years, neural networks have enjoyed a resurgence fueled by advances in deep machine learning techniques and the processing power to implement them. Neural networks are now seen to be the leading solution to a host of challenges around mimicking how the brain recognizes patterns.
Google’s Monday announcement that it was releasing its TensorFlow machine learning system on an open-source basis underscores the significance of these advances, and further validates Sensory’s 22 year commitment to machine learning and neural networks. TensorFlow is intended to be used broadly by researchers and students “wherever researchers are trying to make sense of very complex data — everything from protein folding to crunching astronomy data”. The initial release of TensorFlow will be a version that runs on a single machine, and it will be put into effect for many computers in the months ahead, Google said.
Microsoft also had cloud-based machine learning news on Monday, announcing an upgrade to Project Oxford’s facial recognition API launched in May specifically for the Movember Foundation’s no-shave November fundraising effort: a facial hair recognition API that can recognize moustache and beard growth and assign it a rating (as well as adding a moustache “sticker” to the faces of facial hair posers).
Project Oxford’s cloud-based services are based on the same technology used in Microsoft’s Cortana personal assistant and the Skype Translator service, and also offer emotion recognition, spell check, video processing for facial and movement detection, speaker recognition and custom speech recognition services.
While Google and Microsoft have announced some impressive machine-learning capabilities in the cloud, Sensory uniquely combines voice and face for authentication and improved intent interpretation on device, complementing what the big boys are doing.
From small footprint neural networks for noise robust voice triggers and phrase-spotted commands, to large vocabulary recognition leveraging a unique neural network with deep learning that achieves acoustic models an order of magnitude smaller than the present state-of-the-art, to convolutional neural networks deployed in the biometric fusion of face and voice modalities for authentication, all on device and not requiring any cloud component, Sensory continues to be the leader in utilizing state-of-the-art machine learning technology for embedded solutions.
Not bad company to keep!
May 1, 2015
Winning on Accuracy & Speed… How can a tiny player like Sensory compete in deep learning technology with giants like Microsoft, Google, Facebook, Baidu and others?
There’s a number of ways, and let me address them specifically:
These 3 items together have provided Sensory with the highest quality embedded speech engines in the world. It’s worth reiterating why embedded is needed, even if speech recognition can all be done in the cloud:
March 23, 2015
This month had three very different announcements about face recognition from Alibaba, Google, and Microsoft. Nice to see that Sensory is in good company!!!
Alibaba’s CEO Jack Ma discussed and demoed the possibility of using face verification for the very popular Alipay.
A couple interesting things about this announcement…First, I have to say, with a name like Alibaba, I am a little let down that they’re not using “Open Sesame” as a voice password to go with or instead of the face authentication… All joking aside, I do think relying on facial recognition as the sole means of user authentication is risky, and think they would be better served using a solution that integrates both face and voice recognition (something like our own TrulySecure), to ensure the utmost security of their customers’ linked bank accounts.
Face is considered one of the more “convenient” methods of biometrics because you just hold your phone out and it works! Well, at least it should… A couple of things I noticed in the Alibaba announcement: Look at the picture…Jack Ma is using both hands to carefully center his photo, and looking at the image of the phone screen tells us why. He needs to get his face very carefully centered on this outline to make it work. Why? Well, it’s a technique used to improve accuracy, but this improved accuracy, trades off the key advantage of face recognition, convenience, to make the solution more robust. Also the article notes that it’s a cloud based solution. To me cloud based means slower, dependent on a connection, and putting personal privacy more at risk. At Sensory, we believe in keeping data secure, especially when it comes to something like mobile payments, which is why we design our technologies to be “embedded” on the device – meaning no biometric data has to be sent to the cloud, and our solutions don’t require an internet connection to function. Additionally, with TrulySecure, we combine face and voice recognition, making authentication quick and simple, not to mention more secure, and less spoofable than face-only solutions. By utilizing a multi-biometric authentication solution like TrulySecure, the biometric is far less environmentally sensitive and even more convenient!
Mobile pay solutions are on the rise and as more hit the market differentiators like authentication approach, solution accuracy, convenience and most of all data security will continue to be looked at more closely. We believe that the embedded multi-biometric approach to user authentication is best for mobile pay solutions.
Also, Google announced that its deep learning FaceNet is nearly 100% accurate.
Everybody (even Sensory) is using deep learning neural net techniques for things like face and speech recognition. Google’s announcement seems to have almost no bearing on their Android based face authentication, which came in the middle of the pack of the five different face authentication systems we recently tested. So, why does Google announce this? Two reasons: – 1) Reaction to Baidu’s recent announcement that their deep learning speech recognition is the best in the world: 2) To counter Facebook’s announcement last year that their DeepFace is the best face recognition in world. My take – it’s really hard to tell whose solution is best on these kind of things, and the numbers and percentages can be deceiving. However, Google is clearly doing research experiments on high-accuracy face matching and NOT real world implementation, and Facebook is using face recognition in a real world setting to tag photos of you. Real-world facial recognition is WAY harder to perfect, so my praise goes out to Facebook for their skill in tagging everyone’s picture to reveal to our friends and family things might not have otherwise seen us doing!
Lastly, Microsoft’s announced Windows Hello.
This is an approach to getting into your Windows device with a biometric (face, iris, or fingerprint). Microsoft has done a very nice job with this. They joined the FIDO alliance and are using an on-device biometric. This approach is what made sense to us at Sensory, because you can’t just hack into it remotely, you must have the device AND the biometric! They also addressed privacy by storing a representation of the biometric. I think their approach of using a 3D IR camera for Face ID is a good approach for the future. This extra definition and data should yield much better accuracy than what is possible with today’s standard 2D cameras and should HELP with convenience because it could be better at angles can work in the dark. Microsoft claims 1 in 100,000 false accepts (letting the wrong person in). I always think it’s silly when companies make false accept claims without stating the false reject numbers (when the right person doesn’t get in). There’s always a tradeoff. For example I could say my coffee mug uses a biometric authenticator to let the right user telepathically levitate it and it has less than a 1 in a billion false accepts (it happens to also have a 100% false reject since even the right biometric can’t telepathically levitate it!). Nevertheless, with a 3D camera I think Microsoft’s face authentication can be more accurate than Sensory’s 2D face authentication. BUT, its unlikely that the face recognition on its own will ever be more accurate than our TrulySecure, which still offers a lower False Accept rate than Microsoft – and less than 10% False Reject rate to boot!
Nevertheless, I like the announcement of 3D cameras for face recognition and am excited to see how their system performs.
June 9, 2014
I still subscribe to the San Jose Mercury News, as they do a good job of tech business reporting. One of my favorite Mercury News writers is a true critic in the literary sense of the term, Troy Wolverton. Troy rarely raves and is typically critical, but in a smart, logical, and unemotional way.
A few days back he started writing about Microsoft’s Cortana and said “Watch out Siri, someone wants your job.”
I was eager to read his review of Cortana this morning and in particular his comparison with Siri. He ended up giving it a 7/10, and concluding Siri was still ahead. What I thought was most interesting though was that in his final summary, he compared three products and three assistants based on the ease of calling up each of those assistants:
Motorola is Sensory’s customer, and I am happy to read that Troy gets it and considers this front end activation an important metric in comparing personal assistants!
June 17, 2011
That’s what America’s most charismatic President used to say! I didn’t necessarily agree with Reagan’s politics, but I sure did like his presentation. Nuance’s Paul Ricci is kind of the inverse of that; a lot of people don’t like him, but it’s hard to argue with his politics (although I will later in this blog…)
I’ve never met Ricci. I’ve known a lot of people who have worked for him, with him, and against him. Everybody agrees he’s a tough guy, and I think most would also use words like ruthless and smart. A lot of people might even call him an asshole, and whether true or not, I don’t think he cares about that. He’s a competitive strategy gameplay kind of guy, and he’s done pretty well. However, he has a HUGE challenge being up against the likes of Google, Microsoft, and eventually Apple (let alone the smart little guys like Vlingo, Yap, Loquendo, etc.). But I digress…
I started this blog thinking about Nuance’s recent acquisition of SVOX. And I wanted to congratulate Nuance and Ricci for ACQUIRING SVOX WITHOUT SUING THEM. If I look back a ways (and I can look back VERY FAR!), Nuance (or the company formerly known as Lernout and Hauspie and then Scansoft) has at least 4 embedded speech recognition companies wrapped into it over the years. In rough chronological order: Voice Control Systems (VCS was probably the FIRST embedded speech company and the first and only embedded group to go public), Phillips Embedded Speech Division (I think they had acquired VCS for around $50M), Advanced Recognition Technologies, and Voice Signal Technologies. I believe Ricci was at the helm during the Philips embedded acquisition (this was the one closer to 2000 as opposed to the Philips Medical group a few years ago), ART, and VST. Interestingly, 2 of these 3 were lawsuit acquisitions. There are probably some inside stories about SVOX that I don’t know (e.g. threats of lawsuits??), but it appears that Nuance’s acquisitions of embedded companies are now down to 50% lawsuit driven. Thanks, Paul, you’re moving in the right direction! ;-)
OK, so what’s wrong with suing the companies you want to acquire? It probably does lower their price and reduce competitive bidding. Setting aside the legal and moral issues, there is one huge issue that’s clear- If you want to hold onto your star employees and technologists, you need to treat them well. Everyone understands who the “stars” are – they are the 10% of the workforce that contribute to 90% of the innovation. They are not going to stick around unless they are treated right, and starting off a relationship by calling them thieves is not a good way to court a long term relationship.
For example, there’s been a lot of press lately about the Vlingo/Nuance situation and how Ricci offered the top 3 employee/founders $5M each to sell Vlingo (plus a bundle of money for Vlingo!) Well, Mike Phillips used to be Nuance’s CTO (through acquisition of Speechworks)…so wouldn’t it have been more valuable to KEEP Mike there than BUY him back? The “other” Mike…Mike Cohen is Google’s head of speech. He FOUNDED Nuance (well, the company formerly known as Nuance!) and left to join Google, and of course this caused a lawsuit…think either of the Mike’s (two of the smartest speech technologists in the industry) would ever go back to Nuance? Google has managed to hold onto Cohen, so it’s not just an issue of the best people leaving big companies because “little companies innovate.” I’ve also seen the recent rumor mill about Nuance’s Head of Smart Phone Architecture leaving for Apple…
So it’s the personnel and customer thing that Nuance is missing out on in their competitive gameplay strategy, and my hope is that SVOX’s acquisition represents a significant change in how Nuance does business!
As a point in contrast, Sensory has acquired only one company in our history – Fluent Speech Technologies (and no, we didn’t sue them first.) This was a group that spun out of the former Oregon Graduate Institute back in the 1990’s. We saw a demo of theirs back in 1997-1998, and thought the technology was great. They offered to sell us the speech recognition technology (not the company), so they could focus on animation opportunities, but we had NO INTEREST in that. We wanted the people that made the technology, not the technology itself. That’s how our Oregon office was born; we acquired the company with the people. The office is now about as big as our headquarters (and some of our people in Silicon Valley have even moved up there!) By the way, ALL the technologists that came with that acquisition are still with us after 12 years, and we’ve kept a very friendly relationship with the former OGI as well.
Time for a breather…Yeah, I do long blogs….if you see a short one, which might start appearing, it’s probably a “ghostwriter” helping me out…. ;-)
So let’s look at Nuance’s acquisition of SVOX. Why did Nuance acquire them?
Anyways…I suspect the acquisition was a good deal for Nuance and its investors, and probably a GREAT deal for SVOX and its investors. Nuance’s market price didn’t seem to move much, but maybe it will once the price is disclosed. I commend and encourage Nuance to cut the lawsuits…one of them could bite back a lot worse than the pain of losing employees!