Posts Tagged ‘Vlingo’
June 4, 2014
It was about 4 years ago that Sensory partnered with Vlingo to create a voice assistant with a special “in car” mode that would allow the user to just say “Hey Vlingo” then ask any question. This was one of the first “TrulyHandsfree” voice experiences on a mobile phone, and it was this feature that was often cited for giving Vlingo the lead in the mobile assistant wars (and helped lead to their acquisition by Nuance).
About 2 years ago Sensory introduced a few new concepts including “trigger to search” and our “deeply embedded” ultra-low power always listening (now down to under 2mW, including audio subsystem!). Motorola took advantage of these excellent approaches from Sensory and created what I most biasedly think is the best voice experience on a mobile phone. Samsung too has taken the Sensory technology and used in a number of very innovative ways going beyond mere triggers and using the same noise robust technology for what I call “sometimes always listening”. For example when the camera is open it is always listening for “shoot” “photo” “cheese” and a few other words.
So I’m curious about what Google, Microsoft, and Apple will do to push the boundaries of voice control further. Clearly all 3 like this “sometimes always on” approach, as they don’t appear to be offering the low power options that Motorola has enabled. At Apple’s WWDC there wasn’t much talk about Siri, but what they did say seemed quite similar to what Sensory and Vlingo did together 4 years ago…enable an in car mode that can be triggered by “Hey Siri” when the phone is plugged in and charging.
I don’t think that will be all…I’m looking forward to seeing what’s really in store for Siri. They have hired a lot of smart people, and I know something good is coming that will make me go back to the iPhone, but for now it’s Moto and Samsung for me!
August 5, 2013
I often get the question, “If Android and Qualcomm offer voice activation for free, why would anyone license from Sensory?” While I’m not sure about Android and Qualcomm’s business models, I do know that decisions are based on accuracy, total added cost (royalties plus hardware requirements to run), power consumption, support, and other variables. Sensory seems to be consistently winning the shootouts it enters for embedded voice control. Some approaches that appear lower cost require a lot more memory or MIPS, driving up total cost and power consumption.
It’s interesting to note that companies like Nuance have a similar challenge on the server side where Google and Microsoft “give it away”. Because Google’s engine is so good it creates a high hurdle for Nuance. I’d guess Google’s rapid progress helps Nuance with their licensing of Apple, but may have made it more challenging to license Samsung. Samsung actually licensed Vlingo AND Nuance AND Sensory, then Nuance bought Vlingo.
Why doesn’t Samsung use Google recognition if it’s free? On the server it’s not power consumption effecting decisions, but cost, quality, and in this case CONTROL. On the cost side it could be that Samsung MAKES more money by using Nuance in some sort of ad revenue kickbacks, which I’d guess Google doesn’t allow. This is of course just hypothesizing. I don’t really know, and if I did know I couldn’t say. The control issue is big too as companies like Sensory and Nuance will sell to everyone and in that sense offer platform independence and more control. Working with a Microsoft or Google engine forces an investment in a specific platform implementation, and therefore less flexibility to have a uniform cross platform solution.
October 2, 2012
I really enjoyed reading this article interviewing Vlad Sejnoha, Nuance’s CTO. Most people would consider Nuance the leader in speech recognition today, and Vlad is certainly a very smart, thoughtful, and articulate man.
I enjoyed it for a few different reasons. The first and main reason I liked the article is it helps to push the idea Sensory has been championing for the past several years that devices don’t have to be touched to enable voice commands, and that you should be able to just start talking to things like we talk to each other. That’s what Sensory calls TrulyHandsfree, and it’s the technology that showed up in the first Bluetooth carkit that requires no touching (by BlueAnt) AND the first mobile phones that responded to voice without touch (Samsungs Galaxy SII and SIII and Note). Even hit toys like Mattel’s award winning Fijit Friends and Hallmarks Interactive Books use this unique technology that just works when you talk to it. In fact, it really was the TrulyHandsfree feature that made Vlingo so popular, as this Vlingo video nicely states in its comparison between Vlingo and Siri. (Nuance bought Vlingo earlier this year, but the Sensory TrulyHandsfree didn’t come with it!).
The article says “Sejnoha believes that within a year or two you’ll be able to talk to your smartphone even as it lies idle on a desk, asking it questions such as, “When’s my next appointment?” The phone will be able to detect that you are speaking, wake itself up, and accomplish the task at hand.” Check out this Sensory video…this is definitely what Vlad is talking about! Yeah, we can do it today, and it’s REALLY FAST and really accurate.
But is it low power? Well that’s ABSOLUTELY KEY. That’s why Sensory partnered with Tensilica. Tensilica is a leader in low power audio DSP’s for Mobile Phones. Sensory already has its TrulyHandsfree running on chips that run under 5 mW for a COMPLETE audio system. And that’s without having to wake up to understand the task at hand. We can drop by another 1-2mW by not being always on, but turning the recognizer off doesn’t do much. That’s because even if the full recognizer is shut down, you still need to run a mic and preamp, which drives a lot of the current consumption when you have a low power recognizer like TrulyHandsfree (it can run on as little as 7 MIPS!). This means it’s REALLY critical to have a low power recognizer as well, and that’s Sensory’s forte. We are expecting that by next year we will have systems running at 1-3mW!
The article mentions “persistent” listening, but even though I’ve always preached this “always on” concept, I think what will really explode is “intelligent automatic listening”. That is, the device figures out when it needs to listen for what and turns on to listen for it. So it doesn’t always have to be on…it will just seem that way because the devices are so intelligent. For example a certain traveling speed could make a phone listen for car commands or car wake up words. An incoming call could cause the recognizer to wake up and listen for Answer/Ignore. For these to work, the device needs to run not only at very low power but also with VERY high accuracy. You don’t want to have a background conversation triggering the phone call to hang up! Accuracy is another Sensory forte! The combination of accuracy with low power consumption is a difficult mix to conquer! Sensory’s accuracy is not only in noise but also from a distance…that is when a recognizer works well with a poor S/N ratio, that means the signal can be lower (like from distance) and/or the noise can be higher.
So it’s really cool that Nuance is getting on the bandwagon behind Sensory’s innovations like TrulyHandsfree at low power. In fact after Samsungs release on the Galaxy SII with Sensory, Nuance did come out with an always “on and listening mobile device”; for fun we quickly ported our technology onto the same phone to compare…check out this video.
Something interesting we noticed was that after Sensory announced its speaker verification and speaker ID for mobile devices at CTIA this year, Nuance shortly thereafter came out with their own announcement, but there were no demo’s available so we couldn’t do a comparison video.
May 30, 2012
Sensory’s had a lot of press lately. We made 3 big announcements all pretty much together:
1) Announcing speaker verification
2) Announcing speaker identification
3) Saying Sensory is in the Samsung Galaxy S3
Sensory announced these just before CTIA in New Orleans. We had a small booth at the show, and gave demos at several events (on the CTIA stage and floor, at the Mobility Awards dinner, and at the excellent Pepcom Mobile Focus event).
We got a lot of nice press from this. I was thrilled that the Speech Technology email newsletter put our verification release as the featured and lead story. One of the articles I like best, though, just came out last week by Pete Pachal at Mashable http://mashable.com/2012/05/29/sensory-galaxy-s-iii/
This article is great for several key reasons. One is that Pete gets it. He didn’t just reprint our press release, but he added his commentary and wrapped it up in a nice story that hits some of the key issues.
However, what’s best is what the readers wrote in. I LOVE their insights and comments. Here’s a few of the dialogs with my commentary attached:
Seriously??? You still need to push a button to use Siri? I’ve had the “wake with voice” option on my crusty old HTC Incredible, via VLingo inCar, for about 2 years now. Hard to believe Apple is that far behind.
My response: EXACTLY JB! In fact that crusty old HTC using Vlingo, also uses Sensory’s TrulyHandsfree approach! Vlingo was our first licensee in the mobile space.
Scott: But this is talking about OS integration instead of app integration. And as I’m sure you’ve seen on your phone, and as the article noted, wake with voice options currently use a lot of power, which means I can’t see a lot of people willing to use it.
My response: Precisely, Scott! This is why we are implementing the “deeply embedded” approach that will take power consumption down by a factor of 10! Nevertheless, users LOVE it even if it consumes power:
JB – I use it all the time and since my phone plugs into the car’s adapter, I don’t really worry at all about power usage. It’s never been a problem.
My response – Yes, Vlingo and Samsung did a very nice implementation by having an “always listening” mode, particularly useful while driving. Other approaches we expect to see in the future are intelligent sensor based approaches so the phone knows when to listen and when not to (e.g. why not have it turn on and listen whenever you start traveling past 20 MPH, etc.)
Is there anything to prevent me from messing with another person’s phone?
Fillfill Ha ha, imagine being in an auditorium and yelling “Hi Galaxy! … Erase Address Book! … Confirm!”
My comment – Funny! This is one of the reasons we have added speaker verification and identification features to the trigger function
DhanB – Siri doesn’t require a button. It can be activated by lifting the phone up to your face.
Great reader responses:
Darkreaper – …..while driving? (Right! That’s illegal in California and other states!)
Tone – Yes, but with the Samsung Galaxy II, I don’t have to touch it at all. As the article states, this is crucial when you’re in a situation, such as driving. I’ve dropped the phone on the floor while driving and I was still able to send a text message, an email and place a call with it sliding around the back seat. (Bluetooth) iPhone can’t compete, sorry. :-/
…and of course the old “butt dialing” problem:
Jason – This makes me think of the old “butt dialing” problem when you sat down on your phone cause I’d much prefer a manual trigger to prevent accidental usage.
My comment: Once again, I agree with the readers. Sensory isn’t pushing to force “always listening” modes on users, we just want to allow them the choice. We strongly recommend that products have multiple options for anything that can be done by voice or touch. We believe the users should have the right and the ability to access the power of mobile devices without being forced to touch them. And if they want to turn off this ability, that is certainly their choice! We turn off our ringers (at least we should) when we enter a meeting or go to the movies. Likewise, we can turn off hands free voice control when it’s not appropriate…and with the growing presence and power of intelligent sensors, it will get easier and easier (albeit with some mishaps along the way!) for the phones to know when they should listen!
A lot of people commented about Siri. Apple isn’t stupid. They get it that hitting buttons isn’t the most convenient way to always access voice control. That’s why there’s a sensor in place when you lift the phone to your face (of course still requiring touch), it’s also why Siri can speak back. Apple pushed the Voice User Interface forward with Siri…Samsung pushed it further with TrulyHandsfree wake up. There will be a lot of back and forth over the coming years and voice features will continue as a major battleground.
As devices get increasing utility WITHOUT touching the phones (e.g. remote control functions, accessing and receiving data by voice, etc.), the need for a TrulyHandsfree approach will grow stronger and stronger, and Sensory will continue to have the BEST solution – More Accurate, Lower Power, Faster Response Times, and NOW with built in speaker verification or speaker ID!
January 27, 2012
Lot’s of thoughts…no time to share them…So I’ll be brief in a few different areas:
August 17, 2011
I’ll continue on with a few thoughts from yesterday’s blog because I got asked the question: “Why would speech patents be worth so much more than general telecom and other patents?”
There are 2 key reasons:
As an interesting case in point, Sensory has a few key patents on client/server speech recognition approaches. We have a very early initial filing date from 1996 (if you want to know the patent number, drop me an email.) We went through 10 years of revisions and responses to the patent office and finally got 3 patents issued on our initial concepts of using client devices connected to more powerful servers with speech recognition (yeah that should sound familiar today, but it was a very unique idea in 1996!). These are VERY fundamental patents with a VERY early priority date. Back in the downturn of 2008 we talked to a patent auction house that gave a very thorough evaluation of the patents, and they concluded it would be the highest valued auction they had ever seen. They wanted a “reserve” price in the single million dollar digits, but we wanted it in the double million digits, so we never went forward. It just shows the importance of speech patents, and with the recent lawsuits in the mobile and speech community, speech patents have become even more valuable today!
August 16, 2011
Two BIG acquisitions happened over the last week. One is big for the smartphone space, and the other is big for the speech industry. I think they both had something to do with technology patents.
Google acquired Motorola. As everyone knows, Google has been wrapped up in a lot of legal feuds over Android. Android is certainly doing well, its competitors want to knock it down, and patent infringement seems to be the preferred means of fighting. Long established companies like Microsoft, RIM and Apple have had a lot of time to build a patent portfolio…on top of that they recently outbid Google on the Nortel patent acquisition. SO… Google has to beef up its patent portfolio quickly to fight back and eventually do what big companies do – agree to cross license and stop paying the law firms! Or maybe Google just wants a boatload of patents so they can be comfortable indemnifying all the Android users.
So at the end of July, Google bought a boatload (well over 1000) of patents from IBM (Nuance bought a bunch of patents from IBM as well focused on speech tech!)
Now Google buys MOTO. Here’s something really interesting. The price paid for Nortel was about $4.5B for 6000 patents (plus patents applied for etc). That’s about $750K/patent. Google underbid and didn’t get in on the deal. Google bought MOTO Mobility for $12.5B for a little over 17,000 patents… Just under $750K/patent! VERY INTERESTING…seems like $750K/patents is the going rate for large patent portfolios!!!!!
Specialized portfolios in speech technology are worth even more!
Nuance acquires Loquendo. I’m sure this wasn’t just for patents…it was taking out one of their only competitors for both SR and TTS, and Nuance got a GREAT price for a company with a lot of excellent technology. I have no idea how many patents Loquendo has…I think 7 in the US and probably a lot more in Europe. Let’s estimate that they had 35 patents total. At $75M, that would be around $2M per patent, which isn’t far off of the per-patent price Nuance paid for SVOX, who had 60-80 patents. The revenue multipliers seem pretty consistent too…SVOX was doing around $25M in sales and was bought for around 6x sales…likewise Loquendo was doing about $12.5M in sales and was bought for ABOUT SIX TIMES SALES. What does Nuance trade at? ABOUT SIX TIMES SALES. So what does that mean? Well you could argue that if Nuance pays less or equal to its revenue multiplier (6xsales) for an acquisition, then the patents essentially come free because the acquired revenues should immediately boost Nuance’s valuation by close to the purchase price.
I wonder if that’s how Nuance thinks about it. Then they wouldn’t be paying $2M for a patent or even $750K…they’d essentially get them for free and in the process build the biggest database of speech patents in the world.
Maybe Nuance’s strategy isn’t really about taking out competitors and buying customers through M&A, but maybe they want to own the majority of patents in the speech tech space. Nuance certainly hasn’t made money in using patents for lawsuits. Dave Grannan, Vlingo’s CEO was recently quoted as saying, ”We are happy to report that with this latest ruling, Nuance’s record remains perfect in patent infringement trials, they haven’t won any.” You go, Dave!
So why would Nuance want so many speech patents if they can’t make money in court? Well I’ve blogged earlier about their use of patent infringement in acquisitions. Maybe they are looking to be bought by a Google, Apple, or Microsoft…that patent portfolio could certainly do a lot in user experience fights. But if cross licensing agreements get worked out between the companies big enough to acquire Nuance, then where does that leave Nuance?
Well…without a lot of competition for sure!
August 5, 2011
I recently learned about 2 awards that Sensory has won over the past year. The contrast is in how we learned about them, and the different nature of these awards. It’s really amusing, so I thought I’d share my take.
Both awards were for our TrulyHandsfree™ Voice Control. One was for the significance of Sensory’s truly hands-free trigger in implementing speech recognition without using buttons, and the other was for Sensory’s chip-based implementation of a truly hands-free interface.
The first award came from Speech Technology Magazine. Sensory won their Star Performer award for 2011, and I didn’t even know we had been nominated. In fact, nobody ever told me that we had won; I found out really by chance (thanks, Bernie!) They only gave out four of these awards this year, and I’m honored and thrilled that Sensory won one of them. It’s really a testament to our team behind TrulyHandsfree… IT’S THE MOST AMAZING TECHNOLOGY. I sent kudos to Speech Tech for having the insight to understand the significance of this technology! Speech Technology Magazine has gotten so independent and non-self-serving in their awards process, that they didn’t even take the opportunity to call us and let us know! Now we know, so thanks again, Speech Tech!
In contrast…The second award came from a market research firm I’ll call the Cold Irishman. Why don’t I use their real name? Well I can’t or they might sue me. I received a call from their “Manager of IP and Copyrights” to congratulate me, and to let me know about their thoroughly independent and fair process that looked at the entire speech market and decided that Sensory stood out… blah blah blah…
I knew there was something funny going on by the guy’s title. Yeah you guessed it. To be able to tell people we won their award costs a certain price; you pay more the more you want to use it, and you can even pay more to go to an awards banquet. He offered me programs for as little as $10K, which went up in price to WAY more than that. One of the more expensive programs was that they’d make a video for us receiving the award with lots of praise from their esteemed analysts. So, I decided to go onto YouTube and see for myself how many hits last year’s award winners were getting…my memory said low double digits, but that didn’t seem possible (Sensory’s little home-made video’s often get thousands of hits.) Just for fun I looked just now at this year’s award winners – one of them had only 10 (yes TEN) hits. Most of them must have been employees… Pretty hefty price to stroke your own own ego and get almost nothing in return! I’ve always wondered who pays to be in Whoever’s Whatever? It’s probably the same CEO’s that pay to go to award dinners!
So…Many Thanks to Leonard Klie and Speech Technology Magazine…and Cold Irishman…thanks, but no thanks! Sensory deserves recognition for innovation in speech technologies based on our hard work, not on how much we pay to market it.
June 17, 2011
That’s what America’s most charismatic President used to say! I didn’t necessarily agree with Reagan’s politics, but I sure did like his presentation. Nuance’s Paul Ricci is kind of the inverse of that; a lot of people don’t like him, but it’s hard to argue with his politics (although I will later in this blog…)
I’ve never met Ricci. I’ve known a lot of people who have worked for him, with him, and against him. Everybody agrees he’s a tough guy, and I think most would also use words like ruthless and smart. A lot of people might even call him an asshole, and whether true or not, I don’t think he cares about that. He’s a competitive strategy gameplay kind of guy, and he’s done pretty well. However, he has a HUGE challenge being up against the likes of Google, Microsoft, and eventually Apple (let alone the smart little guys like Vlingo, Yap, Loquendo, etc.). But I digress…
I started this blog thinking about Nuance’s recent acquisition of SVOX. And I wanted to congratulate Nuance and Ricci for ACQUIRING SVOX WITHOUT SUING THEM. If I look back a ways (and I can look back VERY FAR!), Nuance (or the company formerly known as Lernout and Hauspie and then Scansoft) has at least 4 embedded speech recognition companies wrapped into it over the years. In rough chronological order: Voice Control Systems (VCS was probably the FIRST embedded speech company and the first and only embedded group to go public), Phillips Embedded Speech Division (I think they had acquired VCS for around $50M), Advanced Recognition Technologies, and Voice Signal Technologies. I believe Ricci was at the helm during the Philips embedded acquisition (this was the one closer to 2000 as opposed to the Philips Medical group a few years ago), ART, and VST. Interestingly, 2 of these 3 were lawsuit acquisitions. There are probably some inside stories about SVOX that I don’t know (e.g. threats of lawsuits??), but it appears that Nuance’s acquisitions of embedded companies are now down to 50% lawsuit driven. Thanks, Paul, you’re moving in the right direction! ;-)
OK, so what’s wrong with suing the companies you want to acquire? It probably does lower their price and reduce competitive bidding. Setting aside the legal and moral issues, there is one huge issue that’s clear- If you want to hold onto your star employees and technologists, you need to treat them well. Everyone understands who the “stars” are – they are the 10% of the workforce that contribute to 90% of the innovation. They are not going to stick around unless they are treated right, and starting off a relationship by calling them thieves is not a good way to court a long term relationship.
For example, there’s been a lot of press lately about the Vlingo/Nuance situation and how Ricci offered the top 3 employee/founders $5M each to sell Vlingo (plus a bundle of money for Vlingo!) Well, Mike Phillips used to be Nuance’s CTO (through acquisition of Speechworks)…so wouldn’t it have been more valuable to KEEP Mike there than BUY him back? The “other” Mike…Mike Cohen is Google’s head of speech. He FOUNDED Nuance (well, the company formerly known as Nuance!) and left to join Google, and of course this caused a lawsuit…think either of the Mike’s (two of the smartest speech technologists in the industry) would ever go back to Nuance? Google has managed to hold onto Cohen, so it’s not just an issue of the best people leaving big companies because “little companies innovate.” I’ve also seen the recent rumor mill about Nuance’s Head of Smart Phone Architecture leaving for Apple…
So it’s the personnel and customer thing that Nuance is missing out on in their competitive gameplay strategy, and my hope is that SVOX’s acquisition represents a significant change in how Nuance does business!
As a point in contrast, Sensory has acquired only one company in our history – Fluent Speech Technologies (and no, we didn’t sue them first.) This was a group that spun out of the former Oregon Graduate Institute back in the 1990’s. We saw a demo of theirs back in 1997-1998, and thought the technology was great. They offered to sell us the speech recognition technology (not the company), so they could focus on animation opportunities, but we had NO INTEREST in that. We wanted the people that made the technology, not the technology itself. That’s how our Oregon office was born; we acquired the company with the people. The office is now about as big as our headquarters (and some of our people in Silicon Valley have even moved up there!) By the way, ALL the technologists that came with that acquisition are still with us after 12 years, and we’ve kept a very friendly relationship with the former OGI as well.
Time for a breather…Yeah, I do long blogs….if you see a short one, which might start appearing, it’s probably a “ghostwriter” helping me out…. ;-)
So let’s look at Nuance’s acquisition of SVOX. Why did Nuance acquire them?
Anyways…I suspect the acquisition was a good deal for Nuance and its investors, and probably a GREAT deal for SVOX and its investors. Nuance’s market price didn’t seem to move much, but maybe it will once the price is disclosed. I commend and encourage Nuance to cut the lawsuits…one of them could bite back a lot worse than the pain of losing employees!
May 6, 2011
For far too long, speech recognition just hasn’t worked well enough to be usable for everyday purposes. Even simple command and control by voice had been barely functional and unreliable…but times, they are a changing! Today speech recognition works quite well and is widely used in computer and smart phone applications…and I believe we are rapidly converging on the Holy Grail of Speech – making a recognition and response system that can be virtually indistinguishable from a human (a really smart human with immaculate spelling skills and fluency in many languages!)
I think there are 4 important components to what I’d call the Holy Grail in Speech:
Anyways, reputable companies are starting to combine and market these kinds of functions today, and I’d guess it’s a just a matter of five to ten years until you can have a conversation with a computer or smartphone that’s so good, it is difficult to tell whether it’s a live person or not!