Posts Tagged ‘voice activation’
January 11, 2019
Interview with Karen Webster, one of the best writers and interviewers in tech/fintech.
In 1994 the fastest imaginable connection to the internet was a 28.9 kbps dial-up modem and email was still mostly a new thing that many people were writing off as a fad. There was no such thing as Amazon.com for the first half the year and less than a third of American households owned computers. Given that, it’s not much of a surprise that the number of people thinking about voice-activated, artificial intelligence (AI)-enhanced wireless technology was extremely small — roughly the same as the number of people putting serious thought into flying cars.
But the team at Sensory is not quite as surprised by the rapid onset evolution of the voice-activated technology marketplace as everyone else may be — because when they were first opening their doors 25 years ago in 1994, this is exactly the world they had hoped to see developing two-and-a-half decades down the line, even if the progress has been a bit uneven.
“We still have a long way to go,” Sensory CEO Todd Mozer told Karen Webster in a recent conversation. “I am excited about how good speech recognition has gotten, but natural language comprehension still needs a lot of work. And combined the inputs of all the sensors devices have — for vision and speech together to make things really smart and functional in context — we just aren’t there yet.”
But for all there is still be to done, and advances that still need to be made, the simple fact that the AI-backboned neural net approach to developing for interactive technology has become “more powerful than we ever imagined it would be with deep learning,” is a huge accomplishment in and of itself.
And the accomplishments are rolling forward, he noted, as AI’s reach and voice control of devices is expanding — and embedding — and the nascent voice ecosystem is quickly growing into its adolescent phase.
“Today these devices do great if I need the weather or a recipe. I think in the future they will be able to do far more than that — but they will be increasingly be invisible in the context of what we are otherwise doing.”
Embedding The Intelligence
Webster and Mozer were talking on the eve of the launch of Sensory’s VoiceGenie for Bluetooth speaker — a new product for speaker makers to add voice controls and functions like wake words, without needing any special apps or a Wi-Fi connection. Said simply, Mozer explained, what Sensor is offering for Bluetooth makers is embedded voice — instead of voice via connection to the cloud.
And the expansion into embedded AI and voice control, he noted, is necessary, particularly in the era of data breach, cyber-crime and good old-fashioned user error with voice technology due to its relative newness.
“There are a lot of sensors on our products and phones that are gathering a lot of interesting information about what we are doing and who we are,” Mozer said.
Apart from being a security problem to send all of that information to the cloud, embedding in devices the ability to extract usefully and adapt on demand to a particular user is an area of great potential in improving the devices we all use multiple times daily.
This isn’t about abandoning the cloud, or even a great migration away from it, he said; there’s always going to be a cloud and clients for it. The cloud natively has more power, memory and capacity than anything that can be put into a device at this point on a cost-effective basis.
“But there is going to be this back-and-forth and things right now are swinging toward more embedded ability on devices,” he said. “There is more momentum in that direction.”
The cloud, he noted, will always be the home of things like transactions, which will have to flow through it. But things like verification and authentication, he said, might be centered in the devices’ embedded capacity, as opposed to in the cloud itself.
The Power Of Intermediaries
Scanning the headlines of late in the world of voice connection and advancing AI, it is easy to see two powerful players emerging in Amazon and Google. Amazon announced Alexa’s presence on 100 million devices, and Google immediately followed up with an announcement of its own that Google Assistant will soon be available on over a billion devices.
Their sheer size and scale gives those intermediaries a tremendous amount of power, as they are increasingly becoming the connectors for these services on the way to critical mass and ubiquity, Webster remarked.
Mozer agreed, and noted that this can look a little “scary” from the outside looking in, particularly given how deeply embedded Amazon and Google otherwise are with their respective mastery of eCommerce and online search.
Like many complex ecosystems, Mozer said that the “giants” — Amazon, Google and Apple to a lesser extent — are both partners and competitors, adding that Sensory’s greatest value to the voice ecosystem is when something that is very customized tech and requires a high level of accuracy and customer service features is needed. Sensory’s technology appears in products by Google, Alibaba, Docomo and Amazon, to name a few.
But ultimately, he noted, the marketplace is heading for more consolidation — and probably putting more power in the hands of very few selected intermediaries.
“I don’t think we are going to have 10 different branded speakers. There will be some kind of cohesion — someone or maybe two someones will kick butt and dominate, with another player struggling in third place. And then a lot of players who aren’t players but want to be. We’ve seen that in other tech, I think we will see it with voice.”
As for who those winning players will be, Google and Amazon look good today, but, Mozer noted, it’s still early in the race.
The Future of Connectedness
In the long term future, Mozer said, we may someday look back on all these individual smart devices as a strange sort of clutter from the past, when everyone was making conversation with different appliances. At some point, he ventured, we may just have sensors embedded in our heads that allow us to think about commands and have them go through — no voice interface necessary
“That sounds like science fiction, but I would argue it is not as far out there as you think. It won’t be this decade, but it might be in the next 50 years.”
But in the more immediate — and less Space Age — future, he said, the next several years will be about enhancing and refining voice technologies ability to understand and respond to human voice — and, ultimately, to anticipate the needs of human users.
There won’t be a killer app for voice that sets it on the right path, according to Mozer; it will simply be a lot of capacity unlocked over time that will make voice controls the indispensable tools Sensory has spent the last 25 years hoping they would become.
“When a device is accurate in identifying who you are, and carrying out your desires seamlessly, that will be when it finds its killer function. It is not a thing that someone is going to snap their fingers and come out with,” he said, “it is going to be an ongoing evolution.”
October 15, 2014
A couple of news headlines have appeared recently asserting that voice activation is unsafe. I thought it was time for Sensory to weigh in on a few aspects of this since we are the pioneers in voice activation:
August 5, 2013
I often get the question, “If Android and Qualcomm offer voice activation for free, why would anyone license from Sensory?” While I’m not sure about Android and Qualcomm’s business models, I do know that decisions are based on accuracy, total added cost (royalties plus hardware requirements to run), power consumption, support, and other variables. Sensory seems to be consistently winning the shootouts it enters for embedded voice control. Some approaches that appear lower cost require a lot more memory or MIPS, driving up total cost and power consumption.
It’s interesting to note that companies like Nuance have a similar challenge on the server side where Google and Microsoft “give it away”. Because Google’s engine is so good it creates a high hurdle for Nuance. I’d guess Google’s rapid progress helps Nuance with their licensing of Apple, but may have made it more challenging to license Samsung. Samsung actually licensed Vlingo AND Nuance AND Sensory, then Nuance bought Vlingo.
Why doesn’t Samsung use Google recognition if it’s free? On the server it’s not power consumption effecting decisions, but cost, quality, and in this case CONTROL. On the cost side it could be that Samsung MAKES more money by using Nuance in some sort of ad revenue kickbacks, which I’d guess Google doesn’t allow. This is of course just hypothesizing. I don’t really know, and if I did know I couldn’t say. The control issue is big too as companies like Sensory and Nuance will sell to everyone and in that sense offer platform independence and more control. Working with a Microsoft or Google engine forces an investment in a specific platform implementation, and therefore less flexibility to have a uniform cross platform solution.
February 18, 2013
(…and something new from Sensory just around the corner!)
I remember watching the Superbowl last year and seeing a BMW Series 3 commercial that I thought was interesting.
It was interesting to me because they put a motion/proximity sensor under the trunk so the user could open the trunk in a hands-free manner. The commercial highlights the benefit of hands-free access when a woman walks up with her hands full of luggage and she just wiggles her foot around and the trunk pops open! Cool…except the user has to do a little one legged dance with their hands full, and as the commercial highlights (which is another reason why I found it interesting), other things can accidentally open the trunk, like a dog wagging its tail. Wouldn’t a hands-free voice trigger do a much better job? Especially an ultra-low-power implementation on a standalone processor with built in speaker verification for security…sounds like a challenge for Sensory’s TrulyHandsfree approach.
Fast forward to this year’s Superbowl, and Kia comes out with the “space babies” ad for its Sorento, and the Uvo entertainment system. Kid asks dad “where do babies come from” and dad concocts an elaborate and humorous lie.
Then after dad’s tall tale the kid says “But Jake said that babies are made when mommies and daddies…” and dad quickly interrupts the kid by saying “Uvo, play Wheels on the Bus”. The Uvo system hears dad and immediately plays the music drowning out the kid’s question. Cool commercial and nice use of voice activation to control music while driving!
Many of Sensory’s customers have told us that they don’t want to have to say the brand name as a command word, and they would really like to name their products themselves, and even better, have the products know who they are when they talk so that settings and controls can be customized to their use…Another job for Sensory’s TrulyHandsfree!
On February 19th we will announce our TrulyHandsfree 3.0 which will enable all of the voice control scenarios I have described, enabling better user experiences that are more customized and more secure!