June 11, 2019
I used to blog a lot about wake words and voice triggers. Sensory pioneered this technology for voice assistants, and we evangelized the importance of not hitting buttons to speak to a voice recognizer. Then everybody caught on and the technology went into main stream use (think Alexa, OK Google, Hey Siri, etc.), and I stopped blogging about it. But I want to reopen the conversation…partly to talk about how important a GREAT wake word is to the consumer experience, and partly to congratulate my team on a recent comparison test that shows how Sensory continues to have the most accurate embedded wake word solutions.
Competitive Test Results. The comparison test was done by Vocalize.ai. Vocalize is an independent test house for voice enabled products. For a while, Sensory would contract out to them for independent testing of our latest technology updates. We have always tested in-house but found that our in-house simulations didn’t always sync up with our customers’ experience. Working with Vocalize allowed us to move from our in-house simulations to more real-world product testing. We liked Vocalize so much that we acquired them. So, now we “contract in” to them but keep their data and testing methodology and reporting uninfluenced by Sensory.
Vocalize compared two Sensory TrulyHandsfree wake word models (1MB size and 250KB size) with two external wake words (Amazon and Kitt.ai’s Snowboy), all using “Alexa” as the trigger. The results are replicable and show that Sensory’s TrulyHandsfree remains the superior solution on the market. TrulyHandsfree was better/lower on BOTH false accepting AND false rejecting. And in many cases our technology was better by a longshot! If you would like see the full report and more details on the evaluation methods, please send an email request to either Vocalize (firstname.lastname@example.org) or Sensory (email@example.com).
It’s Not Easy. There are over 20 companies today that offer on-device wake words. Probably half of these have no experience in a commercially shipping product and they never will; there are a lot of companies that just won’t be taken seriously. The other half can talk a good talk, and in the right environment they can even give a working demo. But this technology is complex, and really easy to do badly and really hard to do great. Some demos are carefully planned with the right noise in the right environment with the right person talking. Sensory has been focused on low power embedded speech for 25 years, we have 65 of the brightest minds working on the toughest challenges in embedded AI. There’s a reason that companies like Amazon, Google, Microsoft and Samsung have turned to Sensory for our TrulyHandsfree technology. Our stuff works, and they understand how difficult it is to make this kind of technology work on-device! We are happy to provide APK’s so you can do you’re your own testing and judge for yourself! OK, enough of the sales pitch…some interesting stuff lays ahead…
It’s Really Important. Getting a wake word to work well is more important than most people realize. It’s like the front door to your house. It might be a small part of your house, but if you aren’t letting the homeowners in then that’s horrible, and if you are letting strangers in by accident that’s even worse. The name a company gives their wake word is usually the company brand name, imagine the sentiment that comes off when I say a brand name and it doesn’t work. Recently I was at a tradeshow that had a Mercedes booth. There were big signs that said “Hey Mercedes”…I walked up to the demo area and I said “Hey Mercedes” but nothing happened…the woman working there informed me that they couldn’t demo it on the show floor because it was really too noisy. I quickly pulled out my mobile phone and showed her that I could use dozens of wake words and command sets without an error in that same environment. Mercedes has spent over 100 years building up one of the best quality brand reputations in the car industry. I wonder what will happen to that reputation if their wake word doesn’t respond in noise? Even worse is when devices accidentally go off. If you have family members that listen to music above volume 7 then you already know the shock that a false alarm causes!
It’s about Privacy. Amazon, like Google and a few others seem to have pretty good wake words, but if you go into your Alexa settings you can see all of the voice data that’s been collected, and a lot of it is being collected when you weren’t intentionally talking to Alexa! You can see this performance issue in the Vocalize test report. Sensory substantially outperformed Amazon in the false reject area. This is when a person tries to speak to Alexa and she doesn’t respond. The difference is most apparent in babble noise where Sensory falsely rejected 3% and Amazon falsely rejected 10% in comparable sized models (250KB). However the False Accept difference is nothing short of AMAZING. Amazon false accepted 13 times in 24 hours of random noise. In this same time period Sensory false accepted ZERO times (on comparably sized 250KB models). How is this possible you may be wondering? Amazon “fixes” its mistakes in the cloud. Even though the device falsely accepts quite frequently, their (larger and more sophisticated) models in the cloud collect the error. Was that a Freudian slip? They correct the error…AND they COLLECT the error. In effect, they are disregarding privacy to save device cost and collect more data.
As the voice revolution continues to grow, you can bet that privacy will continue to be a hot topic. What you now understand is that wake word quality has a direct impact on both the user experience and PRIVACY! While most developers and product engineers in the CE industry are aware of wake words and the difficulty in making them work well on-device, they don’t often consider that competing wake words technologies aren’t created equally – the test results from Vocalize prove it! Sensory is more accurate AND allows more privacy!
March 11, 2019
As more and more devices that are designed to watch and listen to us flood the market, there is rising concern about how the personal data that are collected gets used. Facebook has admitted to monetizing our personal data and the personal data of our friends. Google has been duplicitous in its tracking of users even when privacy settings are set to not track, and more recently has admitted to placing microphones in products without informing consumers. In no realm is the issue of privacy more relevant than today’s voice assistants. A recent PC Magazine survey of over 2,000 people found that privacy was the top concern for smart home devices (more important than cost!). The issue becomes more complex as personal assistant devices become increasingly better-equipped with IP cameras and various other sensors to watch, track and listen to us.
Admittedly, people get a lot in return for giving up their privacy. We’ve become numb to privacy policies. It is almost expected that we will tap “accept” on End User License Agreements (EULAs) without reading through terms that are intentionally designed to be difficult to read. We get free software, free services, music and audio feeds, discounted rates and other valuable benefits. What we don’t fully realize is what we are giving up, because big data companies don’t openly share that with us. We fall into two main categories of consumers: trust the giants of industry to do what’s right, or don’t play (and lose out on the benefits). For example, I love having smart speakers in my home, but many of my friends won’t get them — precisely because of these very fair privacy concerns. Our legislators are trying to address privacy concerns, but the government needs to fully understand how data is used before writing legislation. They need help from the tech community on how to deal with these issues. Europe has adopted the new General Data Protection Regulation (GDPR) to address concerns related to businesses’ handling and protection of user data. This is a step in the right direction, as it provides clear rules to companies that handle personal data and offers clearly defined monetary penalties for failure to protect the data of citizens. However, is it enough to slap fines on these companies for failing to comply? What happens when they think the market value of the infringement will be greater than the penalty? Or when human errors are made, which happened when a European user requested their data and was accidentally given someone else’s data.
In this day and age, a great deal of our daily tasks require us to share personal data. Whether we are talking to our AI assistants, using our smart phone to navigate around traffic or making a credit card transaction at a local retailer, we’re sharing data. There are benefits to sharing this data, and as time goes on people will see even more benefits from data sharing, but there are so many problems too.
AI personal assistants provide examples of how sharing data can benefit or hurt the end user. The more we use these devices the more they learn about us. As they become capable of recognizing who we are by face or voice and get to know and memorize our histories, preferences and needs, these systems will evolve from devices that answer simple questions into proactive, accurate helpers that offer useful recommendations, unprompted reminders, improved home security — assistance we users find truly helpful. But they can also reveal private information that we don’t want others to know, which is what happened when a girl’s shopping habits triggered advertising for baby products before her parents knew she was pregnant.
As digital assistant technologies get smarter, there will be more concern about what private data they collect, store and share over the internet. One solution to the privacy problem would be keeping whatever is learned by the device on the device by moving the AI processing out of the cloud to the edge (on device), ensuring our assistants never take personal data to the cloud so there is no privacy risk. This would be a good solution as embedded AI becomes more powerful. The biggest disadvantage may be that every new device would have to re-learn who we are.
Another possible solution is more collaboration among data companies. Some of the AI assistant companies are starting to work together; for example, Cortana and Alexa are playing friendly. This current approach, however, is about transferring the baton from one assistant to another so that multiple assistants can be accessed from a single device. It’s unlikely this approach will be widely adopted, and even if it were, it would result in inefficiency in the collection and use of our personal data, because each of the AI assistant providers would have to build their own profile of who we are based on the data they collect. However, because companies will want to eventually monetize the data they collect, sharing done right could actually benefit us.
Could more sharing of our data and preferences improve AI assistants?
Sharing data does create the best and most consistent user experience. It allows consumers to switch AI brands or buy new devices without losing any of the benefits in user experience and device knowledge of who we are. Each assistant having its own unique data and profile for its users skirts the privacy issue and misses the advantages of big data. That doesn’t seem to be the right way for the industry to advance. Ideally, we would create a system where there is shared knowledge of who we are, but we, as individuals, need to control and manage that knowledge.
For example, Google knows that the most frequent search I do on Google Maps is for vegetarian restaurants. Google has probably figured out that I’m a vegetarian. Quite ironically, I found “meat” in my shopping cart at Amazon. Somehow, Alexa thought I asked it to add “meat” to my shopping cart. Perhaps if Alexa had the same knowledge of me as Google it would have made a different and better decision. Likewise, Amazon knows a lot about my shopping details. It knows what pills I take, my shoe size and a whole lot of other things that might be helpful to the Google Assistant. A shared knowledge base would benefit me, but I want to be able to control and oversee this shared database or profile.
Improved privacy without losing the advantages of shared data is achievable through a combination of private devices with embedded intelligence and cloud devices restricted and governed by legislature and industry standards.
Legislature should enable user-controlled personal-data sharing systems, then the user can choose their own mix of private devices with embedded intelligence and cloud devices that offer more general and standard protections. Of course the legislature is the hard part since our government tends to work more for the industries that fund reelections and bills than the people…nevertheless here are some thoughts on how legislature for privacy could work:
Ultimately, this concept is a shared format designed to give companies equal access to data while providing users full ownership and control of their personal data and a clear understanding of what is shared or made public. No longer would the user have to deal with individual companies and complex EULAs, and no longer would they need to fear that their devices are carrying out unscrupulous behavior because of the vagueness of our legal system and the less-than-transparent ways big-data companies use our data.
With a little creative thinking and a shift in regulation, we the people can change the future of data collection and take back ownership and control of our personal data.
February 11, 2019
Voice Assistants are growing in both popularity and capability. They are arriving in our home, cars, mobile devices and seem to now be a standard part of American culture, entering our tv shows, movies, music, and Super Bowl ads. However, this popularity is accompanied by a persistent concern over our privacy and the safety of our personal data when these devices are always listening and always watching.
There is a significant distrust of big companies like Facebook, Google, Apple, and Amazon. Facebook and Google have admitted to misusing our private data, and Apple and Amazon have admitted that system failures have led to a loss of private data.
So naturally, there would be an advantage of not sending our voices or videos into the cloud and doing the processing on-device. Then no data loss is at risk. Cloud-based queries could still occur, but through anonymized text only.
COMPUTING AT THE EDGE VERSUS THE CLOUD
Some have argued that we have carried microphones and cameras around with us for years without any issues, but I see this thinking as flawed. Just recently, Apple admitted to a facetime bug on mobile phones enabling “eavesdropping” on others.
Also, if my phone is listening for a wake word it’s a very different technology model than an IoT device that’s “always on.” Phones are usually designed to listen in arms-length situations of 2 or 3 feet. An IoT speaker is designed to listen to 20 feet! If we assume constant noise across a room that could make an assistant “false fire” and start listening, then we can think of 2 listening circles, one with a radius of 3 feet and one with a radius of 20 feet, to compare the listening area of the phone with a far-field IoT device such as a smart speaker. The phone has a listening area of π r2 or 9 π, the IoT device has a listening area of 400 π. So, all else equal the IoT device is about 44 times more likely to false fire and start listening when it wasn’t intended to.
As cloud-based far-field assistants enter the home there is a definite risk of our private data getting intercepted. It’s not just machine errors but human errors too, like the Amazon employee that accidentally sent out the wrong data to a person that requested it.
There are also other means in which we can lose our cloud-connected private data like the “dolphin attack” that can allow outsiders to listen in.
ON-DEVICE VOICE ASSISTANTS WILL BECOME MORE COMMON
January 11, 2019
Interview with Karen Webster, one of the best writers and interviewers in tech/fintech.
In 1994 the fastest imaginable connection to the internet was a 28.9 kbps dial-up modem and email was still mostly a new thing that many people were writing off as a fad. There was no such thing as Amazon.com for the first half the year and less than a third of American households owned computers. Given that, it’s not much of a surprise that the number of people thinking about voice-activated, artificial intelligence (AI)-enhanced wireless technology was extremely small — roughly the same as the number of people putting serious thought into flying cars.
But the team at Sensory is not quite as surprised by the rapid onset evolution of the voice-activated technology marketplace as everyone else may be — because when they were first opening their doors 25 years ago in 1994, this is exactly the world they had hoped to see developing two-and-a-half decades down the line, even if the progress has been a bit uneven.
“We still have a long way to go,” Sensory CEO Todd Mozer told Karen Webster in a recent conversation. “I am excited about how good speech recognition has gotten, but natural language comprehension still needs a lot of work. And combined the inputs of all the sensors devices have — for vision and speech together to make things really smart and functional in context — we just aren’t there yet.”
But for all there is still be to done, and advances that still need to be made, the simple fact that the AI-backboned neural net approach to developing for interactive technology has become “more powerful than we ever imagined it would be with deep learning,” is a huge accomplishment in and of itself.
And the accomplishments are rolling forward, he noted, as AI’s reach and voice control of devices is expanding — and embedding — and the nascent voice ecosystem is quickly growing into its adolescent phase.
“Today these devices do great if I need the weather or a recipe. I think in the future they will be able to do far more than that — but they will be increasingly be invisible in the context of what we are otherwise doing.”
Embedding The Intelligence
Webster and Mozer were talking on the eve of the launch of Sensory’s VoiceGenie for Bluetooth speaker — a new product for speaker makers to add voice controls and functions like wake words, without needing any special apps or a Wi-Fi connection. Said simply, Mozer explained, what Sensor is offering for Bluetooth makers is embedded voice — instead of voice via connection to the cloud.
And the expansion into embedded AI and voice control, he noted, is necessary, particularly in the era of data breach, cyber-crime and good old-fashioned user error with voice technology due to its relative newness.
“There are a lot of sensors on our products and phones that are gathering a lot of interesting information about what we are doing and who we are,” Mozer said.
Apart from being a security problem to send all of that information to the cloud, embedding in devices the ability to extract usefully and adapt on demand to a particular user is an area of great potential in improving the devices we all use multiple times daily.
This isn’t about abandoning the cloud, or even a great migration away from it, he said; there’s always going to be a cloud and clients for it. The cloud natively has more power, memory and capacity than anything that can be put into a device at this point on a cost-effective basis.
“But there is going to be this back-and-forth and things right now are swinging toward more embedded ability on devices,” he said. “There is more momentum in that direction.”
The cloud, he noted, will always be the home of things like transactions, which will have to flow through it. But things like verification and authentication, he said, might be centered in the devices’ embedded capacity, as opposed to in the cloud itself.
The Power Of Intermediaries
Scanning the headlines of late in the world of voice connection and advancing AI, it is easy to see two powerful players emerging in Amazon and Google. Amazon announced Alexa’s presence on 100 million devices, and Google immediately followed up with an announcement of its own that Google Assistant will soon be available on over a billion devices.
Their sheer size and scale gives those intermediaries a tremendous amount of power, as they are increasingly becoming the connectors for these services on the way to critical mass and ubiquity, Webster remarked.
Mozer agreed, and noted that this can look a little “scary” from the outside looking in, particularly given how deeply embedded Amazon and Google otherwise are with their respective mastery of eCommerce and online search.
Like many complex ecosystems, Mozer said that the “giants” — Amazon, Google and Apple to a lesser extent — are both partners and competitors, adding that Sensory’s greatest value to the voice ecosystem is when something that is very customized tech and requires a high level of accuracy and customer service features is needed. Sensory’s technology appears in products by Google, Alibaba, Docomo and Amazon, to name a few.
But ultimately, he noted, the marketplace is heading for more consolidation — and probably putting more power in the hands of very few selected intermediaries.
“I don’t think we are going to have 10 different branded speakers. There will be some kind of cohesion — someone or maybe two someones will kick butt and dominate, with another player struggling in third place. And then a lot of players who aren’t players but want to be. We’ve seen that in other tech, I think we will see it with voice.”
As for who those winning players will be, Google and Amazon look good today, but, Mozer noted, it’s still early in the race.
The Future of Connectedness
In the long term future, Mozer said, we may someday look back on all these individual smart devices as a strange sort of clutter from the past, when everyone was making conversation with different appliances. At some point, he ventured, we may just have sensors embedded in our heads that allow us to think about commands and have them go through — no voice interface necessary
“That sounds like science fiction, but I would argue it is not as far out there as you think. It won’t be this decade, but it might be in the next 50 years.”
But in the more immediate — and less Space Age — future, he said, the next several years will be about enhancing and refining voice technologies ability to understand and respond to human voice — and, ultimately, to anticipate the needs of human users.
There won’t be a killer app for voice that sets it on the right path, according to Mozer; it will simply be a lot of capacity unlocked over time that will make voice controls the indispensable tools Sensory has spent the last 25 years hoping they would become.
“When a device is accurate in identifying who you are, and carrying out your desires seamlessly, that will be when it finds its killer function. It is not a thing that someone is going to snap their fingers and come out with,” he said, “it is going to be an ongoing evolution.”
October 25, 2018
I just returned from the four-day Money 20/20 event in Las Vegas. The show covers the overlap of Money and Technology including FinTech, Payment, Ecommerce and more. It had tens of thousands of attendees, over 3,500 companies, and 400 startups and lots of starpower including Richard Branson, Shaquille O’Neil, Akon, and yours truly speaking on a biometrics panel.
I walked the show floor to find the latest news in embedded biometrics and to better understand the choice between embedded and cloud based biometrics in the fintech/money space. I was impressed by how biometrics has moved into the mainstream conversation. Before mentioning the other companies I talked to, I’ll kick off with Sensory, my company.
Sensory’s focus on AI and biometrics has always been on the embedded side. We believe in data privacy and we think the best way to accomplish that is through keeping in the hands and control of the user. On a less promotional front there is also a strategic reason we focus on embedded, and that’s because the industry giants are really good at cloud based and unconstrained AI tasks, and they often give it away for free, so we are focused on a place where the Googles and Amazons of the world can be our customers and not just our competitors. On the last day of Money 20/20, Sensory introduced TrulySecure 4.0, a fusion of face and voice biometrics with improved accuracy, speed, and support for 3D.
BioConnect sponsored one of the excellent lunches at the show. I spoke to Rob Douglas, Founder and CEO of BioConnect who said, “We are on the quest for rightful identity and what we offer is a market leading mobile biometric authentication solution for the enterprise. We provide a building block like a piece of LEGO that you can apply into all the infrastructure of an Enterprise to upgrade from passwords and key fobs to a world where you have higher assurance when you are conducting digital or physical transactions.” BioConnect has been in business for eight years and has 1,600 customers and at Money 20/20, the Bank of Montreal announced a partnership with BioConnect and IBM.
BioConnect has a strong belief in face authentication, but also works with other biometrics including voice, eye, fingerprint, and behavioral. According to Douglas, “We believe in both cloud and client and we support the FIDO approach, but there are use cases where the transport of the biometrics through a cloud-based infrastructure can make a lot of sense.”
The FIDO Alliance had a large area with alliance members touting their wares. FIDO (fast identity online) is “the World’s Largest Ecosystem for Standards-Based, Interoperable Authentication.” I spoke to Andrew Shikiar, the CMO of the FIDO Alliance. Local authentication with biometrics is key to the FIDO approach. “Whether you are storing passwords or biometrics, a central repository will be targeted, and will be breached to be used in nefarious ways.” When I asked Shikiar about the desire to share biometrics across platforms he said, “That’s typical of the type of use case that our technical working groups are working to address, while leveraging the FIDO standards”
Conor White, President Americas at Daon described Daon as “a human authentication company that provides technologies to allow customers to create and manage digital identities of their users in a way that’s advantageous in a risk and security perspective.” At the show they announced a partnership to expand from their base in mobile into the contact center.
Daon provides support to a wide cross section of biometrics and provides embedded solutions through the FIDO standard but can support cloud based biometrics when desired. Daon is seeing more customers getting comfortable with going from on premise to cloud based implementations but in the vast majority of cases, the biometrics still resides on the device even if the service is run in the cloud. White sits on the board of the FIDO alliance and sees the FIDO standard with embedded biometrics gaining ground.
Veritran is a software company based in Buenos Aires and developing innovative and secure digital banking platforms for the Latin American markets. They process over 4 billion banking transactions each year, and they are now expanding from Banking into other Enterprise markets and geographies beyond Latin America. At Mobile World Congress in February, they announced a new platform for secure application development, and at Money 20/20 ,they demonstrated some of the apps developed on this platform.
Like other companies, Veritran offers a mix of biometric modalities and in talking with Veritran’s CEO Marcelo Gonzales, I learned a very interesting reason as to why they prefer embedded biometrics instead of processing in the cloud. The Latin American customers buy prepaid plans with limited data. To keep their costs down, they must keep their data usage down, and with the biometrics stored and processed on the device, transactions can occur with minimal data costs.
There were a lot of other companies at Mobile 20/20. As a quick summary I would say a few important things stood out. Biometrics are definitely taking off as we all understand the problems with passwords. A variety of biometric modalities are offered but there does seem to be a preference and movement toward face authentication that can run cross platform without specialized hardware. Most vendors offer a choice between having the biometric data stored and processed on the device or in the cloud, but with the FIDO Alliance behind embedded and the clear advantages for security and privacy, the embedded usage case seems to be winning out.
September 13, 2018
I thought it was funny that Google and Alexa both handed out the neck worn badge holders which nobody seemed to wear.
There are some, but not a lot of companies that are innovating. There were a ton of smart speakers, thermostats, lights, electrical outlets and various appliances that can be controlled by assistants, but little of that rose to the level of true innovation based on where we are today. However, I did see a few new things too
INFRASTRUCTURE PLAYERS ARE BETTING ON AMAZON AND GOOGLE
There’s lots of infrastructure developing 3rd-party support of Alexa, Google and custom voice interfaces. For example, a variety of chip companies like DSPG were showing their ability to enable lower power solutions while design houses like Sugr, StreamUnlimited, and Frontier Smart Technologies can assist with hardware and software development.
OVERALL, MORE EVIDENCE OF VOICE ASSISTANT ACCELERATION
IFA showed the continuing growth and accelerated market adoption of voice assistants. It was a well organized and like CES, IFA had separate locations that required transportation to access through transportation. Berlin, by the way is a fantastic and unique city with a very liberal feel, friendly people, the best Turkish food I’ve ever had, and very international. Parts of it even reminded me of Berkeley in the 1970s. Of course, there is a lot more presence of voice assistants today!
Todd Mozer is CEO and founder of Sensory.
August 13, 2018
It’s not easy to be a retailer today when more and more people are turning to Amazon for shopping. And why not shop online? Ordering is convenient with features such as ratings. Delivery is fast and cheap, and returns are easy and free – if you are Prime member! In April 2018 Bezos reported there are more than 100 million Prime members in the world, and the majority of US households are Prime members. Walmart and Google have partnered in an ecommerce play to compete with Amazon, but Walmart is just dancing with the devil. Google will use the partnership to gather data and invest more in their internal ecommerce and shopping experiences. Walmart isn’t relaxing, and is aggressively pursuing ecommerce and AI initiatives through acquisitions, and its Store #8 that acts as an incubator for AI companies and internal initiatives. Question: why does Facebook have a Building 8 and Walmart have a Store 8 for skunkworks projects?
It’s not just the retailers that are under pressure, though. If you make consumer electronics it’s getting more challenging too. Google controls the Android eco-system and is pumping a lot of money into centralizing and hiring around their hardware development efforts. Google is competing against the mobile phones of Samsung, Huawei, LG, Oppo, Vivo, and other users of their Android OS. And Amazon is happy to sell other people’s hardware online (OK, not Google, but others), but they take a nice commission on those sales, and if it’s a hit product they find ways to make more money through Amazon’s in house brands and warehousing, and potentially even making the product themselves. The Alexa fund has financed companies that created Alexa based hardware products that Amazon ended up competing against with in-house developments,and when Amazon sells Alexa products it doesn’t need to make a big profit (as described in part one). And Apple… well, they have a history of extracting money from anyone that wants to play in their eco-system too. This is business and there’s a very good reason that Google, Amazon, Apple, and other giants are giants. They know how to make money on everything they do. They are tough to compete with. The “free” stuff consumers get (and we do get a lot!) isn’t really free. We are trading our data and personal information for it.
So retailers have it tough (and assistants will make it even tougher), service providers have it tough (and assistants with service offerings make it even tougher), and consumer electronic companies have it tough. But the toughest situation is for the speaker companies. The market for speakers is exploding driven by the demand for “smart” speakers. Markets and Markets research report the current smart speaker market at over $2.6B and growing at over 34% a year. Seems like that would be a sweet market to be in, but a lot of that growth is eating away at the traditional speaker market. So a speaker company gets faced with a few alternatives:
Many are choosing option 1 only to find that their sales are poor because of better quality lower priced offering from Google and Amazon. A company like Sonos, who is a leader in high quality wifi speakers has chosen option 1 with a twist where they are trying to support Google and Amazon and Apple. Their recent IPO filing highlights the challenges well:
”Our current agreement with Amazon allows Amazon to disable the Alexa integration in our Sonos One and Sonos Beam products with limited notice. As such, it is possible that Amazon, which sells products that compete with ours, may on limited notice disable the integration, which would cause our Sonos One or Sonos Beam products to lose their voice-enabled functionality. Amazon could also begin charging us for this integration which would harm our operating results.”
They further highlighted that their lack of service integrations could be a challenge should Google, Amazon or others offer discounting (which is already happening): “Many of these partners may subsidize these prices and seek to monetize their customers through the sale of additional services rather than the speakers themselves,” the company said. “Our business model, by contrast, is dependent on the sale of our speakers. Should we be forced to lower the price of our products in order to compete on a price basis, our operating results could be harmed.” Looking at Sono’s financials you can see their margins already starting to erode.
Some companies have attempted #2 above by bringing out in house Assistants using open-source speech recognizers like Kaldi. This might save the cost of deploying third party solutions but it requires substantial in house efforts, and is ultimately fraught with the same challenges as #3 above which is that it’s really hard to compete against companies approaching a trillion dollar market capitalization when these companies see AI and voice assistants as strategically important and are investing that way.
Retailers, Consumer OEMs, and Service providers all have a big challenge. I run a small company called Sensory. We develop AI technologies, and companies like Google, Amazon, Samsung, Microsoft, Apple, Alibaba, Tencent, Baidu, etc. are our customers AND our biggest competitors. My strategy? Move fast, innovate, and move on. I can’t compete head to head with these companies, but when I come out with solutions that they need BEFORE they have it in house, I get a 1-3 year window to sell to them before they switch to an in house replacement. That’s not bad for a small company like Sensory. For a bigger company like a Sonos or a Comcast, they could deploy the same general strategy to set up fast moving innovation pieces that allow them to stay ahead of the game. This appears to be the exact strategy that Walmart is taking on with Store 8 to not be left behind! Without doubt, it’s very tough competing in a world of giants that have no boundaries in their pursuits and ambitions!
August 6, 2018
Apple introduced Siri in 2011 and my world changed. I was running Sensory back then as I am today and suddenly every company wanted speech recognition. Sensory was there to sell it! Steve Jobs, a notorious nay-sayer on speech recognition, had finally given speech recognition the thumbs up. Every consumer electronics company noticed and decided the time had come. Sensory’s sales shot up for a few years driven by this sudden confidence in speech recognition as a user interface for consumer electronics.
Fast forward to today and Apple has just become the first and only trillion dollar US company in terms of market capitalization. One trillion dollars is an arbitrary round number with a lot of zeroes, but it is psychologically very important. It was winning a race. It was a race between Cook, Bezos, the Google/Alphabet Crew and others that most of the contestants would say doesn’t really matter and that they weren’t in the race. But, they were and they all wanted to win. Without question it was quarterly financial results that caused Apple to reach the magic number and beat Amazon, Google and Microsoft to the trillion dollar value spot. I wouldn’t argue that Siri got them there, but I would argue that Siri didn’t stop them, and this is important.
SIRI WAS FIRST, BUT QUICKLY LOST THE VOICE LEAD TO RIVALS
Then in 2014 Amazon introduced the Echo smart speaker with Alexa and beat Apple and others into the home with a useable voice assistant. Alexa came out strong and got stronger quickly. Amazon amassed over 5,000 people into what is likely the largest speech recognition team in the world. Google got punched but wasn’t knocked out. Its AI team kept growing and Google had a very strong reputation in academia as hiring the best and brightest machine learning and AI folks out of PhD programs. By 2016, Google had introduced its own smart speaker, and by CES 2018, Google made a VERY strong marketing statement that it was still in the game.
APPLE FOCUSED ELSEWHERE
AI ASSISTANTS DRIVE CONSUMER LOCK-IN
The assistants aren’t sold and so they don’t directly make money but they can be used as purchasing agents (where Amazon makes a lot of money), advertising agents (where Google makes its money), access to entertainment services (where all the big guys make money) and as a user experience for consumer electronics (where Apple makes a lot of money). The general thinking is that the more an assistant is used, the more it learns about the user, the better it serves the user, and the more the user is locked in! So winning in the AI Assistant game is HUGELY important and recent changes at Apple show that Siri is quickly coming up in the rankings and could have more momentum right now than in its entire history. That’s why Siri didn’t stop Apple from reaching $1T.
SIRI ON THE RISE
It may have taken a while but Apple seems serious. It’s nice to have a pioneer in the space not stay down for the count!
August 6, 2018
Here’s the basic motivation that I see in creating Voice Assistants…Build a cross platform user experience that makes it easy for consumers to interact, control and request things through their assistant. This will ease adoption and bring more power to consumers who will use the products more and in doing so create more data for the cloud providers. This “data” will include all sorts of preferences, requests, searches, purchases, and will allow the assistants to learn more and more about the users. The more the assistant knows about any given user, the BETTER the assistant can help the user in providing services such as entertainment and assisting with purchases (e.g. offering special deals on things the consumer might want). Let’s look at each of these in a little more detail:
1. Owning the cross platform user experience and collecting user data to make a better Voice Assistants.
Owning the user experience on a single device is not good enough. The goal of each of these voice assistants is to be your personal assistant across devices. On your phone, in your home, in your car, wherever you may go. This is why we see Alexa and Google and Siri all battling for, as an example, a position in automotive. Your assistant wants to be the place you turn for consistent help. In doing so it can learn more about your behaviors…where you go, what you buy, what you are interested in, who you talk to, and what your history is. This isn’t just scary big brother stuff. It’s quite practical. If you have multiple assistants for different things, they may each think of you and know you differently, thereby having a less complete picture. It’s really best for the consumer to have one assistant that knows you best.
For example, let’s take the simple case of finding food when I’m hungry. I might say “I’m hungry.” Then the assistant’s response would be much more helpful the more it knows about me. Does it know I’m a vegetarian? Does it know where I’m located, or whether I am walking or driving? Maybe it knows I’m home and what’s in my refrigerator, and can suggest a recipe…does it know my food/taste preferences? How about cost preferences? Does it have the history of what I have eaten recently, and knows how much variety I’d like? Maybe it should tell me something like “Your wife is at Whole Foods, would you like me to text her a request or call her for you?” It’s easy to see how these voice assistants could really be quite helpful the more it knows about you. But with multiple assistants in different products and locations, it wouldn’t be as complete. In this example it might know I’m home, but NOT know what’s in my fridge. Or it might know what’s in the fridge and know I’m home but NOT know my wife is currently shopping at Whole Foods, etc.
The more I use my assistant across more devices in more situations and over more time, the more data it could gather and the better it should get at servicing my needs and assisting me! It’s easy to see that once it knows me well and is helping me with this knowledge it will get VERY sticky and become difficult to get me to switch to a new assistant that doesn’t know me as well.
2. Entertainment and other service package sales.
3. Selling and recommending products to consumers
It would be really obnoxious if Alexa or Siri or Cortana or Google Assistant suddenly suggested I buy something that I wasn’t interested in, but what if it knew what I needed? For example, it could track vitamin usage and ask if I want more before they run out, or it could know how frequently I wear out my shoes, and recommend a sale for my brand and my size, when I really needed them. The more my assistant knows me the better it can “advertise” and sell me in a way that’s NOT obnoxious but really helpful. And of course making extra money in the process!
July 25, 2018
I have spoken on a lot of “voice” oriented shows over the years, and it has been disappointing that there hasn’t been more discussion about the competition in the industry and what is driving the huge investments we see today. Because companies like Amazon and Google participate in and sponsor these shows, there is a tendency to avoid the more controversial aspects of the industry. I wrote this blog to share some of my thoughts on what is driving the competition, why the voice assistant space is so strategically important to companies, and some of the challenges resulting from the voice assistant battles
In September of 2017 it was widely reported that Amazon had over 5000 employees working on Alexa with more than 1000 more to be hired. To use a nice round and conservative number, let’s assume an average Alexa employee’s fully weighted cost to Amazon is $200K. With about 6,000 employees on the Alexa team today, that would mean a $1.2 billion investment. Of course, some of this is recouped by the Echo’s and Dot’s bringing in profits, but when you consider that Dots sell for $30-$50 and Echos at $80-$100, it’s hard to imagine a high enough profit to justify the investment through hardware sales. For example, if Amazon can sell 30 million Alexa devices and make an average of $30 per unit profit, that only covers 75% of the cost of the conservative $1.2 billion investment.
Other evidence supporting the huge investments being made in voice assistants is the battle in advertising. Probably the most talked about thing at 2018’s CES show was the enormous position Google took in advertising the Google Assistant. In fact, if you watch any of the most expensive advertising slots on TV (SuperBowl, NBA finals, World Cup, etc.) you will see a preponderance of advertisements with known actors and athletes saying “Hey Google,” “Alexa,” or, “Hey Siri.” (Being in the wakeword business, I particularly like the Kevin Durant “Yo Google” ad!)
And it’s not just the US giants that are investing big into assistants: Docomo, Baidu, Tencent, Alibaba, Naver, and other large international players are developing their own or working with 3rd party assistants.
So what is driving this huge investment companies are making? It’s a multitude of factors including:
In my next blog, I’ll discuss these three factors in more detail, and in a final blog on this topic I will discuss the challenges being faced by consumer OEMs and service providers that must play in the voice assistant game to not lose out to service and hardware competition from Apple, Amazon, Google, and others.