HEAR ME -
Speech Blog
HEAR ME - Speech Blog  |  Read more December 9, 2019 - Can Your Assistant Deliver?
HEAR ME - Speech Blog

Archives

Categories

Posts Tagged ‘SIRI’

Voice assistant battles, part three: The challenges

August 13, 2018

It’s not easy to be a retailer today when more and more people are turning to Amazon for shopping. And why not shop online? Ordering is convenient with features such as ratings. Delivery is fast and cheap, and returns are easy and free – if you are Prime member! In April 2018 Bezos reported there are more than 100 million Prime members in the world, and the majority of US households are Prime members. Walmart and Google have partnered in an ecommerce play to compete with Amazon, but Walmart is just dancing with the devil. Google will use the partnership to gather data and invest more in their internal ecommerce and shopping experiences. Walmart isn’t relaxing, and is aggressively pursuing ecommerce and AI initiatives through acquisitions, and its Store #8 that acts as an incubator for AI companies and internal initiatives. Question: why does Facebook have a Building 8 and Walmart have a Store 8 for skunkworks projects?

It’s not just the retailers that are under pressure, though. If you make consumer electronics it’s getting more challenging too. Google controls the Android eco-system and is pumping a lot of money into centralizing and hiring around their hardware development efforts. Google is competing against the mobile phones of Samsung, Huawei, LG, Oppo, Vivo, and other users of their Android OS. And Amazon is happy to sell other people’s hardware online (OK, not Google, but others), but they take a nice commission on those sales, and if it’s a hit product they find ways to make more money through Amazon’s in house brands and warehousing, and potentially even making the product themselves. The Alexa fund has financed companies that created Alexa based hardware products that Amazon ended up competing against with in-house developments,and when Amazon sells Alexa products it doesn’t need to make a big profit (as described in part one). And Apple… well, they have a history of extracting money from anyone that wants to play in their eco-system too. This is business and there’s a very good reason that Google, Amazon, Apple, and other giants are giants. They know how to make money on everything they do. They are tough to compete with. The “free” stuff consumers get (and we do get a lot!) isn’t really free. We are trading our data and personal information for it.

So retailers have it tough (and assistants will make it even tougher), service providers have it tough (and assistants with service offerings make it even tougher), and consumer electronic companies have it tough. But the toughest situation is for the speaker companies. The market for speakers is exploding driven by the demand for “smart” speakers. Markets and Markets research report the current smart speaker market at over $2.6B and growing at over 34% a year. Seems like that would be a sweet market to be in, but a lot of that growth is eating away at the traditional speaker market. So a speaker company gets faced with a few alternatives:

  1. Partner with voice assistants within the eco-system of their biggest competitors (Google, Apple, Amazon, etc.). This would give all the data collected to their competitors and put them at the mercy of their competitors systems.
  2. Develop and support an in house solution which could cost WAY too much to maintain, or
  3. Use a 3rd party solution which is likely to cost a lot more and underperform compared to the big guys that are pumping billions of dollars each year into enhancing their AI offerings.

Many are choosing option 1 only to find that their sales are poor because of better quality lower priced offering from Google and Amazon. A company like Sonos, who is a leader in high quality wifi speakers has chosen option 1 with a twist where they are trying to support Google and Amazon and Apple. Their recent IPO filing highlights the challenges well:

”Our current agreement with Amazon allows Amazon to disable the Alexa integration in our Sonos One and Sonos Beam products with limited notice. As such, it is possible that Amazon, which sells products that compete with ours, may on limited notice disable the integration, which would cause our Sonos One or Sonos Beam products to lose their voice-enabled functionality. Amazon could also begin charging us for this integration which would harm our operating results.”

They further highlighted that their lack of service integrations could be a challenge should Google, Amazon or others offer discounting (which is already happening): “Many of these partners may subsidize these prices and seek to monetize their customers through the sale of additional services rather than the speakers themselves,” the company said. “Our business model, by contrast, is dependent on the sale of our speakers. Should we be forced to lower the price of our products in order to compete on a price basis, our operating results could be harmed.” Looking at Sono’s financials you can see their margins already starting to erode.

Some companies have attempted #2 above by bringing out in house Assistants using open-source speech recognizers like Kaldi. This might save the cost of deploying third party solutions but it requires substantial in house efforts, and is ultimately fraught with the same challenges as #3 above which is that it’s really hard to compete against companies approaching a trillion dollar market capitalization when these companies see AI and voice assistants as strategically important and are investing that way.

Retailers, Consumer OEMs, and Service providers all have a big challenge. I run a small company called Sensory. We develop AI technologies, and companies like Google, Amazon, Samsung, Microsoft, Apple, Alibaba, Tencent, Baidu, etc. are our customers AND our biggest competitors. My strategy? Move fast, innovate, and move on. I can’t compete head to head with these companies, but when I come out with solutions that they need BEFORE they have it in house, I get a 1-3 year window to sell to them before they switch to an in house replacement. That’s not bad for a small company like Sensory. For a bigger company like a Sonos or a Comcast, they could deploy the same general strategy to set up fast moving innovation pieces that allow them to stay ahead of the game. This appears to be the exact strategy that Walmart is taking on with Store 8 to not be left behind! Without doubt, it’s very tough competing in a world of giants that have no boundaries in their pursuits and ambitions!

Apple is Getting Sirious – $1 Trillion is Not the Endgame

August 6, 2018

Apple introduced Siri in 2011 and my world changed. I was running Sensory back then as I am today and suddenly every company wanted speech recognition. Sensory was there to sell it! Steve Jobs, a notorious nay-sayer on speech recognition, had finally given speech recognition the thumbs up. Every consumer electronics company noticed and decided the time had come. Sensory’s sales shot up for a few years driven by this sudden confidence in speech recognition as a user interface for consumer electronics.

Fast forward to today and Apple has just become the first and only trillion dollar US company in terms of market capitalization. One trillion dollars is an arbitrary round number with a lot of zeroes, but it is psychologically very important. It was winning a race. It was a race between Cook, Bezos, the Google/Alphabet Crew and others that most of the contestants would say doesn’t really matter and that they weren’t in the race. But, they were and they all wanted to win. Without question it was quarterly financial results that caused Apple to reach the magic number and beat Amazon, Google and Microsoft to the trillion dollar value spot. I wouldn’t argue that Siri got them there, but I would argue that Siri didn’t stop them, and this is important.

SIRI WAS FIRST, BUT QUICKLY LOST THE VOICE LEAD TO RIVALS
Siri has had a bit of a mixed history. It was the first voice assistant to come out in mobile phones but in spite of Apple’s superior marketing abilities, the Google Assistant (or whatever naming convention was being used as it never seemed totally clear) quickly surpassed Siri on most key metrics of quality and performance. The Siri team went through turnover and got stuck in a world of rule based natural language understanding when the state of the art turned to deep learning and data-based approaches.

Then in 2014 Amazon introduced the Echo smart speaker with Alexa and beat Apple and others into the home with a useable voice assistant. Alexa came out strong and got stronger quickly. Amazon amassed over 5,000 people into what is likely the largest speech recognition team in the world. Google got punched but wasn’t knocked out. Its AI team kept growing and Google had a very strong reputation in academia as hiring the best and brightest machine learning and AI folks out of PhD programs. By 2016, Google had introduced its own smart speaker, and by CES 2018, Google made a VERY strong marketing statement that it was still in the game.

APPLE FOCUSED ELSEWHERE
All the while Apple stayed relatively quiet. Drifting further behind in accuracy, utility, use-ability, integration and now smart speakers, Siri took its time. The HomePod speaker had a series of delays and when introduced in Q1 2018 was largely criticized because of the relatively poor performance of Siri and lack of compatibility. The huge investment Bezos made in Alexa might have been hard for Apple to rationalize in a post Jobs era run by a smart operating guy driven by the numbers more than by a passion or vision. Or, perhaps Tim Cook knew that he had time to get it right, as the Apple eco-system was captive and not running away because of poor Siri performance. Maybe they were waiting for their services ecosystem to really kick in before cranking up the power of Siri. For whatever reason, Siri was largely viewed as the first out of the gates but well behind the pack in Q2 2018.

AI ASSISTANTS DRIVE CONSUMER LOCK-IN
Fast forward to now and I’ll say why I think things are changing and why I said that Siri didn’t stop Apple from being first to $1T. But first, let me diverge to dwell on the importance of an AI Assistant to Apple and others. First off, it’s pretty easy to see the importance the industry puts on AI assistants. Any time I watch advertising spots, I see some of the most expensive commercials ever produced with the biggest named stars promoting “Hey Google”, “Hey Siri”, and “Alexa” (and occasionally Bixby or Cortana too!).

The assistants aren’t sold and so they don’t directly make money but they can be used as purchasing agents (where Amazon makes a lot of money), advertising agents (where Google makes its money), access to entertainment services (where all the big guys make money) and as a user experience for consumer electronics (where Apple makes a lot of money). The general thinking is that the more an assistant is used, the more it learns about the user, the better it serves the user, and the more the user is locked in! So winning in the AI Assistant game is HUGELY important and recent changes at Apple show that Siri is quickly coming up in the rankings and could have more momentum right now than in its entire history. That’s why Siri didn’t stop Apple from reaching $1T.

SIRI ON THE RISE
Let me highlight three recent pieces of news that suggest Siri is now headed in the right direction.

  • HomePod Sales: Apple HomePod sales just reached $1B. Not a shabby business given the high margins Apple typically gets. According to Consumer Intelligence Research Partners (CIRP) the HomePod marketshare doubled over the past quarter. What’s interesting is that the early reviews stated that Siri’s poor performance and lack of compatibility was dragging down HomePod sales. However, CIRP reported the biggest problem today is price and that at $349 it is hundreds of dollars more than competitors.
  • Loup Ventures analysis:
    Loup Ventures does an annual Assistant assessment. Several companies do this sort of thing and the traditional and general rankings have previously showed Google as best, Cortana and Alexa not far behind, and Siri somewhat behind the pack. Loup’s most recent analyses showed something different. Siri is shown to have the most improvement (from April 2017 to July 2018) in both “answered correctly” and “understood query”, and has surpassed Cortana and Alexa in both categories.


Of particular note is the categories of correct analysis. Siri substantially outperformed Google Assistant in the “command” category which is arguable the most important category for a consumer electronics manufacturer that wants to improve user experience.

 

  • Apple Reorganization:In April 2018 Apple hired John Giannandrea. JG is a silicon valley luminary and not only played roles with early pioneers like General Magic and Netscape, but he was a founder of TellMe Networks which still holds the record for the highest valued acquisition in the speech recognition space. Microsoft paid $800 million in a 2007 acquisition. JG didn’t retire and rest on his laurels. He joined Google as an Engineering VP and in 2016 was promoted to SVP Search (yeah I mean all of search as in “Google that”) including heading up all artificial intelligence and machine learning within Google. Business Insider called him “The most sought after free agent in Silicon Valley.” He reports directly to Tim Cook. In July 2018, a reorg was announced that brings Siri and all machine learning under one roof…under JG. Siri has bounced around under a few top executives. With JG on board and Bill Stasior (VP Siri) staying on and now reporting into JG, Siri has a bright future.

It may have taken a while but Apple seems serious. It’s nice to have a pioneer in the space not stay down for the count!

Voice assistant battles, part two: The strategic importance

August 6, 2018

Here’s the basic motivation that I see in creating Voice Assistants…Build a cross platform user experience that makes it easy for consumers to interact, control and request things through their assistant. This will ease adoption and bring more power to consumers who will use the products more and in doing so create more data for the cloud providers. This “data” will include all sorts of preferences, requests, searches, purchases, and will allow the assistants to learn more and more about the users. The more the assistant knows about any given user, the BETTER the assistant can help the user in providing services such as entertainment and assisting with purchases (e.g. offering special deals on things the consumer might want). Let’s look at each of these in a little more detail:

1. Owning the cross platform user experience and collecting user data to make a better Voice Assistants. ​
For thousands of years consumers interacted with products by touch. Squeezing, pressing, turning, and switching were all the standard means of controlling. The dawn of electronics really didn’t change this and mechanical touch systems became augmented with electrical touch mechanisms. Devices got smarter and had more capabilities but the means to access these capabilities got more confusing with more complicated interfaces and a more difficult user experience. As new sensory technologies began to be deployed (such as gesture, voice, pressure sensors, etc.) companies like Apple emerged as consumer electronics leaders because of their ability to package consumer electronics in a more user friendly manner. With the arrival of Siri on the iPhone and Alexa in the home, voice first user experiences are driving the ease of use and naturalness of interacting with consumer products. Today we find companies like Google and Amazon investing heavily into their hardware businesses and using their Assistants as a means to improve and control the user experience.

Owning the user experience on a single device is not good enough. The goal of each of these voice assistants is to be your personal assistant across devices. On your phone, in your home, in your car, wherever you may go. This is why we see Alexa and Google and Siri all battling for, as an example, a position in automotive. Your assistant wants to be the place you turn for consistent help. In doing so it can learn more about your behaviors…where you go, what you buy, what you are interested in, who you talk to, and what your history is. This isn’t just scary big brother stuff. It’s quite practical. If you have multiple assistants for different things, they may each think of you and know you differently, thereby having a less complete picture. It’s really best for the consumer to have one assistant that knows you best.

For example, let’s take the simple case of finding food when I’m hungry. I might say “I’m hungry.” Then the assistant’s response would be much more helpful the more it knows about me. Does it know I’m a vegetarian? Does it know where I’m located, or whether I am walking or driving? Maybe it knows I’m home and what’s in my refrigerator, and can suggest a recipe…does it know my food/taste preferences? How about cost preferences? Does it have the history of what I have eaten recently, and knows how much variety I’d like? Maybe it should tell me something like “Your wife is at Whole Foods, would you like me to text her a request or call her for you?” It’s easy to see how these voice assistants could really be quite helpful the more it knows about you. But with multiple assistants in different products and locations, it wouldn’t be as complete. In this example it might know I’m home, but NOT know what’s in my fridge. Or it might know what’s in the fridge and know I’m home but NOT know my wife is currently shopping at Whole Foods, etc.

The more I use my assistant across more devices in more situations and over more time, the more data it could gather and the better it should get at servicing my needs and assisting me! It’s easy to see that once it knows me well and is helping me with this knowledge it will get VERY sticky and become difficult to get me to switch to a new assistant that doesn’t know me as well.

2. Entertainment and other service package sales.
Alexa came onto the scene in 2014 with one very special domain – Music. Amazon chose to do one thing really well, and that was make a speaker that could accept voice commands for playing songs, albums, bands, radio. Not long after that Alexa added new domains and moved into new platforms like Fire TV and the Fire stick controller. It’s no coincidence that an Amazon Music service and Amazon TV services both exist and you can wrap even more services into an Amazon Prime membership. When Assistants don’t support Spotify well, there are a lot of complaints. And it’s no surprise that Spotify has been reported to be developing their own assistant and speaker. In fact Comcast has their own voice control remotes. There’s a very close tie between the voice assistants and the services that they bring. Apple is restrictive in what Siri will allow you to listen for. They want to keep you within their eco-system where they make more money. (Maybe it’s this locked in eco-system that has given Apple a more relaxed schedule in improving Siri?). Amazon and Google are really not that different, although they may have different means of leading us to the services they want us to use, they still can influence our choices for media. Spotify has over 70M subscribers (20M paying), over 5 Billion in revenues and recently went public with about a $30B market cap…and Apple Music just overtook Spotify in terms of paying subscribers. Music streaming has turned the music industry into a growth business again. The market for video services is even bigger, and Amazon is one of the top content producers of video! Your assistant will have a lot of influence on the services you choose and how accessible they are. This is one reason why voice assistant providers might be willing to lose money in getting the assistants out to the market, so they can make more money on services. The battle of Voice Assistants is really a battle of who controls your media and your purchases!

3. Selling and recommending products to consumers
The biggest business in the world is selling products. It’s helped make Amazon, Google and Apple the giants that they are today. Google makes the money on advertising, which is an indirect form of selling products. What if your assistant knew what you needed whenever you needed it? It would uproot the entire advertising industry. Amazon has the ability to pull this off. They have the world’s largest online store, they know our purchase histories, they have an awesome rating system that really works, and they have Alexa listening everywhere willing to take our orders. Because assistants use a voice interface, there will be a much more serial approach to making recommendations and selling me things. For example, if I do a text search on a device for nearby vegan restaurants, I see a map with a whole lot of choices and long list of options. Typically these options could include side bars of advertising or “sponsored” restaurants first in the listing, but I’m supplied a long list. If I do a voice search on a smart speaker with no display, it will be awkward to give me more than a few results…and I’ll bet the results we hear will become the “sponsored” restaurants and products.

It would be really obnoxious if Alexa or Siri or Cortana or Google Assistant suddenly suggested I buy something that I wasn’t interested in, but what if it knew what I needed? For example, it could track vitamin usage and ask if I want more before they run out, or it could know how frequently I wear out my shoes, and recommend a sale for my brand and my size, when I really needed them. The more my assistant knows me the better it can “advertise” and sell me in a way that’s NOT obnoxious but really helpful. And of course making extra money in the process!

Voice Assistant Battles, part one

July 25, 2018

I have spoken on a lot of “voice” oriented shows over the years, and it has been disappointing that there hasn’t been more discussion about the competition in the industry and what is driving the huge investments we see today. Because companies like Amazon and Google participate in and sponsor these shows, there is a tendency to avoid the more controversial aspects of the industry. I wrote this blog to share some of my thoughts on what is driving the competition, why the voice assistant space is so strategically important to companies, and some of the challenges resulting from the voice assistant battles

In September of 2017 it was widely reported that Amazon had over 5000 employees working on Alexa with more than 1000 more to be hired. To use a nice round and conservative number, let’s assume an average Alexa employee’s fully weighted cost to Amazon is $200K. With about 6,000 employees on the Alexa team today, that would mean a $1.2 billion investment. Of course, some of this is recouped by the Echo’s and Dot’s bringing in profits, but when you consider that Dots sell for $30-$50 and Echos at $80-$100, it’s hard to imagine a high enough profit to justify the investment through hardware sales. For example, if Amazon can sell 30 million Alexa devices and make an average of $30 per unit profit, that only covers 75% of the cost of the conservative $1.2 billion investment.

Other evidence supporting the huge investments being made in voice assistants is the battle in advertising. Probably the most talked about thing at 2018’s CES show was the enormous position Google took in advertising the Google Assistant. In fact, if you watch any of the most expensive advertising slots on TV (SuperBowl, NBA finals, World Cup, etc.) you will see a preponderance of advertisements with known actors and athletes saying “Hey Google,” “Alexa,” or, “Hey Siri.” (Being in the wakeword business, I particularly like the Kevin Durant “Yo Google” ad!)

And it’s not just the US giants that are investing big into assistants: Docomo, Baidu, Tencent, Alibaba, Naver, and other large international players are developing their own or working with 3rd party assistants.

So what is driving this huge investment companies are making? It’s a multitude of factors including:

  1. Owning the cross platform user experience and collecting user data
  2. Entertainment and other service package sales
  3. Selling and recommending products to consumers

In my next blog, I’ll discuss these three factors in more detail, and in a final blog on this topic I will discuss the challenges being faced by consumer OEMs and service providers that must play in the voice assistant game to not lose out to service and hardware competition from Apple, Amazon, Google, and others.

Assistant vs Alexa: 8 things not discussed (enough)

October 14, 2016

I watched Sundar and Rick and the team at Google announce all the great new products from Google. I’ve read a few reviews and comparisons with Alexa/Assistant and Echo/Home, but it struck me that there’s quite an overlap in the reports I’m reading and some of the more interesting things aren’t being discussed. Here are a few of them, roughly in increasing order of importance:

  1. John Denver. Did anybody notice that the Google Home advertisement using John Denver’s Country Road song? Really? Couldn’t they have found something better? Country Roads didn’t make PlayBuzz’s list of the 15 best “home” songs or Jambase’s top 10 Home Songs Couldn’t someone have Googled “best home songs” to find something better?
  2. Siri and Cortana. With all the buzz about Amazon vs. Google, I’m wondering what’s up with Siri and Cortana? Didn’t see much commentary on that.
  3. AI acquisitions. Anybody notice that Google acquired API.ai? API.ai always claimed to have the highest rated voice assistant in the playstore. They called it “Assistant.” Hm. Samsung just acquired VIV – that’s Adam, Dag, Marco, and company that were behind the original Siri. Samsung has known for a while that they couldn’t trust Google and they always wanted to keep a distance.
  4. Assistant is a philosophical change. Google’s original positioning for its voice services were that Siri and Cortana could be personal assistants, but Google was just about getting to the information fast, not about personalities or conversations. The name “assistant” implies this might be changing.
  5. Google: a marketing company? Seems like Google used to pride itself of being void of marketing. They had engineers. Who needs marketing? This thinking came through loud and clear in the naming of their voice recognizer. Was it Google Voice, Google Now, OK Google? Nobody new. This historical lack of marketing and market focus was probably harmful. It would be fatal in an era of moving more heavily into hardware. That’s probably why they brought on Rick Osterloh, who understands hardware and marketing. Rick, did you approve that John Denver song?
  6. Data. Deep learning is all about data. Data that’s representative and labeled is the key. Google has been collecting and classifying all sorts of data for a very long time. Google will have a huge leg up on data for speech recognition, dialogs, pictures, video, searching, etc. Amazon is relatively new to the voice game, and it is at quite a disadvantage in the data game.
  7. Shopping. The point of all these assistants isn’t about making our lives better; it’s about getting our money. Google and Amazon are businesses with a profit motive, right? Google is very good at getting advertising dollars through search. Amazon is, among other things, very good at getting shoppers money (and they probably have a good amount of shopping data). If Amazon knows our buying habits and preferences and has the review system to know what’s best, then who wants ads? Just ship me what I need and if you get it wrong, let me return it hassle free. I don’t blame Google for trying to diversify. The ad model is under attack by Amazon through Alexa, Dash, Echo, Dot, Tap, etc.
  8. Personalization, privacy, embedded. Sundar talked a bit about personalization. He’s absolutely right that this is the direction assistants need to move (even if speaker verification isn’t built into the first Home units). Personalization occurs by collecting a lot of data about each individual user – what you sound like, how you say things, what music you listen to, what you control in your house, etc. Sundar didn’t talk much about privacy, but if you read user commentary on these home devices, the top issue by far relates to an invasion of privacy, which directly goes against personalization. The more privacy you give up, the more personalization you get. Unless… What if your data isn’t going to the cloud? What if it’s stored on your device in your home? Then privacy is at less risk, but the benefits of personalization can still exist. Maybe this is why Google briefly hit on the Embedded Assistant! Google gets it. More of the smarts need to move onto the device to ensure more privacy!

Speaking the language of the voice assistant

June 17, 2016

Hey Siri, Cortana, Google, Assistant, Alexa, BlueGenie, Hound, Galaxy, Ivee, Samantha, Jarvis, or any other voice-recognition assistant out there.

Now that Google and Apple have announced that they’ll be following Amazon into the home far-field voice assistant business, I’m wondering how many things in my home will always be on, listening for voice wakeup phrases. In addition, how will they work together (if at all). Let’s look at some possible alternatives:

Co-existence. We’re heading down a path where we as consumers will have multiple devices on and listening in our homes and each device will respond to its name when spoken to. This works well with my family; we just talk to each other, and if we need to, we use each other’s names to differentiate. I can have friends and family over or even a big party, and it doesn’t become problematic calling different people by different names.

The issue for household computer assistants all being on simultaneously is that false fires will grow in direct proportion to the number of devices on and listening. With Amazon’s Echo, I get a false fire about every other day, and Alexa does a great job of listening to what I say after the false fire and ignoring if it doesn’t seem to be an intended command. It’s actually the best performing system I’ve used and the fact that its starts playing music or talking every other week is a testament to what a good job they have done. However, interrupting my family every other week is not good enough. And if I have five always-listening devices interrupting us 10 times a month, that becomes unacceptable. And if they don’t do as good a job as Alexa, and interrupt more frequently, it becomes quite problematic.

Functional winners. Maybe each device could own a functional category. For example, all my music systems could use Alexa, my TV’s use Hi Galaxy, and all appliances are Bosch. Then I’d have less “names” to call out to and there would be some big benefits: 1) The devices using the same trigger phrase could communicate and compare what they heard to improve performance; 2) More relevant data could be collected on the specific usage models, thus further improving performance; and 3) With less names to call out, I’d have fewer false fires. Of course, this would force me as a consumer to decide on certain brands to stick to in certain categories.

Winner take all. Amazon is adopting a multi-pronged strategy of developing its own products (Echo, Dot, Tap, etc.) and also letting its products control other products. In addition, Amazon is offering the backend Alexa voice service to independent product developers. It’s unclear whether competitors will follow suit, but one thing is clear—the big guys want to own the home, not share it.

Amazon has a nice lead as it gets other products to be controlled by Echo. The company even launched an investment fund to spur more startups writing to Alexa. Consumers might choose an assistant we like (and we think performs well) and just stick with that across the household. The more we share with that assistant, the better it knows us, and the better it serves us. This knowledge base could carry across products and make our lives easier.

Just Talk. In the “co-existence” case previously mentioned, there six people in my household, so it can be a busy place. But when I speak to someone, I don’t always start with their name. In fact, I usually don’t. If there’s just one other person in the room, it’s obvious who I’m speaking to. If there are multiple people in the room, I tend to look at or gesture toward the person I’m addressing. This is more natural than speaking their name.

An “always listening” device should have other sensors to know things like how many people are in the room, where they’re standing and looking at, how they’re gesturing, and so on. These are the subconscious cues humans use to know who is talking to us, and our devices would be smarter and more capable if they could do it.

Google Assistant vs. Amazon’s Alexa

June 15, 2016

“Credit to the team at Amazon for creating a lot of excitement in this space,” Google CEO Sundar Pichai. He made this comment during his Google I/O speech last week when introducing Google’s new voice-controlled home speaker, Google Home which offers a similar sounding description to Amazon’s Echo. Many interpreted this as a “thanks for getting it started, now we’ll take over,” kind of comment.

Google has always been somewhat marketing challenged in naming its voice assistant. Everyone knows Apple has Siri, Microsoft has Cortana, and Amazon has Alexa. But what is Google’s voice assistant called? Is it Google Voice, Google Now, OK Google, Voice Actions? Even those of us in the speech industry have found Google’s branding to be confusing. Maybe they’re clearing that up now by calling their assistant “Google Assistant.” Maybe that’s the Google way of admitting it’s an assistant without admitting they were wrong by not giving it a human sounding name.

The combination of the early announcement of Google Home and Google Assistant has caused some to comment that Amazon has BIG competition at best, and at worst, Amazon’s Alexa is in BIG trouble.

Forbes called Google’s offering the Echo Killer, while Slate said it was smarter than Amazon’s Echo.

I thought I’d point out a few good reasons why Amazon is in pretty good shape:

  1. Google Home is not shipping. Google has a bit of a chicken-and-egg issue in that it needs to roll out a product that has industry support (for controlling third-party products by voice). How do you get industry partners without a product? You announce early! That was a smart move; now they just need to design it and ship it…not always an easy task.
  2. It’s about Voice Commerce. This is REALLY important. Many people think Google will own this home market because it has a better speech recognizer. Speech recognition capabilities are nice but not the end game. The value here is having a device that’s smart and trusted enough to take money out of our bank accounts and deliver us goods and services that we want when we want them. Amazon has a huge infrastructure lead here in products, reviews, shipping, and other key components of Internet commerce. Adding a convenient voice front end isn’t easy, but it’s also NOT the hardest part of enabling big revenue voice commerce systems.
  3. Amazon has far-field working and devices that always “talk back.” I admit the speech recognition is important, and Google has a lot of data, experience, and technologists in machine learning, AI, and speech recognition. But most of the Google experience is through Android and mobile-phone hardware. Where Amazon has made a mark is in far-field or longer distance recognition that really works, which is not easy to do. Speech recognition has always been about signal/noise ratios and far-field makes the task more difficult and requires acoustic echo cancellation, multiple microphones, plus various gain control and noise filtering/speech focusing approaches. Also, the Google recognizer was established around finding data through voice queries, most of such data being displayed on-screen (and often through search). The Google Home and Amazon Echo are no-screen devices. Having them intelligently talk back means more than just reading the text off a search. Google can handle this, of course, but it’s one more technical barrier that needs to be done right.
  4. Amazon has a head start and already is an industry standard. Amazon’s done a nice job with the Echo. It’s follow-on products, Tap and Dot, were intelligent offshoots. Even its Fire TV took advantage of in-house voice capabilities. The Alexa Voice Services work well and already are acting like a standard for voice control. Roughly three million Amazon devices have already sold, and I’d guess that in the next year, the number of Alexa connected devices will double through both Amazon sales and third parties using AVS. This is not to mention the tens of millions of devices on the market that can be controlled by Echo or other Amazon hardware. Amazon is pretty well entrenched!

Of course, Amazon has its challenges as well, but I’ll leave that for another blog.

TrulyHandsfree 4.0… Maintaining the big lead!

August 6, 2015

We first came out with TrulyHandsfree about five years ago. I remember talking to speech tech executives at MobileVoice as well as other industry tradeshows, and when talking about always-on hands-free voice control, everybody said it couldn’t be done. Many had attempted it, but their offerings suffered from too many false fires, or not working in noise, or consuming too much power to be always listening. Seems that everyone thought a button was necessary to be usable!

In fact, I remember the irony of being on an automotive panel, and giving a presentation about how we’ve eliminated the need for a trigger button, while the guy from Microsoft presented on the same panel the importance of where to put the trigger button in the car.

Now, five years later, voice activation is the norm… we see it all over the place with OK Google, Hey Siri, Hey Cortana, Alexa, Hey Jibo, and of course if you’ve been watching Sensory’s demos over the years, Hello BlueGenie!

Sensory pioneered the button free, touch free, always-on voice trigger approach with TrulyHandsfree 1.0 using a unique, patented keyword spotting technology we developed in-house– and from its inception, it was highly robust to noise and it was ultra-low power. Over the years we have ported it to dozens of platforms, Including DSP/MCU IP cores from ARM, Cadence, CEVA, NXP CoolFlux, Synopsys and Verisilicon, as well as for integrated circuits from Audience, Avnera, Cirrus Logic, Conexant, DSPG, Fortemedia, Intel, Invensense, NXP, Qualcomm, QuickLogic, Realtek, STMicroelectronics, TI and Yamaha.

This vast platform compatibility has allowed us to work with numerous OEMs to ship TrulyHandsfree in over a billion products!

Sensory didn’t just innovate a novel keyword spotting approach, we’ve continually improved it by adding features like speaker verification and user defined triggers. Working with partners, we lowered the draw on the battery to less than 1mA, and Sensory introduced hardware and software IP to enable ultra-low-power voice wakeup of TrulyHandsfree. All the while, our accuracy has remained the best in the industry for voice wakeup.

We believe the bigger, more capable companies trying to make voice triggers have been forced to use deep learning speech techniques to try and catch up with Sensory in the accuracy department. They have yet to catch up, but they have grown their products to a very usable accuracy level, through deep learning, but lost much of the advantages of small footprint and low power in the process.

Sensory has been architecting solutions for neural nets in consumer electronics since we opened the doors more than 20 years ago. With TrulyHandsfree 4.0 we are applying deep learning to improve accuracy even further, pushing the technology even more ahead of all other approaches, yet enabling an architecture that has the ability to remain small and ultra-low power. We are enabling new feature extraction approaches, as well as improved training in reverb and echo. The end result is a 60-80% boost in what was already considered industry-leading accuracy.

I can’t wait for TrulyHandsfree 5.0…we have been working on it in parallel with 4.0, and although it’s still a long ways off, I am confident we will make the same massive improvements in speaker verification with 5.0 that we are doing for speech recognition in 4.0! Once again further advancing the state of the art in embedded speech technologies!

OK, Amazon!

May 4, 2015

I was at the Mobile Voice Conference last week and was on a keynote panel with Adam Cheyer (Siri, Viv, etc.) and Phil Gray (Interactions) with Bill Meisel moderating. One of Bills questions was about the best speech products, and of course there was a lot of banter about Siri, Cortana, and Voice Actions (or GoogleNow as it’s often referred to). When it was my turn to chime in I spoke about Amazon’s Echo, and heaped lots of praise on it. I had done a bit of testing on it before the conference but I didn’t own one. I decided to buy one from Ebay since Amazon didn’t seem to ever get around to selling me one. It arrived yesterday.

Here are some miscellaneous thoughts:

  • Echo is a fantastic product! Not so much because of what it is today but for the platform it’s creating for tomorrow. I see it as every bit as revolutionary as Siri.
  • The naming is really confusing. You call it Alexa but the product is Echo. I suspect this isn’t the blunder that Google made (VoiceActions, GoogleNow, GoogleVoice, etc.), but more an indication that they are thinking of Echo as the product and Alexa as the personality, and that new products will ship with the same personality over time. This makes sense!
  • Setup was really nice and easy, the music content integration/access is awesome, the music quality could be a bit better but is useable; there’s lots of other stuff that normal reviewers will talk about…But I’m not a “normal” reviewer because I have been working with speech recognition consumer electronics for over 20 years, and my kids have grown up using voice products, so I’ll focus on speech…
  • My 11 year old son, Sam, is pretty used to me bringing home voice products, and is often enthusiastic (he insisted on taking my Vocca voice controlled light to keep in his room earlier this year). Sam watched me unpack it and immediately got the hang of it and used it to get stats on sports figures and play songs he likes. Sam wants one for his birthday! Amazon must have included some kids voice modeling in their data because it worked pretty well with his voice (unlike the Xbox when it first shipped, which I found particularly ironic since Xbox was targeting kids).
  • The Alexa trigger works VERY well. They have implemented beamforming and echo cancellation in a very state of the art implementation. The biggest issue is that it’s a very bandwidth intensive approach and is not low power. Green is in! That could be why its plug-in/AC only and not battery powered. Noise near the speaker definitely hurts performance as does distance, but it absolutely represents a new dimension in voice usability from a distance and unlike with the Xbox, you can move anywhere around it, and aren’t forced to be in a stationary position (thanks to their 7 mics, which surely must be overkill!)
  • The voice recognition in generally is good, but like all of the better engines today (Google, Siri, Cortana, and even Sensory’s TrulyNatural) it needs to get better. We did have a number of problems where Alexa got confused. Also, Alexa doesn’t appear to have memory of past events, which I expect will improve with upgrades. I tried playing the band Cake (a short word, making it more difficult) and it took about 4 attempts until it said “Would you like me to play Cake?” Then I made the mistake of trying “uh-huh” instead of “yes” and I had to start all over again!
  • My FAVORITE thing about the recognizer is that it does ignore things very nicely. It’s very hard to know when to respond and when not to. The Voice Assistants (Google, Siri, Cortana) seem to always defer to web searches and say things like “It’s too noisy” no matter what I do, and I thought Echo was good at deciding not to respond sometimes.

OK, Amazon… here’s my free advice (admittedly self-serving but nevertheless accurate):

  • You need to know who is talking and build models of their voices and remember who they are and what their preferences are. Sensory has the BEST embedded speaker identification/verification engine in the world, and it’s embedded so you don’t need to send a bunch of personal data into the cloud. Check out TrulySecure!
  • In fact, if you added a camera to Alexa, it too could be used for many vision features, including face authentication.
  • Make it battery powered and portable! To do this, you’d need an equally good embedded trigger technology that runs at low power – Check out TrulyHandsfree!
  • If it’s going to be portable, then it needs to work if even when not connected to the Internet. For this, you’d need an amazing large vocabulary embedded speech engine. Did I tell you about TrulyNatural?
  • Of course, the hope is that the product-line will quickly expand and as a result, you will then add various sensors, microphones, cameras, wheels, etc.; and at the same time, you will also want to develop lower cost versions that don’t have all the mics and expensive processing. You are first to market and that’s a big edge. A lot of companies are trying to follow you. You need to expand the product-line quickly, learning from Alexa. Too many big companies have NIH syndrome… don’t be like them! Look for partnering opportunities with 3rd parties who can help your products succeed – Like Sensory! ;-)

Touch-less Control Wins!

June 9, 2014

I still subscribe to the San Jose Mercury News, as they do a good job of tech business reporting. One of my favorite Mercury News writers is a true critic in the literary sense of the term, Troy Wolverton. Troy rarely raves and is typically critical, but in a smart, logical, and unemotional way.

A few days back he started writing about Microsoft’s  Cortana and said “Watch out Siri, someone wants your job.”

I was eager to read his review of Cortana this morning and in particular his comparison with Siri. He ended up giving it a 7/10, and concluding Siri was still ahead. What I thought was most interesting though was that in his final summary, he compared three products and three assistants based on the ease of calling up each of those assistants:

  • Cortana – required two touch steps to activate the personal voice assistant
  • Siri – required one touch step to activate the personal voice assistant
  • MotoX – The best, because you can just start talking with the keyword phrase “OK Google Now” making a TrulyHandsfree experience!!

Motorola is Sensory’s customer, and I am happy to read that Troy gets it and considers this front end activation an important metric in comparing personal assistants!

« Older Entries