Why Major Global Companies Are Adopting Voice AI Strategies

It’s no secret, the largest companies in the world today are the biggest players in voice technology. Amazon, Google, and Microsoft have each invested billions of dollars into speech recognition and NLU technology and they aren’t the only big investors in this space.

What isn’t generally understood and is rarely discussed by analysts, is the motivations behind these big investments. The motivations are often misinterpreted, and the big picture can be easily lost or misunderstood. For example, the Voice Assistant in smart speakers is often viewed as a play to grow sales model in consumer electronics. That may be true for Apple or Sonos, but it certainly isn’t the motivation for Amazon or Google. Amazon speakers get priced close to cost on Prime days and with returns and shipping factored in they likely lose money just to build market share.

I’ve looked at the 10 biggest players in voice AI today and have a pretty good idea of what the driving force is for most of these major investments in voice technologies, NLU, and even AI in general. And let me clearly state that although my company Sensory has non-disclosure agreements in place with pretty much every company mentioned here, I am not sharing any confidential information. I am simply sharing my educated guesses based on public information and common industry knowledge. Some of what I say may seem obvious, some will disagree with my opinions. What I think everyone will find interesting is the large variance in motivations driving the growth of voice and speech technology-related investments. My methodology is simple…I look at their core businesses and how they are using voice to enhance those basic business models.

1) Amazon.  Amazon has done an amazing job with voice tech. They came out of nowhere and beat everyone to the punch with the Echo smart speaker, and it was one of the better performing voice AI-enabled consumer electronics right off the bat! Amazon made several smart and cheap acquisitions to get there, but they have spent a fortune on hiring literally thousands of employees to be a part of the Alexa/Echo development and commercialization teams; an investment that was likely necessary to overcome their lack of a mobile platform to commercialize a voice assistant. Amazon has a lot of businesses driving this strategy, but I’ll discuss the main ones:

  • AWS – Amazon’s Cloud offering is at close to an $80B run rate and its growing fast and very profitable… so profitable that it contributes to half of Amazons overall profits. They offer AI and speech tech to bring people into their cloud!
  • Amazon Shopping – There’s a hope for voice shopping; it hasn’t taken off yet but it’s a big motivator, as this is Amazons largest revenue source and voice shopping would be an easier way to find things… if it works well!
  • Amazon Prime – With over 200 million members this is a big business, and a variety of services are or can be embedded inside of it. Voice can be a key benefit in service functions like media requests and as a Voice User Interface (VUI).

2) Apple. Apple has often been criticized for Siri falling behind Alexa and Google. But Apple is smart. They have invested wisely in technology, they appear to care more about privacy, and they aren’t as focused on advertising revenues. So, the key drivers for voice tech at Apple appear to be to improve ease of use and communication in consumer products and services…the Voice User Interface!

3) Baidu. Baidu is often considered the Google of China. Like Google, it leads in search and has invested heavily in voice, and for many years competed against the leading US firms in announcing lower- and lower-word error rates and performance metrics. Because Baidu has a heavy internet-based business the Xiaodu Technology group investment must be driven by maintaining their strong position in China for search, i.e. extending it to voice search. It’s also likely that Xiaodu will be heavily used as a VUI for Chinese electronics as that has already started happening. What’s interesting is that the US companies view the voice assistants as strategic to their overall business and they put their money to work to invest in it. Xiaodu Technologies was spun off as an independent company (and has recently raised money based on a $5B valuation) so it appears they are positioning as a standalone Voice OS (DuerOS) or Voice Assistant technology play.

4) Cerence. I almost didn’t include Cerence (the auto-focused spinoff of Nuance, before Nuance was acquired by Microsoft). But Cerence is the largest independent provider of voice tech in the US, so they are worth mentioning. They sell voice tech as a business model. This is an increasingly tough model, and it’s likely that Cerence will want to be acquired by one of the larger companies that is willing to pay for a stronger seat in automotive.

5) Facebook/Meta. Meta invests heavily in voice technology and other AI technologies, but their strategy is not very clear. They create a lot of good open source software, they have acquired a few companies in the space, and they have introduced a few voice or NLU based products with marginal receptions and minimal commitment. They didn’t put a lot of muscle behind Facebooks M Virtual Assistant or even their Portal speaker. They have talked about a voice assistant for Oculus for some time. More recently they have brought meager capabilities into Oculus, they just announced Project Cairaoke, and they have a nice video demo of virtual Zuk on a beach adding features by voice using BuilderBot. I suspect Meta knows that voice tech and AI are a critical part of their future, and they are being careful to not over invest too early. My guess is we will see increasing capabilities brought into Oculus as Microsoft and Meta vie for voice dominance in AR/VR.

6) Google. Alphabet just had a record year growing over 40% to $258B in sales. Google search and Advertising accounted for about $209B of those sales. Google dominates in search and therefore advertising and maintaining that dominance with the emergence of voice search is extremely strategic to Alphabet; outside of Google search/ads all the other Alphabet revenues lead to losses! Even the successful Google Cloud which generates over $5B annually is losing money. Of course, their AI for voice, text and dialog helps bring people into the Google Cloud, but the real strategic driver for Alphabet isn’t Cloud, or a VUI for products, it’s to maintain the Google search/advertising juggernaut position.

7) Microsoft. Microsoft has done several high-priced acquisitions in the voice/NLU/AI space for a long time. They hire top talent and have likely made the largest investment in Voice/NLU over time. They took a quick stab at playing in the consumer smart speaker space and quickly backed off to focus on PC and business productivity. With their acquisitions of Nuance for Healthcare and Activision for gaming, it seems like they have a more unique use for voice tech as a play for vertical market dominance. This is a smart strategy and I’d expect them to make more acquisitions where their skills in voice and AI can lead to market advantage.

8) Nvidia. Intel and Qualcomm have dabbled in voice tech, NLU, and other AI, but Nvidia is going all in. It’s a great way to sell more chips IF you have market-leading software. When it comes to voice, Nvidia’s Riva offering is quite powerful and comprehensive and Nvidia is investing heavily. Nvidia’s $600B dollar market cap is worth more than TI, Qualcomm, and Intel combined, and driving their valuation is both strong revenues and profits but also a belief that AI is a good strategic investment. Nvidia GPU’s go into gaming devices, data centers, and a whole lot of other places. AI processing is a growing business for them and having AI software to go with their chips will only strengthen their positions and allow them to extend their reach into more markets and customer segments. Few chip companies have been able to master hardware and software together; this strategy is winning for Nvidia.

9) Oracle. Oracle entered the cloud computing space later than other companies, but like every cloud player, they are seeing nice growth opportunities. They’re even offsetting their declining hardware revenues. To compete with Google, Amazon and Microsoft in Enterprise Cloud and software offerings Oracle had to have conversational assistant technology and they introduced the Oracle Digital Assistant in 2019. Without the consumer focus of Google or Amazon, it’s difficult to see how this is playing out, but as their cloud platform grows, it doesn’t appear to be hindering them.

10) Samsung. Samsung has always been good at hardware and more challenged at software. They typically sit in or near the top sales spot for mobile phones, chips, home appliances, and more. Like Apple they have the VUI need for voice tech. Like Nvidia they have the strategic need to add value to and help sell more chips, and like Google, Samsung would like a spot in voice search. Voice has been challenging for Samsung. They have invested heavily in voice tech acquisitions and building voice teams and even went as far as creating their own Tizen OS to avoid dependency and strengthen their negotiating power with Google. Today they have Bixby, but they still use Google on phones. Their voice team is in the mobile phone group and not cross-company, and it’s difficult to understand their big picture strategy as it changes yearly with major management overhauls.

With so much money flowing into voice AI and VUI development, it’s clear to see that major technology companies are motivated by a few common trends – growing their cloud AI businesses, binding voice tech to selling more hardware, and bolstering their own products and services’ value with intelligent VUIs.

Now you might be wondering how a company like Sensory fits into the picture. Think of Sensory as being a white label voice recognition and biometric solutions provider. We are highly flexible, meaning we work with pretty much all the major platforms (Android, iOs, Linux, etc) and we can even make our tech fit on small chips for deeply embedded applications. Our motivations are in providing the best technology for our clients’ needs while keeping data safe and private for consumers. We’ve just entered the cloud SaaS space with SensoryCloud.ai, and we’re leveraging decades of experience in voice and computer vision technology to create even more flexible high-performance solutions. In a nutshell we’ve known that voice recognition technology will only grow in adoption across products and industries, so we plan to keep providing/creating/innovating solutions to meet the demand.