HEAR ME -
Speech Blog
HEAR ME - Speech Blog  |  Read more December 9, 2019 - Can Your Assistant Deliver?
HEAR ME - Speech Blog

Archives

Categories

Posts Tagged ‘AI’

The Move Towards On-Device Assistants for Performance and Privacy

February 11, 2019

Voice Assistants are growing in both popularity and capability. They are arriving in our home, cars, mobile devices and seem to now be a standard part of American culture, entering our tv shows, movies, music, and Super Bowl ads. However, this popularity is accompanied by a persistent concern over our privacy and the safety of our personal data when these devices are always listening and always watching.

There is a significant distrust of big companies like Facebook, Google, Apple, and Amazon. Facebook and Google have admitted to misusing our private data, and Apple and Amazon have admitted that system failures have led to a loss of private data.

So naturally, there would be an advantage of not sending our voices or videos into the cloud and doing the processing on-device. Then no data loss is at risk. Cloud-based queries could still occur, but through anonymized text only.

COMPUTING AT THE EDGE VERSUS THE CLOUD
There are forces bringing us closer to edge-based assistants and there are other forces leading to data going through the cloud. Here are a few ideas to consider.

  • Power and Memory. There is no doubt that cloud-based solutions offer more power and memory, and deep learning approaches can certainly take advantage of those features. However, access speed and available bandwidth are often issues giving an edge to working on-device. Current, state of the art deep net modeling can allow limited domain natural language engines that require substantially less memory and MIPS than general purpose models, making natural language on device realistic today. Furthermore, powerful on-device voice experiences are increasingly realistic as we pack more and more memory and MIPS into smaller and cheaper packages. New chip architectures targeting deep learning methodologies can also lead to on-device breakthroughs and these designs are now hitting the markets.
  • Accuracy. Although power and memory may be key factors in influencing accuracy, an on-device assistant may be able to take advantage of sensor and usage data and other embedded information not available to the cloud-based assistant so that it can better adapt to users and their preferences.
  • Privacy. Not sending data to the cloud is more private.

Some have argued that we have carried microphones and cameras around with us for years without any issues, but I see this thinking as flawed. Just recently, Apple admitted to a facetime bug on mobile phones enabling “eavesdropping” on others.

Also, if my phone is listening for a wake word it’s a very different technology model than an IoT device that’s “always on.” Phones are usually designed to listen in arms-length situations of 2 or 3 feet. An IoT speaker is designed to listen to 20 feet! If we assume constant noise across a room that could make an assistant “false fire” and start listening, then we can think of 2 listening circles, one with a radius of 3 feet and one with a radius of 20 feet, to compare the listening area of the phone with a far-field IoT device such as a smart speaker. The phone has a listening area of π r2 or 9 π, the IoT device has a listening area of 400 π. So, all else equal the IoT device is about 44 times more likely to false fire and start listening when it wasn’t intended to.

As cloud-based far-field assistants enter the home there is a definite risk of our private data getting intercepted. It’s not just machine errors but human errors too, like the Amazon employee that accidentally sent out the wrong data to a person that requested it.

There are also other means in which we can lose our cloud-connected private data like the “dolphin attack” that can allow outsiders to listen in.

  • The will of Amazon, Google, Apple, Government, and others. We should not underestimate the market power and persuasiveness of these tech giants. They want to open our wallets, and the best way to do that is to present us with things we want to buy…whether, food, shelter, gifts or whatever. Amazon is pretty good at selling us stuff. Google is pretty good at making money connecting people with things they want and showing them ads. User data makes all of these things easier and more effective. More effective means they make more money showing us ads and selling us stuff. I suspect that most of these giant players will have strong incentives to keep our assistants and our data flowing into the cloud. Of course, tempering this will is the various govt agencies trying to protect consumer privacy. Europe has launched GDPR (ironically leading to the Amazon accident mentioned above!) which could provide some disincentives around using cloud-based services.

ON-DEVICE VOICE ASSISTANTS WILL BECOME MORE COMMON
My conclusion is that there is a lot of opportunity in bringing assistants onto devices. It can not only protect privacy but through adaptation and domain limitation it can create a better-customized user experience. I predict increasingly more products to use on-device voice control and assistants! Of course, I also predict increasingly more devices to use cloud assistants. What wins out, in the long run, will probably depend more on government legislation and individual privacy concerns than anything else.

Embedded AI is here

February 10, 2017

The wonders of deep learning are well utilized in the area of artificial intelligence, aka AI. Massive amounts of training data can be processed on very powerful platforms to create wonderful generalized models, which can be extremely accurate. But this in and of itself is not yet optimal, and there’s a movement afoot to move the intelligence and part of the learning onto the embedded platforms.

Certainly, the cloud offers the most power and data storage, allowing the most immense and powerful of systems. However, when it comes to agility, responsiveness, privacy, and personalization, the cloud looks less attractive. This is where edge computing and shallow learning through adaptation can become extremely effective. “Little” data can have a big impact on a particular individual. Think how accurately and how little data is required for a child to learn to recognize its mother.

A good example of specialized learning is when it comes to accents or speech impediments. Generalized acoustic models often don’t handle this well, resulting in customized models for different markets and accents. However, this customization is difficult to manage, can add to the cost of goods, and may negatively impact the user experience. Yet, this still results in a model generalized for a specific class of people or accents. An alternative approach could begin with a general model built with cloud resources, with the ability to adapt on the device to the distinct voices of the people that use it.

The challenge with embedded deep learning occurs in its limited resources and the need to deal with on-device data collection, which by its nature, will be less plentiful, unlabeled, yet more targeted. New approaches are being implemented such as teacher/student models where smaller models can be built from a wider body of data, essentially turning big powerful models into small powerful models that imitate the bigger ones while getting similar performance.

Generative data without supervision can also be deployed for on-the-fly learning and adaptation. Along with improvements in software and technology, the chip industry is going through somewhat of a deep learning revolution, adding more parallel processing and specialized vector math functions. For example, GPU vendor nVidia taking has some exciting products that take advantage of deep learning. Some smaller private embedded deep learning IP companies like Nervana, Movidius, and Apical are getting snapped up in highly valued acquisitions from larger companies like Intel and ARM.

Embedded deep learning and embedded AI is here.