HEAR ME -
Speech Blog
HEAR ME - Speech Blog  |  Read more June 11, 2019 - Revisiting Wake Word Accuracy and Privacy
HEAR ME - Speech Blog

Archives

Categories

Archive for the ‘consumer electronics’ Category

Sensory CEO On Voice-Activated Technology’s Next Big Wave

January 11, 2019

Interview with Karen Webster, one of the best writers and interviewers in tech/fintech.

In 1994 the fastest imaginable connection to the internet was a 28.9 kbps dial-up modem and email was still mostly a new thing that many people were writing off as a fad. There was no such thing as Amazon.com for the first half the year and less than a third of American households owned computers. Given that, it’s not much of a surprise that the number of people thinking about voice-activated, artificial intelligence (AI)-enhanced wireless technology was extremely small — roughly the same as the number of people putting serious thought into flying cars.

But the team at Sensory is not quite as surprised by the rapid onset evolution of the voice-activated technology marketplace as everyone else may be — because when they were first opening their doors 25 years ago in 1994, this is exactly the world they had hoped to see developing two-and-a-half decades down the line, even if the progress has been a bit uneven.

“We still have a long way to go,” Sensory CEO Todd Mozer told Karen Webster in a recent conversation. “I am excited about how good speech recognition has gotten, but natural language comprehension still needs a lot of work. And combined the inputs of all the sensors devices have — for vision and speech together to make things really smart and functional in context — we just aren’t there yet.”

But for all there is still be to done, and advances that still need to be made, the simple fact that the AI-backboned neural net approach to developing for interactive technology has become “more powerful than we ever imagined it would be with deep learning,” is a huge accomplishment in and of itself.

And the accomplishments are rolling forward, he noted, as AI’s reach and voice control of devices is expanding — and embedding — and the nascent voice ecosystem is quickly growing into its adolescent phase.

“Today these devices do great if I need the weather or a recipe. I think in the future they will be able to do far more than that — but they will be increasingly be invisible in the context of what we are otherwise doing.”

Embedding The Intelligence

Webster and Mozer were talking on the eve of the launch of Sensory’s VoiceGenie for Bluetooth speaker — a new product for speaker makers to add voice controls and functions like wake words, without needing any special apps or a Wi-Fi connection. Said simply, Mozer explained, what Sensor is offering for Bluetooth makers is embedded voice — instead of voice via connection to the cloud.

And the expansion into embedded AI and voice control, he noted, is necessary, particularly in the era of data breach, cyber-crime and good old-fashioned user error with voice technology due to its relative newness.

“There are a lot of sensors on our products and phones that are gathering a lot of interesting information about what we are doing and who we are,” Mozer said.

Apart from being a security problem to send all of that information to the cloud, embedding in devices the ability to extract usefully and adapt on demand to a particular user is an area of great potential in improving the devices we all use multiple times daily.

This isn’t about abandoning the cloud, or even a great migration away from it, he said; there’s always going to be a cloud and clients for it. The cloud natively has more power, memory and capacity than anything that can be put into a device at this point on a cost-effective basis.

“But there is going to be this back-and-forth and things right now are swinging toward more embedded ability on devices,” he said. “There is more momentum in that direction.”

The cloud, he noted, will always be the home of things like transactions, which will have to flow through it. But things like verification and authentication, he said, might be centered in the devices’ embedded capacity, as opposed to in the cloud itself.

The Power Of Intermediaries

Scanning the headlines of late in the world of voice connection and advancing AI, it is easy to see two powerful players emerging in Amazon and Google. Amazon announced Alexa’s presence on 100 million devices, and Google immediately followed up with an announcement of its own that Google Assistant will soon be available on over a billion devices.

Their sheer size and scale gives those intermediaries a tremendous amount of power, as they are increasingly becoming the connectors for these services on the way to critical mass and ubiquity, Webster remarked.

Mozer agreed, and noted that this can look a little “scary” from the outside looking in, particularly given how deeply embedded Amazon and Google otherwise are with their respective mastery of eCommerce and online search.

Like many complex ecosystems, Mozer said that the “giants” — Amazon, Google and Apple to a lesser extent — are both partners and competitors, adding that Sensory’s greatest value to the voice ecosystem is when something that is very customized tech and requires a high level of accuracy and customer service features is needed. Sensory’s technology appears in products by Google, Alibaba, Docomo and Amazon, to name a few.

But ultimately, he noted, the marketplace is heading for more consolidation — and probably putting more power in the hands of very few selected intermediaries.

“I don’t think we are going to have 10 different branded speakers. There will be some kind of cohesion — someone or maybe two someones will kick butt and dominate, with another player struggling in third place. And then a lot of players who aren’t players but want to be. We’ve seen that in other tech, I think we will see it with voice.”

As for who those winning players will be, Google and Amazon look good today, but, Mozer noted, it’s still early in the race.

The Future of Connectedness

In the long term future, Mozer said, we may someday look back on all these individual smart devices as a strange sort of clutter from the past, when everyone was making conversation with different appliances. At some point, he ventured, we may just have sensors embedded in our heads that allow us to think about commands and have them go through — no voice interface necessary

“That sounds like science fiction, but I would argue it is not as far out there as you think. It won’t be this decade, but it might be in the next 50 years.”

But in the more immediate — and less Space Age — future, he said, the next several years will be about enhancing and refining voice technologies ability to understand and respond to human voice — and, ultimately, to anticipate the needs of human users.

There won’t be a killer app for voice that sets it on the right path, according to Mozer; it will simply be a lot of capacity unlocked over time that will make voice controls the indispensable tools Sensory has spent the last 25 years hoping they would become.

“When a device is accurate in identifying who you are, and carrying out your desires seamlessly, that will be when it finds its killer function. It is not a thing that someone is going to snap their fingers and come out with,” he said, “it is going to be an ongoing evolution.”

Voice assistant battles, part three: The challenges

August 13, 2018

It’s not easy to be a retailer today when more and more people are turning to Amazon for shopping. And why not shop online? Ordering is convenient with features such as ratings. Delivery is fast and cheap, and returns are easy and free – if you are Prime member! In April 2018 Bezos reported there are more than 100 million Prime members in the world, and the majority of US households are Prime members. Walmart and Google have partnered in an ecommerce play to compete with Amazon, but Walmart is just dancing with the devil. Google will use the partnership to gather data and invest more in their internal ecommerce and shopping experiences. Walmart isn’t relaxing, and is aggressively pursuing ecommerce and AI initiatives through acquisitions, and its Store #8 that acts as an incubator for AI companies and internal initiatives. Question: why does Facebook have a Building 8 and Walmart have a Store 8 for skunkworks projects?

It’s not just the retailers that are under pressure, though. If you make consumer electronics it’s getting more challenging too. Google controls the Android eco-system and is pumping a lot of money into centralizing and hiring around their hardware development efforts. Google is competing against the mobile phones of Samsung, Huawei, LG, Oppo, Vivo, and other users of their Android OS. And Amazon is happy to sell other people’s hardware online (OK, not Google, but others), but they take a nice commission on those sales, and if it’s a hit product they find ways to make more money through Amazon’s in house brands and warehousing, and potentially even making the product themselves. The Alexa fund has financed companies that created Alexa based hardware products that Amazon ended up competing against with in-house developments,and when Amazon sells Alexa products it doesn’t need to make a big profit (as described in part one). And Apple… well, they have a history of extracting money from anyone that wants to play in their eco-system too. This is business and there’s a very good reason that Google, Amazon, Apple, and other giants are giants. They know how to make money on everything they do. They are tough to compete with. The “free” stuff consumers get (and we do get a lot!) isn’t really free. We are trading our data and personal information for it.

So retailers have it tough (and assistants will make it even tougher), service providers have it tough (and assistants with service offerings make it even tougher), and consumer electronic companies have it tough. But the toughest situation is for the speaker companies. The market for speakers is exploding driven by the demand for “smart” speakers. Markets and Markets research report the current smart speaker market at over $2.6B and growing at over 34% a year. Seems like that would be a sweet market to be in, but a lot of that growth is eating away at the traditional speaker market. So a speaker company gets faced with a few alternatives:

  1. Partner with voice assistants within the eco-system of their biggest competitors (Google, Apple, Amazon, etc.). This would give all the data collected to their competitors and put them at the mercy of their competitors systems.
  2. Develop and support an in house solution which could cost WAY too much to maintain, or
  3. Use a 3rd party solution which is likely to cost a lot more and underperform compared to the big guys that are pumping billions of dollars each year into enhancing their AI offerings.

Many are choosing option 1 only to find that their sales are poor because of better quality lower priced offering from Google and Amazon. A company like Sonos, who is a leader in high quality wifi speakers has chosen option 1 with a twist where they are trying to support Google and Amazon and Apple. Their recent IPO filing highlights the challenges well:

”Our current agreement with Amazon allows Amazon to disable the Alexa integration in our Sonos One and Sonos Beam products with limited notice. As such, it is possible that Amazon, which sells products that compete with ours, may on limited notice disable the integration, which would cause our Sonos One or Sonos Beam products to lose their voice-enabled functionality. Amazon could also begin charging us for this integration which would harm our operating results.”

They further highlighted that their lack of service integrations could be a challenge should Google, Amazon or others offer discounting (which is already happening): “Many of these partners may subsidize these prices and seek to monetize their customers through the sale of additional services rather than the speakers themselves,” the company said. “Our business model, by contrast, is dependent on the sale of our speakers. Should we be forced to lower the price of our products in order to compete on a price basis, our operating results could be harmed.” Looking at Sono’s financials you can see their margins already starting to erode.

Some companies have attempted #2 above by bringing out in house Assistants using open-source speech recognizers like Kaldi. This might save the cost of deploying third party solutions but it requires substantial in house efforts, and is ultimately fraught with the same challenges as #3 above which is that it’s really hard to compete against companies approaching a trillion dollar market capitalization when these companies see AI and voice assistants as strategically important and are investing that way.

Retailers, Consumer OEMs, and Service providers all have a big challenge. I run a small company called Sensory. We develop AI technologies, and companies like Google, Amazon, Samsung, Microsoft, Apple, Alibaba, Tencent, Baidu, etc. are our customers AND our biggest competitors. My strategy? Move fast, innovate, and move on. I can’t compete head to head with these companies, but when I come out with solutions that they need BEFORE they have it in house, I get a 1-3 year window to sell to them before they switch to an in house replacement. That’s not bad for a small company like Sensory. For a bigger company like a Sonos or a Comcast, they could deploy the same general strategy to set up fast moving innovation pieces that allow them to stay ahead of the game. This appears to be the exact strategy that Walmart is taking on with Store 8 to not be left behind! Without doubt, it’s very tough competing in a world of giants that have no boundaries in their pursuits and ambitions!

Voice assistant battles, part two: The strategic importance

August 6, 2018

Here’s the basic motivation that I see in creating Voice Assistants…Build a cross platform user experience that makes it easy for consumers to interact, control and request things through their assistant. This will ease adoption and bring more power to consumers who will use the products more and in doing so create more data for the cloud providers. This “data” will include all sorts of preferences, requests, searches, purchases, and will allow the assistants to learn more and more about the users. The more the assistant knows about any given user, the BETTER the assistant can help the user in providing services such as entertainment and assisting with purchases (e.g. offering special deals on things the consumer might want). Let’s look at each of these in a little more detail:

1. Owning the cross platform user experience and collecting user data to make a better Voice Assistants. ​
For thousands of years consumers interacted with products by touch. Squeezing, pressing, turning, and switching were all the standard means of controlling. The dawn of electronics really didn’t change this and mechanical touch systems became augmented with electrical touch mechanisms. Devices got smarter and had more capabilities but the means to access these capabilities got more confusing with more complicated interfaces and a more difficult user experience. As new sensory technologies began to be deployed (such as gesture, voice, pressure sensors, etc.) companies like Apple emerged as consumer electronics leaders because of their ability to package consumer electronics in a more user friendly manner. With the arrival of Siri on the iPhone and Alexa in the home, voice first user experiences are driving the ease of use and naturalness of interacting with consumer products. Today we find companies like Google and Amazon investing heavily into their hardware businesses and using their Assistants as a means to improve and control the user experience.

Owning the user experience on a single device is not good enough. The goal of each of these voice assistants is to be your personal assistant across devices. On your phone, in your home, in your car, wherever you may go. This is why we see Alexa and Google and Siri all battling for, as an example, a position in automotive. Your assistant wants to be the place you turn for consistent help. In doing so it can learn more about your behaviors…where you go, what you buy, what you are interested in, who you talk to, and what your history is. This isn’t just scary big brother stuff. It’s quite practical. If you have multiple assistants for different things, they may each think of you and know you differently, thereby having a less complete picture. It’s really best for the consumer to have one assistant that knows you best.

For example, let’s take the simple case of finding food when I’m hungry. I might say “I’m hungry.” Then the assistant’s response would be much more helpful the more it knows about me. Does it know I’m a vegetarian? Does it know where I’m located, or whether I am walking or driving? Maybe it knows I’m home and what’s in my refrigerator, and can suggest a recipe…does it know my food/taste preferences? How about cost preferences? Does it have the history of what I have eaten recently, and knows how much variety I’d like? Maybe it should tell me something like “Your wife is at Whole Foods, would you like me to text her a request or call her for you?” It’s easy to see how these voice assistants could really be quite helpful the more it knows about you. But with multiple assistants in different products and locations, it wouldn’t be as complete. In this example it might know I’m home, but NOT know what’s in my fridge. Or it might know what’s in the fridge and know I’m home but NOT know my wife is currently shopping at Whole Foods, etc.

The more I use my assistant across more devices in more situations and over more time, the more data it could gather and the better it should get at servicing my needs and assisting me! It’s easy to see that once it knows me well and is helping me with this knowledge it will get VERY sticky and become difficult to get me to switch to a new assistant that doesn’t know me as well.

2. Entertainment and other service package sales.
Alexa came onto the scene in 2014 with one very special domain – Music. Amazon chose to do one thing really well, and that was make a speaker that could accept voice commands for playing songs, albums, bands, radio. Not long after that Alexa added new domains and moved into new platforms like Fire TV and the Fire stick controller. It’s no coincidence that an Amazon Music service and Amazon TV services both exist and you can wrap even more services into an Amazon Prime membership. When Assistants don’t support Spotify well, there are a lot of complaints. And it’s no surprise that Spotify has been reported to be developing their own assistant and speaker. In fact Comcast has their own voice control remotes. There’s a very close tie between the voice assistants and the services that they bring. Apple is restrictive in what Siri will allow you to listen for. They want to keep you within their eco-system where they make more money. (Maybe it’s this locked in eco-system that has given Apple a more relaxed schedule in improving Siri?). Amazon and Google are really not that different, although they may have different means of leading us to the services they want us to use, they still can influence our choices for media. Spotify has over 70M subscribers (20M paying), over 5 Billion in revenues and recently went public with about a $30B market cap…and Apple Music just overtook Spotify in terms of paying subscribers. Music streaming has turned the music industry into a growth business again. The market for video services is even bigger, and Amazon is one of the top content producers of video! Your assistant will have a lot of influence on the services you choose and how accessible they are. This is one reason why voice assistant providers might be willing to lose money in getting the assistants out to the market, so they can make more money on services. The battle of Voice Assistants is really a battle of who controls your media and your purchases!

3. Selling and recommending products to consumers
The biggest business in the world is selling products. It’s helped make Amazon, Google and Apple the giants that they are today. Google makes the money on advertising, which is an indirect form of selling products. What if your assistant knew what you needed whenever you needed it? It would uproot the entire advertising industry. Amazon has the ability to pull this off. They have the world’s largest online store, they know our purchase histories, they have an awesome rating system that really works, and they have Alexa listening everywhere willing to take our orders. Because assistants use a voice interface, there will be a much more serial approach to making recommendations and selling me things. For example, if I do a text search on a device for nearby vegan restaurants, I see a map with a whole lot of choices and long list of options. Typically these options could include side bars of advertising or “sponsored” restaurants first in the listing, but I’m supplied a long list. If I do a voice search on a smart speaker with no display, it will be awkward to give me more than a few results…and I’ll bet the results we hear will become the “sponsored” restaurants and products.

It would be really obnoxious if Alexa or Siri or Cortana or Google Assistant suddenly suggested I buy something that I wasn’t interested in, but what if it knew what I needed? For example, it could track vitamin usage and ask if I want more before they run out, or it could know how frequently I wear out my shoes, and recommend a sale for my brand and my size, when I really needed them. The more my assistant knows me the better it can “advertise” and sell me in a way that’s NOT obnoxious but really helpful. And of course making extra money in the process!

Voice Assistant Battles, part one

July 25, 2018

I have spoken on a lot of “voice” oriented shows over the years, and it has been disappointing that there hasn’t been more discussion about the competition in the industry and what is driving the huge investments we see today. Because companies like Amazon and Google participate in and sponsor these shows, there is a tendency to avoid the more controversial aspects of the industry. I wrote this blog to share some of my thoughts on what is driving the competition, why the voice assistant space is so strategically important to companies, and some of the challenges resulting from the voice assistant battles

In September of 2017 it was widely reported that Amazon had over 5000 employees working on Alexa with more than 1000 more to be hired. To use a nice round and conservative number, let’s assume an average Alexa employee’s fully weighted cost to Amazon is $200K. With about 6,000 employees on the Alexa team today, that would mean a $1.2 billion investment. Of course, some of this is recouped by the Echo’s and Dot’s bringing in profits, but when you consider that Dots sell for $30-$50 and Echos at $80-$100, it’s hard to imagine a high enough profit to justify the investment through hardware sales. For example, if Amazon can sell 30 million Alexa devices and make an average of $30 per unit profit, that only covers 75% of the cost of the conservative $1.2 billion investment.

Other evidence supporting the huge investments being made in voice assistants is the battle in advertising. Probably the most talked about thing at 2018’s CES show was the enormous position Google took in advertising the Google Assistant. In fact, if you watch any of the most expensive advertising slots on TV (SuperBowl, NBA finals, World Cup, etc.) you will see a preponderance of advertisements with known actors and athletes saying “Hey Google,” “Alexa,” or, “Hey Siri.” (Being in the wakeword business, I particularly like the Kevin Durant “Yo Google” ad!)

And it’s not just the US giants that are investing big into assistants: Docomo, Baidu, Tencent, Alibaba, Naver, and other large international players are developing their own or working with 3rd party assistants.

So what is driving this huge investment companies are making? It’s a multitude of factors including:

  1. Owning the cross platform user experience and collecting user data
  2. Entertainment and other service package sales
  3. Selling and recommending products to consumers

In my next blog, I’ll discuss these three factors in more detail, and in a final blog on this topic I will discuss the challenges being faced by consumer OEMs and service providers that must play in the voice assistant game to not lose out to service and hardware competition from Apple, Amazon, Google, and others.

Smart speakers coming from all over

October 12, 2017

Amazon, Google, Sonos, and LINE all introduced smart speakers within a few weeks of each other. Here’s my quick take and commentary on those announcements.Amazon now has the new Echo, the old Echo, the Echo Plus, Spot, Dot, Show, and Look. The company is improving quality, adding incremental features, lowering cost, and seemingly expanding its leadership position. They make great products for consumers, have a very strong eco-system, and make very tough products to compete with for both their competitors and their many platform partners that use Alexa. Seems that their branding strategy is to use short three- or four-letter names that have Os. The biggest thing that was missing was speaker identification to know who’s talking to it. Interestingly, Amazon just added that capability.

Google execs wore black shirts and jeans in a very ironic-seeming Steve Jobs fashion. They attacked the Amazon Dot with their Mini, and announced the Max to compete with the quality expectations of Sonos and Apple. I didn’t find much innovation in the product line or in their dress, but I’d still rank the Google Assistant as the most capable assistant I’ve used. Of course, Google got caught stealing data, so it makes sense they have more knowledge about us and can make a better assistant.

Sonos invented the Wi-Fi speaker market and has always been known for quality. They announced the Sonos One at a surprisingly aggressive $199 price point. Their unique play is to support Alexa, Assistant, and Siri, starting first with Alexa. Now this would put price pressure on Apple’s planned $349 HomePod, but my guess is that Apple will aggressively sell this into its captive, and demographically wealthy market before they allow Sonos to incorporate Siri. Like Apple, Sonos will have a nice edge in being able to sell into its existing customer base who will certainly want the added convenience and capability of voice control, with their choice of assistant.

American readers might be familiar with LINE, but the company offers a hugely popular communications app that’s been downloaded by about a billion people. They’re big in Japan and owned by Naver, an even bigger Korean company that’s also working on a smart speaker.

Most notable about LINE (besides the unique looking speaker that resembles a cone with the top cut off) is that it appears that they’re not only beating Amazon, Google, Apple, and Sonos to Japan, but they’re also getting there before the Japanese giants like Docomo, Sony, Sharp, and Softbank. And all of these companies are making smart speakers.

Then, there are the Chinese giants who are all making smart speakers, and the old-school speaker companies who are trying to get into the game. It’s going to be crowded very quickly, and I’m very excited to see quality going up and costs staying low.

Here’s what’s next for always listening devices

August 28, 2017

Ten years ago, I tried to explain to friends and family that my company Sensory was working on a solution that would allow IoT devices to always be “on” and listening for a key wake up word without “false firing” and doing it at ultra-low power and with very little processing power. Generally, the response was “Huh?”

Today, I say, “Just like Hey Siri, OK Google, Alexa, Hey Cortana, and so on.” Now, everybody gets it and the technology is mainstream. In fact, next year, Sensory will have technology that’s embedded in IoT devices that listens all those things (and more). But that’s not good enough.

Here are some of the things that will be appearing over the next 10 (or more) years to make always listening better and different:

  1. Assistants that see. I hate it when I say OK Google to my Home and my phone responds. Or worse, when a device false fires and I left the volume on really loud. Many of these devices will be getting vision in the coming years (Amazon’s Echo Look already does) and their ability to see what device I’m talking to will make it easier for them to respond from the correct device.
  2. No wake words. In a room with multiple people, we sometimes direct questions by saying the name of the person we want to talk to first. But we don’t do this when we are having a dialog back and forth, and we certainly don’t do it if there’s just one person in the room. Our Assistants should respond to questions without having their names said.
  3. Multiple assistants on single devices. Why can’t I have a device that I can shop on with Alexa, search info with Google, or control my appliances with Bixby? Amazon should be fine with that but Google wouldn’t. Certain cloud assistants will allow it and others won’t, and we’ll start seeing products that combine multiple assistants into one product. This could create some strange and interesting bedfellows.
  4. Portable assistants. I unplug my Echo and move it from room to room when I’m listening to music. I already have two Echos and one Home (and a few other Alexa devices) and I don’t want to buy one for every room. Why can’t I throw Google Home in my backpack for music while biking? What about an always on wearable assistant? This will require ultra-low power wake words that perform great.
  5. Privacy controls. The intelligent assistants’ capabilities are directly proportional to the privacy we’re willing give up. The better they know us, the better they can get us what we want. Today, we just sign our privacy away. In the future, there likely will be settings that we can control.
  6. Embedded always on assistants. Power consumption should be low enough that assistants can be embedded into our bodies for augmented intelligence, memory, and of course medical checkups. Within 20 years, our bodies will become enhanced with sensors (microphones, cameras, etc.), memory, and processors that augment our personal capabilities and are directly wired to our brains.

Staying Ahead with Advanced AI on Devices

June 8, 2017

Since the beginning, Sensory has been a pioneer in advancing AI technologies for consumer electronics. Not only did Sensory implement the first commercially successful speech recognition chip, but we also were first to bring biometrics to low cost chips, and speech recognition to Bluetooth devices. Perhaps what I am most proud of though, more than a decade ago Sensory introduced its TrulyHandsfree technology and showed the world that wakeup words could really work in real devices, getting around the false accept and false reject, and power consumption issues that had plagued the industry. No longer did speech recognition devices require button presses…and it caught on quickly!

Let me go on boasting because I think Sensory has a few more claims to fame… Do you think Apple developed the first “Hey Siri” wake word? Did Google develop the first “OK Google” wake word? What about “Hey Cortana”? I believe Sensory developed these initial wake words, some as demos and some shipped in real products (like the Motorola MotoX smartphone and certain glasses). Even third-party Alexa and Cortana products today are running Sensory technology to wake up the Alexa cloud service.

Sensory’s roots are in neural nets and machine learning. I know everyone does that today, but it was quite out of favor when Sensory used machine learning to create a neural net speech recognition system in the 1990’s and 2000’s.  Today everyone and their brother is doing deep learning (yeah that’s tongue in cheek because my brother is doing it too! (http://www.cs.colorado.edu/~mozer/index.php). And a lot of these deep learning companies are huge multi-billion-dollar business or extremely well-funded startups.

So, can Sensory stay ahead now and continuing pioneering innovation in AI now that everyone is using machine learning and doing AI? Of course, the answer is yes!

Sensory is now doing computer vision with convolutional neural nets. We are coming out with deep learning noise models to improve speech recognition performance and accuracy, and are working on small TTS systems using deep learning approaches that help them sound lifelike. And of course, we have efforts in biometrics and natural language that also use deep learning.

We are starting to combine a lot of technologies together to show that embedded systems can be quite powerful. And because we have been around longer and thought through most of these implementations years before others, we have a nice portfolio of over 3 dozen patents covering these embedded AI implementations. Hand in hand with Sensory’s improvements in AI software, companies like ARM, NVidia, Intel, Qualcomm and others are investing and improving upon neural net chips that can perform parallel processing for specialized AI functions, so the world will continue seeing better and better AI offerings on “the edge”.

Curious about the kind of on-device AI we can create when combining a bunch of our technologies together? So were we! That’s why we created this demo that showcases Sensory’s natural language speech recognition, chatbots, text-to-speech, avatar lip-sync and animation technologies. It’s our goal to integrate biometrics and computer vision into this demo in the months ahead:

Let me know what you think of that! If you are a potential customer and we sign an NDA, we would be happy to send you an APK of this demo so you can try it yourself! For more information about this exciting demo, please check out the formal announcement we made: http://www.prnewswire.com/news-releases/sensory-brings-chatbot-and-avatar-technology-to-consumer-devices-and-apps-300470592.html

Assistant vs Alexa: 8 things not discussed (enough)

October 14, 2016

I watched Sundar and Rick and the team at Google announce all the great new products from Google. I’ve read a few reviews and comparisons with Alexa/Assistant and Echo/Home, but it struck me that there’s quite an overlap in the reports I’m reading and some of the more interesting things aren’t being discussed. Here are a few of them, roughly in increasing order of importance:

  1. John Denver. Did anybody notice that the Google Home advertisement using John Denver’s Country Road song? Really? Couldn’t they have found something better? Country Roads didn’t make PlayBuzz’s list of the 15 best “home” songs or Jambase’s top 10 Home Songs Couldn’t someone have Googled “best home songs” to find something better?
  2. Siri and Cortana. With all the buzz about Amazon vs. Google, I’m wondering what’s up with Siri and Cortana? Didn’t see much commentary on that.
  3. AI acquisitions. Anybody notice that Google acquired API.ai? API.ai always claimed to have the highest rated voice assistant in the playstore. They called it “Assistant.” Hm. Samsung just acquired VIV – that’s Adam, Dag, Marco, and company that were behind the original Siri. Samsung has known for a while that they couldn’t trust Google and they always wanted to keep a distance.
  4. Assistant is a philosophical change. Google’s original positioning for its voice services were that Siri and Cortana could be personal assistants, but Google was just about getting to the information fast, not about personalities or conversations. The name “assistant” implies this might be changing.
  5. Google: a marketing company? Seems like Google used to pride itself of being void of marketing. They had engineers. Who needs marketing? This thinking came through loud and clear in the naming of their voice recognizer. Was it Google Voice, Google Now, OK Google? Nobody new. This historical lack of marketing and market focus was probably harmful. It would be fatal in an era of moving more heavily into hardware. That’s probably why they brought on Rick Osterloh, who understands hardware and marketing. Rick, did you approve that John Denver song?
  6. Data. Deep learning is all about data. Data that’s representative and labeled is the key. Google has been collecting and classifying all sorts of data for a very long time. Google will have a huge leg up on data for speech recognition, dialogs, pictures, video, searching, etc. Amazon is relatively new to the voice game, and it is at quite a disadvantage in the data game.
  7. Shopping. The point of all these assistants isn’t about making our lives better; it’s about getting our money. Google and Amazon are businesses with a profit motive, right? Google is very good at getting advertising dollars through search. Amazon is, among other things, very good at getting shoppers money (and they probably have a good amount of shopping data). If Amazon knows our buying habits and preferences and has the review system to know what’s best, then who wants ads? Just ship me what I need and if you get it wrong, let me return it hassle free. I don’t blame Google for trying to diversify. The ad model is under attack by Amazon through Alexa, Dash, Echo, Dot, Tap, etc.
  8. Personalization, privacy, embedded. Sundar talked a bit about personalization. He’s absolutely right that this is the direction assistants need to move (even if speaker verification isn’t built into the first Home units). Personalization occurs by collecting a lot of data about each individual user – what you sound like, how you say things, what music you listen to, what you control in your house, etc. Sundar didn’t talk much about privacy, but if you read user commentary on these home devices, the top issue by far relates to an invasion of privacy, which directly goes against personalization. The more privacy you give up, the more personalization you get. Unless… What if your data isn’t going to the cloud? What if it’s stored on your device in your home? Then privacy is at less risk, but the benefits of personalization can still exist. Maybe this is why Google briefly hit on the Embedded Assistant! Google gets it. More of the smarts need to move onto the device to ensure more privacy!

Sensory Winning Awards

October 6, 2016

It’s always nice when Sensory wins an award. 2016 has been a special year for Sensory because we won more awards than any other year in our 23 year history!!

Check it out:

Sensory Earns Multiple Coveted Awards in 2016
Pioneering embedded speech and machine vision tech company receiving industry accolades

Sensory Inc., a Silicon Valley company that pioneered the hands-free voice wakeup word approach, today, announced it has won over half a dozen awards in 2016 across its product-line, including awards for products, technologies, and people, covering deep learning, biometric authentication and voice recognition.

The awards presented to Sensory include the following:
AIconics are the world’s only independently judged awards celebrating the drive, innovation and hard work in the international artificial intelligence community. Sensory was initially a finalist along with six other companies in the category of Best Innovation in Deep Learning, and judges determined Sensory to be the overall WINNER at an awards ceremony held in September 2016. The judging panel was comprised of 12 independent professionals spanning leaders in artificial intelligence R&D, academia, investments, journalists and analysts.

CTIA Super Mobility 2016™, the largest wireless event in America, announced more than 70 finalists for its 10th annual CTIA Emerging Technology (E-Tech) Awards. Sensory was nominated in the category of Mobile Security and Privacy for its TrulySecure™ technology, along with Nokia, Samsung, SAP, and others. Sensory was presented with the First Place award for the category in a ceremony on September 2016 at the CTIA Las Vegas event.

Speech Technology magazine, the leading provider of speech technology news and analysis, had its 10th annual Speech Industry Awards to recognize the creativity and notable achievements of key influencers (Luminaries), major innovators (Star Performers), and impressive deployments (Implementation Awards). The editors of Speech Technology magazine selected 2016 award winners based on their industry contributions during the past 12 months. Sensory’s CEO, Todd Mozer, was awarded with a Luminary Award, making it his second time winning the prestigious award. Sensory as a company was awarded the Star Performer award along with IBM, Amazon and others.

Two well-known industry analyst firms issued reports highlighting Sensory’s industry contributions for its TrulyHandsfree product and customer leadership, offering awards for innovations, customer deployment, and strategic leadership.

“Sensory has an incredibly talented team of speech recognition and biometrics experts dedicated to advancing the state-of-the-art of each respective field. We are pleased that our TrulyHandsfree, TrulySecure and TrulyNatural product lines are being recognized in so many categories, across the various industries in which we do business,” said Todd Mozer, CEO of Sensory. “I am also thrilled that Sensory’s research and innovations in the deep learning space has been noticed, generating our company prestigious accolades and management recognition.”

For more information about this announcement, Sensory or its technologies, please contact sales@sensory.com; Press inquiries: press@sensory.com

TrulySecure 2.0 Wins First Place in 2016 CTIA E-Tech Awards

September 9, 2016

Print

We are pleased to announce that Sensory’s TrulySecure technology has earned first place in this year’s CTIA E-Tech Awards. We believe that this recognition serves as a testament to Sensory’s devotion to developing the best embedded speech recognition and biometric security technologies available.

For those of you unfamiliar with TrulySecure – TrulySecure is the result of more than 20 years of Sensory’s industry leading and award-winning experience in the biometric space. The TrulySecure SDK allows application developers concerned about both security and convenience to quickly and easily deploy a multimodal voice and vision authentication solution for mobile phones, tablets, and PCs. TrulySecure is highly secure, environment robust, and user friendly – offering better protection and greater convenience than passwords, PINs, fingerprint readers and other biometric scanners. TrulySecure offers the industry’s best accuracy at recognizing the right user, while keeping unauthorized users out. Sensory’s advanced deep learning neural networks are fine tuned to provide verified users with instant access to protected apps and services, without the all too common false rejections of the right user associated with other biometric authentication methods. TrulySecure features a quick and easy enrollment process – capturing voice and face simultaneously in a few seconds. Authentication is on-device and almost instantaneous.

TrulySecure provides maximum security against unauthorized attempts by mobile identity thieves from breaking into a protected mobile device, while ensuring the most accurate verification rates for the actual user. Compared to published data by Apple, the iPhone’s thumbprint reader offers about in 1:50K chance of a false accept of the wrong user, and the probability of the wrong user getting into the device gets higher when the user enrolls more than one finger. With TrulySecure, face and voice biometrics individually offer a baseline 1:50k false accept rate, but can each be made more secure depending on the security needs of the developer. When both face and voice biometrics are required for user authentication, TrulySecure is virtually impenetrable by anybody but the actual user. As a baseline, TrulySecure’s face+voice authentication offers a baseline of 1:100k False Accept Rate, but can be dialed in to offer as much as a 1:1Million False Accept Rate depending on security needs.

TrulySecure is robust to environmental challenges such as low light or high noise – it works in real-life situations that render lesser offerings useless. The proprietary speaker verification, face recognition, and biometric fusion algorithms leverage Sensory’s deep strength in speech processing, computer vision, and machine learning to continually make the user experience faster, more accurate, and more secure. The more the user uses TrulySecure, the more secure it gets.

TrulySecure offers ease-of-mind specifications: no special hardware is required – the solution uses standard microphones and cameras universally installed on today’s phones, tablets and PCs. All processing and encryption is done on-device, so personal data remains secure – no personally identifiable data is sent to the cloud. TrulySecure was also the first biometric fusion technology to be FIDO UAF Certified.

While we are truly honored to be the recipient of this prestigious award, we won’t rest on our laurels. Our engineers are already working on the next generation of TrulySecure, further improving accuracy and security, as well as refining the already excellent user experience.

Guest blog by Michael Farino

« Older Entries