A Busy Summer for Speech Recognition on the Edge

In August Samsung hosted their Galaxy UNPACKED 2021 event. Amid a dizzying array of new gadgets like folding phones, designer collaborations, body fat tracking smartwatches and colorful amped up earbuds, there was barely enough time for a major Bixby announcement. However, they did manage squeeze in a Bixby update during the unveiling the Z Flip 3 and just before the obligatory BTS video. You have to listen carefully, and at right around the 40-minute mark they state, “Bixby is now 35% faster than before because it detects and processes your voice, all on-device, without having to go through the server.” And that’s it. Dedicating one sentence to the Bixby voice assistant, during an hour-long presentation, doesn’t seem like much, but to Sensory, there is a lot to unpack in that statement.

Sensory has always been a proponent of AI on the edge. Each of our core technologies; TrulyHandsfree, TrulyNatural and TrulySecure supports edge processing. We frequently evangelize the benefits of edge vs. cloud-based processing for speech recognition. Benefits which include, faster speed, lower cost and most importantly privacy. Samsung chose to focus their marketing message on the speed boost of not having to process speech in the cloud, but the lower cost and increased privacy aspects are still relevant.

It is interesting to note that Apple also made a similar announcement earlier this year. During Apple’s Word Wide Developer Conference 2021, they shared that Siri had a “major update to privacy.” All Siri audio by default is processed right on your iPhone or iPad. Apple went on to say that, “This addresses the biggest privacy concern we hear for voice assistants, which is unwanted audio recording.” Apple’s marketing message was clearly focused on privacy, but they did also take some time to demonstrate the performance speed up with on-device speech recognition.

Somewhat related on the privacy front, was the announcement that Zoom launched voice control on the Zoom Rooms platform. Working with Sensory, Zoom enabled the convenience of voice control while maintaining 100% user privacy. This was accomplished with a combination of Sensory’s TrulyHandsfree wake word detector and TrulyNatural large vocabulary speech recognizer on the edge.

Within the course of just a few months, two of the biggest mobile phone companies and the largest video conference company announced that their respective voice solutions would be processing voice requests on device. Kudos to Samsung and Apple for making the mobile move to the edge. However, these changes seem to be limited to their mobile phones. There was no mention of either company moving speech recognition to the edge for smart home appliances like the Samsung Family Hub Refrigerator or smart speakers like the Apple HomePod mini. It will be interesting to see in the coming months; will Samsung and Apple move all of their smart home products to speech recognition on the edge. Even more interesting will be if Amazon and Google also make moves for their smart speakers. As demonstrated by Samsung, Apple and Sensory, faster processing time and increased user privacy sure seem to make a compelling case for AI on the edge and speech recognition.