Robotic Speech

Last weekend I helped my daughter Samantha create a Halloween costume. Actually it was 2 costumes, because she wanted one for her friend also. They wanted to be robots this year. I took a couple of old cardboard boxes, cut out holes for arms and legs, attached old circuit boards and switches to the sides, and put pieces of dryer vent hose into the arm holes. Then I painted the whole thing silver.

It looked pretty good, so good that my 4-year old son Sam put it on. His arms didn’t make it to the end of the makeshift sleeves and his head barely popped out the top, but he came walking into the kitchen wearing it and said in a monotonic ‘robot voice’: “I am a robot. I will destroy you.”

We all had a good laugh over that, but I wondered how he had learned what a robot sounds like and what they say. I guess that’s the power of the media. Interestingly though, the media has it all wrong. Speech output technologies even in their infancy never sounded like monotone robots.

Speech compression schemes digitize a real waveform and compress the data, which makes it increasingly unnatural and distorted as the compression rates drop, but it never becomes monotone as the inflections are still maintained. Likewise, approaches to TTS (text-to-speech) have never been robotic and monotonic. The early DecTalk and formant synthesis approaches sounded more like someone with an intoxicated Swedish accent than the traditional bot talk, and today, TTS and speech compression techniques sound close to perfect.

On the other hand, where the media has made speech output worse in robots, they have done the opposite for speech recognition. The media portrays robotic recognition as flawless. The Star Trek computer or the Lost in Space Robot never said “What did you say? I can’t understand, please repeat. Take me to a quieter environment.”

Speaking of robots, I just spoke at Robo Development 2007 and kicked off my speech by telling the story above. My favorite part of the show, however, wasn’t all the interesting people I met during my talk; it was walking through the exhibit space. I was very impressed with Hanson Robotic’s Zeno Robot. As I spoke with David Hanson, he looked over at my name badge and said “Oh Sensory, we’re using both your FluentSoft and your FluentChip technologies!”

It’s always fun when I’m not expecting it to meet a cool new application that uses Sensory technology.