HEAR ME - Speech Blog  |  Read more October 1, 2015 - Sensory’s CEO, Todd Mozer, interviewed on FutureTalk


  • Huawei also created this cool feature to help you find your phone more quickly. It’s called voice wake up, and you can ask your phone “Where are you?” or some other phrase, and your phone will respond, saying, “I’m here,” and play music until you find it.

    Malarie Gokey,
  • So if the cloud’s not private, how can your TV respond to voice commands? Simple. Use speech-recognition services that are baked right into the TV – no cloud required.

    Ted Kritsonis, Digital Trends
  • Of particular interest is the fact that TrulySecure is an on-device biometric identification system that does not rely on a connection to the cloud. Many users prefer this approach because they do not wish for their biometric data to be replicated and stored outside of their personal devices.

    Max Maxfield, EE Times
  • TrulySecure works by watching and listening as you repeat a passphrase a couple times. The system tracks the way your lips move and registers the unique attributes of your voice.

    Josh Ong, The Next Web
  • Given Qualcomm’s prominence as a mobile technology developer and the technological advancement on display in its latest offerings, the partnership reflects very well on the confidence the company has in Sensory’s technology.

    Alex Perala, Mobile ID World
  • With touchless control, Motorola and Google upped the ante.

    Eric Mack, CNET
  • Speech recognition company Sensory is expanding into the computer vision space with a new smartphone security client that uses both voice and face recognition to lock down your phone.

    Kevin Fitchard, Gigaom
  • MotoX is a fantastic phone with many great features. My favorite is Touchless Control… Ask it the weather, to call a friend or do a Google search, and it'll just do it, and you never have to touch the phone.

    Pete Pachal, Mashable
  • Sensory is continuing to exhibit leadership in handsfree control by allowing a secure multimodal biometric that doesn’t require touching devices to make them work.

    Dan Miller, Opus Research
  • Touchless the most useful feature [on Moto X].

    David Pogue, NY Times
  • The defining feature of the Moto X is it’s a virtual ear, always straining to hear its owner’s voice say three magic words that will rouse it to action: "Okay, Google Now."

    Steven Levy, WIRED
  • The phone [Moto X] has all the standard features expected of today’s top smartphone, with a twist: the ability to control the phone by talking to it, without lifting a finger.

    The New York Times
  • The voice-response system, called BlueGenie is surprisingly accurate for such a small device. It's better than the voice system in my Blackberry phone.

    US News and World Report
  • [BlueGenie is] an intuitive voice control system...the finest voice recognition user interface we've seen.

    Good Gear Guide
  • Sensory is trying to revolutionize voice and speech recognition by creating TrulyHandsfree, which looks to evolve our interactions with our smart devices.

    Talk Android
  • It may not seem like much, but that little detail of getting the phone to wake up via a voice command - which Sensory calls ‘TrulyHandsfree’ - is one of the trickiest.

  • With its dual biometric factors, AppLock comes closer to the security-and-convenience ideal than I've ever seen.

    Mike Feibus, usatoday


Sensory offers world class speech and vision technologies that can be embedded into mobile and other consumer electronic products. The technologies listed below can be implemented across a variety of operating systems and DSPs.


Small vocabulary command and control
Multi-biometric authentication
Embedded large vocabulary continuous speech recognition


NLP-5x Natural Language Processor with FluentChip™ 5 Technology
RSC-4x Family with FluentChip™ 3

Audio and Speech Technologies

Speech Recognition
Phrase Spotting

TN-BG-NLP-iconTrulyHandsfree™ Voice Control Phrase spotting of multiple commands or key words embedded in speech allows the TrulyNatural, BlueGenie and NLP-5x to continuously listen for triggers or commands, even in the presence of high noise. The number of commands depends on the power of the processor. In phrase spotting mode, the word(s) to be recognized may be spoken in the middle of speech. Truly Hands-Free™ triggers can be used to alert the recognizer to listen for commands that follow for product control.

TrulyHandsfree™ Voice Control 3.0

Speaker Verification

TN-NLP-RSC-iconVoice Biometrics – Speaker Identification
Speaker Verification offers the capability to verify whether or not a password is spoken by the original individual who enrolled it. The user trains 1-4 passwords (the more passwords, the better the security) that can create voice access to any product. Equal error rates (where the probability of an incorrect acceptance equals that of an incorrect rejection) ranges between 0.01-7% depending on the number of words and whether the passwords are known to the imposter.

On the NLP-5x, up to 10 SV templates can be stored on-chip. The RSC-4x can store 5 SV templates on-chip. With external memory, the number of unique sets for both chips is limited only by programmable memory capacity.

TrulyNatural offers Speaker Verification as well as Speaker Identification where a user can be identified from an enrolled group – great for personalizing products!

Language Coverage

TN-BG-NLP-RSC-iconWide and Ever Expanding Coverage!
Sensory’s speech recognition technologies currently support a wide range of languages covering many countries/regions all over the world.

 Language Coverage Map

We are continuously working to expand our language and country support.
Check back frequently for updated coverage or contact our Sales Department at (971)256-0056 for more information about our language offerings.

Natural Language Interface

TN-NLP-iconFlexible Grammars!
Sensory’s TrulyNatural and the NLP-5x provides the unique ability to understand context-specific user’s commands in the natural way the user would like to speak. Order independence allows flexibility in commands and speech prompts can request any missing information (form filling). Revolutionary flexible grammars allow the user to say multiple commands in a single phrase, and even in a flexible order. This results in the most natural use of speech recognition!

Speaker Independent w/T2SI™

TN-BG-NLP-RSC-iconNo Training Required!
Unspotted Speaker-Independent (SI) works right out of the box, and requires no end-user training. SI technology is designed for a specific language, and can handle thousands of words in a single set for TrulyNatural, 40 words in a single set for the FluentChip/RSC-4x combo, and 75 words in a set for the FluentChip 5/NLP-5x combo. The number of sets is limited only by the amount of memory in your system. With proper design, Sensory’s SI technology will yield highly accurate recognition. Sensory’s Quick T2SI (text-to-SI) is the first ever GUI tool to allow product designers to create their own speaker-independent set and execute recognition within minutes on chip!

Speaker Dependent Speech Recognition

5Flexible vocabulary, any language, any accent
Speaker dependent (SD) recognition is desirable where user-specific or language-specific vocabularies are required. Each recognition word is trained just once by the user to create voice “templates”, each of which requires up to 200 bytes of memory (which can be on-chip or external). Vocabularies in excess of 100 words are possible, although there are often practical reasons for keeping recognition sets under 50 words. The NLP-5x can store up to 10 SD templates in on-chip SRAM. The RSC-4128 can store up to 7 SD templates in on-chip SRAM. With proper design, Sensory’s SD technology can yield highly accurate recognition for any user, regardless of language or accent.

Continuous Digits

TN-BG-NLP-RSC-iconFor entering phone numbers and digit strings
This technology is ideally suited for voice dialing applications such as mobile phones, handsets and hands-free kits. It can also be used anywhere that a string of digits are used for recognition.

Text to Speech Synthesis (TTS) with Voice Morphing

TN-BG-NLP-iconText-based Speech Playback
Text-to-speech (TTS) is supported for systems requiring text-based speech playback, and requires as little as 270KB of external memory. TTS works well for names or text/phrase reading and is supported in multiple languages.

Stereo MP3 Decoder

NLP-5xHi-Fidelity Stereo MP3 Decoder with all standard bitrates and a 5-band equalizer.

Mono or Stereo Music

6Sensory’s music synthesis technology can produce up to 24 stereo voices simultaneously at a sample rate of 32K samples-per-second on the NLP-5x. The RSC-4x family supports up to 8 mono voices simultaneously at 8KHz sample rate. The music can be played through the on-chip stereo DAC or mono PWM (Pulse Width Modulator). Speech or sound effects encoded with SX, PCM, or ADPCM can be mixed in with the music. The music synthesis technology can “play” MIDI files that are stored in on-chip or off-chip memory, such as serial flash. MIDI files are a memory-efficient way to store music. The music synthesizer requires a database of instrument audio samples which is typically stored in external parallel flash memory. Sensory currently offers a database of instrument samples for a wide variety of common instruments from the General MIDI melodic instrument set, plus the complete General MIDI percussion set.

Speech Synthesis

TN-BG-NLP-RSC-AUDIO-iconPerfect for Voice Prompts and/or Speech Output
High quality speech and sound effects can be played back by Sensory’s IC’s and software products. Sensory’s compression technology utilizes proprietary time and frequency-domain approaches that can compress speech and sound effects to as little as 1000 bits per second. Speech output creates the opportunity for natural dialog with a product and can reduce reliance on an instruction manual.

Record and Playback

5Store messages and play back–voice messaging capability
Compressed digital sound reproduction
Sensory’s RSC-4x and NLP-5x processors can record audio to off-chip RAM or Flash at data rates of under 30k bits per second for custom greetings, phones and answering machines, voice pitch changers, and hand-held recording devices. On-chip compression levels can be varied depending on the quantity and quality of playback desired. Automatic silence removal can also be done to reduce memory requirements. The NLP-5x offers 8k and 16k bit samples per second while the RSC-4x family offers 8k samples per second. The NLP-5x signal processing provides superior voice quality.

Interactive / Robotic
Natural TimeSet

Set digital time clocks using natural phrases.

Natural TimeSet Demo

LCD Control

NLP-5xLCD control logic and drive – up to 104 icons or pixels. SPI for large array driver interfaces.

Silent SonicNet

NLP-5xSilent SonicNet communicates data via encoded sound at 14KHz or 18KHz in short bursts on the NLP-5x. These high frequencies make the short bursts essentially inaudible in practical application. Silent SonicNet can run conincident with SX or T2SI, allowing data transmission during VR dialogues. Products with integrated speech that already include an NLP-5x, microphone and speaker can implement this at no additional cost, and can interact with each other, potentially doubling demand.


Communicates data at 8KHz via encoded sound in short bursts on the RSC-4x. SonicNet can run coincident with SX to partially mask the sonic tones. Products with integrated speech that already include an RSC-4x, microphone and speaker can implement this at no additional cost, and can interact with each other, potentially doubling demand.

SonicNet Demo

Interactive Multimedia Windows Media Demo requires RSC-4x Demo/Eval Board

System Communications

NLP-5xUSB1.1, SPI, UART-Lite, I2S and infrared (IR) interfaces combine with voice user interface capabilities, enabling man-machine interface solutions with an unprecedented combination of power and cost-effectiveness.

Real-Time LipSync

5Allows a product to match robotic mouth movements to speech heard in real time, much like a ventriloquist dummy.

Real-Time LipSync Demo

Beat Prediction

5Chip figures out the recurring beat to know how to act moving forward-great for dancing and motion oriented products.

Beat Prediction Demo


5Allows for a product to match robotic mouth movements to pre-recorded speech.

LipSync Demo

Peak Detection

5Picking up the amplitude of different sounds in the room as they occur and reacting to them with a movement or display function.

Pitch Detection

5A human pitched voice can be analyzed by the RSC processor to figure out the pitches being sung.

Sing Back

5Combining talkback and pitch detection allows a robotic creature or avatar to imitate a person singing.

See the Windows Media Demo

This demo features a dog barking and matching pitches from a human voice.

See the Windows Media Demo


Sound Sourcing

5Adding a second microphone allows the NLP-5x or RSC-4x processor to locate the direction of a human voice.

SoundSource Demo

Talk Back

5The RSC can produce speech in response to your talking or inquiries that appears to be conversational speech from a non-human creature.

Natural Radio Tuning

NLP-5xSet radio stations using natural phrases on the NLP-5x.

Sensory Interfacing

NLP-5xSensory and 3rd party developers provide support for presence detection, touch and position sensors, gesture and motion analysis, etc. USB1.1, SPI, UART-Lite, I2S and infrared (IR) interfaces combine with voice user interface capabilities, enabling man-machine interface solutions with an unprecedented combination of power and cost-effectiveness.

Voice Recognition for BlueTooth Products
BlueGenie™ Voice Interface

BlueGenieSpeech Recognition and TTS for Headsets, Music Players, Hands-Free Kits & More
Sensory’s BlueGenie Voice Interface software suite runs on CSR’s BC-5 MM Kalimba DSP, and enables manufacturers of Bluetooth products to integrate full voice control and synthetic speech output without the need for visual displays or complex user interfacing. It frees designers to pack functionality onto small form factor Bluetooth devices and answers consumer demand for a truly hands-free experience. TTS allows Caller ID announcement and SMS message playback with speech.

BlueGenie Voice User Interface Demo