Share This Post:

VoiceXML Voice Recognition

As I understand the Voicent Gateway is a voiceXML gateway server, are there any limitations on grammar for voice response recognition? We are planning to implement a menuing system for outbound calls and then process the spoken voice responses. Does the voiceXML specification address translation of the audio back to text? We’re looking to capture key call [spoken] confirmation data, and then convert it to text. Does the “voiceXML” specification address this? Are there limitations and/or recommendations regarding this?

Is there a counterpart to the TTS engine, to convert audio to text? Or, is this built into the voicentXML gateway? Regarding voice recognition, do you offer any voice to text conversion utilities?

One of the key advantage of using a VoiceXML gateway is the automatic handling of speech command. To implement a menu system using spoken words, all you need to specify is a grammar in the VoiceXML file. Everything else is taken care of by the gateway. The spoken command/response will be automatically processed and the recognized text is returned. The speech recognition engine is embedded in Voicent Gateway.

The difficulty of using spoken command is the recognition accuracy, especally using the phone audio that is compressed. If you have used some of the applications using voice recognition, you know you can get frustrated by false recognition. Notably, to recognize the English alphabet over the phone is not an easy task. Even for human phone conversation, you have to say “B as in Boy, P as in Peter” to clarify. In addition, unlike a desktop application, the recognition engine cannot be trained for one particular person since it has to handle different people with all kinds of accent.

If you do need to use speech recognition, try to make the spoken commands easy to distinguish for the engine. For example, instead of asking a response for “yes/no”, asking for “go to next step/no”.

Share This Post:

This entry was posted in Developer. Bookmark the permalink.