VoiceXML - Programming Page

		Home Download Install Demo Programming VoiceXML Speech Interfaces Technical

Brainhat VoiceXML

Contact Us

Kevin Dowd
Brainhat
111 Founders Plaza
13th Floor
East Hartford, CT 06108
(860) 291-0851
dowd@brainhat.com

VoiceXML

VoiceXML is a standard proposed by the VoiceXML Forum, with sponsorship by IBM, Motorola, Lucent, AT&T and others. VoiceXML extends HTML to provide for "voice markup" so that one may interact with Internet servers via a "voice browser." The idea is, in part, that you could take the web with you, and perhaps check your mail on the road. The potential is, fortunately, much more than that. One could imagine having a random conversation with a computer somewhere across the web, perhaps for technical support or to engage in computer generated prayer (ha!).

There are a few VoiceXML-based voice browsers available. And there are a handful of voice portals that will allow you to dial into the VoiceXML web.

VoiceXML is targeted for menus, forms and limited scope dialogs. When you pull down a page of VoiceXML markup, you also pull down the context free grammars associated with each form or menu on the page. A grammar describes all of the possible combinations of words the voice browser may accept at a given point. Having the possibilities pre-specified has utility because it reduces the speech recognition errors one might get using an anything-goes dictation grammar. A limited grammar, on the other hand, hobbles Brainhat's ability for senseless confabulation. I've tried to put together grammars that cover most of what a reasonable person might really say in a given context. This comes at the expense of being able to correctly field foolishness like "does the princess want to have sex with me?" blurted out in the midst of ordering a pizza. At the same time, the broader grammars I have used can reduce the recognition rate, sometimes frustratingly. I am still tuning this.

Talking to the Brainhat VoiceXML Server

Brainhat generates VoiceXML dynamically. A conversation starts with an initial HTTP "get" of index.vxml directed to the Brainhat server on port 8080.

http://hostname:8080/

Brainhat says "hello" to start. The user responds similarly. Subsequent interaction is redirected to another port dedicated to this one particular intercourse. The daemon remains active and stateful, awaiting the user's return or, failing that, an eventual time-out. I chose to make the daemon stateful and persistent so that the VoiceXML version of Brainhat could support all of the back-end interfaces supported by the other flavors of Brainhat. Particularly, one may interact with robots and processes on the far side of the daemon. The downside is that a dedicated copy of the daemon consumes memory resources, and so limits the number of sessions Brainhat can handle concurrently. A stateless version of the daemon is possible, but not yet available.

Saying "goodbye" to Brainhat will cause the daemon to exit before the time-out occurs. (Saying goodbye would be a nice gesture on your part.)

Setting up Brainhat as a VoiceXML Server

The operating directory for the Brainhat VoiceXML daemon is /usr/local/etc/brainhat. File brainhat.init, located in the directory where you invoke the daemon, should include the text of any scenarios with which you wish to prime the daemon. The grammar for the conversation should be located in the file /brainhat.gram, served from the same machine via an HTTP daemon, listening on port 80. The contents of the grammar will depend on the utterances you hope the speech engine will recognize. If you look at the sample included with the distribution, you will be able to modify it to meet your needs.

Assuming that the distribution is in /usr/local/etc/brainhat and that /usr/local/etc/brainhat/brainhat.init contains the scenario data with which you wish to prime the daemon, invoke with:

cd /usr/local/etc/brainhat
./comp ./data/data9 -h &

Note that the data file, data9 needs to be pre-processed if you make changes to words, input-patterns or other files in the data directory. You can do this by running simplecpp:

./simplcpp < data/data9.in > data/data9

VoiceXML Demo

The following URL will connect you to Brainhat server for a VoiceXML session. Again, you need a working VoiceXML browser (supporting the Nuance grammar specification). I ask that you don't use this server to debug your installation because sessions take up resources. However, you are welcome to use the site to demonstrate VoiceXML to others.

http://vxml.brainhat.com:8080/index.xml

The current scenario is The Statue with the Missing Head.

You may also try reaching Brainhat through a Voice Portal. At this time, Brainhat is tuned for Voxeo's voice portal. Contact Voxeo for details about their developer's program.

The program has also worked with VoiceGenie and TellMe's portals. I have found differences in the way different portals treat grammars. Accordingly, what works on one may not work on another. I have tried to make Brainhat accommodate the differences by adjusting the grammar it returns as a function of the portal it detects, but sometimes we miss.