VoiceXML is a
standard proposed by the VoiceXML
Forum, with sponsorship by IBM, Motorola, Lucent, AT&T and
others. VoiceXML extends HTML to provide for "voice
markup" so that one may interact with Internet servers via a
"voice browser." The idea is, in part, that you could
take the web with you, and perhaps check your mail on the road.
The potential is, fortunately, much more than that. One could
imagine having a random conversation with a computer somewhere
across the web, perhaps for technical support or to engage in
computer generated prayer (ha!).
There are a few VoiceXML-based
voice browsers available. And there are a handful of voice portals
that will allow you to dial into the VoiceXML web.
VoiceXML is targeted for menus,
forms and limited scope dialogs. When you pull down a page of
VoiceXML markup, you also pull down the context free grammars
associated with each form or menu on the page. A grammar describes
all of the possible combinations of words the voice browser may
accept at a given point. Having the possibilities pre-specified
has utility because it reduces the speech recognition errors one
might get using an anything-goes dictation grammar. A
limited grammar, on the other hand, hobbles Brainhat's ability for
senseless confabulation. I've tried to put together grammars that
cover most of what a reasonable person might really say in a given
context. This comes at the expense of being able to correctly
field foolishness like "does the princess want to have sex
with me?" blurted out in the midst of ordering a pizza. At
the same time, the broader grammars I have used can reduce the
recognition rate, sometimes frustratingly. I am still tuning this.
Talking to the Brainhat
Brainhat generates VoiceXML
dynamically. A conversation starts with an initial HTTP
"get" of index.vxml directed to the Brainhat
server on port 8080.
Brainhat says "hello" to
start. The user responds similarly. Subsequent interaction is
redirected to another port dedicated to this one particular
intercourse. The daemon remains active and stateful, awaiting the
user's return or, failing that, an eventual time-out. I chose to
make the daemon stateful and persistent so that the VoiceXML
version of Brainhat could support all of the back-end interfaces
supported by the other flavors of Brainhat. Particularly, one may
interact with robots and processes on the far side of the daemon.
The downside is that a dedicated copy of the daemon consumes
memory resources, and so limits the number of sessions Brainhat
can handle concurrently. A stateless version of the daemon is
possible, but not yet available.
Saying "goodbye" to
Brainhat will cause the daemon to exit before the time-out occurs.
(Saying goodbye would be a nice gesture on your part.)
Setting up Brainhat as a VoiceXML
The operating directory for the
Brainhat VoiceXML daemon is /usr/local/etc/brainhat. File
brainhat.init, located in the directory where you invoke
the daemon, should include the text of any scenarios with which
you wish to prime the daemon. The grammar for the conversation
should be located in the file /brainhat.gram, served from
the same machine via an HTTP daemon, listening on port 80. The
contents of the grammar will depend on the utterances you hope the
speech engine will recognize. If you look at the sample included
with the distribution, you will be able to modify it to meet your
Assuming that the distribution is
in /usr/local/etc/brainhat and that /usr/local/etc/brainhat/brainhat.init
contains the scenario data with which you wish to prime the
daemon, invoke with:
- cd /usr/local/etc/brainhat
./comp ./data/data9 -h &
Note that the data file, data9
needs to be pre-processed if you make changes to words, input-patterns
or other files in the data directory. You can do this by running simplecpp:
- ./simplcpp < data/data9.in
The following URL will connect
you to Brainhat server for a VoiceXML session. Again, you need a
working VoiceXML browser (supporting the Nuance grammar
specification). I ask that you don't use this server to debug your
installation because sessions take up resources. However, you are
welcome to use the site to demonstrate VoiceXML to others.
The current scenario is The
Statue with the Missing Head.
You may also try reaching
Brainhat through a Voice Portal. At this time, Brainhat is tuned
for Voxeo's voice portal. Contact Voxeo
for details about their developer's program.
The program has also worked with
VoiceGenie and TellMe's portals. I have found differences in the
way different portals treat grammars. Accordingly, what works on
one may not work on another. I have tried to make Brainhat
accommodate the differences by adjusting the grammar it returns as
a function of the portal it detects, but sometimes we miss.