Title Goes Here

VoiceXML Tutorial:
Part 1
Introduction and User Interaction
with DTMF
Presented by Plum
Voice
About Plum
Plum offers high-performance, versatile, and scalable IVR systems
and hosting that can automate any phone.
We pride ourselves on delivering solutions that satisfy customers
ranging from small and medium-sized business to some of the
largest enterprises in the world.
1. VoiceXML Tutorial
VoiceXML 2.0 is the World Wide Web
consortium standard for scripting voice
applications. In this tutorial, we construct a
VoiceXML interactive voice response (IVR) for a
customer service center.
Some aspects of this tutorial assume you have
your own web server. For a full production level
application, this is the recommended
configuration. Starting from a simple "Hello
World" application, we build a telephony
application which includes:
 Dynamic response driven by touch tone or
speech input
 Advanced text-to-speech (TTS) speech
synthesis and automatic speech recognition
(ASR)
 System integration with enterprise databases
1.1 Introduction to VXML
We begin with nearly the
simplest complete VoiceXML
application. The application
here is analogous to an
answering machine set to
play an announcement only.
<?xml version="1.0"?>
<vxml version="2.0">
<form>
<block>
<prompt>
Welcome to Plum
Voice.
</prompt>
</block>
</form>
</vxml>
In this example, the user would
hear a synthesized voice say,
"Welcome to Plum Voice." Then
the system would simply hang
up. The <form> defines the
basic unit of interaction in
VoiceXML. This form includes
only a single <block> of
executable content which in turn
includes a single <prompt> to
the user. By default, any plain
text within a prompt is passed to
the system's text-to-speech
(TTS) synthesis engine to be
generated as audio.
1.1 Introduction to VXML
Also, as the <?xml?> tag declares, every VoiceXML document is an
XML document. The basic structure of the VoiceXML should be familiar
to anyone who has looked at HTML web documents. Tags are set off by
brackets <form> and are closed with a forward slash </form>.
VoiceXML documents must adhere strictly to the XML standard. The
document must begin with the <?xml?> tag. Then the rest of the
document is enclosed within the <vxml></vxml> tags. Unlike HTML, all
tags must be closed and certain special characters must be escaped
with a safe alternative. For example, the less than sign <, when it is not
used to open a tag, must be escaped with a safe alternative (e.g. &lt;).
1.1 Introduction to VXML
For static prompts such as this welcome message, we'll
probably want to use a human announcer instead of TTS. TTS
has come a long way, but there's still no substitute for the real
thing. For recorded prompts, we use the <audio> tag.
<prompt>
<audio src="wav/welcome.wav">
Welcome to Plum Voice.
</audio>
</prompt>
In this case, the source ("src") reference is relative to the VXML document
URL in which it appears. WAV files are a generic container type. WAV files
include a header which indicates the actual audio sample size, encoding,
and rate used. Supported formats vary by VoiceXML implementation and
not all possible WAV file formats are supported. The Plum Voice Platform
supports 8 kHz audio files in 16 bit linear, 8 bit µ-law (u-law), or 8 bit A-law
encoding in WAV files or headerless files.
1.1 Introduction to VXML
The text within the audio tag is not
required. We could have included
no content:
<audio src="wav/welcome.wav"/>
which is equivalent to
<audio
src="wav/welcome.wav"></audio>
The text included within the audio
tag in the example above is
something like the ALT text for
images in HTML. If the VoiceXML
platform is unable to open or play
the source ("src") file in the audio
tag, it falls back on generating TTS
from the included text.
1.1 Introduction to VXML
It is good practice to store your audio files on the same local server as your
application script. For example, here is what our server files would look like on our
local server. From the screenshot above, note that in the files folder of our local
server, test.php is our script that contains the reference to the file, welcome.wav.
1.1 Introduction to VXML
welcome.wav is stored in our wav folder. Thus, when
referencing the source ("src") file in our audio tag, we do:
<audio src="wav/welcome.wav">
Welcome to Plum Voice.
</audio>
The benefit of storing audio files on your local server as
opposed to the audio repository is that it allows for easier file
management. Suppose you wanted to change the name of
one of your audio files. If this file is stored locally on your
server, you could just go in and rename the file yourself.
However, with the audio repository, you are not able to
manage these files. For example, if you deleted a recording
in the audio repository (in this case, let's call it 12.wav) and
uploaded a replacement file, the replacement file would not
take the deleted recording's old name. It would take the next
highest number available out of your recordings (in this case,
let's say it got named 21.wav).
If you are concerned about loading times for audio files from
your local server, please note that when these audio files
have been cached, they will have the same load times as if
stored on our audio repository. 0
1.2 User Interaction with DTMF
Grammars are used by speech
recognizers to determine what the
recognizer should listen for, and so
describe the utterances a user may
say. Starting with VoiceXML Version
2.0, the W3C requires that all
VoiceXML platforms must support at
least one common format, the XML
Form of the W3C Speech
Recognition Grammar Specification
(SRGS). Plum implements the
SRGS+XML grammar format for
both Voice and DTMF grammars as
well as JSpeech Grammar Format
(JSGF).
1.2 User Interaction with DTMF
To control user input, we can
explicitly create input fields and
specify allowable grammars for
user input. We do this by
explicitly using the <grammar>
tag for each <field> inside a
<form>. The <grammar>
element is used to provide a
speech (or DTMF) grammar
that:
 Specifies a set of utterances
or DTMF key presses that a
user may speak or type to
perform an action or supply
information.
 Returns a corresponding
semantic interpretation for a
matching input.
1.2 User Interaction with DTMF
The following example shows how to set up a grammar for DTMF input from the user:
<?xml version="1.0"?>
<vxml version="2.0">
<form id="mainmenu">
<field name="menuchoice">
<grammar type="application/x-jsgf" mode="dtmf">
1|2|3
</grammar>
<prompt>
For sales, press 1.
For tech support, press 2.
For company directory, press 3.
</prompt> <filled> <if cond="menuchoice==1"> Welcome to sales. <elseif
cond="menuchoice==2"/> Welcome to tech support. <elseif
cond="menuchoice==3"/> Welcome to the company directory. </if> </filled> </field>
</form> </vxml>
Here we specify a grammar for the field using JSGF (Java Speech Grammar Format)
grammar syntax which is the default syntax for the Plum Voice Platform. To do this
example in SRGS+XML format, it would look like this.
1.2 User Interaction with DTMF
<?xml version="1.0"?>
<vxml version="2.0">
<form id="mainmenu">
<field name="menuchoice">
<grammar type="application/srgs+xml" root="ROOT"
mode="dtmf">
<rule id="ROOT">
<one-of>
<item>1</item>
<item>2</item>
<item>3</item>
</one-of>
</rule>
</grammar>
<prompt>
For sales, press 1.
For tech support, press 2.
For company directory, press 3.
</prompt>
<filled>
<if cond="menuchoice==1">
Welcome to sales.
<elseif cond="menuchoice==2"/>
Welcome to tech support.
<elseif cond="menuchoice==3"/>
Welcome to the company directory.
</if>
</filled>
</field>
</form>
</vxml>
•
From this example, notice that the
SRGS+XML grammar in this
example is longer than the JSGF
grammar in the example before it.
For numeric input, JSGF is often a
shorter alternative.
About us
Plum Voice was founded in 2000 as The Plum Group Inc. With headquarters in
New York and offices in New York City, Boston, Denver and London, Plum
creates technologies for personalized audio communication. Plum provides
interactive voice response platforms, systems and hosting services to
developers and companies to automate call center and business process
over the phone. Products and services include:
The Plum VoiceXML Platform
Plum IVR Hosting Suite
Plum Survey
Plum IVR Server
Plum Professional Services
QuickFuse
Up Next:
• User Interaction with
Speech, Built-in
Grammars, and
Standard Events