VoiceXML Introduction

Voice Extensible Markup Language (VoiceXML)

What is VoiceXML?

VoiceXML is a Web-based markup language for representing human-computer dialogs, just like HTML.

HTML assumes a graphical web browser, with display, keyboard and mouse.

VoiceXML is assumes a voice browser with audio output (computer-synthesized and/or recorded), and audio input (voice and/or keypad tones).

VoiceXML
HTML

VoiceXML Infrastructure

VoiceXML documents describe:

spoken prompts (synthetic speech)
output of audio files and streams
recognition of spoken words and phrases
recognition of touch tone (DTMF) key presses
recording of spoken input
control of dialog flow
telephony control (call transfer and hangup)

Key Concepts

Session

A session begins when the user connects to a voice browser and ends when the user disconnects.

Application

An application is a set of VoiceXML documents that share the same application root document. The root document is automatically loaded whenever one of the application documents is loaded, and remains loaded until there is a transition to a different application, or when the call is disconnected. The root document information is available to all documents in the same application.

Document

A document is a VoiceXML file (extension .vxml). Documents are also called pages.

Menu

A menu presents the user with a choice of options and the transitions to another dialog state based upon the user's selection. Documents contain menus.

Form

A form defines an interaction that collects values for each of the fields in the form. Each field may specify a prompt, the expected input, and evaluation rules. The form can be submitted to a server in much the same way as for HTML. A document may contain one or more forms. Forms are also called dialogs.

Subdialog

A subdialog is like a function call: it allows you to call out to a new dialog and then returns to the original dialog, retaining the local state information for that dialog. Sub dialogs can be used to handle confirmations and to create a library of re-usable dialogs for common tasks.

Grammar

Each dialog state has one of more grammars associated with it, that are used to describe the expected user input, either spoken input or touch-tone (DTMF) key presses. In the simplest case, only the dialog's grammars are active in that dialog. In more complex cases, other grammars can be active.

grammars defined within the dialog itself

external grammars referenced by links

grammars defined at the document level and marked as being globally active

grammars defined in the root application document and active throughout the application

Variables

VoiceXML allows you to define named variables for holding data. These can be defined at any level and their scope follows an inheritance model. You can test the values of variables to determine what dialog state to transition to next. Variable expressions can also be used for conditional prompts and grammars etc.

Events

Events are thrown when the user fails to respond to a prompt, or when the input can't be understood. VoiceXML allows you to write handlers for catching events. These follow an inheritance model, and events can be caught at a higher level if there is no corresponding handler at the dialog level.

Scripting

VoiceXML allows you to use scripting (ECMAScript) when you need additional control over the application. VoiceXML employs a form filling metaphor. You can define a complex grammar for collecting the values of several fields in a single response. Any unfilled fields can be handled by special subdialogs defined inline within each dialog. This is JavaScript.

This document was created using content from http://www.w3.org/Voice/Guide/