XHTML+Voice is an
XMLExtensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
language for describing
multimodal user interfacesMultimodal interaction provides the user with multiple modes of interfacing with a system. A multimodal interface provides several distinct tools for input and output of data.- Multimodal input :...
. The two essential modalities are visual and auditory. Visual interaction is defined like most current web pages via
XHTMLXHTML is a family of XML markup languages that mirror or extend versions of the widely-used Hypertext Markup Language , the language in which web pages are written....
. Auditory components are defined by a subset of Voice XML. Interfacing the voice and visual components of X+V documents is accomplished through a combination of
ECMAScriptECMAScript is the scripting language standardized by Ecma International in the ECMA-262 specification and ISO/IEC 16262. The language is widely used for client-side scripting on the web, in the form of several well-known dialects such as JavaScript, JScript, and ActionScript.- History :JavaScript...
,
JavaScriptJavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....
, and
XML EventsIn computer science and web development, XML Events is a W3C standard for handling events that occur in an XML document. These events are typically caused by users interacting with the web page using a device such as a web browser on a personal computer or mobile phone.- Formal Definition :An XML...
.
Voice input
Voice input or
speech recognitionSpeech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software...
is based on grammars that define the set of possible input text. In contrast to a probabilistic approach employed by popular software packages such as Dragon Naturally Speaking, the grammar based approach provides the recognizer with important contextual information that significantly boosts recognition accuracy. The specific formats for grammars include
JSGFJSGF stands for Java Speech Grammar Format or the JSpeech Grammar Format . Developed by Sun Microsystems, it is a textual representation of grammars for use in speech recognition for technologies like XHTML+Voice...
.
Voice output
Voice output or
speech synthesisSpeech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware...
can read any string at virtually any time. Pitch, volume, and other characteristics can be customized using
CSS-Computing:*Cascading Style Sheets, a language used to describe the style of document presentations in web development*Central Structure Store in the PHIGS 3D API*Closed source software, software that is not distributed with source code...
and
Speech Synthesis Markup LanguageSpeech Synthesis Markup Language is an XML-based markup language for speech synthesis applications. It is a recommendation of the W3C's voice browser working group. SSML is often embedded in VoiceXML scripts to drive interactive telephony systems. However, it also may be used alone, such as for...
(SSML) however the
OperaOpera is a web browser and Internet suite developed by Opera Software with over 200 million users worldwide. The browser handles common Internet-related tasks such as displaying web sites, sending and receiving e-mail messages, managing contacts, chatting on IRC, downloading files via BitTorrent,...
web browser doesn't currently support all these features.
MIME types
The previously recommended MIME type for any X+V document is application/xhtml+voice+xml which is what the Opera browser uses. Opera will also interpret X+V documents served as text/xml. The current recommended MIME type for any X+V document is application/xv+xml. Since most web servers associate the .xml extension with text/xml, an xml extension is a fairly safe way of making your static X+V document files browsable.
X+V-enabled browsers
The most commonly used X+V browser is the Opera browser. Users of the Opera browser can enable X+V support through steps described at
http://www.opera.com/voice/. Voice is not yet supported in
Opera MiniOpera Mini is a web browser designed primarily for mobile phones, smartphones and personal digital assistants. Until version 4 it used the Java ME platform, requiring the mobile device to run Java ME applications. From version 5 it is also available as a native application for Android, iOS, Symbian...
or on platforms other than Windows.
Detecting support for X+V is best done from the server by checking the HTTP header "Accept" for the MIME type application/xhtml+voice+xml. Here is some PHP code that returns "true" if and only if the requesting browser supports XHTML+Voice:
/*
The following script echoes "true" if and only if the requesting browser
supports XHTML+Voice.
*/
//
// Determine whether browser is sending Accept header.
//
if (isset($_SERVER['HTTP_ACCEPT'])) {
$accept = $_SERVER['HTTP_ACCEPT'];
// If they omit the MIME type from Accept then assume no support.
if (strpos($accept, 'application/xhtml+voice+xml') false) {
echo 'false';
} else {
echo 'true';
}
} else {
echo 'false';
}
?>
Related Technology
Speech Application Language TagsSpeech Application Language Tags is an XML based markup language that is used in HTML and XHTML pages to add voice recognition capabilities to web based applications.-Description:...
(SALT) is a very similar format developed by
MicrosoftMicrosoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...
in 2001 to compete with
VoiceXMLVoiceXML is the W3C's standard XML format for specifying interactive voice dialogues between a human and a computer. It allows voice applications to be developed and deployed in an analogous way to HTML for visual applications. Just as HTML documents are interpreted by a visual web browser,...
and XHTML+Voice. SALT also provides users with multimodal support including grammar based recognition and speech synthesized output. The main differences are in the providers of support. Many different companies support VoiceXML and XHTML+Voice by providing various development tools and in particular
IBMInternational Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...
and
Opera SoftwareOpera Software ASA is a Norwegian software company, primarily known for its Opera family of web browsers with over 220 million users worldwide. Opera Software is also involved in promoting Web standards through participation in the W3C. The company has its headquarters in Oslo, Norway and is...
. SALT is supported almost exclusively from
MicrosoftMicrosoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...
by products such as the
Microsoft Speech Application SDKThe SASDK is Microsoft's Speech Application SDK. It is used to create telephony applications as well as multimodal web applications. It complies with the SALT XML standard, unlike Microsoft's earlier endeavors. The SASDK is used to create Web-based applications only...
and
Microsoft Speech ServerThe Microsoft Speech Server is a product from Microsoft designed to allow the authoring and deployment of IVR applications incorporating Speech Recognition, Speech Synthesis and DTMF....
.
External links