Home      Discussion      Topics      Dictionary      Almanac
Signup       Login
XHTML+Voice

XHTML+Voice

Discussion
Ask a question about 'XHTML+Voice'
Start a new discussion about 'XHTML+Voice'
Answer questions from other users
Full Discussion Forum
 
Encyclopedia
XHTML+Voice is an XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

 language for describing multimodal user interfaces
Multimodal interaction
Multimodal interaction provides the user with multiple modes of interfacing with a system. A multimodal interface provides several distinct tools for input and output of data.- Multimodal input :...

. The two essential modalities are visual and auditory. Visual interaction is defined like most current web pages via XHTML
XHTML
XHTML is a family of XML markup languages that mirror or extend versions of the widely-used Hypertext Markup Language , the language in which web pages are written....

. Auditory components are defined by a subset of Voice XML. Interfacing the voice and visual components of X+V documents is accomplished through a combination of ECMAScript
ECMAScript
ECMAScript is the scripting language standardized by Ecma International in the ECMA-262 specification and ISO/IEC 16262. The language is widely used for client-side scripting on the web, in the form of several well-known dialects such as JavaScript, JScript, and ActionScript.- History :JavaScript...

, JavaScript
JavaScript
JavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....

, and XML Events
XML Events
In computer science and web development, XML Events is a W3C standard for handling events that occur in an XML document. These events are typically caused by users interacting with the web page using a device such as a web browser on a personal computer or mobile phone.- Formal Definition :An XML...

.

Voice input


Voice input or speech recognition
Speech recognition
Speech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software...

 is based on grammars that define the set of possible input text. In contrast to a probabilistic approach employed by popular software packages such as Dragon Naturally Speaking, the grammar based approach provides the recognizer with important contextual information that significantly boosts recognition accuracy. The specific formats for grammars include JSGF
JSGF
JSGF stands for Java Speech Grammar Format or the JSpeech Grammar Format . Developed by Sun Microsystems, it is a textual representation of grammars for use in speech recognition for technologies like XHTML+Voice...

.

Voice output


Voice output or speech synthesis
Speech synthesis
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware...

 can read any string at virtually any time. Pitch, volume, and other characteristics can be customized using CSS
CSS
-Computing:*Cascading Style Sheets, a language used to describe the style of document presentations in web development*Central Structure Store in the PHIGS 3D API*Closed source software, software that is not distributed with source code...

 and Speech Synthesis Markup Language
Speech Synthesis Markup Language
Speech Synthesis Markup Language is an XML-based markup language for speech synthesis applications. It is a recommendation of the W3C's voice browser working group. SSML is often embedded in VoiceXML scripts to drive interactive telephony systems. However, it also may be used alone, such as for...

 (SSML) however the Opera
Opera (web browser)
Opera is a web browser and Internet suite developed by Opera Software with over 200 million users worldwide. The browser handles common Internet-related tasks such as displaying web sites, sending and receiving e-mail messages, managing contacts, chatting on IRC, downloading files via BitTorrent,...

 web browser doesn't currently support all these features.

MIME types


The previously recommended MIME type for any X+V document is application/xhtml+voice+xml which is what the Opera browser uses. Opera will also interpret X+V documents served as text/xml. The current recommended MIME type for any X+V document is application/xv+xml. Since most web servers associate the .xml extension with text/xml, an xml extension is a fairly safe way of making your static X+V document files browsable.

X+V-enabled browsers


The most commonly used X+V browser is the Opera browser. Users of the Opera browser can enable X+V support through steps described at http://www.opera.com/voice/. Voice is not yet supported in Opera Mini
Opera Mini
Opera Mini is a web browser designed primarily for mobile phones, smartphones and personal digital assistants. Until version 4 it used the Java ME platform, requiring the mobile device to run Java ME applications. From version 5 it is also available as a native application for Android, iOS, Symbian...

 or on platforms other than Windows.

Detecting support for X+V is best done from the server by checking the HTTP header "Accept" for the MIME type application/xhtml+voice+xml. Here is some PHP code that returns "true" if and only if the requesting browser supports XHTML+Voice:

/*
The following script echoes "true" if and only if the requesting browser
supports XHTML+Voice.
*/

//
// Determine whether browser is sending Accept header.
//
if (isset($_SERVER['HTTP_ACCEPT'])) {
$accept = $_SERVER['HTTP_ACCEPT'];
// If they omit the MIME type from Accept then assume no support.
if (strpos($accept, 'application/xhtml+voice+xml') false) {
echo 'false';
} else {
echo 'true';
}
} else {
echo 'false';
}
?>

Related Technology


Speech Application Language Tags
Speech Application Language Tags
Speech Application Language Tags is an XML based markup language that is used in HTML and XHTML pages to add voice recognition capabilities to web based applications.-Description:...

(SALT) is a very similar format developed by Microsoft
Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...

 in 2001 to compete with VoiceXML
VoiceXML
VoiceXML is the W3C's standard XML format for specifying interactive voice dialogues between a human and a computer. It allows voice applications to be developed and deployed in an analogous way to HTML for visual applications. Just as HTML documents are interpreted by a visual web browser,...

 and XHTML+Voice. SALT also provides users with multimodal support including grammar based recognition and speech synthesized output. The main differences are in the providers of support. Many different companies support VoiceXML and XHTML+Voice by providing various development tools and in particular IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

 and Opera Software
Opera Software
Opera Software ASA is a Norwegian software company, primarily known for its Opera family of web browsers with over 220 million users worldwide. Opera Software is also involved in promoting Web standards through participation in the W3C. The company has its headquarters in Oslo, Norway and is...

. SALT is supported almost exclusively from Microsoft
Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...

 by products such as the Microsoft Speech Application SDK
SASDK
The SASDK is Microsoft's Speech Application SDK. It is used to create telephony applications as well as multimodal web applications. It complies with the SALT XML standard, unlike Microsoft's earlier endeavors. The SASDK is used to create Web-based applications only...

 and Microsoft Speech Server
Microsoft Speech Server
The Microsoft Speech Server is a product from Microsoft designed to allow the authoring and deployment of IVR applications incorporating Speech Recognition, Speech Synthesis and DTMF....

.

External links