Simple API for XML
Encyclopedia
SAX is an event-based sequential access
Sequential access
In computer science, sequential access means that a group of elements is accessed in a predetermined, ordered sequence. Sequential access is sometimes the only way of accessing the data, for example if it is on a tape...

 parser API
Application programming interface
An application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...

 developed by the XML-DEV mailing list for XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

 documents. SAX provides a mechanism for reading data from an XML document that is an alternative to that provided by the Document Object Model
Document Object Model
The Document Object Model is a cross-platform and language-independent convention for representing and interacting with objects in HTML, XHTML and XML documents. Aspects of the DOM may be addressed and manipulated within the syntax of the programming language in use...

 (DOM). Where the DOM operates on the document as a whole, SAX parsers operate on each piece of the XML document sequentially.

Definition

Unlike DOM
Document Object Model
The Document Object Model is a cross-platform and language-independent convention for representing and interacting with objects in HTML, XHTML and XML documents. Aspects of the DOM may be addressed and manipulated within the syntax of the programming language in use...

, there is no formal specification for SAX. The Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

 implementation of SAX is considered to be normative. It is used for state-independent processing of XML documents, in contrast to StAX
StAX
Streaming API for XML is an application programming interface to read and write XML documents, originating from the Java programming language community.Traditionally, XML APIs are either:...

 that processes the documents state-dependently.

Benefits

SAX parsers have certain benefits over DOM-style parsers. The quantity of memory that a SAX parser must use in order to function is typically much smaller than that of a DOM parser. DOM parsers must have the entire tree in memory before any processing can begin, so the amount of memory used by a DOM parser depends entirely on the size of the input data. The memory footprint of a SAX parser, by contrast, is based only on the maximum depth of the XML file (the maximum depth of the XML tree) and the maximum data stored in XML attributes on a single XML element. Both of these are always smaller than the size of the parsed tree itself.

Because of the event-driven nature of SAX, processing documents can often be faster than DOM-style parsers. Memory allocation takes time, so the larger memory footprint of the DOM is also a performance issue.

Due to the nature of DOM, streamed reading from disk is impossible. Processing XML documents larger than main memory is also impossible with DOM parsers, but can be done with SAX parsers. However, DOM parsers may make use of disk space as memory
Virtual memory
In computing, virtual memory is a memory management technique developed for multitasking kernels. This technique virtualizes a computer architecture's various forms of computer data storage , allowing a program to be designed as though there is only one kind of memory, "virtual" memory, which...

 to sidestep this limitation.

Drawbacks

The event-driven model of SAX is useful for XML parsing, but it does have certain drawbacks.

Certain kinds of XML validation
XML Validation
XML validation is the process of checking a document written in XML to confirm that it is both "well-formed" and also "valid" in that it follows a defined structure. A "well-formed" document follows the basic syntactic rules of XML, which are the same for all XML documents...

 require access to the document in full. For example, a DTD
Document Type Definition
Document Type Definition is a set of markup declarations that define a document type for SGML-family markup languages...

 IDREF attribute requires that there be an element in the document that uses the given string as a DTD ID attribute. To validate this in a SAX parser, one would need to keep track of every previously encountered ID attribute and every previously encountered IDREF attribute, to see if any matches are made. Furthermore, if an IDREF does not match an ID, the user only discovers this after the document has been parsed; if this linkage was important to building functioning output, then time has been wasted in processing the entire document only to throw it away.

Additionally, some kinds of XML processing simply require having access to the entire document. XSLT
XSLT
XSLT is a declarative, XML-based language used for the transformation of XML documents. The original document is not changed; rather, a new document is created based on the content of an existing one. The new document may be serialized by the processor in standard XML syntax or in another format,...

 and XPath
XPath
XPath is a language for selecting nodes from an XML document. In addition, XPath may be used to compute values from the content of an XML document...

, for example, need to be able to access any node at any time in the parsed XML tree. While a SAX parser could be used to construct such a tree, the DOM already does so by design.

XML processing with SAX

A parser that implements SAX (i.e., a SAX Parser) functions as a stream parser, with an event-driven
Event-driven
Event driven may refer to:The term event-driven refers to a methodology that focuses on events and event dependencies.Examples include:* Event-driven finite-state machine, finite-state machine where the transition from one state to another is triggered by an event or a message* Event-driven...

 API. The user defines a number of callback methods
Callback (computer science)
In computer programming, a callback is a reference to executable code, or a piece of executable code, that is passed as an argument to other code. This allows a lower-level software layer to call a subroutine defined in a higher-level layer....

 that will be called when events occur during parsing. The SAX events include:
  • XML Text nodes
  • XML Element nodes
  • XML Processing Instruction
    Processing Instruction
    A Processing Instruction is an SGML and XML node type, which may occur anywhere in the document, intended to carry instructions to the application....

    s
  • XML Comments


Events are fired when each of these XML features are encountered, and again when the end of them is encountered. XML attributes are provided as part of the data passed to element events.

SAX parsing is unidirectional; previously parsed data cannot be re-read without starting the parsing operation again.

Example

Given the following XML document:





Some Text



Pre-Text Inlined text Post-text.



This XML document, when passed through a SAX parser, will generate a sequence of events like the following:
  • XML Element start, named RootElement, with an attribute param equal to "value"
  • XML Element start, named FirstElement
  • XML Text node, with data equal to "Some Text" (note: text processing, with regard to spaces, can be changed)
  • XML Element end, named FirstElement
  • Processing Instruction event, with the target some_pi and data some_attr="some_value"
  • XML Element start, named SecondElement, with an attribute param2 equal to "something"
  • XML Text node, with data equal to "Pre-Text"
  • XML Element start, named Inline
  • XML Text node, with data equal to "Inlined text"
  • XML Element end, named Inline
  • XML Text node, with data equal to "Post-text."
  • XML Element end, named SecondElement
  • XML Element end, named RootElement

Note that the first line of the sample above is the XML Declaration and not a processing instruction; as such it will not be reported as a processing instruction event.

The result above may vary: the SAX specification deliberately states that a given section of text may be reported as multiple sequential text events. Thus in the example above, a SAX parser may generate a different series of events, part of which might include:
  • XML Element start, named FirstElement
  • XML Text node, with data equal to "Some "
  • XML Text node, with data equal to "Text"
  • XML Element end, named FirstElement

Further reading

  • David Brownell: SAX2, O'Reilly, ISBN 0-596-00237-8
  • W. Scott Means, Michael A. Bodie: The Book of SAX, No Starch Press, ISBN 1-886411-77-8

See also

  • Document Object Model
    Document Object Model
    The Document Object Model is a cross-platform and language-independent convention for representing and interacting with objects in HTML, XHTML and XML documents. Aspects of the DOM may be addressed and manipulated within the syntax of the programming language in use...

  • Expat (XML)
    Expat (XML)
    In computing, Expat is a stream-oriented XML 1.0 parser library, written in C. As one of the first available open-source XML parsers, Expat has found a place in many open-source projects. Such projects include the Apache HTTP Server, Mozilla, Perl, Python and PHP...

  • Java API for XML Processing
    Java API for XML Processing
    The Java API for XML Processing, or JAXP , is one of the Java XML programming APIs. It provides the capability of validating and parsing XML documents...

  • LibXML
    LibXML
    libxml2 is a software library for parsing XML documents. It is also the basis for the libxslt library which processes XSLT-1.0 stylesheets.-Description:...

  • List of XML markup languages
  • List of XML schemas
  • MSXML
    MSXML
    Microsoft XML Core Services is a set of services that allow applications written in JScript, VBScript, and Microsoft development tools to build Windows-native XML-based applications...

  • StAX
    StAX
    Streaming API for XML is an application programming interface to read and write XML documents, originating from the Java programming language community.Traditionally, XML APIs are either:...

  • Streaming XML
    Streaming XML
    Streaming XML means dynamic data which is in an XML format.Another popular use of this term refers to one method of consuming XML data – largely known as Simple API for XML. This is via asynchronous events that are generated as the XML data is parsed. In this context, the consumer streams through...

  • VTD-XML
    VTD-XML
    Virtual Token Descriptor for eXtensible Markup Language refers to a collection of cross-platform XML processing technologies centered around a non-extractive XML, "document-centric" parsing technique called Virtual Token Descriptor...

  • Xerces
    Xerces
    Xerces is a collection of software libraries for parsing, validating, serializing and manipulating XML. The library implements a number of standard APIs for XML parsing, including DOM, SAX and SAX2. The implementation is available in Java, C++ and Perl programming languages.-External...

  • XSL Transformations
    XSL Transformations
    XSLT is a declarative, XML-based language used for the transformation of XML documents. The original document is not changed; rather, a new document is created based on the content of an existing one. The new document may be serialized by the processor in standard XML syntax or in another format,...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK