XML pipeline
Encyclopedia
In software, an XML Pipeline is formed when XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

 (Extensible Markup Language) processes, especially XML transformations
XML transformation language
An XML transformation language is a programming language designed specifically to transform an input XML document into an output XML document which satisfies some specific goal.There are two special cases of transformation:...

 and XML validation
XML Validation
XML validation is the process of checking a document written in XML to confirm that it is both "well-formed" and also "valid" in that it follows a defined structure. A "well-formed" document follows the basic syntactic rules of XML, which are the same for all XML documents...

s, are connected together.

For instance, given two transformations T1 and T2, the two can be connected together so that an input XML document is transformed by T1 and then the output of T1 is fed as input document to T2. Simple pipelines like the one described above are called linear; a single input document always goes through the same sequence of transformations to produce a single output document.

Micro-operations

They operate at the inner document level
  • Rename - renames elements or attributes without modifying the content
  • Replace - replaces elements or attributes
  • Insert - adds a new data element to the output stream at a specified point
  • Delete - removes an element or attribute (also known as pruning the input tree)
  • Wrap - wraps elements with additional elements
  • Reorder - changes the order of elements

Document operations

They take the input document as a whole
  • Identity transform
    Identity transform
    The identity transform is a data transformation that copies the source data into the destination data without change.The identity transformation is considered an essential process in creating a reusable transformation library. By creating a library of variations of the base identity...

    - makes a verbatim copy of its input to the output
  • Compare - it takes two documents and compare them
  • Transform - execute a transform on the input file using a specified XSLT
    XSLT
    XSLT is a declarative, XML-based language used for the transformation of XML documents. The original document is not changed; rather, a new document is created based on the content of an existing one. The new document may be serialized by the processor in standard XML syntax or in another format,...

     file. Version 1.0 or 2.0 should be specified.
  • Split - take a single XML document and split it into distinct documents

Sequence operations

They are mainly introduced in XProc
XProc
XProc is a W3C Recommendation to define an XML transformation language to define XML Pipelines.Below is an example abbreviated XProc file: This is a pipeline that consists of two atomic steps, XInclude and Validate...

 and help to handle the sequence of document as a whole
  • Count - it takes a sequence of documents and counts them
  • Identity transform
    Identity transform
    The identity transform is a data transformation that copies the source data into the destination data without change.The identity transformation is considered an essential process in creating a reusable transformation library. By creating a library of variations of the base identity...

    - makes a verbatim copy of its input sequence of documents to the output
  • split-sequence - takes a sequence of documents as input and routes them to different outputs depending on matching rules
  • wrap-sequence - takes a sequence of documents as input and wraps them into one or more documents

Non-linear

Non-linear operations on pipelines may include:
  • Conditionals — where a given transformation is executed if a condition is met while another transformation is executed otherwise
  • Loops — where a transformation is executed on each node of a node set selected from a document or a transformation is executed until a condition evaluates to false
  • Tees — where a document is fed to multiple transformations potentially happening in parallel
  • Aggregations — where multiple documents are aggregated into a single document
  • Exception Handling — where failures in processing can result an alternate pipeline being processed


Some standards also categorize transformation as macro (changes impacting an entire file) or micro (impacting only an element or attribute)

XML Pipeline languages

XML pipeline languages are used to define pipelines. A program written with an XML pipeline language is implemented by software known as an XML pipeline engine, which creates processes, connects them together and finally executes the pipeline. Existing XML pipeline languages include:

Standards

  • XProc: An XML Pipeline Language
    XProc
    XProc is a W3C Recommendation to define an XML transformation language to define XML Pipelines.Below is an example abbreviated XProc file: This is a pipeline that consists of two atomic steps, XInclude and Validate...

    is a W3C Recommendation http://www.w3.org/TR/xproc for defining linear and non-linear XML pipelines.

Product-specific

  • W3C XML Pipeline Definition Language is specified in a W3C Note.
  • W3C XML Pipeline Language (XPL) Version 1.0 (Draft) http://www.w3.org/Submission/xpl/ http://www.w3.org/TR/xml-pipeline/ is specified in a W3C Submission and a component of Orbeon Presentation Server OPS (now called Orbeon Forms). This specification provides an implementation of an earlier version of the language. XPL allows the declaration of complex pipelines with conditionals, loops, tees, aggregations, and sub-pipelines.
  • Cocoon
    Apache Cocoon
    Apache Cocoon, usually just called Cocoon, is a web application framework built around the concepts of pipeline, separation of concerns and component-based web development. The framework focuses on XML and XSLT publishing and is built using the Java programming language...

     sitemaps
    allow, among other functionality, the declaration of XML pipelines. Cocoon sitemaps are one of the earliest implementations of the concept of XML pipeline.
  • smallx XML Pipelines are used by the smallx project.
  • ServingXML defines a vocabulary for expressing flat-XML, XML-flat, flat-flat, and XML-XML transformations in pipelines.
  • PolarLake Circuit Markup Language used by PolarLake's runtime to define XML pipelines. Circuits are collections of paths through which fragments of XML stream (usually as SAX or DOM events). Components are placed on paths to interact with the stream (and/or the outside world) in a low latency process.
  • xmlsh is a scripting language based on the unix shells which natively supports xml and text pipelines http://www.xmlsh.org

Pipe Granularity

Different XML Pipeline implementations support different granularity of flow.
  • Document: Whole documents flow through the pipe as atomic units. A document can only be in one place at a time. Though usually multiple documents may be in the pipe at once.
  • Event: Element/Text nodes events may flow through different paths. A document may be concurrently flowing through many components at the same time.

Standardization

Until may 2010, there was no widely used standard for XML pipeline languages. However, with the introduction of the W3C XProc standard as a W3C Recommendation as of May 2010 http://www.w3.org/TR/xproc/, widespread adoption can be expected.

XML Pipeline History

  • 1972 Douglas McIlroy
    Douglas McIlroy
    Malcolm Douglas McIlroy is a mathematician, engineer, and programmer. As of 2007 he is an Adjunct Professor of Computer Science at Dartmouth College. Dr...

     of Bell Laboratories adds the pipe operator to the UNIX
    Unix
    Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

     command shell. This allows the output from one shell program to go directly into input of another shell program without going to disk. This allowed programs such as the UNIX awk and sed
    Sed
    sed is a Unix utility that parses text and implements a programming language which can apply transformations to such text. It reads input line by line , applying the operation which has been specified via the command line , and then outputs the line. It was developed from 1973 to 1974 as a Unix...

     to be specialized yet work together http://www.cs.dartmouth.edu/~doug/ http://cm.bell-labs.com/cm/cs/who/dmr/hist.html. For more details see Pipeline (Unix)
    Pipeline (Unix)
    In Unix-like computer operating systems , a pipeline is the original software pipeline: a set of processes chained by their standard streams, so that the output of each process feeds directly as input to the next one. Each connection is implemented by an anonymous pipe...

    .
  • 1993 Sean McGrath developed a C++ toolkit for SGML processing http://xpipe.sourceforge.net/Articles/Miscellaneous/fog0000000020.html.
  • 1998 Stefano Mazzocchi releases the first version of Apache Cocoon
    Apache Cocoon
    Apache Cocoon, usually just called Cocoon, is a web application framework built around the concepts of pipeline, separation of concerns and component-based web development. The framework focuses on XML and XSLT publishing and is built using the Java programming language...

    , one of the first software programs to use XML pipelines.
  • 1998 PolarLake build XML Operating System, which includes XML Pipelining.
  • 2002 Notes submitted by Norman Walsh and Eve Maler from Sun Microsystems
    Sun Microsystems
    Sun Microsystems, Inc. was a company that sold :computers, computer components, :computer software, and :information technology services. Sun was founded on February 24, 1982...

    , as well as a W3C Submission submitted in 2005 by Erik Bruchez and Alessandro Vernet from Orbeon, were important steps toward spawning an actual standardization effort. While neither submission directly became a W3C recommendation, they were considered key sources of inspiration for the W3C XML Processing Working Group
    Working group
    A working group is an interdisciplinary collaboration of researchers working on new research activities that would be difficult to develop under traditional funding mechanisms . The lifespan of the WG can last anywhere between a few months and several years...

    .
  • September 2005 W3C XML Processing Working Group
    Working group
    A working group is an interdisciplinary collaboration of researchers working on new research activities that would be difficult to develop under traditional funding mechanisms . The lifespan of the WG can last anywhere between a few months and several years...

     started. The task of this working group was to create a specification for an XML pipelining language.
  • August 2008, xmlsh, an XML pipeline language was announced at Balisage 2008

See also

  • Apache Cocoon
    Apache Cocoon
    Apache Cocoon, usually just called Cocoon, is a web application framework built around the concepts of pipeline, separation of concerns and component-based web development. The framework focuses on XML and XSLT publishing and is built using the Java programming language...

  • Identity transform
    Identity transform
    The identity transform is a data transformation that copies the source data into the destination data without change.The identity transformation is considered an essential process in creating a reusable transformation library. By creating a library of variations of the base identity...

  • NetKernel
    NetKernel
    NetKernel is an implementation of the resource oriented computing abstraction.ROC is a logical computing model that resides on top of but is completely isolated fromthe physical realm of code and objects....

  • Pipeline (Unix)
    Pipeline (Unix)
    In Unix-like computer operating systems , a pipeline is the original software pipeline: a set of processes chained by their standard streams, so that the output of each process feeds directly as input to the next one. Each connection is implemented by an anonymous pipe...

  • W3C recommendation
    W3C recommendation
    A W3C Recommendation is the final stage of a ratification process of the World Wide Web Consortium working group concerning a technical standard. This designation signifies that a document has been subjected to a public and W3C-member organization's review. It aims to standardise the Web technology...

  • XSLT
    XSLT
    XSLT is a declarative, XML-based language used for the transformation of XML documents. The original document is not changed; rather, a new document is created based on the content of an existing one. The new document may be serialized by the processor in standard XML syntax or in another format,...

  • SYSQ
  • kyachahiye

Recommendations


Working drafts


Product specific



The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK