All Topics  
Markup language

 
Markup Language

   Email Print
   Bookmark   Link






 

Markup language



 
 
A markup language is a set of codes that give instructions regarding the structure of a text or how it is to be displayed. Markup languages have been in use for centuries, and in recent years have been used in computer typesetting and word-processing systems to specify the formatting, layout, structure, and other elements of a document.

A well-known example of a markup language in use today in computing is HyperText
Hypertext

Hypertext is text, displayed on a computer, with references to other text that the reader can immediately follow, usually by a mouse click or keypress sequence....
 Markup Language (HTML
HTML

HTML, an Acronym and initialism of HyperText Markup Language, is the predominant markup language for Web pages. It provides a means to describe the structure of text-based information in a document?by denoting certain text as links, headings, paragraphs, lists, and so on?and to supplement that text with interactive forms, embedded '...
), one of the most used in the World Wide Web.






Discussion
Ask a question about 'Markup language'
Start a new discussion about 'Markup language'
Answer questions from other users
Full Discussion Forum



Encyclopedia


A markup language is a set of codes that give instructions regarding the structure of a text or how it is to be displayed. Markup languages have been in use for centuries, and in recent years have been used in computer typesetting and word-processing systems to specify the formatting, layout, structure, and other elements of a document.

A well-known example of a markup language in use today in computing is HyperText
Hypertext

Hypertext is text, displayed on a computer, with references to other text that the reader can immediately follow, usually by a mouse click or keypress sequence....
 Markup Language (HTML
HTML

HTML, an Acronym and initialism of HyperText Markup Language, is the predominant markup language for Web pages. It provides a means to describe the structure of text-based information in a document?by denoting certain text as links, headings, paragraphs, lists, and so on?and to supplement that text with interactive forms, embedded '...
), one of the most used in the World Wide Web. For example, if an HTML page contains "<b>test</b>" a typical Web browser will display the word "test" in bold face; the "<b>" and "</b>" won't be displayed because, as markup, they are instructions to the browser, not part of the content.

History

The term markup is derived from the traditional publishing practice of "marking up" a manuscript
Manuscript

A manuscript is any document that is written by hand, as opposed to being printed or reproduced in some other way. The term may also be used for information that is hand-recorded in other ways than writing, for example inscriptions that are chiselled upon a hard material or scratched as with a knife point in plaster or with a stylus on a wa...
, which involves adding symbolic printer
Printing

Printing is a process for reproducing text and image, typically with ink on paper using a printing press. It is often carried out as a large-scale industrial process, and is an essential part of publishing and transaction printing....
's instructions in the margins of a paper manuscript. For centuries, this task was done primarily by skilled typographers known as "markup men" who marked up text to indicate what typeface
Typeface

In typography, a typeface is a set of one or more fonts, in one or more sizes, designed with stylistic unity, each comprising a coordinated set of glyphs....
, style, and size should be applied to each part, and then passed the manuscript to others for typesetting
Typesetting

Typesetting involves the presentation of textual material in graphic form on paper or some other Recording medium. Before the advent of desktop publishing, typesetting of printed material was produced in print shops by compositors or typesetters working by hand, and later with machines....
 by hand. Markup was also commonly applied by editors, proofreaders, publishers, and graphic designers.

GenCode

The idea of
markup languages was apparently first publicly presented by publishing executive William W. Tunnicliffe
William W. Tunnicliffe

William W. Tunnicliffe is credited by Charles Goldfarb as being the first person to articulate the idea of separating the definition of formatting from the structure of content in electronic documents....
 at a conference in 1967, although he preferred to call it
"generic coding." In the 1970s, Tunnicliffe led the development of a standard called GenCode for the publishing industry and later was the first chair of the International Organization for Standardization
International Organization for Standardization

The International Organization for Standardization , widely known as ISO , is an international standard-setting body composed of representatives from various national standards organizations....
 committee that created SGML, the first widely used descriptive markup language. Book designer Stanley Rice published vague speculation along similar lines in the late 1960s. Brian Reid, in his 1980 dissertation at Carnegie Mellon University
Carnegie Mellon University

Carnegie Mellon University is a top private university research university in Pittsburgh. Since its inception, Carnegie Mellon has grown into a world-renowned institution, with numerous programs that are frequently college and university rankings among the best in the world....
, developed the theory and a working implementation of descriptive markup in actual use.

However, IBM
IBM

International Business Machines Corporation, abbreviated IBM and nicknamed "Big Blue" , is a multinational corporation computer technology and consulting corporation headquartered in Armonk, New York, New York, United States....
 researcher Charles Goldfarb
Charles Goldfarb

Charles F. Goldfarb is known as the father of SGML and is a co-inventor of the concept of markup languages. Goldfarb holds an Bachelor of Laws from Harvard Law School....
 is more commonly seen today as the "father" of markup languages. Goldfarb hit upon the basic idea while working on a primitive document management system intended for law firms in 1969, and helped invent IBM GML later that same year. GML was first publicly disclosed in 1973.

In 1975, Goldfarb moved from Cambridge, Massachusetts
Cambridge, Massachusetts

Cambridge is a city in the Greater Boston area of Massachusetts, United States. It was named in honor of the University of Cambridge in England....
 to Silicon Valley
Silicon Valley

Silicon Valley is the South Bay of the San Francisco Bay Area in Northern California, United States. The term originally referred to the region's large number of Integrated circuit innovators and manufacturers, but eventually came to refer to all the high-tech businesses in the area; it is now generally used as a metonym for the high-tech s...
 and became a product planner at the IBM Almaden Research Center. There, he convinced IBM's executives to deploy GML commercially in 1978 as part of IBM's Document Composition Facility product. Development informally began that year on what ultimately became the SGML standard, and Goldfarb eventually became chair of the SGML committee. SGML was standardized and released by ISO in 1986.

Some early examples of markup languages available outside the publishing industry can be found in typesetting tools on Unix
Unix

Unix is a computer operating system originally developed in 1969 by a group of American Telephone & Telegraph employees at Bell Labs, including Ken Thompson , Dennis Ritchie, Douglas McIlroy, and Joe Ossanna....
 systems such as troff
Troff

troff is a document processing system developed by AT&T for the Unix operating system....
 and nroff
Nroff

nroff is a Unix text-formatting computer program; it produces output suitable for simple fixed-width computer printer and computer terminal windows....
. In these systems, formatting commands were inserted into the document text so that typesetting software could format the text according to the editor's specifications. It was a trial and error
Trial and error

Trial and error, or trial by error, is a general method of problem solving for obtaining knowledge, both propositional knowledge and know-how....
 iterative process to get a document printed correctly. Availability of WYSIWYG
WYSIWYG

WYSIWYG , is an acronym for What You See Is What You Get, used in computing to describe a system in which content displayed during editing appears very similar to the final output, which might be a printed document, web page, slide presentation or even the lighting for a theatrical event....
 ("what you see is what you get") publishing software supplanted much use of these languages among casual users, though serious publishing work still uses markup to specify the non-visual structure of texts.

TeX

Another major publishing standard is TeX
TeX

TeX is a typesetting system designed and mostly written by Donald Knuth. Together with the METAFONT language for font description and the Computer Modern typefaces, it was designed with two main goals in mind: to allow anybody to produce high-quality books using a reasonable amount of effort, and to provide a system that would give the exact...
, created and continuously refined by Donald Knuth
Donald Knuth

Donald Ervin Knuth is a renowned computer science and Emeritus of the Art of Computer Programming at Stanford University.Author of the seminal multi-volume work The Art of Computer Programming , Knuth has been called the "father" of the run-time analysis, contributing to the development of, and systematizing formal mathematical techn...
 in the 1970s and 80s. TeX
TeX

TeX is a typesetting system designed and mostly written by Donald Knuth. Together with the METAFONT language for font description and the Computer Modern typefaces, it was designed with two main goals in mind: to allow anybody to produce high-quality books using a reasonable amount of effort, and to provide a system that would give the exact...
 concentrated on detailed layout of text and font descriptions in order to typeset mathematical books in professional quality. This required Knuth to spend considerable time investigating the art of typesetting
Typesetting

Typesetting involves the presentation of textual material in graphic form on paper or some other Recording medium. Before the advent of desktop publishing, typesetting of printed material was produced in print shops by compositors or typesetters working by hand, and later with machines....
. However, TeX has a steep learning curve, so that it is mainly used in academia
Academia

Academia, Academe, or the Academy are collective terms for the community of students and scholars engaged in higher education and research....
, where it is the
de facto standard in many scientific disciplines. A TeX macro package known as LaTeX
LaTeX

LaTeX is a document markup language and Word processor for the TeX typesetting program. Within the typesetting system, its name is styled as ....
 provides a descriptive markup system on top of TeX, and is widely used.

Scribe, GML and SGML

The first language to make a clear and clean distinction between structure and presentation was Scribe
Scribe (markup language)

Scribe is a markup language and word processing system which pioneered the use of Markup language. Scribe was revolutionary when it was proposed, because it involved for the first time a clean separation of structure and format....
, developed by Brian Reid and described in his doctoral thesis in 1980. Scribe was revolutionary in a number of ways, not least that it introduced the idea of styles separated from the marked up document, and of a grammar
Grammar

Grammar is the field of linguistics that covers the conventions governing the use of any given natural language. It includes morphology and syntax, often complemented by phonetics, phonology, semantics, and pragmatics....
 controlling the usage of descriptive elements. Scribe influenced the development of Generalized Markup Language (later SGML) and is a direct ancestor to HTML and LaTeX
LaTeX

LaTeX is a document markup language and Word processor for the TeX typesetting program. Within the typesetting system, its name is styled as ....
.

In the early 1980s, the idea that markup should be focused on the structural aspects of a document and leave the visual presentation of that structure to the interpreter led to the creation of SGML. The language was developed by a committee chaired by Goldfarb. It incorporated ideas from many different sources, including Tunnicliffe's project, GenCode. Sharon Adler, Anders Berglund
Anders Berglund

Anders Berglund, born 21 July 1948 in the south of Stockholm, is a Sweden organizer, composer, Conducting, pianist and musician.Known from most important to it and let the majority melodifestivaler ....
, and James A. Marke were also key members of the SGML committee.

SGML specified a syntax for including the markup in documents, as well as one for separately describing what tags were allowed, and where (the Document Type Definition (DTD
Document Type Definition

Document Type Definition is one of several SGML and XML schema languages, and is also the term used to describe a document or portion thereof that is authored in the DTD language....
) or schema
Schema

The word schema comes from the Greek word "s???a" , which means shape, or more generally, plan. The Greek plural is "s???ata" . In English, both schemas and schemata are used as plural forms, although the latter is the standard form for written English....
). This allowed authors to create and use any markup they wished, selecting tags that made the most sense to them and were named in their own natural languages. Thus, SGML is properly a meta-language, and many particular markup languages are derived from it. From the late 80s on, most substantial new markup languages have been based on SGML system, including for example TEI
Text Encoding Initiative

The Text Encoding Initiative , a consortium of institutions and research projects, maintains and develops a standard for the representation of texts in digital form....
 and DocBook
DocBook

DocBook is a Semantics markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software but it can be used for any other sort of documentation....
. SGML was promulgated as an International Standard by International Organization for Standardization
International Organization for Standardization

The International Organization for Standardization , widely known as ISO , is an international standard-setting body composed of representatives from various national standards organizations....
, ISO 8879, in 1986.

SGML found wide acceptance and use in fields with very large-scale documentation requirements. However, it was generally found to be cumbersome and difficult to learn, a side effect of attempting to do too much and be too flexible. For example, SGML made end tags
Tag (metadata)

A tag is a non-hierarchical index term assigned to a piece of information . This kind of metadata helps describe an item and allows it to be found again by browsing or searching....
 (or start-tags, or even both) optional in certain contexts, because it was thought that markup would be done manually by overworked support staff who would appreciate saving keystrokes.

HTML
By 1991, it appeared to many that SGML would be limited to commercial and data-based applications while WYSIWYG
WYSIWYG

WYSIWYG , is an acronym for What You See Is What You Get, used in computing to describe a system in which content displayed during editing appears very similar to the final output, which might be a printed document, web page, slide presentation or even the lighting for a theatrical event....
 tools (which stored documents in proprietary binary formats) would suffice for other document processing
Document processing

Document Processing involves the conversion of typed and handwritten text on paper-based & electronic documents into electronic information utilising one of, or a combination of, Intelligent Character Recognition , Optical Character Recognition and experienced Data Entry Clerks....
 applications.

The situation changed when Sir Tim Berners-Lee
Tim Berners-Lee

Sir Timothy John Berners-Lee, Order of Merit, Order of the British Empire, Royal Society, Royal Academy of Engineering, Royal Society of Arts is an English people computer scientist and MIT professor credited with inventing the World Wide Web....
, learning of SGML from co-worker Anders Berglund and others at CERN
CERN

The European Organization for Nuclear Research , known as CERN , , is the world's largest particle physics laboratory, situated in the northwest suburbs of Geneva on the France-Switzerland border, established in 1954 in science....
, used SGML syntax to create HTML
HTML

HTML, an Acronym and initialism of HyperText Markup Language, is the predominant markup language for Web pages. It provides a means to describe the structure of text-based information in a document?by denoting certain text as links, headings, paragraphs, lists, and so on?and to supplement that text with interactive forms, embedded '...
. HTML resembles other SGML-based tag languages, although it began as simpler than most and a formal DTD was not developed until later. Steven DeRose argues that HTML's use of descriptive markup (and SGML in particular) was a major factor in the success of the Web, because of the flexibility and extensibility that it enabled (other factors include the notion of URLs and the free distribution of browsers). HTML is quite likely the most used markup language in the world today.

However, HTML's status as a markup language is disputed by some computer scientists. The argument for this is that HTML restricts the placement of tags, requiring them to be either fully nested inside of other tags, or the root tag of the document. Because of this, these scientists would suggest instead that HTML is a container language, following a Hierarchical model
Hierarchical model

A hierarchical data model is a data model in which the data is organized into a Tree data structure-like structure. The structure allows repeating information using parent/child relationships: each parent can have many children but each child only has one parent....
.LoL

XML


XML (Extensible Markup Language) is a meta markup language that is now widely used. XML was developed by the World Wide Web Consortium
World Wide Web Consortium

The World Wide Web Consortium is the main international standards organization for the World Wide Web . It is arranged as a consortium where member organizations maintain full-time staff for the purpose of working together in the development of standards for the World Wide Web....
, in a committee created and chaired by Jon Bosak
Jon Bosak

Jon Bosak led the creation of the XML specification at the W3C.Tim Bray, who was one of the editors of the XML specification, has this to say in his note on Bosak in his : "Jon Bosak is the single person without whose efforts XML would most likely have failed to happen....
. The main purpose of XML was to simplify SGML by focusing on a particular problem — documents on the Internet. XML remains a meta-language like SGML, allowing users to create any tags needed (hence "extensible") and then describing those tags and their permitted uses.

XML adoption was helped because every XML document can be written in such a way that it is also an SGML document, and existing SGML users and software could switch to XML fairly easily. However, XML eliminated many of the more complex and human-oriented features of SGML to simplify implementation (while increasing markup size and reducing readability and editability). Other improvements rectified some SGML problems in international settings, and made it possible to parse and interpret document hierarchy even if no DTD
Document Type Definition

Document Type Definition is one of several SGML and XML schema languages, and is also the term used to describe a document or portion thereof that is authored in the DTD language....
 is available.

XML was designed primarily for semi-structured environments such as documents and publications. However, it appeared to strike a happy medium between simplicity and flexibility, and was rapidly adopted for many other uses. XML is now widely used for communicating data
Transaction

A transaction is an agreement, communication, or movement carried out between separate entities or objects, often involving the exchange of items of value, such as information, goods, services and money....
 between applications. Like HTML, it can be described as a 'container' language.

XHTML
Since January 2000 all W3C Recommendation
W3C recommendation

A W3C Recommendation is the final stage of a ratification process of the World Wide Web Consortium working group concerning the standard. This designation signifies that a document has been subjected to a public and W3C-member organization's review....
s for HTML have been based on XML rather than SGML, using the abbreviation XHTML
XHTML

The Extensible Hypertext Markup Language, or XHTML, is a markup language that has the same depth of expression as HTML, but also conforms to XML syntax....
 (
E
xtensible HyperText Markup Language). The language specification requires that XHTML Web documents must be well-formed XML documents – this allows for more rigorous and robust documents while using tags familiar from HTML.

One of the most noticeable differences between HTML and XHTML is the rule that
all tags must be closed: empty HTML tags such as
must either be
closed with a regular end-tag, or replaced by a special form:  /> (the space before the '/' on the end tag is optional, but frequently used because it enables some pre-XML Web browsers, and SGML parsers, to accept the tag). Another is that all attribute
HTML

HTML, an Acronym and initialism of HyperText Markup Language, is the predominant markup language for Web pages. It provides a means to describe the structure of text-based information in a document?by denoting certain text as links, headings, paragraphs, lists, and so on?and to supplement that text with interactive forms, embedded '...
 values in tags must be quoted. Finally, all tag and attribute names must be lowercase in order to be valid; HTML, on the other hand, was case-insensitive.

Other XML-based applications
Many XML-based applications now exist, including Resource Description Framework
Resource Description Framework

The Resource Description Framework is a family of World Wide Web Consortium specifications originally designed as a metadata data model. It has come to be used as a general method for conceptual description or modeling, of information that is implemented in web resources; using a variety of syntax formats....
 (RDF), XForms
XForms

XForms is an XML format for the specification of a data processing model for XML data and user interface for the XML data, such as form . XForms was designed to be the next generation of HTML / XHTML forms, but is generic enough that it can also be used in a standalone manner or with presentation languages other than XHTML to describe a user...
, DocBook
DocBook

DocBook is a Semantics markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software but it can be used for any other sort of documentation....
, SOAP and the Web Ontology Language
Web Ontology Language

The Web Ontology Language is a family of knowledge representation languages for authoring Ontology , and is endorsed by the World Wide Web Consortium....
 (OWL). For a partial list of these see List of XML markup languages
List of XML markup languages

This page aims to list articles related to XML markup languages. This is so that those interested in the subject can monitor changes to the pages by clicking on Related changes in the sidebar and on the bottom of the page....
.

Features

A common feature of many markup languages is that they intermix the text of a document with markup instructions in the same data stream or file. This is not necessary; it is possible to isolate markup from text content, using pointers, offsets, IDs, or other methods to co-ordinate the two. Such "standoff markup" is typical for the internal representations programs use to work with marked-up documents. However, embedded or "inline" markup is much more common elsewhere. Here, for example, is a small section of text marked up in HTML:

<h1> Anatidae </h1> <p> The family <i>Anatidae</i> includes ducks, geese, and swans, but <em>not</em> the closely-related screamers. </p>

The codes enclosed in angle-brackets <like this> are markup instructions (known as tags), while the text between these instructions is the actual text of the document. The codes h1, p, and em are examples of
structural markup, in that they describe the intended purpose or meaning of the text they include. Specifically, h1 means "this is a first-level heading", p means "this is a paragraph", and em means "this is an emphasized word or phrase". A program interpreting such structural markup may apply its own rules or styles for presenting the various pieces of text, using diffent typefaces, boldness, font size, indention, colour, or other styles, as desired. A tag such as "h1" (header level 1) might be presented in a large bold sans-serif typeface, for example, or in a monospaced (typewriter-style) document it might be underscored – or it might not change the presentation at all.

In contrast, the i tag in HTML is an example of
presentational markup; it is generally used to specify a particular characteristic of the text (in this case, the use of an italic typeface) without specifying the reason for that appearance.

The Text Encoding Initiative
Text Encoding Initiative

The Text Encoding Initiative , a consortium of institutions and research projects, maintains and develops a standard for the representation of texts in digital form....
 (TEI) has published extensive guidelines for how to encode texts of interest in the humanities and social sciences, developed through years of international cooperative work. These guidelines are used by projects encoding historical documents, the works of particular scholars, periods, or genres, and so on.

Alternative usage

While the idea of markup language originated with text documents, there is an increasing usage of markup languages in other areas which involve the presentation of various types of information, including playlist
Playlist

In its most general form, a playlist is simply a list of songs. The term has several specialized meanings in the realms of radio broadcasting and personal computers....
s, vector graphics
Vector graphics

Vector graphics is the use of geometrical Primitive s such as point s, line , curves, and shapes or polygon, which are all based upon mathematical equations, to represent s in computer graphics....
, web service
Web service

A Web service is defined by the W3C as "a software system designed to support interoperability Machine to Machine interaction over a computer network"....
s, content syndication
Web syndication

Web syndication is a form of Broadcast syndication in which website material is made available to multiple other sites. Most commonly, web syndication refers to making web feeds available from a site in order to provide other people with a summary of the website's recently added content ....
, and user interface
User interface

The user interface is the aggregate of means by which people—the User s—Interaction with the system—a particular machine, device, computer program or other complex tools....
s. Most of these are XML applications because it is a well-defined and extensible language.

The use of XML has also led to the possibility of combining multiple markup languages into a single profile, like XHTML+SMIL
XHTMLplusSMIL

XHTML+SMIL is a World Wide Web Consortium Note that describes an integration of Synchronized Multimedia Integration Language semantics with XHTML and Cascading Style Sheets....
 and XHTML+MathML+SVG

See also

  • List of markup languages
    List of markup languages

    This is a list of markup languages.*List of XML markup languages*General purpose markup language*List of document markup languages*List of content syndication markup languages...
  • CSS
    Cascading Style Sheets

    Cascading Style Sheets is a stylesheet language used to describe the presentation of a document written in a markup language. Its most common application is to style web pages written in HTML and XHTML, but the language can be applied to any kind of XML document, including Scalable Vector Graphics and XUL....
     (Cascading Style Sheets)
  • Lightweight markup language
    Lightweight markup language

    A lightweight markup language is a markup language with a simple syntax, designed to be easy for a human to enter with a simple text editor, and easy to read in its raw form....
  • User interface markup language
    User interface markup language

    A user interface markup language is a markup language that renders and describes graphical user interface user interfaces and controls. Many of these markup languages are dialects of XML and are dependent upon a pre-existing scripting language engine, usually a JavaScript engine, for rendering of controls and extra scriptability....
  • Scalable Vector Graphics
  • Vector graphics markup language
    Vector graphics markup language

    A vector graphics markup language is a markup language that describes an image at a higher level than a bitmap — in terms of lines, curves, and other vector graphics primitives....
  • ColdFusion Markup Language
    ColdFusion Markup Language

    ColdFusion Markup Language, more commonly known as CFML, is the server-side scripting used by Adobe Systems ColdFusion, BlueDragon and Railo, as well as ColdFusion#Alternative server environments....
  • Programming language
    Programming language

    A programming language is a machine-readable artificial language designed to express computations that can be performed by a machine, particularly a computer....
     (contrast)
  • YAML
    YAML

    YAML is a human-readable data serialization format that takes concepts from languages such as XML, C , Python , Perl, as well as the format for electronic mail as specified by Request for Comments ....
     (YAML is not a markup language, but it's close)
  • Wikitext
    Wikitext

    Wikitext language or wiki markup is a markup language that offers a lightweight markup language to HTML and is used to write pages in wiki websites such as Wikipedia....


Sources

  • by James H. Coombs, Allen H. Renear, and Steven J. DeRose. Originally published in the November 1987 CACM
    Communications of the ACM

    Communications of the ACM is the flagship monthly journal of the Association for Computing Machinery . First published in 1957, CACM is sent to all ACM members, currently numbering about 80,000....
    , and reprinted


External links