Microformat
Encyclopedia
A microformat is a web-based
World Wide Web
The World Wide Web is a system of interlinked hypertext documents accessed via the Internet...

 approach to semantic markup which seeks to re-use existing HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

/XHTML
XHTML
XHTML is a family of XML markup languages that mirror or extend versions of the widely-used Hypertext Markup Language , the language in which web pages are written....

 tags to convey metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...

 and other attributes in web pages and other contexts that support (X)HTML, such as RSS
RSS
-Mathematics:* Root-sum-square, the square root of the sum of the squares of the elements of a data set* Residual sum of squares in statistics-Technology:* RSS , "Really Simple Syndication" or "Rich Site Summary", a family of web feed formats...

. This approach allows software
Software agent
In computer science, a software agent is a piece of software that acts for a user or other program in a relationship of agency, which derives from the Latin agere : an agreement to act on one's behalf...

 to process information intended for end-users (such as contact information
Address book
An address book or a name and address book is a book or a database used for storing entries called contacts. Each contact entry usually consists of a few standard fields...

, geographic coordinates
Geographic coordinate system
A geographic coordinate system is a coordinate system that enables every location on the Earth to be specified by a set of numbers. The coordinates are often chosen such that one of the numbers represent vertical position, and two or three of the numbers represent horizontal position...

, calendar events, and the like) automatically.

Although the content of web pages is technically already capable of "automated processing", and has been since the inception of the web, such processing is difficult because the traditional markup tags
Markup language
A markup language is a modern system for annotating a text in a way that is syntactically distinguishable from that text. The idea and terminology evolved from the "marking up" of manuscripts, i.e. the revision instructions by editors, traditionally written with a blue pencil on authors' manuscripts...

 used to display information on the web do not describe what the information means. Microformats can bridge this gap by attaching semantics
Semantics
Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....

, and thereby obviate other, more complicated, methods of automated processing, such as natural language processing
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....

 or screen scraping. The use, adoption and processing of microformats enables data items to be indexed, searched for, saved or cross-referenced, so that information can be reused or combined.

, microformats allow the encoding and extraction of events, contact information, social relationships and so on. More are being developed.

Background

Microformats emerged as part of a grassroots movement to make recognizable data items (such as events, contact details or geographical locations) capable of automated processing by software, as well as directly readable by end-users. Link-based microformats emerged first. These include vote links that express opinions of the linked page, which search engines can tally into instant polls.

As the microformats community grew, CommerceNet
CommerceNet
CommerceNet is a 5016 organization established in 1994 to promote electronic commerce on the Internet. The organisation initially focused on industry-wide research and programs that have advanced the commercial use of the Internet.-History:...

, a nonprofit organization that promotes electronic commerce on the Internet, helped sponsor and promote the technology and support the microformats community in various ways. CommerceNet also helped co-found the Microformats.org community site.

Neither CommerceNet nor Microformats.org operates as a standards body. The microformats community functions through an open wiki, a mailing list, and an Internet relay chat (IRC
Internet Relay Chat
Internet Relay Chat is a protocol for real-time Internet text messaging or synchronous conferencing. It is mainly designed for group communication in discussion forums, called channels, but also allows one-to-one communication via private message as well as chat and data transfer, including file...

) channel. Most of the existing microformats were created at the Microformats.org wiki and the associated mailing list, by a process of gathering examples of web publishing behaviour, then codifying it. Some other microformats (such as rel=nofollow
Nofollow
nofollow is a value that can be assigned to the rel attribute of an HTML a element to instruct some search engines that a hyperlink should not influence the link target's ranking in the search engine's index...

 and unAPI
UnAPI
According to its website, unAPI is:a tiny HTTP API any web application may use to co-publish discretely identified objects in both HTML pages and disparate bare object formats...

) have been proposed, or developed, elsewhere.

Technical overview

XHTML and HTML standards allow for the embedding and encoding of semantics within the attributes of markup tags
HTML element
An HTML element is an individual component of an HTML document. HTML documents are composed of a tree of HTML elements and other nodes, such as text nodes. Each element can have attributes specified. Elements can also have content, including other elements and text. HTML elements represent...

. Microformats take advantage of these standards by indicating the presence of metadata using the following attributes:
  • class
  • rel
  • rev (in one case, otherwise deprecated in microformats)


For example, in the text "The birds roosted at 52.48, -1.89" is a pair of numbers which may be understood, from their context, to be a set of geographic coordinates
Geographic coordinate system
A geographic coordinate system is a coordinate system that enables every location on the Earth to be specified by a set of numbers. The coordinates are often chosen such that one of the numbers represent vertical position, and two or three of the numbers represent horizontal position...

. With wrapping in spans
Span and div
In HTML, the span and div elements are used where parts of a document cannot be semantically described by other HTML elements.Most HTML elements carry semantic meaning – i.e. the element describes, and can be made to function according to, the type of data contained within...

 (or other HTML elements) with specific class names (in this case geo, latitude and longitude, all part of the geo microformat
Geo (microformat)
Geo is a microformat used for marking up WGS84 geographical coordinates in HTML. Although termed a "draft" specification, this is a formality, and the format is stable and in widespread use; not least as a sub-set of the published hCalendar and hCard microformat specifications, neither of which is...

 specification):

The birds roosted at

52.48,
-1.89



software agents can recognize exactly what each value represents and can then perform a variety of tasks such as indexing, locating it on a map and exporting it to a GPS device.

Example

In this example, the contact information is presented as follows:





With hCard microformat markup, that becomes:





Here, the formatted name (fn), organisation (org), telephone number (tel) and web address
Uniform Resource Locator
In computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....

 (url) have been identified using specific class names and the whole thing is wrapped in class="vcard", which indicates that the other classes form an hCard (short for "HTML vCard
VCard
vCard is a file format standard for electronic business cards. vCards are often attached to e-mail messages, but can be exchanged in other ways, such as on the World Wide Web or Instant Messaging...

") and are not merely coincidentally named. Other, optional, hCard classes also exist. Software, such as browser plug-ins, can now extract the information, and transfer it to other applications, such as an address book.

Specific microformats

Several microformats have been developed to enable semantic markup of particular types of information.
  • hAtom
    HAtom
    hAtom is a draft Microformat for marking up HTML, using classes and rel attributes, content on web pages that contain blog entries or similar chronological content...

     – for marking up Atom
    Atom (standard)
    The name Atom applies to a pair of related standards. The Atom Syndication Format is an XML language used for web feeds, while the Atom Publishing Protocol is a simple HTTP-based protocol for creating and updating web resources.Web feeds allow software programs to check for updates published on a...

     feeds from within standard HTML
  • hCalendar
    HCalendar
    hCalendar is a microformat standard for displaying a semantic HTML representation of iCalendar-format calendar information about an event, on web pages, using HTML classes and rel attributes....

     – for events
  • hCard
    HCard
    hCard is a microformat for publishing the contact details of people, companies, organizations, and places, in HTML, Atom, RSS, or arbitrary XML...

     – for contact information; includes:
  • adr – for postal addresses
  • geo
    Geo (microformat)
    Geo is a microformat used for marking up WGS84 geographical coordinates in HTML. Although termed a "draft" specification, this is a formality, and the format is stable and in widespread use; not least as a sub-set of the published hCalendar and hCard microformat specifications, neither of which is...

     – for geographical coordinates (latitude
    Latitude
    In geography, the latitude of a location on the Earth is the angular distance of that location south or north of the Equator. The latitude is an angle, and is usually measured in degrees . The equator has a latitude of 0°, the North pole has a latitude of 90° north , and the South pole has a...

    , longitude
    Longitude
    Longitude is a geographic coordinate that specifies the east-west position of a point on the Earth's surface. It is an angular measurement, usually expressed in degrees, minutes and seconds, and denoted by the Greek letter lambda ....

    )
  • hMedia - for audio/video content
  • hNews
    HNews
    hNews is a microformat for news content developed by the Associated Press and the Media Standards Trust. hNews extends hAtom, introducing a number of fields that more completely describe a journalistic work. hNews also introduces rel-principles...

     - for news content
  • hProduct
    HProduct
    hProduct is a microformat for publishing details of products, on web pages, using HTML classes and rel attributes..On 12 May 2009, Google announced that they would be parsing the hProduct, hCard and hReview microformats, and using them to populate search result pages....

     – for products
  • hRecipe
    HRecipe
    hRecipe is a draft microformat for publishing details of recipes using HTML on web pages, using HTML classes and rel attributes. In its simplest form, it can be used to identify individual foodstuffs, because the only required properties are fn and an ingredient, which can be the same:...

     - for recipes and foodstuffs.
  • hResume
    HResume
    hResume is a microformat for publishing résumé or Curriculum Vitae information using HTML on web pages. Like many other microformats, hResume uses HTML classes and rel attributes to make an otherwise non-semantic document more meaningful...

     – for resumes or CVs
  • hReview
    HReview
    hReview is a microformat for publishing reviews of books, music, films, restaurants, businesses, holidays, etc. using HTML on web pages, using HTML classes and rel attributes.....

     – for reviews
  • rel-directory
    Directory (file systems)
    In computing, a folder, directory, catalog, or drawer, is a virtual container originally derived from an earlier Object-oriented programming concept by the same name within a digital file system, in which groups of computer files and other folders can be kept and organized.A typical file system may...

     – for distributed directory creation and inclusion
  • rel-enclosure – for multimedia attachments to web pages
  • rel-license – specification of copyright license
  • rel-nofollow
    Nofollow
    nofollow is a value that can be assigned to the rel attribute of an HTML a element to instruct some search engines that a hyperlink should not influence the link target's ranking in the search engine's index...

    , an attempt to discourage third-party content spam (e.g. spam in blogs
    Spam in blogs
    Spam in blogs is a form of spamdexing. It is done by automatically posting random comments or promoting commercial services to blogs, wikis, guestbooks, or other publicly...

    )
  • rel-tag
    Tag (metadata)
    In online computer systems terminology, a tag is a non-hierarchical keyword or term assigned to a piece of information . This kind of metadata helps describe an item and allows it to be found again by browsing or searching...

     – for decentralized tagging (Folksonomy
    Folksonomy
    A folksonomy is a system of classification derived from the practice and method of collaboratively creating and managing tags to annotate and categorize content; this practice is also known as collaborative tagging, social classification, social indexing, and social tagging...

    )
  • xFolk – for tagged links
  • XHTML Friends Network
    XHTML Friends Network
    XHTML Friends Network is an HTML microformat developed by Global Multimedia Protocols Group that provides a simple way to represent human relationships using links. XFN enables web authors to indicate relationships to the people in their blogrolls by adding one or more keywords as the rel...

     (XFN) – for social relationships
  • XOXO
    XOXO
    XOXO is an XML microformat for outlines built on top of XHTML. Developed by several authors as an attempt to reuse XHTML building blocks instead of inventing unnecessary new XML elements/attributes, XOXO is based on existing conventions for publishing outlines, lists, and blogrolls on the Web.The...

     – for lists and outlines

Microformats under development

Among the many proposed microformats, the following are undergoing active development:
  • hAudio – for audio files and references to released recordings
  • citation – for citing references
  • currency – for amounts of money
  • figure – for associating captions with images
  • geo extensions – for places on Mars, the Moon, and other such bodies; for altitude; and for collections of waypoint
    Waypoint
    A waypoint is a reference point in physical space used for purposes of navigation.-Concept:Waypoints are sets of coordinates that identify a point in physical space. Coordinates used can vary depending on the application. For terrestrial navigation these coordinates can include longitude and...

    s marking route
    Route
    Route may refer to:* Route or thoroughfare for transportation* Route number or road number*Trade route, a commonly used path for the passage of goods*Scenic route, a thoroughfare designated as scenic based on the scenery through which it passes...

    s or boundaries
    Border
    Borders define geographic boundaries of political entities or legal jurisdictions, such as governments, sovereign states, federated states and other subnational entities. Some borders—such as a state's internal administrative borders, or inter-state borders within the Schengen Area—are open and...

  • species – for the names of living things (already used by Wikipedia
    Wikipedia
    Wikipedia is a free, web-based, collaborative, multilingual encyclopedia project supported by the non-profit Wikimedia Foundation. Its 20 million articles have been written collaboratively by volunteers around the world. Almost all of its articles can be edited by anyone with access to the site,...

      and the BBC
    BBC
    The British Broadcasting Corporation is a British public service broadcaster. Its headquarters is at Broadcasting House in the City of Westminster, London. It is the largest broadcaster in the world, with about 23,000 staff...

     Wildlife Finder)
  • measure – for physical quantities, structured data-values

Uses of microformats

Using microformats within HTML code provides additional formatting and semantic data that applications can use. For example, applications such as web crawlers can collect data about on-line resources, or desktop applications such as e-mail clients or scheduling software can compile details. The use of microformats can also facilitate "mash ups" such as exporting all of the geographical locations on a web page into (for example) Google Maps
Google Maps
Google Maps is a web mapping service application and technology provided by Google, free , that powers many map-based services, including the Google Maps website, Google Ride Finder, Google Transit, and maps embedded on third-party websites via the Google Maps API...

 to visualize them spatially.

Several browser extensions, such as Operator
Operator (extension)
Operator is an extension for the Mozilla Firefox web browser. It parses and acts upon a number of microformats, as well as validating them.Operator lets the user access microformats through a number of methods, all of which are optional: a toolbar, a toolbar button, a status bar icon, a location...

 for Firefox and Oomph for Internet Explorer
Internet Explorer
Windows Internet Explorer is a series of graphical web browsers developed by Microsoft and included as part of the Microsoft Windows line of operating systems, starting in 1995. It was first released as part of the add-on package Plus! for Windows 95 that year...

, provide the ability to detect microformats within an HTML document. When hCard or hCalendar are involved, such browser extensions allow to export them into formats compatible with contact management and calendar utilities, such as Microsoft Outlook
Microsoft Outlook
Microsoft Outlook is a personal information manager from Microsoft, available both as a separate application as well as a part of the Microsoft Office suite...

. When dealing with geographical coordinates, they allow to send the location to maps applications such as Google Maps. Yahoo! Query Language
Yahoo! query language
Yahoo! query language is an SQL-like query language created by Yahoo! as part of their Developer Network. YQL is designed to retrieve and manipulate data from APIs through a single Web interface, thus allowing mashups that enable developers to create their own applications.Initially launched in...

 can be used to extract microformats from web pages. On 12 May 2009, Google
Google search
Google or Google Web Search is a web search engine owned by Google Inc. Google Search is the most-used search engine on the World Wide Web, receiving several hundred million queries each day through its various services....

 announced that they would be parsing the hCard, hReview and hProduct microformats, and using them to populate search result pages. They have since extended this to use hCalendar for events and hRecipe for cookery recipes. Similarly, microformats are also consumed by Bing
Bing
Bing is a web search engine from Microsoft.Bing may also refer to:* An onomatopœia of a bell sound* Bing cherry, a variety of cherry* Bing , Chinese flatbread* Bing , a German company that manufactured toys and kitchen utensils...

 and Yahoo!
Yahoo!
Yahoo! Inc. is an American multinational internet corporation headquartered in Sunnyvale, California, United States. The company is perhaps best known for its web portal, search engine , Yahoo! Directory, Yahoo! Mail, Yahoo! News, Yahoo! Groups, Yahoo! Answers, advertising, online mapping ,...

. Together, these are the world's top three search engines.

Microsoft
Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...

 expressed a desire to incorporate Microformats into upcoming projects; as have other software companies.

Alex Faaborg summarizes the arguments for putting the responsibility for microformat user interfaces in the web browser rather than making more complicated HTML:
  • Only the web browser knows what applications are accessible to the user and what the user's preferences are
  • It lowers the barrier to entry for web site developers if they only need to do the markup and not handle "appearance" or "action" issues
  • Retains backwards compatibility with web browsers that don't support microformats
  • The web browser presents a single point of entry from the web to the user's computer, which simplifies security issues

Evaluation of microformats

Various commentators have offered review and discussion on the design principles and practical aspects of microformats. Additionally, microformats have been compared to other approaches that seek to serve the same or similar purpose. From time to time, there is criticism of a single, or all, microformats. Documented efforts to advocate both the spread and use of microformats are known to exist as well. Opera Software
Opera Software
Opera Software ASA is a Norwegian software company, primarily known for its Opera family of web browsers with over 220 million users worldwide. Opera Software is also involved in promoting Web standards through participation in the W3C. The company has its headquarters in Oslo, Norway and is...

 CTO and CSS
Cascading Style Sheets
Cascading Style Sheets is a style sheet language used to describe the presentation semantics of a document written in a markup language...

 creator Håkon Wium Lie
Håkon Wium Lie
Håkon Wium Lie is a web pioneer, a standards activist, and, , the Chief Technology Officer of Opera Software.He is best known for proposing the concept of Cascading Style Sheets while working with Tim Berners-Lee and Robert Cailliau at CERN in 1994. As an employee at W3C, he developed CSS into a...

 said in 2005 "We will also see a bunch of microformats being developed, and that’s how the semantic web
Semantic Web
The Semantic Web is a collaborative movement led by the World Wide Web Consortium that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of...

 will be built, I believe." However, as of August 2008, Toby Inkster, author of the "Swignition" (formerly "Cognition") microformat parsing service pointed out that no new microformat specifications had been published for over three years.

Design principles

Computer scientist and entrepreneur, Rohit Khare
Rohit Khare
Rohit Khare is computer science entrepreneur who has been active in many aspects of the development of the World Wide Web. He is the founder of Ångströ, co-founder of KnowNow, former Director of CommerceNet Labs, and a key player in the microformats community. He holds a Ph.D. from the University...

 stated that reduce, reuse, and recycle is "shorthand for several design principles" that motivated the development and practices behind microformats. These aspects can be summarized as follows:
  • Reduce: favor the simplest solutions and focus attention on specific problems;
  • Reuse: work from experience and favor examples of current practice;
  • Recycle: encourage modularity and the ability to embed, valid XHTML can be reused in blog posts, RSS feeds, and anywhere else you can access the web.

Accessibility

Because some microformats make use of title attribute of HTML's abbr element to conceal machine-readable data (particularly date-times and geographical coordinates) in the "abbr design pattern", the plain text content of the element is inaccessible to those screen reader
Screen reader
A screen reader is a software application that attempts to identify and interpret what is being displayed on the screen . This interpretation is then re-presented to the user with text-to-speech, sound icons, or a Braille output device...

s that expand abbreviations. In June 2008, the BBC announced that it would be dropping use of microformats using the abbr design pattern because of accessibility concerns.

Comparison with alternative approaches

Microformats are not the only solution for providing "more intelligent data" on the web. Alternative approaches exist and are under development as well. For example, the use of XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

 markup and standards of the Semantic Web
Semantic Web
The Semantic Web is a collaborative movement led by the World Wide Web Consortium that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of...

 are cited as alternative approaches. Some contrast these with microformats in that they do not necessarily coincide with the design principles of "reduce, reuse, and recycle", at least not to the same extent.

One advocate of microformats, Tantek Çelik
Tantek Çelik
Tantek Çelik is a computer scientist of Turkish-American descent and was the Chief Technologist at Technorati. He is one of the principal editors of several CSS Specifications....

, characterized a problem with alternative approaches:

For some applications the use of other approaches may be valid. If one wishes to use microformat-style embedding but the type of data one wishes to embed does not map to an existing microformat, one can use RDFa
RDFa
RDFa is a W3C Recommendation that adds a set of attribute-level extensions to XHTML for embedding rich metadata within Web documents...

 to embed arbitrary vocabularies into HTML, for example: embedding domain-specific scientific data on the Web like zoological or chemical data where no microformat for such data exists. Furthermore, standards such as W3C's GRDDL
GRDDL
GRDDL is a markup format for Gleaning Resource Descriptions from Dialects of Languages. It is a W3C Recommendation, and enables users to obtain RDF triples out of XML documents, including XHTML. The GRDDL specification shows examples using XSLT, however it was intended to be abstract enough to...

 allow microformats to be converted into data compatible with the Semantic Web.

Another advocate of microformats, Ryan King, put the compatibility of microformats with other approaches this way:

See also

  • COinS
    COinS
    ContextObjects in Spans, commonly abbreviated COinS, is a method to embed bibliographic metadata in the HTML code of web pages. This allows bibliographic software to publish machine-readable bibliographic items and client reference management software to retrieve bibliographic metadata. The...

  • Embedded RDF
    Embedded RDF
    Embedded RDF is a syntax for writing HTML in such a way that the information in the HTML document can be extracted into Resource Description Framework...

  • GRDDL
    GRDDL
    GRDDL is a markup format for Gleaning Resource Descriptions from Dialects of Languages. It is a W3C Recommendation, and enables users to obtain RDF triples out of XML documents, including XHTML. The GRDDL specification shows examples using XSLT, however it was intended to be abstract enough to...

  • Intelligent agent
    Intelligent agent
    In artificial intelligence, an intelligent agent is an autonomous entity which observes through sensors and acts upon an environment using actuators and directs its activity towards achieving goals . Intelligent agents may also learn or use knowledge to achieve their goals...

    s
  • Microdata (HTML5)
    Microdata (HTML5)
    Microdata is a WHATWG HTML specification used to nest semantics within existing content on web pages. Search engines, web crawlers, and browsers can extract and process Microdata from a web page and use it to provide a richer browsing experience for users. Microdata use a supporting vocabulary to...

  • RDFa
    RDFa
    RDFa is a W3C Recommendation that adds a set of attribute-level extensions to XHTML for embedding rich metadata within Web documents...

  • S5 (file format)
  • Schema.org
    Schema.org
    Schema.org is an initiative launched on 2 June 2011 by Bing, Google and Yahoo! to introduce the concept of the Semantic Web to websites. On 1 November Yandex joined the initiative . The operators of the world's largest search engines propose to mark up website content as metadata about itself,...

  • Simple HTML Ontology Extensions
    Simple HTML Ontology Extensions
    In the semantic web, Simple HTML Ontology Extensions are a small set of HTML extensions designed to give web pages semantic meaning by allowing information such as class, subclass and property relationships....

  • Tag (metadata)
    Tag (metadata)
    In online computer systems terminology, a tag is a non-hierarchical keyword or term assigned to a piece of information . This kind of metadata helps describe an item and allows it to be found again by browsing or searching...

  • Web crawler
    Web crawler
    A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. Other terms for Web crawlers are ants, automatic indexers, bots, Web spiders, Web robots, or—especially in the FOAF community—Web scutters.This process is called Web...

    s
  • XMDP
    XMDP
    XHTML Meta Data Profiles is a format for defining metadata 'profiles' or formats in a machine-readable fashion, while also enabling people to see a description of the definition visually in a web browser. XMDP definitions are expressed in XHTML...


Further reading


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK