Semantic HTML
Encyclopedia
Semantic HTML is the use of HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

 markup to reinforce the semantics
Semantics
Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....

, or meaning, of the information in webpages rather than merely to define its presentation (look). Semantic HTML is processed by regular web browser
Web browser
A web browser is a software application for retrieving, presenting, and traversing information resources on the World Wide Web. An information resource is identified by a Uniform Resource Identifier and may be a web page, image, video, or other piece of content...

s as well as by many other user agent
User agent
In computing, a user agent is a client application implementing a network protocol used in communications within a client–server distributed computing system...

s. CSS
Cascading Style Sheets
Cascading Style Sheets is a style sheet language used to describe the presentation semantics of a document written in a markup language...

 is used to suggest its presentation to human users.

As an example, recent HTML standards discourage use of the tag <i> (italic
Italic type
In typography, italic type is a cursive typeface based on a stylized form of calligraphic handwriting. Owing to the influence from calligraphy, such typefaces often slant slightly to the right. Different glyph shapes from roman type are also usually used—another influence from calligraphy...

, a typeface
Typeface
In typography, a typeface is the artistic representation or interpretation of characters; it is the way the type looks. Each type is designed and there are thousands of different typefaces in existence, with new ones being developed constantly....

) in preference of more specific tags such as <em> (emphasis); the CSS stylesheet should then specify whether emphasis is denoted by an italic font, a bold font, underlining, slower or louder audible speech etc. This is because italics are used for purposes other than emphasis, such as citing a source; for this, HTML 4 provides the tag <cite>. Another use for italics is foreign phrases or loanwords; web designers may use built-in XHTML language attributes or specify their own semantic markup by choosing appropriate names for the class attribute values of HTML elements (e.g. class="loanword"). Marking emphasis, citations and loanwords in different ways makes it easier for web agents
User agent
In computing, a user agent is a client application implementing a network protocol used in communications within a client–server distributed computing system...

 such as search engine
Search engine
A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...

s and other software to ascertain the significance of the text.

History

HTML has included semantic markup since its inception. In an HTML document, the author may, among other things, "start with a title; add headings and paragraphs; add emphasis to [the] text; add images; add links to other pages; [and] use various kinds of lists".
At one time, HTML also included presentational markup such as <font>, <i> and <center> tags. There are also the semantically neutral span and div
Span and div
In HTML, the span and div elements are used where parts of a document cannot be semantically described by other HTML elements.Most HTML elements carry semantic meaning – i.e. the element describes, and can be made to function according to, the type of data contained within...

 tags. Since the late 1990s when Cascading Style Sheets
Cascading Style Sheets
Cascading Style Sheets is a style sheet language used to describe the presentation semantics of a document written in a markup language...

 were beginning to work in most browsers, web authors have been encouraged to avoid the use of presentational HTML markup with a view to the separation of presentation and content
Separation of presentation and content
Separation of presentation and content is a common idiom, a design philosophy, and a methodology applied in the context of various publishing technology disciplines, including information retrieval, template processing, web design, web development, word processing, desktop publishing,...

.

In 2001 Tim Berners-Lee
Tim Berners-Lee
Sir Timothy John "Tim" Berners-Lee, , also known as "TimBL", is a British computer scientist, MIT professor and the inventor of the World Wide Web...

 participated in a discussion of the Semantic Web
Semantic Web
The Semantic Web is a collaborative movement led by the World Wide Web Consortium that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of...

, where it was presented that intelligent software 'agents' might one day automatically trawl the Web and find, filter and correlate previously unrelated, published facts for the benefit of human users. Such agents are not commonplace even now, but some of the ideas of Web 2.0
Web 2.0
The term Web 2.0 is associated with web applications that facilitate participatory information sharing, interoperability, user-centered design, and collaboration on the World Wide Web...

, mashups
Mashup (web application hybrid)
In Web development, a mashup is a Web page or application that uses and combines data, presentation or functionality from two or more sources to create new services...

 and price comparison websites
Price comparison service
On the internet, a price comparison service allows individuals to see different lists of prices for specific products. Most price comparison services do not sell products themselves, but source prices from retailers from whom users can buy...

 may be coming close. The main difference between these web application hybrids and Berners-Lee's semantic agents lies in the fact that the current aggregation and hybridisation of information is usually designed in by web developers, who already know the web locations and the API semantics
Application programming interface
An application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...

 of the specific data they wish to mash, compare and combine.

An important type of web agent that does crawl and read web pages automatically, without prior knowledge of what it might find, is the Web crawler
Web crawler
A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. Other terms for Web crawlers are ants, automatic indexers, bots, Web spiders, Web robots, or—especially in the FOAF community—Web scutters.This process is called Web...

 or search-engine spider. These software agents are dependent on the semantic clarity of web pages they find as they use various techniques and algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

s to read and index millions of web pages a day and provide web users with search facilities
Web search engine
A web search engine is designed to search for information on the World Wide Web and FTP servers. The search results are generally presented in a list of results often referred to as SERPS, or "search engine results pages". The information may consist of web pages, images, information and other...

 without which the World Wide Web would be only a fraction of its current usefulness.

In order for search-engine spiders to be able to rate the significance of pieces of text they find in HTML documents, and also for those creating mashups and other hybrids, as well as for more automated agents as they are developed, the semantic structures that exist in HTML need to be widely and uniformly applied to bring out the meaning of published text.

While the true semantic web may depend on complex RDF
Resource Description Framework
The Resource Description Framework is a family of World Wide Web Consortium specifications originally designed as a metadata data model...

 ontologies and metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...

, every HTML document makes its contribution to the meaningfulness of the Web by the correct use of headings, lists, titles and other semantic markup wherever possible. The correct use of Web 2.0 'tagging' creates folksonomies
Folksonomy
A folksonomy is a system of classification derived from the practice and method of collaboratively creating and managing tags to annotate and categorize content; this practice is also known as collaborative tagging, social classification, social indexing, and social tagging...

 that may be equally or even more meaningful to many. HTML 5
HTML 5
HTML5 is a language for structuring and presenting content for the World Wide Web, and is a core technology of the Internet originally proposed by Opera Software. It is the fifth revision of the HTML standard and is still under development...

 will introduce several new semantic tags that will become commonplace in web documents of the future, such as section, article, footer, progress, nav etc.

Presentational markup tags are not deprecated
Deprecation
In the process of authoring computer software, its standards or documentation, deprecation is a status applied to software features to indicate that they should be avoided, typically because they have been superseded...

 in current HTML (4.01) and XHTML recommendations, but were recommended against. In HTML 5 some of those elements, such as i and b are still specified as their meaning has been clearly defined "as to be stylistically offset from the normal prose without conveying any extra importance".

Considerations

In cases where a document requires more precise semantics than those expressed in HTML alone, fragments of the document may be enclosed within span or div elements with meaningful class names such as <span class="author"> and <div class="invoice">. Where these class names are also a fragment identifier within a schema or ontology, they may link to a more defined meaning. Microformat
Microformat
A microformat is a web-based approach to semantic markup which seeks to re-use existing HTML/XHTML tags to convey metadata and other attributes in web pages and other contexts that support HTML, such as RSS...

s formalise this approach to semantics in HTML.

One important restriction of this approach is that such markup based on element inclusion must meet the well-formedness conditions. As these documents are broadly tree-structured, this means that only balanced fragments from a sub-tree can be marked up in this way. A means of marking-up any arbitrary section of HTML would require a mechanism independent of the markup structure itself, such as XPointer
XPointer
XPointer is a system for addressing components of XML based internet media.XPointer is divided among four specifications: a "framework" which forms the basis for identifying XML fragments, a positional element addressing scheme, a scheme for namespaces, and a scheme for XPath-based addressing...

.

Good semantic HTML also improves the accessibility
Accessibility
Accessibility is a general term used to describe the degree to which a product, device, service, or environment is available to as many people as possible. Accessibility can be viewed as the "ability to access" and benefit from some system or entity...

 of web documents (see also Web Content Accessibility Guidelines
Web Content Accessibility Guidelines
Web Content Accessibility Guidelines are part of a series of Web accessibility guidelines published by the W3C's Web Accessibility Initiative. They consist of a set of guidelines for making content accessible, primarily for disabled users, but also for all user agents, including highly limited...

). For example, when a screen reader or audio browser can correctly ascertain the structure of a document, it will not waste the visually impaired user's time by reading out repeated or irrelevant information when it has been marked up correctly.

Google 'rich snippets'

In 2010, Google
Google
Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...

 specified three forms of structured metadata that their systems will use to find structured semantic content within webpages. Such information, when related to reviews, people profiles, business listings, and events will be used by Google to enhance the 'snippet', or short piece of quoted text that is shown when the page appears in search listings. Google specifies that that data may be given using microdata
Microdata (HTML5)
Microdata is a WHATWG HTML specification used to nest semantics within existing content on web pages. Search engines, web crawlers, and browsers can extract and process Microdata from a web page and use it to provide a richer browsing experience for users. Microdata use a supporting vocabulary to...

, microformat
Microformat
A microformat is a web-based approach to semantic markup which seeks to re-use existing HTML/XHTML tags to convey metadata and other attributes in web pages and other contexts that support HTML, such as RSS...

s or RDFa
RDFa
RDFa is a W3C Recommendation that adds a set of attribute-level extensions to XHTML for embedding rich metadata within Web documents...

. Microdata is specified inside itemtype and itemprop attributes added to existing HTML elements; microformat keywords are added inside class attributes as discussed above; and RDFa relies on rel, typeof and property attributes added to existing elements.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK