Data: URI scheme
Encyclopedia
The data URI scheme is a URI scheme
URI scheme
In the field of computer networking, a URI scheme is the top level of the Uniform Resource Identifier naming structure. All URIs and absolute URI references are formed with a scheme name, followed by a colon character , and the remainder of the URI called the scheme-specific part...

 (Uniform Resource Identifier
Uniform Resource Identifier
In computing, a uniform resource identifier is a string of characters used to identify a name or a resource on the Internet. Such identification enables interaction with representations of the resource over a network using specific protocols...

 scheme) that provides a way to include data in-line in web page
Web page
A web page or webpage is a document or information resource that is suitable for the World Wide Web and can be accessed through a web browser and displayed on a monitor or mobile device. This information is usually in HTML or XHTML format, and may provide navigation to other web pages via hypertext...

s as if they were external resources. It tends to be simpler than other inclusion methods, such as MIME
MIME
Multipurpose Internet Mail Extensions is an Internet standard that extends the format of email to support:* Text in character sets other than ASCII* Non-text attachments* Message bodies with multiple parts...

 with cid or mid URIs. Data URIs are sometimes called Uniform Resource Locator
Uniform Resource Locator
In computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....

s, although they do not actually locate anything remote. The data URI scheme is defined in RFC 2397 of the Internet Engineering Task Force
Internet Engineering Task Force
The Internet Engineering Task Force develops and promotes Internet standards, cooperating closely with the W3C and ISO/IEC standards bodies and dealing in particular with standards of the TCP/IP and Internet protocol suite...

 (IETF).

The IETF published the data URI specification in 1998 as Proposed Standard on the IETF Standards Track, and hasn't progressed it since. The HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

 4.01 specification refers to the data URI scheme, and data URIs have now been implemented in most browsers.

Web browser support

Data URIs are currently supported by the following web browser
Web browser
A web browser is a software application for retrieving, presenting, and traversing information resources on the World Wide Web. An information resource is identified by a Uniform Resource Identifier and may be a web page, image, video, or other piece of content...

s:
  • Gecko
    Gecko (layout engine)
    Gecko is a free and open source layout engine used in many applications developed by Mozilla Foundation and the Mozilla Corporation , as well as in many other open source software projects....

    -based, such as Firefox
    Mozilla Firefox
    Mozilla Firefox is a free and open source web browser descended from the Mozilla Application Suite and managed by Mozilla Corporation. , Firefox is the second most widely used browser, with approximately 25% of worldwide usage share of web browsers...

    , SeaMonkey
    SeaMonkey
    SeaMonkey is a free and open source cross-platform Internet suite. It is the continuation of the former Mozilla Application Suite, based on the same source code...

    , XeroBank, Camino
    Camino
    Camino is a free, open source, GUI-based Web browser based on Mozilla's Gecko layout engine and specifically designed for the Mac OS X operating system...

    , Fennec
    Fennec (browser)
    Firefox for mobile is the name of the build of the Mozilla Firefox web browser for devices such as mobile phones and personal digital assistants ....

     and K-Meleon
    K-Meleon
    K-Meleon is a web browser for the Microsoft Windows platform. Based on the same Gecko layout engine as Mozilla Firefox, K-Meleon uses native Windows application programming interface to create the user interface, instead of using Mozilla's cross-platform XML User Interface Language layer, and as...

  • Konqueror
    Konqueror
    Not to be confused with the Conqueror web browser.Konqueror is a web browser and file manager that provides file-viewer functionality for file systems such as local files, files on a remote ftp server and files in a disk image. It is a core part of the KDE desktop environment...

    , via KDE
    KDE
    KDE is an international free software community producing an integrated set of cross-platform applications designed to run on Linux, FreeBSD, Microsoft Windows, Solaris and Mac OS X systems...

    's KIO
    KIO
    KIO is part of the KDE architecture. It provides access to files, web sites and other resources through a single consistent API. Applications, such as Konqueror which are written using this framework can operate on files stored on remote servers in exactly the same way as they operate on those...

     slaves input/output system
  • Opera
    Opera (web browser)
    Opera is a web browser and Internet suite developed by Opera Software with over 200 million users worldwide. The browser handles common Internet-related tasks such as displaying web sites, sending and receiving e-mail messages, managing contacts, chatting on IRC, downloading files via BitTorrent,...

     (including devices such as the Nintendo DSi or Wii)
  • WebKit
    WebKit
    WebKit is a layout engine designed to allow web browsers to render web pages. WebKit powers Google Chrome and Apple Safari and by October 2011 held over 33% of the browser market share between them. It is also used as the basis for the experimental browser included with the Amazon Kindle ebook...

    -based, such as Safari
    Safari (web browser)
    Safari is a web browser developed by Apple Inc. and included with the Mac OS X and iOS operating systems. First released as a public beta on January 7, 2003 on the company's Mac OS X operating system, it became Apple's default browser beginning with Mac OS X v10.3 "Panther". Safari is also the...

     (including on iOS), Android's browser, Epiphany
    Epiphany (web browser)
    Epiphany is an open source web browser for the GNOME desktop environment. The browser is a descendant of Galeon, and was created after developer disagreements about Galeon's growing complexity...

     and Midori
    Midori (browser)
    is a web browser that aims to be lightweight and fast. It uses the WebKit rendering engine and the GTK+ 2 interface. Midori is part of the Xfce desktop environment's Goodies component...

     (WebKit is a derivative of Konqueror's KHTML
    KHTML
    KHTML is the HTML layout engine developed by the KDE project. It is the engine used by the Konqueror web browser. A forked version of KHTML called WebKit is used by several web browsers, among them Safari and Google Chrome...

     engine, but Mac OS X
    Mac OS X
    Mac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...

     does not share the KIO architecture so the implementations are different), and Webkit/Chromium
    Chromium (web browser)
    Chromium is the open source web browser project from which Google Chrome draws its source code. The project's hourly Chromium snapshots appear essentially similar to the latest builds of Google Chrome aside from the omission of certain Google additions, most noticeable among them: Google's...

    -based, such as Chrome
    Google Chrome
    Google Chrome is a web browser developed by Google that uses the WebKit layout engine. It was first released as a beta version for Microsoft Windows on September 2, 2008, and the public stable release was on December 11, 2008. The name is derived from the graphical user interface frame, or...

  • Trident
    Trident (layout engine)
    Trident is the name of the layout engine for the Microsoft Windows version of Internet Explorer.It was first introduced with the release of Internet Explorer version 4.0 in October 1997; it has been steadily upgraded and remains in use today...

    • Internet Explorer 8
      Internet Explorer 8
      Windows Internet Explorer 8 is a web browser developed by Microsoft in the Internet Explorer browser series. The browser was released on March 19, 2009 for Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, and Windows 7. Both 32-bit and 64-bit builds are available...

      : Microsoft has limited its support to certain "non-navigable" content for security reasons, including concerns that JavaScript
      JavaScript
      JavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....

       embedded in a data URI may not be interpretable by script filters such as those used by web-based email clients. Data URIs must be smaller than 32 KiB in Version 8. Data URIs are supported only for the following elements and/or attributes:
      • object (images only)
      • img
      • input type=image
      • link (data URI must be base64 encoded)
      • CSS declarations that accept a URL, such as background-image, background, list-style-type, list-style and similar.
    • Internet Explorer 9
      Internet Explorer 9
      Windows Internet Explorer 9 is the current version of the Internet Explorer web browser from Microsoft. It was released to the public on March 14, 2011 at 21:00 PDT. Internet Explorer 9 supports several CSS 3 properties, embedded ICC v2 or v4 color profiles support via Windows Color System, and...

      : Internet Explorer 9 does not have 32KiB limitation and allowed in broader elements.

If a website designer is using data URIs they can support other browsers by serving different content based on the browsers User-Agent string.

Advantages

  • HTTP
    Hypertext Transfer Protocol
    The Hypertext Transfer Protocol is a networking protocol for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web....

     request and header traffic is not required for embedded data, so data URIs consume less bandwidth whenever the overhead of encoding the inline content as a data URI is smaller than the HTTP overhead. For example, the required base64 encoding for an image 600 bytes long would be 800 bytes, so if an HTTP request required more than 200 bytes of overhead, the data URI would be more efficient.
  • For transferring many small files (less than a few kilobytes each), this can be faster. TCP
    Transmission Control Protocol
    The Transmission Control Protocol is one of the core protocols of the Internet Protocol Suite. TCP is one of the two original components of the suite, complementing the Internet Protocol , and therefore the entire suite is commonly referred to as TCP/IP...

     transfers tend to start slowly
    Slow-start
    Slow-start is part of the congestion control strategy used by TCP, the data transmission protocol used by many Internet applications. Slow-start is used in conjunction with other algorithms to avoid sending more data than the network is capable of transmitting, that is, to avoid causing network...

    . If each file requires a new TCP connection, the transfer speed is limited by the round-trip time rather than the available bandwidth. Using HTTP keep-alive improves the situation, but may not entirely alleviate the bottleneck.
  • When browsing a secure HTTPS web site, web browsers commonly require that all elements of a web page be downloaded over secure connections, or the user will be notified of reduced security due to a mixture of secure and insecure elements. On badly configured servers, HTTPS requests have significant overhead over common HTTP requests, so embedding data in data URIs may improve speed in this case.
  • Web browsers are usually configured to make only a certain number (often two) of concurrent HTTP connections to a domain, so inline data frees up a download connection for other content.
  • Environments with limited or restricted access to external resources may embed content when it is disallowed or impractical to reference it externally. For example, an advanced HTML editing field could accept a pasted or inserted image and convert it to a data URI to hide the complexity of external resources from the user. Alternatively, a browser can convert (encode) image based data from the clipboard to a data URI and paste it in a HTML editing field. Mozilla Firefox 4
    Mozilla Firefox 4
    Mozilla Firefox 4 is a version of the Firefox web browser, released on 22 March 2011. The first beta was made available on 6 July 2010; Release Candidate 2 was released on 18 March 2011. It was codenamed Tumucumaque, and has been confirmed as Firefox's last large release cycle...

     supports this functionality.
  • It is possible to manage a multimedia page as a single file.
  • Email message templates can contain images (for backgrounds or signatures) without the image appearing to be an "attachment".
  • Data URIs make it more difficult for security software to filter content.

Disadvantages

  • Data URIs are not separately cached from their containing documents (e.g. CSS or HTML files) so data is downloaded every time the containing documents are redownloaded.
  • Content must be re-encoded and re-embedded every time a change is made.
  • Internet Explorer
    Internet Explorer
    Windows Internet Explorer is a series of graphical web browsers developed by Microsoft and included as part of the Microsoft Windows line of operating systems, starting in 1995. It was first released as part of the add-on package Plus! for Windows 95 that year...

     through version 7
    Internet Explorer 7
    Windows Internet Explorer 7 is a web browser released by Microsoft in October 2006. Internet Explorer 7 is part of a long line of versions of Internet Explorer and was the first major update to the browser in more than 5 years...

     (approximately 5% of web traffic as of September 2011), lacks support. However this can be overcome by serving browser specific content.
  • Internet Explorer 8 limits data URIs to a maximum length of 32 KB. (Internet Explorer 9 does not have this limitation)
  • Data is included as a simple stream, and many processing environments (such as web browsers) may not support using containers (such as multipart/alternative or message/rfc822) to provide greater complexity such as metadata
    Metadata
    The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...

    , data compression
    Data compression
    In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

    , or content negotiation
    Content negotiation
    Content negotiation is a mechanism defined in the HTTP specification that makes it possible to serve different versions of a document at the same URI, so that user agents can specify which version fit their capabilities the best...

    .
  • Base64
    Base64
    Base64 is a group of similar encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation...

    -encoded data URIs are 1/3 larger in size than their binary equivalent. (However, this overhead is reduced to 2-3% if the HTTP server compresses the response using gzip
    Gzip
    Gzip is any of several software applications used for file compression and decompression. The term usually refers to the GNU Project's implementation, "gzip" standing for GNU zip. It is based on the DEFLATE algorithm, which is a combination of Lempel-Ziv and Huffman coding...

    )
  • Data URIs do not carry a filename as a normal linked file would. When saving, a default filename for the specified MIME
    MIME
    Multipurpose Internet Mail Extensions is an Internet standard that extends the format of email to support:* Text in character sets other than ASCII* Non-text attachments* Message bodies with multiple parts...

     type is generally used.

Format

data:[][;charset=][;base64],

The encoding is indicated by ;base64. If it's present the data is encoded as base64
Base64
Base64 is a group of similar encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation...

. Without it the data (as a sequence of octets
Octet (computing)
An octet is a unit of digital information in computing and telecommunications that consists of eight bits. The term is often used when the term byte might be ambiguous, as there is no standard for the size of the byte.-Overview:...

) is represented using ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

 encoding for octets inside the range of safe URL characters and using the standard %xx hex encoding of URLs for octets outside that range. If is omitted, it defaults to text/plain;charset=US-ASCII. (As a shorthand, the type can be omitted but the charset parameter supplied.)

Some browsers (Chrome, Opera, Safari, Firefox) accept a non-standard ordering if both ;base64 and ;charset are supplied, while Internet Explorer requires that the charset's specification must precede the base64 token.

HTML

An HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

 fragment embedding a picture of small red dot:

Red dot

As demonstrated above, data URIs encoded with base64
Base64
Base64 is a group of similar encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation...

 may contain whitespace
Whitespace (computer science)
In computer science, whitespace is any single character or series of characters that represents horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visual mark, but typically does occupy an area on a page...

 for readability.

CSS

A CSS rule that includes a background image:

ul.checklist li.complete {
margin-left: 20px;
background: white url('data:image/png;base64,iVBORw0KGgoAA
AANSUhEUgAAABAAAAAQAQMAAAAlPW0iAAAABlBMVEUAAAD///+l2Z/dAAAAM0l
EQVR4nGP4/5/h/1+G/58ZDrAz3D/McH8yw83NDDeNGe4Ug9C9zwz3gVLMDA/A6
P9/AFGGFyjOXZtQAAAAAElFTkSuQmCC') no-repeat scroll left top;
}

In Mozilla Firefox 5 (released June, 2011), encoded data must not contain newline
Newline
In computing, a newline, also known as a line break or end-of-line marker, is a special character or sequence of characters signifying the end of a line of text. The name comes from the fact that the next character after the newline will appear on a new line—that is, on the next line below the...

s.

JavaScript

A JavaScript
JavaScript
JavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....

 statement that opens an embedded subwindow, as for a footnote link:


window.open('data:text/html;charset=utf-8,%3C%21DOCTYPE%20' +
'html%3E%0D%0A%3Chtml%20lang%3D%22en%22%3E%0D%0A%3Chead%' +
'3E%3Ctitle%3EEmbedded%20Window%3C%2Ftitle%3E%3C%2Fhead%' +
'3E%0D%0A%3Cbody%3E%3Ch1%3E42%3C%2Fh1%3E%3C%2Fbody%3E%0A' +
'%3C%2Fhtml%3E%0A%0D%0A','_blank','height=300,width=400');


This example does not work with Internet Explorer 8 due to its security restrictions that prevent navigable file types from being used.

Inclusion in HTML or CSS using PHP

Because base64-encoded data URIs are not human readable, a website author might prefer the encoded data be included in the page via a scripting language
Scripting language
A scripting language, script language, or extension language is a programming language that allows control of one or more applications. "Scripts" are distinct from the core code of the application, as they are usually written in a different language and are often created or at least modified by the...

 such as PHP. This has the advantage that if the included file changes, no modifications need to be made to the HTML file, and also of keeping a separation between binary data and text based formats. Disadvantages include greater server
Web server
Web server can refer to either the hardware or the software that helps to deliver content that can be accessed through the Internet....

 CPU use unless a server-side cache
Cache
In computer engineering, a cache is a component that transparently stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere...

 is used.


function data_uri($file, $mime) {
$contents = file_get_contents($file);
$base64 = base64_encode($contents);
return "data:$mime;base64,$base64";
}
?>

An elephant


Similarly, if CSS is processed by PHP, the below function may also be used:




div.menu {
background-image:url(' 'image/png'); ?>');
}


In either case, client or server side features (such as dynamic content generation), client detection and awareness (to support selection of alternative content, such as a different language), or discrimination (content filtering based on some client deficiency) systems (like conditional comment
Conditional comment
Conditional comments are conditional statements interpreted by Microsoft Internet Explorer in HTML source code. Conditional comments can be used to provide and hide code to and from Internet Explorer....

s) may be used to provide a standard http: URL for Internet Explorer and other older browsers.

Python


data_uri = open("sample.png", "rb").read.encode("base64").replace("\n", "")
img_tag = 'sample'.format(data_uri)
print img_tag

See also

  • An alternative for attaching resources to an HTML document is MHTML
    MHTML
    MHTML, short for MIME HTML, is a web page archive format used to combine resources that are typically represented by external links together with HTML code into a single file. The content of an MHTML file is encoded as if it were an HTML e-mail message, using the MIME type multipart/related...

    , usually found in HTML email messages.
  • MIME
    MIME
    Multipurpose Internet Mail Extensions is an Internet standard that extends the format of email to support:* Text in character sets other than ASCII* Non-text attachments* Message bodies with multiple parts...

     for the used mediatypes.
  • The data URI scheme is tested in the Acid2
    Acid2
    Acid2 is a test page published and promoted by the Web Standards Project to expose web page rendering flaws in web browsers and other applications that render HTML. Named after the acid test for gold, it was developed in the spirit of Acid1, a relatively narrow test of compliance with the Cascading...

     and Acid3
    Acid3
    Acid3 test is a web test page from the Web Standards Project that checks a web browser's compliance with elements of various web standards, particularly the Document Object Model and JavaScript....

    tests.

External links
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK