HTTP ETag
Encyclopedia
An ETag, or entity tag, is part of HTTP
Hypertext Transfer Protocol
The Hypertext Transfer Protocol is a networking protocol for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web....

, the protocol for the World Wide Web
World Wide Web
The World Wide Web is a system of interlinked hypertext documents accessed via the Internet...

. It is one of several mechanisms that HTTP provides for cache validation
Web cache
A web cache is a mechanism for the temporary storage of web documents, such as HTML pages and images, to reduce bandwidth usage, server load, and perceived lag...

, and which allows a client to make conditional requests. This allows caches to be more efficient, and saves bandwidth, as a web server does not need to send a full response if the content has not changed. ETags can also be used for optimistic concurrency control
Optimistic concurrency control
In the field of relational database management systems, optimistic concurrency control is a concurrency control method that assumes that multiple transactions can complete without affecting each other, and that therefore transactions can proceed without locking the data resources that they affect...

, as a way to help prevent simultaneous updates of a resource from overwriting each other.

An ETag is an opaque identifier assigned by a web server to a specific version of a resource found at a URL
Uniform Resource Locator
In computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....

. If the resource content at that URL ever changes, a new and different ETag is assigned. Used in this manner ETags are similar to fingerprints
Fingerprint (computing)
In computer science, a fingerprinting algorithm is a procedure that maps an arbitrarily large data item to a much shorter bit string, its fingerprint, that uniquely identifies the original data for all practical purposes just as human fingerprints uniquely identify people for practical purposes...

, and they can be quickly compared to determine if two versions of a resource are the same or are different. Comparing ETags only makes sense with respect to one URL—ETags for resources obtained from different URLs may or may not be equal and no meaning can be inferred from their comparison.

Deployment risks

The use of ETags in the HTTP header is optional (not mandatory as with some other fields of the HTTP 1.1 header). The method by which ETags are generated has never been specified at any time in the HTTP specification.

Common methods of ETag generation include using a collision-resistant
Collision resistance
Collision resistance is a property of cryptographic hash functions: a hash function is collision resistant if it is hard to find two inputs that hash to the same output; that is, two inputs a and b such that H = H, and a ≠ b.Every hash function with more inputs than outputs will necessarily have...

 hash function
Hash function
A hash function is any algorithm or subroutine that maps large data sets to smaller data sets, called keys. For example, a single integer can serve as an index to an array...

 of the resource's content, a hash of the last modification timestamp, or even just a revision number
Revision control
Revision control, also known as version control and source control , is the management of changes to documents, programs, and other information stored as computer files. It is most commonly used in software development, where a team of people may change the same files...

.

Care must be taken to avoid simple methods, such as with many checksum
Checksum
A checksum or hash sum is a fixed-size datum computed from an arbitrary block of digital data for the purpose of detecting accidental errors that may have been introduced during its transmission or storage. The integrity of the data can be checked at any later time by recomputing the checksum and...

s, that are not collision-resistant such as checksums weaker than CRC32
Cyclic redundancy check
A cyclic redundancy check is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to raw data...

 or CRC64. Checksum or hashsum collisions can lead to inadvertent use of stale cached data.

Strong and weak validation

The ETag mechanism supports both strong validation and weak validation. They are distinguished by the presence of an initial "W/" in the ETag identifier, as:

"123456789" -- A strong ETag validator
W/"123456789" -- A weak ETag validator

Two strongly validating ETags that match effectively make a claim that the resource content is byte-for-byte identical, as well as all other entity fields (such as Content-Language) also being unchanged. Strong ETags permit the caching and reassembly of partial responses, as with byte-range requests
Byte serving
Byte serving is the process of sending only a portion of an HTTP/1.1 message from a server to a client. Clients which request byte-serving might do so in cases in which a large file has been only partially delivered and a limited portion of the file is needed in a particular range. Byte Serving is...

.

Weakly validating ETags that match only make a claim that the resources at the URL are semantically equivalent
Semantic equivalence
In computer metadata, semantic equivalence is a declaration that two data elements from different vocabularies contain data that has similar meaning...

; meaning that for practical purposes they are interchangeable and that cached copies can be used. However the resources are not necessarily byte-for-byte identical, and thus weak ETags are not suitable for byte-range requests. Weak ETags may be useful for cases in which strong ETags are impractical for a web server to generate, such as with dynamically-generated content
Dynamic web page
A dynamic web page is a kind of web page that has been prepared with fresh information , for each individual viewing. It is not static because it changes with the time , the user , the user interaction , the context A dynamic web page is a kind of web page that has been prepared with fresh...

.

Typical usage

In typical usage, when a URL is retrieved the web server will return the resource along with its corresponding ETag value, which is placed in an HTTP "ETag" field:

ETag: "686897696a7c876b7e"

The client may then decide to cache the resource, along with its ETag. Later, if the client wants to retrieve the same URL again, it will send its previously saved copy of the ETag along with the request in a "If-None-Match" field.

If-None-Match: "686897696a7c876b7e"

On this subsequent request, the server may now compare the client's ETag with the ETag for the current version of the resource. If the ETag values match, meaning that the resource has not changed, then the server may send back a very short response with an HTTP 304 Not Modified status. The 304 status tells the client that its cached version is still good and that it should use that.

However, if the ETag values do not match, meaning the resource has likely changed, then a full response including the resource's content is returned, just as if ETags were not being used. In this case the client may decide to replace its previously cached version with the newly returned resource and the new ETag.

ETag values can be used in web page monitoring
Change Detection and Notification
Internet change detection and notification refers to automatic detection of changes made to World Wide Web pages and notification to interested users by email or other means. Whereas search engines are designed to find web pages, CDN systems are designed to monitor changes to web pages...

 systems. Efficient web page monitoring is hindered by the fact that most websites do not set the Etag headers for web pages. When a web monitor has no hints whether web content has been changed all content has to be retrieved, and analyzed, using computing resources for both the publisher and subscriber.

Tracking using ETags

Some have pointed out that online advertising networks could use ETags as a substitute for HTTP cookies, as cookies are increasingly deleted by privacy aware users. As of July 2011, researchers report that a number of Websites, including Hulu.com, are using ETags for tracking purposes. Hulu and KISSmetrics have both ceased respawning as of July 29th 2011, as KISSmetrics and over 20 of its clients are facing a class-action lawsuit over the use of "undeletable" tracking cookies partially involving the use of ETags.

Because ETags are cached by the browser, and returned with subsequent requests for the same resource, a tracking server can simply repeat any ETag received from the browser to ensure an assigned ETag persists indefinitely (in a similar way to persistent cookies).

ETags may be flushable by clearing the browser cache (but browser implementations may vary).

In 2007, two Mozilla Firefox
Mozilla Firefox
Mozilla Firefox is a free and open source web browser descended from the Mozilla Application Suite and managed by Mozilla Corporation. , Firefox is the second most widely used browser, with approximately 25% of worldwide usage share of web browsers...

add-ons were made to prevent the usage of ETags for tracking, though as of September 14, 2011, they don't work with the latest version of Firefox.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK