Web cache
Encyclopedia
A web cache is a mechanism for the temporary storage (caching
Cache
In computer engineering, a cache is a component that transparently stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere...

) of web document
Web document
A web document is similar in concept to a web page, but also satisfies the following broader definition:The term "web document" has been used as a fuzzy term in many sources A web document is similar in concept to a web page, but also satisfies the following broader (W3C) definition:The term "web...

s, such as HTML pages and images
Digital image
A digital image is a numeric representation of a two-dimensional image. Depending on whether or not the image resolution is fixed, it may be of vector or raster type...

, to reduce bandwidth
Bandwidth (computing)
In computer networking and computer science, bandwidth, network bandwidth, data bandwidth, or digital bandwidth is a measure of available or consumed data communication resources expressed in bits/second or multiples of it .Note that in textbooks on wireless communications, modem data transmission,...

 usage, server
Web server
Web server can refer to either the hardware or the software that helps to deliver content that can be accessed through the Internet....

 load, and perceived lag
Lag
Lag is a common word meaning to fail to keep up or to fall behind. In real-time applications, the term is used when the application fails to respond in a timely fashion to inputs...

. A web cache stores copies of documents passing through it; subsequent requests may be satisfied from the cache if certain conditions are met.

It should not be confused with a web archive
Web ARChive
The Web ARChive archive format specifies a method for combining multiple digital resources into an aggregate archive file together with related information. The WARC format is a revision of the Internet Archive's ARC File Format [ARC_IA] that has traditionally been used to store "web crawls" as...

, a site that keeps old versions of web pages.

Systems

Web caches various systems.
  • A search engine
    Search engine
    A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...

     may cache a website.
  • A forward cache is a cache outside the webserver's network, e.g. on the client software's ISP or company network.
  • A network-aware forward cache is just like a forward cache but only caches heavily accessed items.
  • A reverse
    Reverse proxy
    In computer networks, a reverse proxy is a type of proxy server that retrieves resources on behalf of a client from one or more servers. These resources are then returned to the client as though it originated from the reverse proxy itself...

     cache sits in front of one or more Web server
    Web server
    Web server can refer to either the hardware or the software that helps to deliver content that can be accessed through the Internet....

    s and web application
    Web application
    A web application is an application that is accessed over a network such as the Internet or an intranet. The term may also mean a computer software application that is coded in a browser-supported language and reliant on a common web browser to render the application executable.Web applications are...

    s, accelerating requests from the Internet.
  • A client, such as a web browser
    Web browser
    A web browser is a software application for retrieving, presenting, and traversing information resources on the World Wide Web. An information resource is identified by a Uniform Resource Identifier and may be a web page, image, video, or other piece of content...

    , can store web content for reuse. For example, if the back button is pressed, the local cached version of a page may be displayed instead of a new request being sent to the web server.
  • A web proxy sitting between the client and the server can evaluate HTTP headers and choose to store web content.
  • A content delivery network
    Content Delivery Network
    A content delivery network or content distribution network is a system of computers containing copies of data placed at various nodes of a network....

     can retain copies of web content at various points throughout a network.

Cache control

HTTP defines three basic mechanisms for controlling caches: freshness, validation, and invalidation.
  • Freshness allows a response to be used without re-checking it on the origin server, and can be controlled by both the server and the client. For example, the Expires response header gives a date when the document becomes stale, and the Cache-Control: max-age directive tells the cache how many seconds the response is fresh for.
  • Validation can be used to check whether a cached response is still good after it becomes stale. For example, if the response has a Last-Modified header, a cache can make a conditional request using the If-Modified-Since header to see if it has changed. The ETag
    HTTP ETag
    An ETag, or entity tag, is part of HTTP, the protocol for the World Wide Web. It is one of several mechanisms that HTTP provides for cache validation, and which allows a client to make conditional requests. This allows caches to be more efficient, and saves bandwidth, as a web server does not...

     (entity tag) mechanism also allows for both strong and weak validation.
  • Invalidation is usually a side effect of another request that passes through the cache. For example, if URL associated with a cached response subsequently gets a POST, PUT or DELETE request, the cached response will be invalidated.

Browser cache

Web browsers cache content on the client machine, in memory and on disk.

Legal issues

In 1998 the DMCA added rules to the United States Code
United States Code
The Code of Laws of the United States of America is a compilation and codification of the general and permanent federal laws of the United States...

 (17 U.S.C. §: 512) that relinquishes system operators from copyright liability for the purposes of caching.

Comparison of web caches

Name Type Operating System Forward
Mode
Reverse
Mode
License
ApplianSys CACHEbox Appliance Linux Commercial
Blue Coat ProxySG
Blue Coat Systems
Blue Coat Systems Inc. is a network security and network management company based in Sunnyvale, California, United States.It identifies itself as an application delivery network specialist...

Appliance SGOS Commercial
Nginx
Nginx
nginx is a Web server and a reverse proxy server for HTTP, SMTP, POP3 and IMAP protocols, with a strong focus on high concurrency, performance and low memory usage. It is licensed under a BSD-like license and it runs on Unix, Linux, BSD variants, Mac OS X, Solaris, and Microsoft Windows.- Overview...

Software Linux, Unix 2-clause BSD-like
Microsoft Forefront Threat Management Gateway Software Windows Commercial
Polipo
Polipo
Polipo is a fast and lightweight, forwarding and caching proxy server and computer software daemon.By virtue of being a compliant HTTP 1.1 proxy, Polipo has all the uses of traditional Web proxies. It features HTTP 1.1, IPv4 & IPv6, traffic filtering and privacy-enhancement. Polipo supports HTTP...

Software Linux, Unix, Windows GNU GPL
Squid Software Linux, Unix, Windows GNU GPL
Traffic Server
Traffic Server
The Apache Traffic Server is a modular, high-performance reverse proxy and forward proxy server, generally comparable to Nginx and Squid. It was created by Inktomi, and distributed as a commercial product called the Inktomi Traffic Server, before Inktomi was acquired by Yahoo!...

Software Linux, Unix Apache License 2.0
Untangle
Untangle
Untangle is a privately held company based in Sunnyvale, California. The company provides an open source network gateway for small businesses, schools, and non-profit organizations. Untangle provides multiple gateway applications installed at the edge of a network.-History:Untangle was founded in...

Software Linux Commercial
Varnish Software Linux, Unix BSD
WinGate
Wingate
-Places:In New Zealand:* Wingate, New Zealand, A suburb of Lower HuttIn the United Kingdom:* Wingate, County Durham* Old Wingate, County Durham* Wingates, Bolton, Greater ManchesterIn the United States:* Wingate, Indiana...

Software Windows Commercial

See also

  • Harvest project
    Harvest project
    Harvest was a DARPA funded research project by the Internet Research Task Force Research Group on Resource Discovery and hosted at the University of Colorado at Boulder which provided a web cache, developed standards such as the Internet Cache Protocol and Summary Object Interchange Format, and...

  • Proxy server
    Proxy server
    In computer networks, a proxy server is a server that acts as an intermediary for requests from clients seeking resources from other servers. A client connects to the proxy server, requesting some service, such as a file, connection, web page, or other resource available from a different server...

  • Web accelerator
    Web accelerator
    A web accelerator is a proxy server that reduces web site access times. They can be a self-contained hardware appliance or installable software....

  • Cache manifest in HTML5

Further reading

  • Ari Luotonen, Web Proxy Servers (Prentice Hall, 1997) ISBN 0-13-680612-0
  • Duane Wessels, Web Caching (O'Reilly and Associates, 2001). ISBN 1-56592-536-X
  • Michael Rabinovich and Oliver Spatschak, Web Caching and Replication (Addison Wesley, 2001). ISBN 0-201-61570-3

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK