Internet Archive
Encyclopedia
For help citing the Internet Archive in English Wikipedia, see Wikipedia:Using the Wayback Machine.


The Internet Archive is a non-profit digital library with the stated mission of "universal access to all knowledge". It offers permanent storage and access to collections of digitized materials, including websites, music, moving images, and nearly 3 million public domain books. The Internet Archive was founded by Brewster Kahle
Brewster Kahle
Brewster Kahle is a computer engineer, internet entrepreneur, activist, and digital librarian.- Biography :Kahle graduated from the Massachusetts Institute of Technology in 1982 with a Bachelor of Science in computer science and engineering, where he was a member of the Chi Phi Fraternity. The...

 in 1996. It is a member of the IIPC (International Internet Preservation Consortium
International Internet Preservation Consortium
-Projects:IIPC sponsored a project on "cross-archival search strategies" which included the creation of an archive focused on the 2010 Winter Olympics....

).

With offices located in San Francisco
San Francisco, California
San Francisco , officially the City and County of San Francisco, is the financial, cultural, and transportation center of the San Francisco Bay Area, a region of 7.15 million people which includes San Jose and Oakland...

, California
California
California is a state located on the West Coast of the United States. It is by far the most populous U.S. state, and the third-largest by land area...

, USA, and data centers in San Francisco, Redwood City, and Mountain View, California, USA, the Archive's largest collection is its web archive, "snapshots
Snapshot (computer storage)
In computer systems, a snapshot is the state of a system at a particular point in time. The term was coined as an analogy to that in photography. It can refer to an actual copy of the state of a system or to a capability provided by certain systems....

 of the World Wide Web". To ensure the stability and endurance of the Internet Archive, its collection is mirrored at the Bibliotheca Alexandrina
Bibliotheca Alexandrina
The Bibliotheca Alexandrina or Maktabat al-Iskandarīyah is a major library and cultural center located on the shore of the Mediterranean Sea in the Egyptian city of Alexandria...

 in Egypt
Egypt
Egypt , officially the Arab Republic of Egypt, Arabic: , is a country mainly in North Africa, with the Sinai Peninsula forming a land bridge in Southwest Asia. Egypt is thus a transcontinental country, and a major power in Africa, the Mediterranean Basin, the Middle East and the Muslim world...

.

The Archive allows the public to both upload and download digital material to its data cluster, and provides unrestricted online access to that material at no cost. The Archive also oversees one of the world's largest book digitization projects. It is a member of the American Library Association
American Library Association
The American Library Association is a non-profit organization based in the United States that promotes libraries and library education internationally. It is the oldest and largest library association in the world, with more than 62,000 members....

 and is officially recognized by the State of California
California
California is a state located on the West Coast of the United States. It is by far the most populous U.S. state, and the third-largest by land area...

 as a library.

In addition to its archiving function, the Archive is an activist organization, advocating for a free and open Internet.

The Archive is a 501(c)(3) non-profit operating in the United States. It has a staff of 200, most of whom are book scanners in its book scanning centers. Its main office in San Francisco houses about 30 employees. The Archive has an annual budget of $10 million, derived from a variety of sources: revenue from its Web crawling
Web crawler
A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. Other terms for Web crawlers are ants, automatic indexers, bots, Web spiders, Web robots, or—especially in the FOAF community—Web scutters.This process is called Web...

 services, various partnerships, grants, donations, and the Kahle-Austin Foundation.

History

Brewster Kahle
Brewster Kahle
Brewster Kahle is a computer engineer, internet entrepreneur, activist, and digital librarian.- Biography :Kahle graduated from the Massachusetts Institute of Technology in 1982 with a Bachelor of Science in computer science and engineering, where he was a member of the Chi Phi Fraternity. The...

 founded the Archive in 1996 at the same time that he began the for-profit web crawling company Alexa Internet
Alexa Internet
Alexa Internet, Inc. is a California-based subsidiary company of Amazon.com that is known for its toolbar and Web site. Once installed, the toolbar collects data on browsing behavior which is transmitted to the Web site where it is stored and analyzed and is the basis for the company's Web traffic...

. The Archive began to archive the World Wide Web
World Wide Web
The World Wide Web is a system of interlinked hypertext documents accessed via the Internet...

 from 1996, but it did not make this collection available until 2001, when it developed the Wayback Machine
Wayback Machine
The Wayback Machine is a digital time capsule created by the Internet Archive non-profit organization, based in San Francisco, California. It is maintained with content from Alexa Internet. The service enables users to see archived versions of web pages across time, which the Archive calls a "three...

. In late 1999, the Archive expanded its collections beyond the Web archive, beginning with the Prelinger Archive. Now the Internet Archive includes texts, audio, moving images, and software. It hosts a number of other projects: the NASA Images Archive, the contract crawling service Archive-It, and the wiki-editable library catalog and book information site Open Library
Open Library
Open Library is an online project intended to create “one web page for every book ever published”. Open Library is a project of the non-profit Internet Archive and has been funded in part by a grant from the California State Library and the Kahle/Austin Foundation.-Books for the blind and...

. Recently, the Archive has begun working to provide specialized services relating to the information access needs of the print-disabled.

According to its website:
Most societies place importance on preserving artifacts of their culture and heritage. Without such artifacts, civilization has no memory and no mechanism to learn from its successes and failures. Our culture now produces more and more artifacts in digital form. The Archive's mission is to help preserve those artifacts and create an Internet library for researchers, historians, and scholars.

Wayback Machine

The Internet Archive has capitalized on the popular use of the term "WABAC Machine" from a segment of the old Rocky and Bullwinkle cartoon, and uses the name "Wayback Machine" for its service that allows archives of the World Wide Web to be searched and accessed. This service allows users to see archived versions of web page
Web page
A web page or webpage is a document or information resource that is suitable for the World Wide Web and can be accessed through a web browser and displayed on a monitor or mobile device. This information is usually in HTML or XHTML format, and may provide navigation to other web pages via hypertext...

s of the past, what the Internet Archive calls a "three dimensional index". Millions of websites and their associated data (images, source code, documents, etc.) are saved in a gigantic database. The service can be used to see what previous versions of websites used to look like, to grab original source code from websites that may no longer be directly available, or to visit websites that no longer even exist. Not all websites are available, however, because many website owners choose to exclude their sites. As with all sites based on data from web crawlers, the Internet Archive misses large areas of the web for a variety of other reasons. International biases have also been found in its coverage, although this does not seem to be the result of a deliberate policy.
The use of the term "Wayback Machine" in the context of the Internet Archive has become so common that "Wayback Machine" and "Internet Archive" are almost synonymous. This usage occurs in popular culture, e.g., in the television show Law and Order: Criminal Intent ("Legacy", first run Aug. 3, 2008), an extra playing a computer tech uses the "Wayback Machine" to find an archive of a student's Facebook style website. Snapshots usually take at least 6–18 months to be added.

Open Library

The Open Library is another project of the Internet Archive. The site seeks to include a web page database for every book ever published, a sort of open source version of WorldCat
WorldCat
WorldCat is a union catalog which itemizes the collections of 72,000 libraries in 170 countries and territories which participate in the Online Computer Library Center global cooperative...

. It holds 23 million catalog records of books, in addition to the full texts of about 1,600,000 public domain books, which are fully readable and downloadable. Open Library is a free
Free software
Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...

/open source software project, with its source code freely available on the Open Library site.

Archive-It

First deployed in early 2006, Archive-It is a subscription service that allows institutions and individuals to build and preserve collections of born digital content. Through a web application, Archive-It partners can harvest, catalog, manage, and within 24 hours browse their archived collections. Collections are hosted by the Internet Archive and available to the public with full-text search. Content collected through Archive-It is stored with a primary and back up copy, is periodically indexed into the Internet Archive's general archive, and a copy of the data can be sent to the partner institutions.

, Archive-It has 125 partner institutions in 42 US States and 11 countries who have captured over a 1.5 billion URL's for 963 public collections.

Archive-It partners are universities and college libraries, state archives, federal institutions, museums and cultural organizations, including the Electronic Literature Organization
Electronic Literature Organization
The Electronic Literature Organisation is a nonprofit organization "established in 1999 to promote and facilitate the writing, publishing, and reading of electronic literature." -History:...

, the State Archives of North Carolina
North Carolina
North Carolina is a state located in the southeastern United States. The state borders South Carolina and Georgia to the south, Tennessee to the west and Virginia to the north. North Carolina contains 100 counties. Its capital is Raleigh, and its largest city is Charlotte...

, the Texas State Library and Archives Commission
Texas State Library and Archives Commission
The Texas State Library and Archives Commission refers to the agency in the state of Texas that is charged with overseeing and assisting with state-wide library programs, meeting the reading-related needs of Texans with disabilities, and finally preserving and providing access to significant Texas...

, Stanford University
Stanford University
The Leland Stanford Junior University, commonly referred to as Stanford University or Stanford, is a private research university on an campus located near Palo Alto, California. It is situated in the northwestern Santa Clara Valley on the San Francisco Peninsula, approximately northwest of San...

, the National Library of Australia
National Library of Australia
The National Library of Australia is the largest reference library of Australia, responsible under the terms of the National Library Act for "maintaining and developing a national collection of library material, including a comprehensive collection of library material relating to Australia and the...

, the Research Libraries Group
Research Libraries Group
The Research Libraries Group was a U.S.-based library consortium which developed the Eureka interlibrary search engine, the RedLightGreen database of bibliographic descriptions and ArchiveGrid, a database containing descriptions of archival collections...

 (RLG), and many others.

nasaimages.org

NASA Images
NASA Images
NASA Images is a project of the Internet Archive and was created through a Space Act Agreement between the Internet Archive and NASA to bring public access to NASA's image, video, and audio collections in a single, searchable resource. The NASA Images team works closely with all of the NASA centers...

 was created through a Space Act Agreement between the Internet Archive and NASA
NASA
The National Aeronautics and Space Administration is the agency of the United States government that is responsible for the nation's civilian space program and for aeronautics and aerospace research...

 to bring public access to NASA's image, video, and audio collections in a single, searchable resource. The NASA Images team works closely with all of the NASA centers to keep adding to the ever-growing collection at nasaimages.org. The site launched in July 2008 and now has more than 100,000 items online.

Media collections

In addition to web archives, the Internet Archive maintains extensive collections of digital media that are attested by the uploader to be in the public domain
Public domain
Works are in the public domain if the intellectual property rights have expired, if the intellectual property rights are forfeited, or if they are not covered by intellectual property rights at all...

 in the United States or licensed under a license that allows redistribution, such as Creative Commons
Creative Commons
Creative Commons is a non-profit organization headquartered in Mountain View, California, United States devoted to expanding the range of creative works available for others to build upon legally and to share. The organization has released several copyright-licenses known as Creative Commons...

 licenses. The media are organized into collections by media type (moving images, audio, text, etc.), and into sub-collections by various criteria. Each of the main collections includes an "Open Source
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...

" sub-collection where general contributions by the public are stored.

Moving image collection

Aside from feature films, IA's Moving Image collection includes: newsreel
Newsreel
A newsreel was a form of short documentary film prevalent in the first half of the 20th century, regularly released in a public presentation place and containing filmed news stories and items of topical interest. It was a source of news, current affairs and entertainment for millions of moviegoers...

s; classic cartoon
Cartoon
A cartoon is a form of two-dimensional illustrated visual art. While the specific definition has changed over time, modern usage refers to a typically non-realistic or semi-realistic drawing or painting intended for satire, caricature, or humor, or to the artistic style of such works...

s; pro- and anti-war propaganda
Propaganda
Propaganda is a form of communication that is aimed at influencing the attitude of a community toward some cause or position so as to benefit oneself or one's group....

; Skip Elsheimer's "A.V. Geeks" collection; and ephemeral material from Prelinger Archives
Prelinger Archives
The Prelinger Archives is a collection of films relating to U.S. cultural history, the evolution of the American landscape, everyday life and social history...

, such as advertising
Advertising
Advertising is a form of communication used to persuade an audience to take some action with respect to products, ideas, or services. Most commonly, the desired result is to drive consumer behavior with respect to a commercial offering, although political and ideological advertising is also common...

, educational and industrial films and amateur and home movie collections.

IA's Brick Films
Brickfilm
A Brickfilm is a film made using LEGO, or other similar plastic construction toys resembling LEGO toys. They are usually created with stop motion animation, though CGI, traditional animation, and live action films featuring plastic construction toys are also usually considered brickfilms...

 collection contains stop-motion animation filmed with Lego
Lego
Lego is a line of construction toys manufactured by the Lego Group, a privately held company based in Billund, Denmark. The company's flagship product, Lego, consists of colorful interlocking plastic bricks and an accompanying array of gears, minifigures and various other parts...

 bricks, some of which are "remakes" of feature films. The Election 2004 collection is a non-partisan public resource for sharing video materials related to the 2004 United States Presidential Election. The Independent News collection includes sub-collections such as the Internet Archive's World At War competition from 2001, in which contestants created short films demonstrating "why access to history matters." Among their most-downloaded video files are eyewitness recordings of the devastating 2004 Indian Ocean earthquake
2004 Indian Ocean earthquake
The 2004 Indian Ocean earthquake was an undersea megathrust earthquake that occurred at 00:58:53 UTC on Sunday, December 26, 2004, with an epicentre off the west coast of Sumatra, Indonesia. The quake itself is known by the scientific community as the Sumatra-Andaman earthquake...

. The September 11th Television Archive contains archival footage from the world's major television networks of the terrorist attacks of September 11th, 2001 as they unfolded on live television.

Some of the films available on the Internet Archive are:

  • The 39 Steps
    The 39 Steps (1935 film)
    The 39 Steps is a British thriller film directed by Alfred Hitchcock, loosely based on the adventure novel The Thirty-nine Steps by John Buchan. The film stars Robert Donat and Madeleine Carroll....

    (1935)
  • Battleship Potemkin
    The Battleship Potemkin
    The Battleship Potemkin , sometimes rendered as The Battleship Potyomkin, is a 1925 silent film directed by Sergei Eisenstein and produced by Mosfilm...

  • The Birth of a Nation
    The Birth of a Nation
    The Birth of a Nation is a 1915 American silent film directed by D. W. Griffith and based on the novel and play The Clansman, both by Thomas Dixon, Jr. Griffith also co-wrote the screenplay , and co-produced the film . It was released on February 8, 1915...

  • Broken Blossoms
    Broken Blossoms
    Broken Blossoms or The Yellow Man and the Girl is a 1919 silent film directed by D.W. Griffith. It was distributed by United Artists and premiered on May 13, 1919...

  • The Century of the Self
    The Century of the Self
    The Century of the Self is an award winning British television documentary film. It focuses on how Sigmund Freud, Anna Freud, and Edward Bernays influenced the way corporations and governments have thought about,‭ dealt with, and controlled ‬people....

  • Charade (1963
    1963 in film
    The year 1963 in film involved some significant events.-Events:* June 12 - Cleopatra starring Elizabeth Taylor, Rex Harrison and Richard Burton premieres at the Rivoli Theatre in New York City....

    )
  • Columbia Revolt
    Columbia Revolt
    Columbia Revolt is a 50 minute, black-and-white documentary film about the Columbia University protests of 1968. The film was made that year by a collective of independent filmmakers called Newsreel and mostly shot by Melvin Margolis...

  • D.O.A.
    D.O.A. (1950 film)
    D.O.A. , a film noir drama film directed by Rudolph Maté, is considered a classic of the genre. The frantically paced plot revolves around a doomed man's quest to find out who has poisoned him – and why – before he dies.Leo C...

    (1950
    1950 in film
    The year 1950 in film involved some significant events.-Events:* February 15 - Walt Disney Studios' animated film Cinderella debuts.-Top grossing films : After theatrical re-issue- Awards :Academy Awards:*Ambush...

    )
  • Danger Lights
    Danger Lights
    Danger Lights is a 1930 film starring Louis Wolheim, Robert Armstrong, and Jean Arthur.The plot concerns railroading on the Chicago, Milwaukee, St. Paul and Pacific Railroad, and the movie was largely filmed along that railroad's lines in Montana...

  • Das Cabinet des Dr. Caligari
    The Cabinet of Dr. Caligari
    The Cabinet of Dr. Caligari is a 1920 silent horror film directed by Robert Wiene from a screenplay by Hans Janowitz and Carl Mayer. It is one of the most influential of German Expressionist films and is often considered one of the greatest horror movies of the silent era. This movie is cited as...

  • Dating Do's and Don'ts
    Dating Do's and Don'ts
    Dating Dos and Don'ts is a 1949 instructional film designed for American high schools, to teach adolescents basic dating skills, produced by Coronet Instructional Films and directed by Gilbert Altschul with the assistance of Reuben Hill, Research Professor of Family Life at the University of North...

  • Detour
    Detour (1945 film)
    Detour is a film noir thriller that stars Tom Neal, Ann Savage, Claudia Drake and Edmund MacDonald. The movie was adapted by Martin Goldsmith and Martin Mooney from Goldsmith's novel of the same name and was directed by Edgar G. Ulmer...

  • Duck and Cover
    Duck and Cover (film)
    Duck and Cover is a civil defense film produced in 1951 by the United States federal government's civil defense branch shortly after the Soviet Union began nuclear testing. Written by Raymond J...

  • Escape from Sobibor
    Escape from Sobibor
    Escape from Sobibor is a 1987 British made-for-TV film which aired on CBS. It deals with the extermination camp at Sobibor, the site of the most successful uprising by Jewish prisoners of German extermination camps...

  • Fire Over England
    Fire Over England
    Fire Over England is a 1937 London Film Productions film drama, notable for providing the first pairing of Laurence Olivier and Vivien Leigh. It was directed by William K. Howard and written by Clemence Dane from the novel Fire Over England by A. E. W. Mason. Leigh's performance in the movie...

  • The General
    The General (1927 film)
    The General is a 1926 American silent comedy film released by United Artists inspired by the Great Locomotive Chase, which happened in 1862. Buster Keaton starred in the film and co-directed it with Clyde Bruckman...

  • Greed
  • Hemp for Victory
    Hemp for Victory
    Hemp for Victory is a black-and-white United States government film made during World War II, explaining the uses of hemp, encouraging farmers to grow as much as possible.- History :...

  • Intolerance
    Intolerance (film)
    Intolerance is a 1916 American silent film directed by D. W. Griffith and is considered one of the great masterpieces of the Silent Era. The three-and-a-half hour epic intercuts four parallel storylines each separated by several centuries: A contemporary melodrama of crime and redemption; a...

  • The Kid
    The Kid (1921 film)
    The Kid is a 1921 American silent dramedy film written by, produced by, directed by and starring Charlie Chaplin, and features Jackie Coogan as his adopted son and sidekick. This was Chaplin's first full-length movie...

  • Le voyage dans la Lune
  • Lying Lips
    Lying Lips
    Lying Lips is a 1939, melodrama, race movie by Oscar Micheaux, starring Edna Mae Harris, and Robert Earl Jones .Lying Lips was the thirty-seventh film of Micheaux.-Plot:...

  • M
    M (1931 film)
    M is a 1931 German drama-thriller directed by Fritz Lang and written by Lang and his wife Thea von Harbou. It was Lang's first sound film, although he had directed more than a dozen films previously....

  • The Man Who Knew Too Much
    The Man Who Knew Too Much (1934 film)
    The Man Who Knew Too Much is a British suspense film directed by Alfred Hitchcock, featuring Peter Lorre, and released by Gaumont British. It was one of the most successful and critically acclaimed films of Hitchcock's British period....

  • Manos: The Hands of Fate
    Manos: The Hands of Fate
    Manos: The Hands of Fate is an American horror film written, directed, produced by, and starring Harold P. Warren. It is widely recognized to be one of the worst films ever made...

  • Manufacturing Consent: Noam Chomsky and the Media
    Manufacturing Consent: Noam Chomsky and the Media
    Manufacturing Consent: Noam Chomsky and the Media is a documentary film that explores the political life and ideas of Noam Chomsky, a linguist, intellectual, and political activist...

  • Night of the Living Dead
    Night of the Living Dead
    Night of the Living Dead is a 1968 American independent black-and-white zombie film and cult film directed by George A. Romero, starring Duane Jones, Judith O'Dea and Karl Hardman. It premiered on October 1, 1968, and was completed on a USD$114,000 budget. After decades of cinematic re-releases, it...

  • Nosferatu (not public domain outside of the United States)
  • Plan 9 from Outer Space
    Plan 9 from Outer Space
    Plan 9 from Outer Space is a 1959 science fiction film written and directed by Edward D. Wood, Jr. The film features Gregory Walcott, Mona McKinnon, Tor Johnson and Maila "Vampira" Nurmi...

  • The Power of Nightmares
    The Power of Nightmares
    The Power of Nightmares, subtitled The Rise of the Politics of Fear, is a BBC documentary film series, written and produced by Adam Curtis. Its three one-hour parts consist mostly of a montage of archive footage with Curtis's narration...

    (not public domain)
  • Princess Iron Fan
    Princess Iron Fan (1941 film)
    Princess Iron Fan , is the first Chinese animated feature film. It was directed in Shanghai under difficult conditions in the thick of World War II by Wan Guchan and Wan Laiming and was released on January 1, 1941.-Plot:...

    (1941)
  • Reefer Madness
    Reefer Madness
    Reefer Madness is a well-known 1936 American propaganda exploitation film revolving around the melodramatic events that ensue when high school students are lured by pushers to try "marijuana" — from a hit and run accident, to manslaughter, suicide, attempted rape, and descent into madness...

  • Sex Madness
    Sex Madness
    Sex Madness is an exploitation film directed by Dwain Esper, along the lines of Reefer Madness, supposedly to warn teenagers and young adults of the dangers of venereal diseases, specifically syphilis....

  • She Done Him Wrong
    She Done Him Wrong
    She Done Him Wrong is a Pre-Code 1933 Paramount Pictures comedy romance film starring Mae West and Cary Grant. Others in the cast include Owen Moore, Gilbert Roland, Noah Beery, Sr., Louise Beavers and Rochelle Hudson....

    (1933)
  • Triumph of the Will
    Triumph of the Will
    Triumph of the Will is a propaganda film made by Leni Riefenstahl. It chronicles the 1934 Nazi Party Congress in Nuremberg, which was attended by more than 700,000 Nazi supporters. The film contains excerpts from speeches given by various Nazi leaders at the Congress, including portions of...

  • All seven episodes of Why We Fight
    Why We Fight
    Why We Fight is a series of seven war information training films commissioned by the United States government during World War II whose purpose was to show American soldiers the reason for U.S. involvement in the war. Later on they were also shown to the general U.S...


See also [ Wikipedia list of films freely available on the Internet Archive].

Audio collection

The audio collection includes music
Music
Music is an art form whose medium is sound and silence. Its common elements are pitch , rhythm , dynamics, and the sonic qualities of timbre and texture...

, audio books, news broadcasts, old time radio shows and a wide variety of other audio files.

The Live Music Archive sub-collection includes over 50,000 concert recordings from independent artists
Musician
A musician is an artist who plays a musical instrument. It may or may not be the person's profession. Musicians can be classified by their roles in performing music and writing music.Also....* A person who makes music a profession....

, as well as more established artists and musical ensembles with permissive rules about recording their concerts such as the Grateful Dead
Grateful Dead
The Grateful Dead was an American rock band formed in 1965 in the San Francisco Bay Area. The band was known for its unique and eclectic style, which fused elements of rock, folk, bluegrass, blues, reggae, country, improvisational jazz, psychedelia, and space rock, and for live performances of long...

, and more recently, The Smashing Pumpkins
The Smashing Pumpkins
The Smashing Pumpkins are an American alternative rock band that formed in Chicago, Illinois in 1988. Formed by Billy Corgan frontman and James Iha , the band has included Jimmy Chamberlin , D'arcy Wretzky , and currently includes Jeff Schroeder Mike Byrne , and Nicole Fiorentino The Smashing...

. Jordan Zevon
Jordan Zevon
Jordan Zevon is an American singer, musician and songwriter. He is the son of rock musician Warren Zevon. Following his father's death in 2003, Jordan, his half-sister, Ariel, and longtime Zevon collaborator Jorge Calderón accepted Warren's two posthumous Grammy Awards for Best Rock Vocal...

 has also allowed anyone to share concert recordings of his father Warren Zevon
Warren Zevon
Warren William Zevon was an American rock singer-songwriter and musician noted for including his sometimes sardonic opinions of life in his musical lyrics, composing songs that were sometimes humorous and often had political or historical themes.Zevon's work has often been praised by well-known...

 on the Internet Archive.

Text collection

The texts collection includes digitized books from various libraries around the world as well as many special collections. The Internet Archive operates 23 scanning centers
Book scanning
Book scanning is the process of converting physical books and magazines into digital media such as images, electronic text, or electronic books by using an image scanner....

 in five countries, digitizing about 1,000 books a day, financially supported by libraries and foundations. , when there were about 1 million texts, the entire collection was over 0.5 petabytes, which includes raw camera images, cropped and skewed images, PDFs, and raw OCR data.

Between about 2006 and 2008 Microsoft Corporation had a special relationship with Internet Archive texts through its Live Search Books
Live Search Books
Live Search Books was a search service for books launched in December 2006, part of Microsoft's Live Search range of services. Microsoft was working with a number of libraries, including the British Library, to digitize books and make them searchable, and in the case of out-of-copyright books,...

 project, scanning over 300,000 books which were contributed to the collection, as well as financial support and scanning equipment. On May 23, 2008 Microsoft announced it would be ending the Live Book Search project and no longer scanning books. Microsoft made its scanned books available without contractual restriction and donated its scanning equipment to its former partners.

Around October 2007 Archive users began uploading public domain books from Google Book Search
Google Book Search
Google Books is a service from Google that searches the full text of books that Google has scanned, converted to text using optical character recognition, and stored in its digital database. The service was formerly known as Google Print when it was introduced at the Frankfurt Book Fair in October...

. As of May 2011 there were over 900,000 Google-digitized books in the Archive's collection, out of a total of 2.8 million books. The books are identical to the copies found on Google, except without the Google watermarks, and are available for unrestricted use and download, like all Internet Archive materials.

Physical media

Voicing a strong reaction to the idea of books simply being thrown away, and inspired by the Svalbard Global Seed Vault
Svalbard Global Seed Vault
The Svalbard Global Seed Vault is a secure seedbank located on the Norwegian island of Spitsbergen near the town of Longyearbyen in the remote Arctic Svalbard archipelago, about from the North Pole. The facility preserves a wide variety of plant seeds in an underground cavern. The seeds are...

, Kahle now envisions collecting one copy of every book ever published. "We're not going to get there, but that's our goal," he said. Alongside the books, Kahle plans to store the Internet Archive's old servers, which were replaced late last year.

Hate Speech Server for Al-Qaeda

On August 17, 2011, Middle East Media Research Institute (MEMRI.org)
Middle East Media Research Institute
The Middle East Media Research Institute is a Middle Eastern not for profit press monitoring organization with headquarters located in Washington, DC. MEMRI was co-founded in 1998 by Yigal Carmon, a former colonel in the Israeli military intelligence and Meyrav Wurmser, an Israeli-born, American...

 published "Al-Qaeda, Jihadis Infest the San Francisco, California-Based 'Internet Archive' Library", which detailed how members can post anonymously and enjoy free uncensored hosting.

National Security Letter

On May 8, 2008, it was revealed that the Internet Archive successfully challenged an FBI National Security Letter
National Security Letter
A National Security Letter is a form of administrative subpoena used by the United States Federal Bureau of Investigation and reportedly by other U.S. Government Agencies including the Central Intelligence Agency and the Department of Defense. They require no probable cause or judicial oversight...

 asking for logs on an undisclosed user.

Scientology

In late 2002, the Internet Archive removed various sites critical of Scientology
Scientology
Scientology is a body of beliefs and related practices created by science fiction and fantasy author L. Ron Hubbard , starting in 1952, as a successor to his earlier self-help system, Dianetics...

 from the Wayback Machine. The error message stated that this was in response to a "request by the site owner." It was later clarified that lawyers from the Church of Scientology
Church of Scientology
The Church of Scientology is an organization devoted to the practice and the promotion of the Scientology belief system. The Church of Scientology International is the Church of Scientology's parent organization, and is responsible for the overall ecclesiastical management, dissemination and...

 had demanded the removal and that the actual site owners did not want their material removed.

Healthcare Advocates, Inc.

In 2003, Harding Earley Follmer & Frailey defended a client from a trademark dispute using the Archive's Wayback Machine
Wayback Machine
The Wayback Machine is a digital time capsule created by the Internet Archive non-profit organization, based in San Francisco, California. It is maintained with content from Alexa Internet. The service enables users to see archived versions of web pages across time, which the Archive calls a "three...

. The lawyers were able to show that the plaintiff's claims were invalid based on the content of their web site from several years prior. The plaintiff, Healthcare Advocates, then amended their complaint to include the Internet Archive, accusing the organization of copyright infringement as well as violations of the DMCA and the Computer Fraud and Abuse Act
Computer Fraud and Abuse Act
The Computer Fraud and Abuse Act is a law passed by the United States Congress in 1986, intended to reduce cracking of computer systems and to address federal computer-related offenses...

. Healthcare Advocates claimed that, since they had installed a robots.txt file on their website, even if after the initial lawsuit was filed, the Archive should have removed all previous copies of the plaintiff website from the Wayback Machine. The lawsuit was settled out of court.

Robots.txt is used as part of the Robots Exclusion Standard
Robots Exclusion Standard
The Robot Exclusion Standard, also known as the Robots Exclusion Protocol or robots.txt protocol, is a convention to prevent cooperating web crawlers and other web robots from accessing all or part of a website which is otherwise publicly viewable. Robots are often used by search engines to...

, a voluntary protocol the Internet Archive respects that disallows bots from indexing certain pages delineated by the creator as off-limits. As a result, the Internet Archive has rendered unavailable a number of websites that are now inaccessible through the Wayback Machine. Currently, the Internet Archive applies robots.txt rules retroactively; if a site blocks the Internet Archive, like Healthcare Advocates, any previously archived pages from the domain are also rendered unavailable. In cases of blocked sites, only the robots.txt file is archived.

However, the Internet Archive also states, "Sometimes a web site owner will contact us directly and ask us to stop crawling or archiving a site. We comply with these requests." In addition, the website says: "The Internet Archive is not interested in preserving or offering access to Web sites or other Internet documents of persons who do not want their materials in the collection."

Suzanne Shell

On December 12, 2005, activist Suzanne Shell
Suzanne Shell
Donna Suzanne Shell is an American activist critical of child protective services.Shell grew up in Minnesota. Her first experience with child protective services occurred in 1974, when at age 17 she was punched in the face by her father. That year, she was put in a foster home and gave birth to a...

 demanded Internet Archive pay her US$100,000 for archiving her website profane-justice.org between 1999 and 2004. Internet Archive filed a declaratory judgment
Declaratory judgment
A declaratory judgment is a judgment of a court in a civil case which declares the rights, duties, or obligations of one or more parties in a dispute. A declaratory judgment is legally binding, but it does not order any action by a party. In this way, the declaratory judgment is like an action to...

 action in the United States District Court for the Northern District of California
United States District Court for the Northern District of California
The United States District Court for the Northern District of California is the federal United States district court whose jurisdiction comprises following counties of California: Alameda, Contra Costa, Del Norte, Humboldt, Lake, Marin, Mendocino, Monterey, Napa, San Benito, San Francisco, San...

 on January 20, 2006, seeking a judicial determination that Internet Archive did not violate Shell’s copyright
Copyright
Copyright is a legal concept, enacted by most governments, giving the creator of an original work exclusive rights to it, usually for a limited time...

. Shell responded and brought a countersuit against Internet Archive for archiving her site, which she alleges is in violation of her terms of service
Terms of service
Terms of service are rules which one must agree to abide by in order to use a service. Unless in violation of consumer protection laws, such terms are usually legally binding...

. On February 13, 2007, a judge for the United States District Court for the District of Colorado
United States District Court for the District of Colorado
The United States District Court for the District of Colorado is the Federal district court whose jurisdiction is the state of Colorado. The United States Congress organized Colorado as a single judicial district on June 26, 1876, by 19 Stat. 61...

 dismissed all counterclaims except breach of contract
Breach of contract
Breach of contract is a legal cause of action in which a binding agreement or bargained-for exchange is not honored by one or more of the parties to the contract by non-performance or interference with the other party's performance....

. The Internet Archive did not move to dismiss copyright infringement
Copyright infringement
Copyright infringement is the unauthorized or prohibited use of works under copyright, infringing the copyright holder's exclusive rights, such as the right to reproduce or perform the copyrighted work, or to make derivative works.- "Piracy" :...

 claims Shell asserted arising out of its copying activities, which will also go forward.

On April 25, 2007, Internet Archive and Suzanne Shell jointly announced the settlement of their lawsuit. The Internet Archive said, “Internet Archive has no interest in including materials in the Wayback Machine of persons who do not wish to have their Web content archived. We recognize that Ms. Shell has a valid and enforceable copyright in her Web site and we regret that the inclusion of her Web site in the Wayback Machine resulted in this litigation. We are happy to have this case behind us.” Shell said, “I respect the historical value of Internet Archive’s goal. I never intended to interfere with that goal nor cause it any harm.”

Grateful Dead

In November 2005, free downloads of Grateful Dead
Grateful Dead
The Grateful Dead was an American rock band formed in 1965 in the San Francisco Bay Area. The band was known for its unique and eclectic style, which fused elements of rock, folk, bluegrass, blues, reggae, country, improvisational jazz, psychedelia, and space rock, and for live performances of long...

 concerts were removed from the site. John Perry Barlow
John Perry Barlow
John Perry Barlow is an American poet and essayist, a retired Wyoming cattle rancher, and a cyberlibertarian political activist who has been associated with both the Democratic and Republican parties. He is also a former lyricist for the Grateful Dead and a founding member of the Electronic...

 identified Bob Weir
Bob Weir
Bob Weir is an American singer, songwriter, and guitarist, most recognized as a founding member of the Grateful Dead. After the Grateful Dead disbanded in 1995, Weir performed with The Other Ones, later known as The Dead, together with other former members of the Grateful Dead...

, Mickey Hart
Mickey Hart
Mickey Hart is an American percussionist and musicologist. He is best known as one of the two drummers of the rock band the Grateful Dead. He was a member of the Grateful Dead from September 1967 to February 1971, and from October 1974 to August 1995...

, and Bill Kreutzmann
Bill Kreutzmann
Bill Kreutzmann is an American drummer who played with the rock band the Grateful Dead for their entire thirty-year career...

 as the instigators of the change, according to a New York Times article. Phil Lesh
Phil Lesh
Phillip Chapman Lesh is a musician and a founding member of the Grateful Dead, with whom he played bass guitar throughout their 30-year career....

 commented on the change in a November 30, 2005, posting to his personal website:
It was brought to my attention that all of the Grateful Dead shows were taken down from Archive.org right before Thanksgiving. I was not part of this decision making process and was not notified that the shows were to be pulled. I do feel that the music is the Grateful Dead's legacy and I hope that one way or another all of it is available for those who want it.

A November 30 forum post from Brewster Kahle
Brewster Kahle
Brewster Kahle is a computer engineer, internet entrepreneur, activist, and digital librarian.- Biography :Kahle graduated from the Massachusetts Institute of Technology in 1982 with a Bachelor of Science in computer science and engineering, where he was a member of the Chi Phi Fraternity. The...

 summarized what appeared to be the compromise reached among the band members. Audience recordings could be downloaded or streamed
Streaming media
Streaming media is multimedia that is constantly received by and presented to an end-user while being delivered by a streaming provider.The term "presented" is used in this article in a general sense that includes audio or video playback. The name refers to the delivery method of the medium rather...

, but soundboard
Mixing console
In professional audio, a mixing console, or audio mixer, also called a sound board, mixing desk, or mixer is an electronic device for combining , routing, and changing the level, timbre and/or dynamics of audio signals. A mixer can mix analog or digital signals, depending on the type of mixer...

 recordings were to be available for streaming only. Concerts have since been re-added.

Opposition to Google Books Settlement

The Internet Archive is a member of the Open Book Alliance
Open Book Alliance
The Open Book Alliance is an organisation concerned about the mass digitization of books and opposed to the Google Book Settlement, which they believe could allow Google, the Association of American Publishers and the Authors’ Guild collectively "to monopolize the access, distribution and pricing...

, which has been among the most outspoken critics of the Google Book Settlement. The Archive advocates an alternative digital library project.

See also

Similar projects

  • Internet Memory Foundation
  • Library of Congress Digital Library project
    Library of Congress Digital Library project
    The Library of Congress National Digital Library Program is assembling a digital library of reproductions of primary source materials to support the study of the history and culture of the United States...

  • National Digital Information Infrastructure and Preservation Program
    National Digital Information Infrastructure and Preservation Program
    The National Digital Information Infrastructure and Preservation Program is an archival program led by the Library of Congress to archive and provide access to digital resources. The U.S. Congress established the program in 2000...

  • Ourmedia
    Ourmedia
    Ourmedia is a media archive, supported by the Internet Archive, which freely hosts any non-pornographic images, text, and video or audio clips, where this would not violate copyright laws. The website, which launched on March 21, 2005, was founded by Marc Canter and J.D. Lasica.The media archive,...

     - Internet Archive project that freely hosts public image, text, audio, and video submissions
  • Project Gutenberg
    Project Gutenberg
    Project Gutenberg is a volunteer effort to digitize and archive cultural works, to "encourage the creation and distribution of eBooks". Founded in 1971 by Michael S. Hart, it is the oldest digital library. Most of the items in its collection are the full texts of public domain books...

  • UK Government National Web Archive
  • UK Web Archive provided by the British Library
  • WebCite
    WebCite
    WebCite is a service that archives web pages on demand. Authors can subsequently cite the archived web pages through WebCite, in addition to citing the original URL of the web page. Readers are able to retrieve the archived web pages indefinitely, without regard to whether the original web page is...


Other

  • Digital preservation
    Digital preservation
    Digital preservation is the set of processes, activities and management of digital information over time to ensure its long term accessibility. The goal of digital preservation is to preserve materials resulting from digital reformatting, and particularly information that is born-digital with no...

  • Heritrix
    Heritrix
    Heritrix is the Internet Archive’s web crawler, which was specially designed for web archiving. It is open-source and written in Java. The main interface is accessible using a web browser, and there is a command-line tool that can optionally be used to initiate crawls.Heritrix was developed...

  • Link rot
    Link rot
    Link rot , also known as link death or link breaking is an informal term for the process by which, either on individual websites or the Internet in general, increasing numbers of links point to web pages, servers or other resources that have become permanently unavailable...

  • Memory hole
    Memory hole
    A memory hole is any mechanism for the alteration or disappearance of inconvenient or embarrassing documents, photographs, transcripts, or other records, such as from a web site or other archive, particularly as part of an attempt to give the impression that something never happened...

  • PetaBox
    PetaBox
    PetaBox is a storage unit from Capricorn Technologies. It was designed by the staff of the Internet Archive and C. R. Saikley to store and process one petabyte of information.-Goals:...

  • Web archiving
    Web archiving
    Web archiving is the process of collecting portions of the World Wide Web and ensuring the collection is preserved in an archive, such as an archive site, for future researchers, historians, and the public. Due to the massive size of the Web, web archivists typically employ web crawlers for...

  • Web crawler
    Web crawler
    A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. Other terms for Web crawlers are ants, automatic indexers, bots, Web spiders, Web robots, or—especially in the FOAF community—Web scutters.This process is called Web...


Further reading


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK