All Topics  
Archive site

 

   Email Print
   Bookmark   Link






 

Archive site



 
 
In web archiving
Web archiving

Web archiving is the process of collecting portions of the World Wide Web and ensuring the collection is digital preservation in an archive, such as an archive site, for future researchers, historians, and the public....
, an archive site is a website
Website

A Web site is a collection of related Web pages, images, videos or other digital assets that are hosted on one Web server, usually accessible via the Internet....
 that stores information on, or the actual, webpages from the past for anyone to view.

common techniques are #1 using a web crawler
Web crawler

A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner. Other terms for Web crawlers are ants, automatic indexers, bots, and worms or Web spider, Web robot, or?especially in the FOAF community?Web scutter....
 or #2 user submissions.

  1. By using a web crawler the service will not depend on an active community for their content, thereby building a larger database faster, which usually results in the community growing larger as well.






    Discussion
    Ask a question about 'Archive site'
    Start a new discussion about 'Archive site'
    Answer questions from other users
    Full Discussion Forum



    Encyclopedia


    In web archiving
    Web archiving

    Web archiving is the process of collecting portions of the World Wide Web and ensuring the collection is digital preservation in an archive, such as an archive site, for future researchers, historians, and the public....
    , an archive site is a website
    Website

    A Web site is a collection of related Web pages, images, videos or other digital assets that are hosted on one Web server, usually accessible via the Internet....
     that stores information on, or the actual, webpages from the past for anyone to view.

    Common techniques

    Two common techniques are #1 using a web crawler
    Web crawler

    A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner. Other terms for Web crawlers are ants, automatic indexers, bots, and worms or Web spider, Web robot, or?especially in the FOAF community?Web scutter....
     or #2 user submissions.

    1. By using a web crawler the service will not depend on an active community for their content, thereby building a larger database faster, which usually results in the community growing larger as well. However, web site developers and system administrators do have the ability to block these robots from accessing [certain] web pages (using a robots.txt
      Robots Exclusion Standard

      The robot exclusion standard, also known as the Robots Exclusion Protocol or robots.txt protocol, is a convention to prevent cooperating web spiders and other web robots from accessing all or part of a website which is otherwise publicly viewable....
      ).
    2. While it can be difficult to start such services due to potentially low rates of user submission, this system can yield some of the best results. By crawling web pages one is only able to obtain the information the public has bothered to post to the Internet. They may have not bothered to post it due to not thinking anyone would be interested in it, lack of a proper medium, etc. However, if they see someone wants their information then they may be more apt to submit it.


    Examples


    Google Groups

    On February 12 2001 Google
    Google

    Google Inc. is an United States public company, earning revenue from AdWords related to its Google search, Gmail, Google Maps, Google Apps, Orkut, and YouTube services as well as selling advertising-free versions of the Google Search Appliance....
     acquired the Usenet
    Usenet

    Usenet, a portmanteau of "user" and "network", is a worldwide distributed Internet discussion system. It evolved from the general purpose UUCP architecture of the same name....
     discussion group archives from Deja.com and turned it into their Google Groups service . They allow users to search old discussions with Google's search technology, while still allowing users to post to the mailing list
    Mailing list

    A mailing list is a collection of names and addresses used by an individual or an organization to send material to multiple recipients. The term is often extended to include the people subscribed to such a list, so the group of subscribers is referred to as "the mailing list", or simply "the list"....
    s.

    Internet Archive

    The Internet Archive
    Internet Archive

    The Internet Archive is a nonprofit organization dedicated to building and maintaining a free and openly accessible online digital library, including an archive site of the World Wide Web....
     () is building a compedium of websites and digital media
    Digital media

    Digital media usually refers to electronic media that work on digital codes. Today, computing is primarily based on the binary numeral system....
    . Starting in 1996, Archive has been employing a web crawler to build up their database. They are one of the best known archive sites.

    TextFiles.com

    is a large library of old text files sustained by Jason Scott Sadofsky
    Jason Scott Sadofsky

    Jason Scott Sadofsky , more commonly known as Jason Scott, is an United States weblogger who is the creator, owner and system administrator of textfiles.com, a web site which archives files from historic bulletin board systems....
    . Its mission is to archive the old documents that had floated around the bulletin board systems (BBS) of his youth and to document other people's experiences on the BBSes.

    PANDORA Archive

    PANDORA (Pandora Archive
    Pandora Archive

    PANDORA - Australia's Web Archive is the national Web archiving for the preservation of Australia's online publications. It was established by the National Library of Australia in 1996, and is now built in collaboration with a number of other Australian state libraries and cultural collecting organisations, including the Australian Institute...
    ), founded in 1996 by the National Library of Australia
    Australia

    Australia, officially the Commonwealth of Australia, is a country in the southern hemisphere comprising the Australia of the world's smallest continent, the major island of Tasmania, and numerous list of islands of Australia in the Indian Ocean and Pacific Oceans....
    , stands for Preserving and Accessing Networked Documentary Resources of Australia, which encapsolates their mission. They provide a long-term catalog of select online publications and web sites authored by Australians or that are of an Australian topic. They employ their PANDAS (PANDORA Digital Archiving System) when building their catalog.

    See also

    • Internet Archive
      Internet Archive

      The Internet Archive is a nonprofit organization dedicated to building and maintaining a free and openly accessible online digital library, including an archive site of the World Wide Web....
    • Pandora Archive
      Pandora Archive

      PANDORA - Australia's Web Archive is the national Web archiving for the preservation of Australia's online publications. It was established by the National Library of Australia in 1996, and is now built in collaboration with a number of other Australian state libraries and cultural collecting organisations, including the Australian Institute...