Persistent Uniform Resource Locator
Encyclopedia
A persistent uniform resource locator (PURL) is a Uniform Resource Locator
Uniform Resource Locator
In computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....

 (URL) (i.e. location-based Uniform Resource Identifier
Uniform Resource Identifier
In computing, a uniform resource identifier is a string of characters used to identify a name or a resource on the Internet. Such identification enables interaction with representations of the resource over a network using specific protocols...

 or URI) that is used to redirect to the location of the requested Web resource
Resource (Web)
The concept of resource is primitive in the Web architecture, and is used in the definition of its fundamental elements. The term was first introduced to refer to targets of Uniform Resource Locators , but its definition has been further extended to include the referent of any Uniform Resource...

. PURLs redirect HTTP clients using HTTP status codes. PURLs are used to curate the URL resolution process, thus solving the problem of transitory URIs in location-based URI scheme
URI scheme
In the field of computer networking, a URI scheme is the top level of the Uniform Resource Identifier naming structure. All URIs and absolute URI references are formed with a scheme name, followed by a colon character , and the remainder of the URI called the scheme-specific part...

s like HTTP. Technically the string resolution on PURL is like SEF URL resolution.

History

The PURL concept was developed at OCLC
OCLC
OCLC Online Computer Library Center, Inc. is "a nonprofit, membership, computer library service and research organization dedicated to the public purposes of furthering access to the world’s information and reducing information costs"...

 in 1995 and implemented using a forked pre-1.0 release of Apache HTTP Server
Apache HTTP Server
The Apache HTTP Server, commonly referred to as Apache , is web server software notable for playing a key role in the initial growth of the World Wide Web. In 2009 it became the first web server software to surpass the 100 million website milestone...

. The software was modernized and extended in 2007 by Zepheira under contract to OCLC and the official website moved to http://purlz.org (the 'Z' came from the Zepheira name and was used to differentiate the PURL Open-source software
Open-source software
Open-source software is computer software that is available in source code form: the source code and certain other rights normally reserved for copyright holders are provided under a software license that permits users to study, change, improve and at times also to distribute the software.Open...

 site from the PURL resolver operated by OCLC).

PURL version numbers may be considered confusing. OCLC released versions 1 and 2 of the Apache-based source tree, initially in 1999 under the OCLC Research Public License 1.0 License and later under the OCLC Research Public License 2.0 License (http://opensource.org/licenses/oclc2). Zepheira released PURLz 1.0 in 2007 under the Apache License, Version 2.0.

The newest implementation of PURLs, PURLz 2.0, is currently in Beta testing and is also released under the Apache License, Version 2.0.

The oldest PURL HTTP resolver has been operated by OCLC
OCLC
OCLC Online Computer Library Center, Inc. is "a nonprofit, membership, computer library service and research organization dedicated to the public purposes of furthering access to the world’s information and reducing information costs"...

 since 1995 and can be reached as purl.oclc.org as well as purl.org, purl.net, and purl.com.

Other notable PURL resolvers include the US Government Printing Office (http://purl.fdlp.gov), which is operated for the Federal Depository Library Program and has been in operation since 1997.

Current versions of the PURL software and production instances are supported by 3 Round Stones.

Principles of operation

The PURL concept allows for generalized URL curation of HTTP URIs on the World Wide Web
World Wide Web
The World Wide Web is a system of interlinked hypertext documents accessed via the Internet...

. PURLs allow third party control over both URL resolution and resource metadata provision.

A URL is simply an address of a resource on the World Wide Web. A Persistent URL is an address on the World Wide Web that causes a redirection to another Web resource. If a Web resource changes location (and hence URL), a PURL pointing to it can be updated. A user of a PURL always uses the same Web address, even though the resource in question may have moved. PURLs may be used by publishers to manage their own information space or by Web users to manage theirs; a PURL service is independent of the publisher of information. PURL services thus allow the management of hyperlink integrity. Hyperlink integrity is a design trade-off of the World Wide Web, but may be partially restored by allowing resource users or third parties to influence where and how a URL resolves.

A simple PURL works by responding to an HTTP GET request by returning a response of type 302 (equivalent to the HTTP status code 302, meaning "Found"). The response contains an HTTP "Location" header, the value of which is a URL that the client should subsequently retrieve via a new HTTP GET request.

PURLs implement one form of persistent identifier for virtual resources. Other persistent identifier schemes include Digital Object Identifiers
Digital object identifier
A digital object identifier is a character string used to uniquely identify an object such as an electronic document. Metadata about the object is stored in association with the DOI name and this metadata may include a location, such as a URL, where the object can be found...

 (DOIs), Life Sciences Identifiers
LSID
Life Science Identifiers are a way to name and locate pieces of information on the web. Essentially, an LSID is a unique identifier for some data, and the LSID protocol specifies a standard way to locate the data...

 (LSIDs) and INFO URIs
Info:
In computer science, info: is a Uniform Resource Identifier scheme for information assets with identifiers in public namespaces that allows legacy namespaces such as Library of Congress Identifiers and Digital object identifiers to be represented as URIs...

. All persistent identification schemes provide unique identifiers for (possibly changing) virtual resources, but not all schemes provide curation opportunities. Curation of virtual resources has been defined as, "the active involvement of information professionals in the management, including the preservation, of digital data for future use."

PURLs have been criticized for their need to resolve a URL, thus tying a PURL to a network location. Network locations have several vulnerabilities, such as Domain Name System registrations and host dependencies. A failure to resolve a PURL could lead to an ambiguous state: It would not be clear whether the PURL failed to resolve because a network failure prevented it or because it did not exist.

PURLs are themselves valid URLs, so their components must map to the URL specification. The scheme part tells a computer program, such as a Web browser, which protocol to use when resolving the address. The scheme used for PURLs is generally HTTP. The host part tells which PURL server to connect to. The next part, the PURL domain, is analogous to a resource path in a URL. The domain is a hierarchical information space that separates PURLs and allows for PURLs to have different maintainers. One or more designated maintainers may administer each PURL domain. Finally, the PURL name is the name of the PURL itself. The domain and name together constitute the PURL's "id".

Types of PURLs

The most common types of PURLs are named to coincide with the HTTP response code that they return. Not all HTTP response codes have equivalent PURL types. Some HTTP response codes (e.g. 401, Unauthorized) have clear meanings in the context of an HTTP conversation but do not apply to the process of HTTP redirection. Three additional types of PURLs (“chain”, “partial” and “clone”) are given mnemonic names related to their functions.
PURL Types
Type PURL Meaning HTTP Meaning
301 Moved permanently to a target URL Moved permanently
302 Simple redirection to a target URL Found
Chain Redirect to another PURL within the same server Found
Partial Redirect to a target URL with trailing path information appended Found
303 See other URL See Other
307 Temporary redirect to a target URL Temporary Redirect
404 Temporarily gone Not Found
410 Permanently gone Gone
Clone Copy the attributes of an existing PURL N/A


Most PURLs are so-called "simple PURLs", which provide a redirection to the desired resource. The HTTP status code, and hence of the PURL type, of a simple PURL is 302. The intent of a 302 PURL is to inform the Web client and end user that the PURL should always be used to address the requested resource, not the final URI resolved. This is to allow continued resolution of the resource if the PURL changes. Some operators prefer to use PURLs of type 301 (indicating that the final URI should be addressed in future requests).

A PURL of type "chain" allows a PURL to redirect to another PURL in a manner identical to a 301 or 302 redirection, with the difference that a PURL server will handle the redirection internally for greater efficiency. This efficiency is useful when many redirections are possible; since some Web browsers will stop following redirections once a set limit is encountered (in an attempt to avoid loops).

A PURL of type "303" is used to direct a Web client to a resource that provides additional information regarding the resource they requested, without returning the resource itself. This subtlety is useful when the HTTP URI requested is used as an identifier for a physical or conceptual object that cannot be represented as an information resource. PURLs of type 303 are used most often to redirect to metadata in a serialization format of the Resource Description Framework
Resource Description Framework
The Resource Description Framework is a family of World Wide Web Consortium specifications originally designed as a metadata data model...

 and have relevance for Semantic Web
Semantic Web
The Semantic Web is a collaborative movement led by the World Wide Web Consortium that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of...

 and Linked Data
Linked Data
In computing, linked data describes a method of publishing structured data so that it can be interlinked and become more useful. It builds upon standard Web technologies such as HTTP and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a...

 content. This use of the 303 HTTP status code is conforment with the http-range-14 finding of the Technical Architecture Group of the World Wide Web Consortium
World Wide Web Consortium
The World Wide Web Consortium is the main international standards organization for the World Wide Web .Founded and headed by Tim Berners-Lee, the consortium is made up of member organizations which maintain full-time staff for the purpose of working together in the development of standards for the...

.

A PURL of type "307" informs a user that the resource temporarily resides at a different URL from the norm. PURLs of types 404 and 410 note that the requested resource could not be found and suggests some information for why that was so. Support for the HTTP 307 (Temporary Redirect), 404 (Not Found) and 410 (Gone) response codes are provided for completeness.

PURLs of types "404" and "410" are provided to assist administrators in marking PURLs that require repair. PURLs of these types allow for more efficient indications of resource identification failure when target resources have moved and a suitable replacement has not been identified.

PURLs of type "clone" are used solely during PURL administration as a convenient method of copying an existing PURL record into a new PURL.

Redirection of URL Fragments

The PURL service includes a concept known as partial redirection. If a request does not match a PURL exactly, the requested URL is checked to determine if some contiguous front portion of the PURL string matches a registered PURL. If so, a redirection occurs with the remainder of the requested URL appended to the target URL. For example, consider a PURL with a URL of http//purl.org/some/path/ with a target URL of http://example.com/another/path/. An attempt to perform an HTTP GET operation on the URL http//purl.org/some/path/and/some/more/data would result in a partial redirection to http://example.com/another/path/and/some/more/data. The concept of partial redirection allows hierarchies of Web-based resources to be addressed via PURLs without each resource requiring its own PURL. One PURL is sufficient to serve as a top-level node for a hierarchy on a single target server. The new PURL service uses the type "partial" to denote a PURL that performs partial redirection.

Partial redirections at the level of a URL path do not violate common interpretations of the HTTP 1.1 specification. However, the handling of URL fragments across redirections has not been standardized and a consensus has not yet emerged. Fragment identifiers indicate a pointer to more specific information within a resource and are designated as following a # separator in URIs.

Partial redirection in the presence of a fragment identifier is problematic because two conflicting interpretations are possible. If a fragment is attached to a PURL of type "partial", should a PURL service assume that the fragment has meaning on the target URL or should it discard it in the presumption that a resource with a changed location may have also changed content, thus invalidating fragments defined earlier? Bos suggested that fragments should be retained and passed through to target URLs during HTTP redirections resulting in 300 (Multiple Choice), 301 (Moved Permanently), 302 (Found) or 303 (See Other) responses unless a designated target URL already includes a fragment identifier. If a fragment identifier is already present in a target URL, any fragment in the original URL should be abandoned. Unfortunately, Bos’ suggestion failed to navigate the IETF standards track and expired without further work. Dubost et al. resurrected Bos’ suggestions in a W3C Note (not a standard, but guidance in the absence of a standard). Makers of Web clients such as browsers have "generally" [ibid.] failed to follow Bos’ guidance.

Starting with PURLz 1.0 series, the PURL service implements partial redirections inclusive of fragment identifiers by writing fragments onto target URLs in an attempt to comply with and avoid problematic and inconsistent behavior by browser vendors.

Future Directions

PURLz 2.0, currently Beta testing, makes major changes to the PURL concept and implementation. Most notably in terms of implementation, the software runs on a Semantic Web
Semantic Web
The Semantic Web is a collaborative movement led by the World Wide Web Consortium that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of...

 framework with information stored in an RDF
Resource Description Framework
The Resource Description Framework is a family of World Wide Web Consortium specifications originally designed as a metadata data model...

 database. The implementation team intends for the software to provide all the functions of a Web server
Web server
Web server can refer to either the hardware or the software that helps to deliver content that can be accessed through the Internet....

, excluding the serving of content. Active PURLs, which act similar to a stored procedure
Stored procedure
A stored procedure is a subroutine available to applications that access a relational database system. A stored procedure is actually stored in the database data dictionary.Typical uses for stored procedures include data validation or access control mechanisms...

 in relational databases, are being experimented with to allow a PURL server to actively participate in the creation of metadata describing a Web resource.

See also

  • Archival Resource Key
    Archival Resource Key
    Archival Resource Key is a Uniform Resource Locator that provides a multi-purpose identifier given to information objects of any type...

     (ARK)
  • Digital Object Identifier
    Digital object identifier
    A digital object identifier is a character string used to uniquely identify an object such as an electronic document. Metadata about the object is stored in association with the DOI name and this metadata may include a location, such as a URL, where the object can be found...

     (DOI)
  • Handle System
    Handle System
    The Handle System is a technology specification for assigning, managing, and resolving persistent identifiers for digital objects and other resources on the Internet...

     identifiers
  • Link rot
    Link rot
    Link rot , also known as link death or link breaking is an informal term for the process by which, either on individual websites or the Internet in general, increasing numbers of links point to web pages, servers or other resources that have become permanently unavailable...

  • OPAC
    OPAC
    An Online Public Access Catalog is an online database of materials held by a library or group of libraries...

  • SEF URLs
  • Uniform Resource Name
    Uniform Resource Name
    A uniform resource name is a uniform resource identifier that uses the urn scheme and does not imply availability of the identified resource. Both URNs and URLs are URIs, and a particular URI may be a name and a locator at the same time.The functional requirements for uniform resource names are...

     (URN)
  • Personalized Uniform Resource Locator
    Personalized Uniform Resource Locator
    A Personalized Uniform Resource Locator is a URL that can be personalized for each specific user of a website.A Personalized URL will usually take this form: http://[UserName].CompanyName.com...

    (PURL)

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK