Uniform Resource Locator
Encyclopedia
In computing
Information technology
Information technology is the acquisition, processing, storage and dissemination of vocal, pictorial, textual and numerical information by a microelectronics-based combination of computing and telecommunications...

, a uniform resource locator or universal resource locator (URL) is a specific character string that constitutes a reference to an Internet
Internet
The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...

 resource.

A URL is technically a type of uniform resource identifier
Uniform Resource Identifier
In computing, a uniform resource identifier is a string of characters used to identify a name or a resource on the Internet. Such identification enables interaction with representations of the resource over a network using specific protocols...

 (URI) but in many technical documents and verbal discussions URL is often used as a synonym for URI.

History

The Uniform Resource Locator was created in 1994 by Tim Berners-Lee
Tim Berners-Lee
Sir Timothy John "Tim" Berners-Lee, , also known as "TimBL", is a British computer scientist, MIT professor and the inventor of the World Wide Web...

 and the URI working group of the Internet Engineering Task Force
Internet Engineering Task Force
The Internet Engineering Task Force develops and promotes Internet standards, cooperating closely with the W3C and ISO/IEC standards bodies and dealing in particular with standards of the TCP/IP and Internet protocol suite...

 (IETF) as an outcome of collaboration started at the IETF Living Documents "Birds of a Feather"
Birds of a Feather (computing)
In computing, BoF can refer to:* An informal discussion group. Unlike special interest groups or working groups, BoFs are informal and often formed in an ad-hoc manner...

 session in 1992. The format combines the pre-existing system of domain name
Domain name
A domain name is an identification string that defines a realm of administrative autonomy, authority, or control in the Internet. Domain names are formed by the rules and procedures of the Domain Name System ....

s (created in 1985) with file path
Path (computing)
A path, the general form of a filename or of a directory name, specifies a unique location in a file system. A path points to a file system location by following the directory tree hierarchy expressed in a string of characters in which path components, separated by a delimiting character, represent...

 syntax, where forward slashes
Slash (punctuation)
The slash is a sign used as a punctuation mark and for various other purposes. It is now often called a forward slash , and many other alternative names.-History:...

 are used to separate folder and file
Filename
The filename is metadata about a file; a string used to uniquely identify a file stored on the file system. Different file systems impose different restrictions on length and allowed characters on filenames.A filename includes one or more of these components:...

 names. Conventions already existed where server names could be prepended to complete file paths, preceded by a double-slash (//).

Syntax

Every URL consists of some of the following: the scheme name
URI scheme
In the field of computer networking, a URI scheme is the top level of the Uniform Resource Identifier naming structure. All URIs and absolute URI references are formed with a scheme name, followed by a colon character , and the remainder of the URI called the scheme-specific part...

 (commonly called protocol), followed by a colon, two slashes, then, depending on scheme, a domain name
Domain name
A domain name is an identification string that defines a realm of administrative autonomy, authority, or control in the Internet. Domain names are formed by the rules and procedures of the Domain Name System ....

(alternatively, IP address
IP address
An Internet Protocol address is a numerical label assigned to each device participating in a computer network that uses the Internet Protocol for communication. An IP address serves two principal functions: host or network interface identification and location addressing...

), a port number, the path of the resource to be fetched or the program to be run, then, for programs such as Common Gateway Interface
Common Gateway Interface
The Common Gateway Interface is a standard method for web servers software to delegate the generation of web pages to executable files...

 (CGI) scripts, a query string
Query string
In World Wide Web, a query string is the part of a Uniform Resource Locator that contains data to be passed to web applications such as CGI programs....

, and an optional fragment identifier
Fragment identifier
In computer hypertext, a fragment identifier is a short string of characters that refers to a resource that is subordinate to another, primary resource...

.

The syntax is
scheme://domain:port/path?query_string#fragment_id
  • The scheme name defines the namespace, purpose, and the syntax of the remaining part of the URL. Software will try to process a URL according to its scheme and context. For example, a web browser
    Web browser
    A web browser is a software application for retrieving, presenting, and traversing information resources on the World Wide Web. An information resource is identified by a Uniform Resource Identifier and may be a web page, image, video, or other piece of content...

     will usually dereference the URL http://example.org:80 by performing an HTTP request to the host at example.org, using port number 80. The URL mailto:bob@example.com may start an e-mail
    E-mail
    Electronic mail, commonly known as email or e-mail, is a method of exchanging digital messages from an author to one or more recipients. Modern email operates across the Internet or other computer networks. Some early email systems required that the author and the recipient both be online at the...

     composer with the address bob@example.com in the To field.


Other examples of scheme names include https
Https
Hypertext Transfer Protocol Secure is a combination of the Hypertext Transfer Protocol with SSL/TLS protocol to provide encrypted communication and secure identification of a network web server...

:, gopher:, wais
Wide area information server
Wide Area Information Servers or WAIS is a client–server text searching system that uses the ANSI Standard Z39.50 Information Retrieval Service Definition and Protocol Specifications for Library Applications" to search index databases on remote computers...

:, ftp:. URLs with https as a scheme (such as https://example.com/) require that requests and responses will be made over a secure connection to the website. Some schemes that require authentication allow a username, and perhaps a password too, to be embedded in the URL, for example ftp://asmith@ftp.example.org. Passwords embedded in this way are not conducive to secure working, but the full possible syntax is
scheme://username:password@domain:port/path?query_string#fragment_id
  • The domain name or IP address gives the destination location for the URL. The domain google.com, or its IP address 72.14.207.99, is the address of Google's website.
  • The domain name portion of a URL is not case sensitive since DNS
    Domain name system
    The Domain Name System is a hierarchical distributed naming system for computers, services, or any resource connected to the Internet or a private network. It associates various information with domain names assigned to each of the participating entities...

     ignores case: http://en.example.org/ and HTTP://EN.EXAMPLE.ORG/ both open the same page.
  • The port number is optional; if omitted, the default for the scheme is used. For example, http://vnc.example.com:5800 connects to port 5800 of vnc.example.com, which may be appropriate for a VNC remote control session. If the port number is omitted for an http: URL, the browser will connect on port 80, the default HTTP port. The default port for an https: request is 443.
  • The path is used to specify and perhaps find the resource requested. It is case-sensitive, though it may be treated as case-insensitive by some servers, especially those based on Microsoft Windows
    Microsoft Windows
    Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

    . If the server is case sensitive and http://en.example.org/wiki/URL is correct, http://en.example.org/WIKI/URL or http://en.example.org/wiki/url will display an HTTP 404
    HTTP 404
    The 404 or Not Found error message is a HTTP standard response code indicating that the client was able to communicate with the server, but the server could not find what was requested. A 404 error should not be confused with "server not found" or similar errors, in which a connection to the...

     error page, unless these URLs point to valid resources themselves.
  • The query string
    Query string
    In World Wide Web, a query string is the part of a Uniform Resource Locator that contains data to be passed to web applications such as CGI programs....

     contains data to be passed to software running on the server
    Server-side scripting
    Server-side scripting is a web server technology in which a user's request is verified by running a script directly on the web server to generate dynamic web pages. It is usually used to provide interactive web sites that interface to databases or other data stores. This is different from...

    . It may contain name/value pairs separated by ampersands, for example ?first_name=John&last_name=Doe.
  • The fragment identifier
    Fragment identifier
    In computer hypertext, a fragment identifier is a short string of characters that refers to a resource that is subordinate to another, primary resource...

    , if present, specifies a part or a position within the overall resource or document. When used with HTTP, it usually specifies a section or location within the page, and the browser may scroll to display that part of the page.

Absolute and relative URLs

According to RFC 1738, which defined URLs in 1994, when resources contain references to other resources, they can use relative links to define the location of the second resource as if to say, "in the same place as this one except with the following relative path". It went on to say that such relative URLs are dependent on the original URL containing a hierarchical structure against which the relative link is based, and that the ftp, http, and file URL schemes are examples of some that can be considered hierarchical, with the components of the hierarchy being separated by "/".

URLs as locators

A URL is a URI that, "in addition to identifying a resource
Resource (Web)
The concept of resource is primitive in the Web architecture, and is used in the definition of its fundamental elements. The term was first introduced to refer to targets of Uniform Resource Locators , but its definition has been further extended to include the referent of any Uniform Resource...

, provides a means of locating the resource by describing its primary access mechanism (e.g., its network location)"
.

Internet hostnames

On the Internet
Internet
The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...

, a hostname is a domain name
Domain name
A domain name is an identification string that defines a realm of administrative autonomy, authority, or control in the Internet. Domain names are formed by the rules and procedures of the Domain Name System ....

 assigned to a host computer. This is usually a combination of the host's local name with its parent domain's name. For example, en.example.org consists of a local hostname (en) and the domain name example.org. The hostname is translated into an IP address
IP address
An Internet Protocol address is a numerical label assigned to each device participating in a computer network that uses the Internet Protocol for communication. An IP address serves two principal functions: host or network interface identification and location addressing...

 via the local hosts file
Hosts file
The hosts file is a computer file used in an operating system to map hostnames to IP addresses. The hosts file is a plain-text file and is conventionally named hosts.-Purpose:...

, or the domain name system
Domain name system
The Domain Name System is a hierarchical distributed naming system for computers, services, or any resource connected to the Internet or a private network. It associates various information with domain names assigned to each of the participating entities...

 (DNS) resolver. It is possible for a single host computer to have several hostnames; but generally the operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

 of the host prefers to have one hostname that the host uses for itself.

Any domain name can also be a hostname, as long as the restrictions mentioned below are followed. For example, both "en.example.org" and "example.org" can be hostnames if they both have IP address
IP address
An Internet Protocol address is a numerical label assigned to each device participating in a computer network that uses the Internet Protocol for communication. An IP address serves two principal functions: host or network interface identification and location addressing...

es assigned to them. The domain name "xyz.example.org" may not be a hostname if it does not have an IP address, but "aa.xyz.example.org" may still be a hostname. All hostnames are domain names, but not all domain names are hostnames.

See also

  • CURIE
    CURIE
    A CURIE defines a generic, abbreviated syntax for expressing URIs. It is an abbreviated URI expressed in CURIE syntax, and may be found in both XML and non-XML grammars...

     (Compact URI)
  • Extensible Resource Identifier
    Extensible Resource Identifier
    Extensible Resource Identifier is a scheme and resolution protocol for abstract identifiers compatible with Uniform Resource Identifiers and Internationalized Resource Identifiers, developed by the at OASIS...

     (XRI)
  • Fragment identifier
    Fragment identifier
    In computer hypertext, a fragment identifier is a short string of characters that refers to a resource that is subordinate to another, primary resource...

  • Internationalized Resource Identifier
    Internationalized Resource Identifier
    On the Internet, the Internationalized Resource Identifier is a generalization of the Uniform Resource Identifier . While URIs are limited to a subset of the ASCII character set, IRIs may contain characters from the Universal Character Set , including Chinese or Japanese kanji, Korean, Cyrillic...

     (IRI)
  • URL normalization
    URL normalization
    URL normalization is the process by which URLs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URL into a normalized or canonical URL so it is possible to determine if two syntactically different URLs may be equivalent.Search engines...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK