In
computingComputing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...
and
telecommunicationTelecommunication is the transmission of information over significant distances to communicate. In earlier times, telecommunications involved the use of visual signals, such as beacons, smoke signals, semaphore telegraphs, signal flags, and optical heliographs, or audio messages via coded...
, an
escape character is a
characterIn computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language....
which invokes an alternative interpretation on subsequent characters in a character sequence. An escape character is a particular case of
metacharacterA metacharacter is a character that has a special meaning to a computer program, such as a shell interpreter or a regular expression engine.-Examples:...
s. Generally, the judgement of whether something is
an escape character or not depends on context.
Definition
Escape characters are part of the syntax for many programming languages, data formats and communication protocols. For a given
alphabet an escape character's purpose is to start character sequences (so named
escape sequences) which have to be interpreted differently from the same characters occurring alone. An escape character may not have its own meaning, so all escape sequences are of 2 or more characters.
There are usually two functions of escape sequences. The first is to encode a syntactic entity, such as device commands or special data which cannot be directly represented by the alphabet. The second use, referred to as
character quoting, is to represent characters which cannot be typed in current context, or would have an undesired interpretation. In the latter case an escape sequence is a digraph consisting of an escape character itself and a "quoted" character.
Escape character vs control character
Generally, an escape character is not a particular case of (device)
control characterIn computing and telecommunication, a control character or non-printing character is a code point in a character set, that does not in itself represent a written symbol.It is in-band signaling in the context of character encoding....
s, nor vice versa. If we define control characters as non-
graphicIn ISO/IEC 646 and related standards including ISO 8859 and Unicode, a graphic character is any character intended to be written, printed, or otherwise displayed in a form that can be read by humans...
, or as having a special meaning for an output device (e.g.
printerIn computing, a printer is a peripheral which produces a text or graphics of documents stored in electronic form, usually on physical print media such as paper or transparencies. Many printers are primarily used as local peripherals, and are attached by a printer cable or, in most new printers, a...
or text terminal) then any escape character for this device is a control one. But escape characters used in programming (see below) are graphic, hence are not control characters. Conversely most (but not all) of the ASCII "control characters" have some control function in isolation, therefore are not escape characters.
ASCII escape character
The ASCII "escape" character (
octalThe octal numeral system, or oct for short, is the base-8 number system, and uses the digits 0 to 7. Numerals can be made from binary numerals by grouping consecutive binary digits into groups of three...
: \033, or
^[, or, in decimal, 27) is used in many output devices to start a series of characters called a
control sequence or escape sequenceAn escape sequence is a series of characters used to change the state of computers and their attached peripheral devices. These are also known as control sequences, reflecting their use in device control. Some control sequences are special characters that always have the same meaning...
. Typically, the escape character was sent first in such a sequence to alert the device that the following characters were to be interpreted as a control sequence rather than as plain characters, then one or more characters would follow to specify some detailed action, after which the device would go back to interpreting characters normally. For example, the sequence of ^[, followed by the printable characters
[2;10H, would cause a
DECDigital Equipment Corporation was a major American company in the computer industry and a leading vendor of computer systems, software and peripherals from the 1960s to the 1990s...
VT102 terminal to move its
cursorIn computing, a cursor is an indicator used to show the position on a computer monitor or other display device that will respond to input from a text input or pointing device. The flashing text cursor may be referred to as a caret in some cases...
to the 10th cell of the 2nd line of the screen. This was later developed to
ANSI escape codeANSI escape sequences are characters embedded in the text used to control formatting, color, and other output options on video text terminals. Almost all terminal emulators designed to show text output from a remote computer, and to show text output from local software, interpret at least some of...
s covered by the
ANSIThe American National Standards Institute is a private non-profit organization that oversees the development of voluntary consensus standards for products, services, processes, systems, and personnel in the United States. The organization also coordinates U.S. standards with international...
X3.64 standard. The escape character also starts each command sequence in the Hewlett Packard
Printer Command LanguagePrinter Command Language, more commonly referred to as PCL, is a page description language developed by Hewlett-Packard as a printer protocol and has become a de facto industry standard. Originally developed for early inkjet printers in 1984, PCL has been released in varying levels for thermal,...
.
Early reference to the term "escape character" is found in
Bob BemerRobert William Bemer was a computer scientist best known for his work at IBM during the late 1950s and early 1960s.-Biography:...
's IBM technical publications. Apparently, it is he who invented this mechanism, during his work on the
ASCIIThe American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
character set.
The Escape key is usually found on standard PC keyboards. However it is commonly absent from keyboards for PDAs and other devices not designed primarily for ASCII communications, and not generally used as part of the common user interface for applications on the Windows operating system. Linux systems, or applications such as FireFox, often use the key as the functional equivalent to clicking on a Cancel button with a mouse. The DEC
VT220The VT220 was a terminal produced by Digital Equipment Corporation from 1983 to 1987.-Hardware:The VT220 improved on the earlier VT100 series of terminals with a redesigned keyboard, much smaller physical packaging, and a much faster microprocessor...
series was one of the few popular keyboards that did not have a dedicated Esc key, instead using one of the keys above the main keypad. In
user interfaceThe user interface, in the industrial design field of human–machine interaction, is the space where interaction between humans and machines occurs. The goal of interaction between a human and a machine at the user interface is effective operation and control of the machine, and feedback from the...
s of
1970sFile:1970s decade montage.png|From left, clockwise: US President Richard Nixon doing the V for Victory sign after his resignation from office after the Watergate scandal in 1974; Refugees aboard a US naval boat after the Fall of Saigon, leading to the end of the Vietnam War in 1975; The 1973 oil...
–
1980sFile:1980s decade montage.png|thumb|400px|From left, clockwise: The first Space Shuttle, Columbia, lifted off in 1981; American President Ronald Reagan and Soviet leader Mikhail Gorbachev eased tensions between the two superpowers, leading to the end of the Cold War; The Fall of the Berlin Wall in...
it was not uncommon to use this key as an escape character, but in modern desktop computers such use is dropped. Sometimes the key was identified with AltMode (for alternative mode). Even with no dedicated key, the escape character code could be generated by typing '[' while simultaneously holding down the
Control keyIn computing, a Control key is a modifier key which, when pressed in conjunction with another key, will perform a special operation ; similar to the Shift key, the Control key rarely performs any function when pressed by itself...
, 'Ctrl'.
Programming and data formats
Many modern
programming languageA programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....
s specify the doublequote character (
") as a
delimiterA delimiter is a sequence of one or more characters used to specify the boundary between separate, independent regions in plain text or other data streams. An example of a delimiter is the comma character, which acts as a field delimiter in a sequence of comma-separated values.Delimiters represent...
for a
string literalA string literal is the representation of a string value within the source code of a computer program. There are numerous alternate notations for specifying string literals, and the exact notation depends on the individual programming language in question...
. The
backslashThe backslash is a typographical mark used mainly in computing. It was first introduced to computers in 1960 by Bob Bemer. Sometimes called a reverse solidus or a slosh, it is the mirror image of the common slash....
(
\) escape character provides two ways to include doublequotes inside a string literal, either by modifying the meaning of the doublequote character embedded in the string (
\" becomes
"), or by modifying the meaning of the three characters that are the hexadecimal value of a doublequote character (
\x22 becomes
").
In
PerlPerl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...
or
PythonPython is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
-
print "Nancy said "Hello World!" to the crowd.";
produces a syntax error, whereas:
-
print "Nancy said \"Hello World!\" to the crowd."; ### example of \"
produces the intended output.
Another alternative:
-
print "Nancy said \x22Hello World!\x22 to them."; ### example of \x22
uses numeric escape-sequence of hexadecimal "x22" for a quotemark. This would not produce the required text if run on a non-
ASCIIThe American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
machine.
CC is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
,
C++C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
, and
JavaJava is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
all allow exactly the same two backslash escape styles. The
PostScriptPostScript is a dynamically typed concatenative programming language created by John Warnock and Charles Geschke in 1982. It is best known for its use as a page description language in the electronic and desktop publishing areas. Adobe PostScript 3 is also the worldwide printing and imaging...
language and Microsoft
Rich Text FormatThe Rich Text Format is a proprietary document file format with published specification developed by Microsoft Corporation since 1987 for Microsoft products and for cross-platform document interchange....
also use backslash escapes. The
quoted-printableQuoted-printable, or QP encoding, is an encoding using printable ASCII characters to transmit 8-bit data over a 7-bit data path or, generally, over a medium which is not 8-bit clean...
encoding uses the
equals signThe equality sign, equals sign, or "=" is a mathematical symbol used to indicate equality. It was invented in 1557 by Robert Recorde. The equals sign is placed between the things stated to have the same value, as in an equation...
as an escape character.
URL and
URIÚriis a village and commune in the comitatus of Pest in Hungary....
use
%The percent sign is the symbol used to indicate a percentage .Related signs include the permille sign ‰ and the permyriad sign , which indicate that a number is divided by one thousand or ten thousand respectively...
-
escapesPercent-encoding, also known as URL encoding, is a mechanism for encoding information in a Uniform Resource Identifier under certain circumstances. Although it is known as URL encoding it is, in fact, used more generally within the main Uniform Resource Identifier set, which includes both Uniform...
to quote characters with a special meaning, as for non-ASCII characters. The
ampersandAn ampersand is a logogram representing the conjunction word "and". The symbol is a ligature of the letters in et, Latin for "and".-Etymology:...
(
&) character may be considered as an escape character in SGML and derived formats such as
HTMLHyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....
and
XMLExtensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
.
Another similar (and partially overlapping) syntactic trick is stropping.
Some programming languages also provide other ways to represent special characters in literals, without requiring an escape character (see e.g. delimiter collision).
Communication protocols
The
Point-to-Point ProtocolIn networking, the Point-to-Point Protocol is a data link protocol commonly used in establishing a direct connection between two networking nodes...
uses the
0xIn mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...
7D
octetAn octet is a unit of digital information in computing and telecommunications that consists of eight bits. The term is often used when the term byte might be ambiguous, as there is no standard for the size of the byte.-Overview:...
(\175, or ASCII:
} ) as an escape character. The octet immediately following should be XORed by 0x20 before being passed to a higher level protocol. This is applied to both 0x7D itself and the control character 0x7E (which is used in PPP to mark the beginning and end of a frame) when those octets need to be transmitted by a higher level protocol encapsulated by PPP, as well as other octets negotiated when the link is established. That is, when a higher level protocol wishes to transmit 0x7D, it is transmitted as the sequence 0x7D 0x5D, and 0x7E is transmitted as 0x7D 0x5E.
Bourne shell
In
Bourne shellThe Bourne shell, or sh, was the default Unix shell of Unix Version 7 and most Unix-like systems continue to have /bin/sh - which will be the Bourne shell, or a symbolic link or hard link to a compatible shell - even when more modern shells are used by most users.Developed by Stephen Bourne at AT&T...
(sh), the
asteriskAn asterisk is a typographical symbol or glyph. It is so called because it resembles a conventional image of a star. Computer scientists and mathematicians often pronounce it as star...
(
*) and
question markThe question mark , is a punctuation mark that replaces the full stop at the end of an interrogative sentence in English and many other languages. The question mark is not used for indirect questions...
(
?) characters are
wildcard character-Telecommunication:In telecommunications, a wildcard character is a character that may be substituted for any of a defined subset of all possible characters....
s expanded via globbing. Without a preceding escape character, an
* will expand to the names of all files in the
working directoryIn computing, the working directory of a process is a directory of a hierarchical file system, if any, dynamically associated with each process. When the process refers to a file using a simple file name or relative path , the reference is interpreted relative to the current working directory of...
that don't start with a period
iffIFF, Iff or iff may refer to:Technology/Science:* Identification friend or foe, an electronic radio-based identification system using transponders...
there are such files, otherwise
* remains unexpanded. So to refer to a file literally called "*", the shell must be told not to interpret it in this way, by preceding it with a backslash (
\). This modifies the interpretation of the asterisk (
*). Compare:
| |
rm * # delete all files in the current directory
rm \* # delete the file named *
|
Windows Command Prompt
The
Windows command-line interpreterCommand Prompt is the Microsoft-supplied command-line interpreter on OS/2, Windows CE and on Windows NT-based operating systems...
uses a
caretCaret usually refers to the spacing symbol ^ in ASCII and other character sets. In Unicode, however, the corresponding character is , whereas the Unicode character named caret is actually a similar but lowered symbol: ....
character (
^) to escape reserved characters that have special meanings (in particular:
& | < > ^). The
DOS command-line interpreterCOMMAND.COM is the filename of the default operating system shell for DOS operating systems and the default command line interpreter on Windows 95, Windows 98 and Windows Me...
, though it supports similar syntax, does not support this.
For example, on the Windows Command Prompt, this will result in a syntax error.
-
echo
whereas this will output the string: <wiki>
-
echo ^
External links