Null character
Encyclopedia
The null character abbreviated NUL, is a control character
Control character
In computing and telecommunication, a control character or non-printing character is a code point in a character set, that does not in itself represent a written symbol.It is in-band signaling in the context of character encoding....

 with the value zero.

It is present in many character sets, including ISO/IEC 646
ISO/IEC 646
ISO/IEC 646:1991, Information technology — ISO 7-bit coded character set for information interchange, is an ISO standard that since its first edition in 1972 has specified a 7-bit character code from which several national standards are derived...

 (or ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

), the C0 control code
C0 and C1 control codes
Most character encodings, in addition to representing printable characters, may also represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received...

, the Universal Character Set
Universal Character Set
The Universal Character Set , defined by the International Standard ISO/IEC 10646, Information technology — Universal multiple-octet coded character set , is a standard set of characters upon which many character encodings are based...

 (or Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

), and EBCDIC
EBCDIC
Extended Binary Coded Decimal Interchange Code is an 8-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems....

. It is available in nearly all mainstream programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

s.

The original meaning of this character was like NOP
NOP
In computer science, NOP or NOOP is an assembly language instruction, sequence of programming language statements, or computer protocol command that effectively does nothing at all....

—when sent to a printer
Computer printer
In computing, a printer is a peripheral which produces a text or graphics of documents stored in electronic form, usually on physical print media such as paper or transparencies. Many printers are primarily used as local peripherals, and are attached by a printer cable or, in most new printers, a...

 or a terminal
Computer terminal
A computer terminal is an electronic or electromechanical hardware device that is used for entering data into, and displaying data from, a computer or a computing system...

, it does nothing (some terminals, however, incorrectly display it as space
Space (punctuation)
In writing, a space is a blank area devoid of content, serving to separate words, letters, numbers, and punctuation. Conventions for interword and intersentence spaces vary among languages, and in some cases the spacing rules are quite complex....

). When electromechanical teleprinter
Teleprinter
A teleprinter is a electromechanical typewriter that can be used to communicate typed messages from point to point and point to multipoint over a variety of communication channels that range from a simple electrical connection, such as a pair of wires, to the use of radio and microwave as the...

s were used as computer output devices, one or more null characters were sent at the end of each printed line to allow time for the mechanism to return to the first printing position on the next line. On punched tape
Punched tape
Punched tape or paper tape is an obsolete form of data storage, consisting of a long strip of paper in which holes are punched to store data...

, the character is represented with no holes at all, so a new unpunched tape is initially filled with null characters, and often text could be "inserted" at a reserved space of null characters by punching the new characters into the tape over the nulls.

Today the character has much more significance in C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

 and its derivatives and in many data formats, where it serves as a reserved character used to signify the end of a string
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....

, often called a null-terminated string
Null-terminated string
In computer programming, a null-terminated string is a character string stored as an array containing the characters and terminated with a null character...

 or "ASCIIZ" string. This allows the string to be any length with only the overhead of one byte; the alternative of storing a count requires either a string length limit of 255 or an overhead of more than one byte (there are other advantages/disadvantages described under null-terminated string
Null-terminated string
In computer programming, a null-terminated string is a character string stored as an array containing the characters and terminated with a null character...

).

Representation

The null character is often represented as the escape sequence
Escape sequence
An escape sequence is a series of characters used to change the state of computers and their attached peripheral devices. These are also known as control sequences, reflecting their use in device control. Some control sequences are special characters that always have the same meaning...

 \0 in source code
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...

 string literal
String literal
A string literal is the representation of a string value within the source code of a computer program. There are numerous alternate notations for specifying string literals, and the exact notation depends on the individual programming language in question...

s or character constants. In many languages (such as C, which introduced this notation), this is not a separate escape sequence, but an octal escape sequence with a single octal digit of 0; as a consequence, \0 must not be followed by any of the digits 0 through 7; otherwise it is interpreted as the start of a longer octal escape sequence. Other escape sequences that are found in use in various languages are \000, \x00, the Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

 representation \u0000, or \z. A null character can be placed in a URL with %00, which (in case of unchecked user input) creates a vulnerability
Vulnerability (computing)
In computer security, a vulnerability is a weakness which allows an attacker to reduce a system's information assurance.Vulnerability is the intersection of three elements: a system susceptibility or flaw, attacker access to the flaw, and attacker capability to exploit the flaw...

 known as null byte injection and can lead to security exploits.

In caret notation
Caret notation
Caret notation is a notation for unprintable control characters in ASCII encoding. The notation consists of a caret followed by a capital letter; this digraph stands for the ASCII code that has the numerical value equivalent to the letter's numerical value. For example the EOT character with a...

 the null character is ^@. On some keyboards, one can enter a null character by holding down and pressing (which usually requires also holding and pressing another key such as or ). It is also common to be able to type a null with or .

In documentation the null character is sometimes represented as a single-em
Em (typography)
An em is a unit of measurement in the field of typography, equal to the currently specified point size.The name of em is related to M. Originally the unit was derived from the width of the capital "M" in the given typeface....

-width symbol containing the letters "NUL". In Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

, there is a character with a corresponding glyph for visual representation of the null character, "symbol for null", U+2400 —not to be confused with the actual null character, U+0000.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK