Undefined behaviour
Encyclopedia
In computer programming
Computer programming
Computer programming is the process of designing, writing, testing, debugging, and maintaining the source code of computer programs. This source code is written in one or more programming languages. The purpose of programming is to create a program that performs specific operations or exhibits a...

, undefined behavior is a feature of some programming languages—most famously C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

. In these languages, to simplify the specification and allow some flexibility in implementation, the specification leaves the results of certain operations specifically undefined.

For example, in C the use of any automatic variable before it has been initialized yields undefined behavior, as do division by zero
Division by zero
In mathematics, division by zero is division where the divisor is zero. Such a division can be formally expressed as a / 0 where a is the dividend . Whether this expression can be assigned a well-defined value depends upon the mathematical setting...

 and indexing an array outside of its defined bounds (see buffer overflow
Buffer overflow
In computer security and programming, a buffer overflow, or buffer overrun, is an anomaly where a program, while writing data to a buffer, overruns the buffer's boundary and overwrites adjacent memory. This is a special case of violation of memory safety....

). This specifically frees the compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

 to do whatever is easiest or most efficient, should such a program be submitted. In general, any behavior afterwards is also undefined. In particular, it is never required that the compiler diagnose undefined behavior — therefore, programs invoking undefined behavior may appear to compile and even run without errors at first, only to fail on another system, or even on another date. When an instance of undefined behavior occurs, so far as the language specification is concerned anything could happen, maybe nothing at all.

Under some circumstances there can be specific restrictions on undefined behavior. For example, the instruction set
Instruction set
An instruction set, or instruction set architecture , is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O...

 specifications of a CPU
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...

 might leave the behavior of some forms of an instruction undefined, but if the CPU supports memory protection
Memory protection
Memory protection is a way to control memory access rights on a computer, and is a part of most modern operating systems. The main purpose of memory protection is to prevent a process from accessing memory that has not been allocated to it. This prevents a bug within a process from affecting...

 then the specification will probably include a blanket rule stating that no user-accessible instruction may cause a hole in the operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

's security; so an actual CPU would be permitted to corrupt any or all user registers in response to such an instruction but would not be allowed to, for example, switch into supervisor mode.

Examples in C and C++

Attempting to modify a string literal causes undefined behavior:

char * p = "wikipedia"; // in C++, this requires deprecated implicit conversion from const char[] to char*
p[0] = 'W'; // undefined behaviour

One way to prevent this is defining it as an array instead of a pointer.

char p[] = "wikipedia"; /* RIGHT */
p[0] = 'W';

In C++ one can use STL string as follows.

std::string s = "wikipedia"; /* RIGHT */
s[0] = 'W';


Division by zero
Division by zero
In mathematics, division by zero is division where the divisor is zero. Such a division can be formally expressed as a / 0 where a is the dividend . Whether this expression can be assigned a well-defined value depends upon the mathematical setting...

 results in undefined behavior:

return x/0; // undefined behavior

Certain pointer operations may result in undefined behavior:

int arr[4] = {0, 1, 2, 3};
int* p = arr + 5; // undefined behavior

No return by main function may cause undefined behavior:

void main /* undefined behavior */
{
}

The C Programming Language
The C Programming Language (book)
The C Programming Language is a well-known programming book written by Brian Kernighan and Dennis Ritchie, the latter of whom originally designed and implemented the language, as well as co-designed the Unix operating system with which development of the language was closely intertwined...

 written by Kernighan
Brian Kernighan
Brian Wilson Kernighan is a Canadian computer scientist who worked at Bell Labs alongside Unix creators Ken Thompson and Dennis Ritchie and contributed to the development of Unix. He is also coauthor of the AWK and AMPL programming languages. The 'K' of K&R C and the 'K' in AWK both stand for...

 and Ritchie
Dennis Ritchie
Dennis MacAlistair Ritchie , was an American computer scientist who "helped shape the digital era." He created the C programming language and, with long-time colleague Ken Thompson, the UNIX operating system...

 cites the following examples of code that have undefined behavior in Section 2.12.

printf("%d %d\n", ++n, power(2, n)); /* WRONG */

and

a[i] = i++;

Risks of undefined behavior

HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

 versions 4 and earlier left error handling undefined. Over time pages started relying on unspecified error-recovery implemented in popular browsers. This caused difficulties for vendors of less-popular browsers who were forced to reverse-engineer and implement bug compatible
Bug compatibility
Computer hardware and software is said to be bug compatible if it exactly replicates even an undesirable feature of a previous version. The phrase is found in the Jargon File....

 error recovery. This has led to de-facto standard that was much more complicated than it could have been if this behavior was specified from the start.

Compiler easter eggs

In some languages (including C), even the compiler is not bound to behave in a sensible manner once undefined behavior has been invoked. One instance of undefined behavior acting as an Easter egg is the behavior of early versions of the GCC
GNU Compiler Collection
The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain...

 C compiler when given a program containing the #pragma directive, which has implementation-defined behavior according to the C standard. ("Implementation-defined" is more restrictive than "undefined", requiring the implementation to document what it does.) In practice, many C implementations recognize, for example, #pragma once
Pragma once
In the C and C++ programming languages, #pragma once is a non-standard but widely supported preprocessor directive designed to cause the current source file to be included only once in a single compilation...

 as a rough equivalent of #include guard
Include guard
In the C and C++ programming languages, an #include guard, sometimes called a macro guard, is a particular construct used to avoid the problem of double inclusion when dealing with the #include directive...

s — but GCC 1.21, upon finding a #pragma directive, would instead attempt to launch commonly distributed Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

 games such as NetHack
NetHack
NetHack is a single-player roguelike video game originally released in 1987. It is a descendant of an earlier game called Hack , which is a descendant of Rogue...

and Rogue
Rogue (computer game)
Rogue is a dungeon crawling video game first developed by Michael Toy and Glenn Wichman around 1980. It was a favorite on college Unix systems in the early to mid-1980s, in part due to the procedural generation of game content. Rogue popularized dungeon crawling as a video game trope, leading...

, or start Emacs
Emacs
Emacs is a class of text editors, usually characterized by their extensibility. GNU Emacs has over 1,000 commands. It also allows the user to combine these commands into macros to automate work.Development began in the mid-1970s and continues actively...

 running a simulation of the Towers of Hanoi.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK