All Topics  
Buffer overflow

 

   Email Print
   Bookmark   Link






 

Buffer overflow



 
 
In computer security
Computer security

Computer security is a branch of technology known as information security as applied to computers. The objective of computer security can include protection of information from theft or corruption, or the preservation of availability, as defined in the security policy....
 and programming
Computer programming

Computer programming is the process of writing, testing, debugging/troubleshooting, and maintaining the source code of computer programs. This source code is written in a programming language....
, a buffer overflow, or buffer overrun, is an anomalous
Anomaly in software

In software testing an anomaly is anything that differs from expectation. This expectation can result from many things like from a document or from a person's view or experiences ....
 condition where a process
Process (computing)

In computing, a process is an Object of a computer program that is being sequentially executed by a computer system that has the ability to run several computer programs Concurrency ....
 attempts to store data
DATA

Debt, AIDS, Trade in Africa is a multinational Non-governmental organization founded in January 2002 in London by U2's Bono along with Robert Sargent Shriver III and activists from the Jubilee 2000 Drop the Debt campaign....
 beyond the boundaries of a fixed-length buffer
Buffer (computer science)

In computing, a buffer is a region of Memory used to temporarily hold data while it is being moved from one place to another. Typically, the data is stored in a buffer as it is retrieved from an input device or just before it is sent to an output device ....
. The result is that the extra data overwrites adjacent memory
Computer storage

Computer data storage, often called storage or memory, refers to computer components, devices, and recording medium that retain digital data used for computing for some interval of time....
 locations. The overwritten data may include other buffers, variables and program flow data, and may result in erratic program behavior, a memory
Computer memory

Computer memory is usually meant to refer to the semiconductor technology that is used to store information in Electronics devices. Current primary computer memory makes use of integrated circuits consisting of silicon-based transistors....
 access exception
Exception handling

Exception handling is a programming language construct or computer hardware mechanism designed to handle the occurrence of exceptions - special conditions that change the normal flow of execution....
, program termination (a crash
Crash (computing)

A crash or in computing is a condition where a program stops performing its expected function and also stops responding to other parts of the system....
), incorrect results or ? especially if deliberately caused by a malicious user ? a possible breach of system security.

Buffer overflows can be triggered by inputs specifically designed to execute malicious code or to make the program operate in an unintended way.






Discussion
Ask a question about 'Buffer overflow'
Start a new discussion about 'Buffer overflow'
Answer questions from other users
Full Discussion Forum



Encyclopedia


In computer security
Computer security

Computer security is a branch of technology known as information security as applied to computers. The objective of computer security can include protection of information from theft or corruption, or the preservation of availability, as defined in the security policy....
 and programming
Computer programming

Computer programming is the process of writing, testing, debugging/troubleshooting, and maintaining the source code of computer programs. This source code is written in a programming language....
, a buffer overflow, or buffer overrun, is an anomalous
Anomaly in software

In software testing an anomaly is anything that differs from expectation. This expectation can result from many things like from a document or from a person's view or experiences ....
 condition where a process
Process (computing)

In computing, a process is an Object of a computer program that is being sequentially executed by a computer system that has the ability to run several computer programs Concurrency ....
 attempts to store data
DATA

Debt, AIDS, Trade in Africa is a multinational Non-governmental organization founded in January 2002 in London by U2's Bono along with Robert Sargent Shriver III and activists from the Jubilee 2000 Drop the Debt campaign....
 beyond the boundaries of a fixed-length buffer
Buffer (computer science)

In computing, a buffer is a region of Memory used to temporarily hold data while it is being moved from one place to another. Typically, the data is stored in a buffer as it is retrieved from an input device or just before it is sent to an output device ....
. The result is that the extra data overwrites adjacent memory
Computer storage

Computer data storage, often called storage or memory, refers to computer components, devices, and recording medium that retain digital data used for computing for some interval of time....
 locations. The overwritten data may include other buffers, variables and program flow data, and may result in erratic program behavior, a memory
Computer memory

Computer memory is usually meant to refer to the semiconductor technology that is used to store information in Electronics devices. Current primary computer memory makes use of integrated circuits consisting of silicon-based transistors....
 access exception
Exception handling

Exception handling is a programming language construct or computer hardware mechanism designed to handle the occurrence of exceptions - special conditions that change the normal flow of execution....
, program termination (a crash
Crash (computing)

A crash or in computing is a condition where a program stops performing its expected function and also stops responding to other parts of the system....
), incorrect results or ? especially if deliberately caused by a malicious user ? a possible breach of system security.

Buffer overflows can be triggered by inputs specifically designed to execute malicious code or to make the program operate in an unintended way. As such, buffer overflows cause many software vulnerabilities and form the basis of many exploits
Exploit (computer security)

An exploit is a piece of software, a chunk of data, or sequence of commands that take advantage of a software bug, glitch or vulnerability in order to cause unintended or unanticipated behavior to occur on computer software, hardware, or something electronic ....
. Sufficient bounds checking
Bounds checking

In computer programming, bounds checking is any method of detecting whether a variable is within some bounds before its use. It is particularly relevant to a variable used as an index into an array to ensure its value lies within the bounds of the array....
 by either the programmer, the compiler
Compiler

A compiler is a computer program that transforms source code written in a programming language into another computer language . The most common reason for wanting to transform source code is to create an executable program....
 or the runtime
Runtime

In computer science, runtime or run time describes the operation of a computer program, the duration of its execution, from beginning to termination ....
 can prevent buffer overflows.

The programming languages most commonly associated with buffer overflows are C
C (programming language)

C is a general-purpose computer programming language originally developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories to implement the Unix operating system....
 and C++
C++

C++ is a general-purpose programming language. It is regarded as a middle-level language, as it comprises a combination of both high-level programming language and low-level programming language language features....
. They provide no built-in protection against accessing or overwriting data in any part of memory and do not check that data written to an array (the built-in buffer type) is within the boundaries of that array.

Technical description


A buffer overflow occurs when data
DATA

Debt, AIDS, Trade in Africa is a multinational Non-governmental organization founded in January 2002 in London by U2's Bono along with Robert Sargent Shriver III and activists from the Jubilee 2000 Drop the Debt campaign....
 written to a buffer, due to insufficient bounds checking
Bounds checking

In computer programming, bounds checking is any method of detecting whether a variable is within some bounds before its use. It is particularly relevant to a variable used as an index into an array to ensure its value lies within the bounds of the array....
, corrupts data values in memory address
Memory address

In computer science, a memory address is an identifier for a computer memory location, at which a computer program or a hardware device can store a piece of data and later retrieve it....
es adjacent to the allocated buffer. Most commonly this occurs when copying strings
String (computer science)

In computer programming and some branches of mathematics, a string is an ordered sequence of symbols. These symbols are chosen from a predetermined set or alphabet....
 of characters
Character (computing)

In computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written language form of a natural language....
 from one buffer to another.

Basic example


In the following example, a program has defined two data items which are adjacent in memory: an 8-byte-long string buffer, A, and a two-byte integer, B. Initially, A contains nothing but zero bytes, and B contains the number 3. Characters are one byte wide.

Now, the program attempts to store the character string "excessive" in the A buffer, followed by a zero byte
Null character

The null character is a character with the value zero, present in the ASCII and Unicode character sets, and available in nearly all mainstream programming languages....
 to mark the end of the string. By not checking the length of the string, it overwrites the value of B:

Although the programmer did not intend to change B at all, B's value has now been replaced by a number formed from part of the character string. In this example, on a big-endian
Endianness

In computing, endianness is the byte ordering used to represent some kind of data. Typical cases are the order in which integer values are stored as bytes in computer memory and the transmission order over a network or other medium....
 system that uses ASCII
ASCII

American Standard Code for Information Interchange , is a coding standard that can be used for interchanging information, if the information is expressed mainly by the written form of English words....
, "e" followed by a zero byte would become the number 25856. If B was the only other variable data item defined by the program, writing an even longer string that went past the end of B could cause an error such as a segmentation fault
Segmentation fault

A segmentation fault is a particular error condition that can occur during the operation of computer software. A segmentation fault occurs when a program attempts to access a memory location that it is not allowed to access, or attempts to access a memory location in a way that is not allowed ....
, terminating the process.

Exploitation


The techniques to exploit
Exploit (computer security)

An exploit is a piece of software, a chunk of data, or sequence of commands that take advantage of a software bug, glitch or vulnerability in order to cause unintended or unanticipated behavior to occur on computer software, hardware, or something electronic ....
 a buffer overflow vulnerability vary per architecture
Computer architecture

Computer architecture in computer engineering is the conceptual design and fundamental operational structure of a computer system. It is a blueprint and functional description of requirements and design implementations for the various parts of a computer, focusing largely on the way by which the central processing unit performs internally an...
, operating system
Operating system

An operating system is an interface between hardware and applications; it is responsible for the management and coordination of activities and the sharing of the limited resources of the computer....
 and memory region. For example, exploitation on the heap
Dynamic memory allocation

In computer science, dynamic memory allocation is the allocation of computer storage storage for use in a computer program during the runtime of that program....
 (used for dynamically allocated memory) is very different from on the call stack
Call stack

In computer science, a call stack is a dynamic Stack data structure that stores information about the active subroutines of a computer program....
.

Stack-based exploitation


A technically inclined and malicious user may exploit stack-based buffer overflows to manipulate the program in one of several ways:

  • By overwriting a local variable that is near the buffer in memory on the stack to change the behaviour of the program which may benefit the attacker.
  • By overwriting the return address in a stack frame. Once the function returns, execution will resume at the return address as specified by the attacker, usually a user input filled buffer.
  • By overwriting a function pointer, or exception handler, which is subsequently executed.


With a method called "Trampolining" , if the address of the user-supplied data is unknown, but the location is stored in a register, then the return address can be overwritten with the address of an opcode
Opcode

In computer technology, an opcode is the portion of a machine language instruction that specifies the operation to be performed. Their specification and format are laid out in the instruction set architecture of the processor in question ....
 which will cause execution to jump to the user supplied data. If the location is stored in a register R, then a jump to the location containing the opcode for a jump R, call R or similar instruction, will cause execution of user supplied data. The locations of suitable opcodes, or bytes in memory, can be found in DLLs
Dynamic-link library

Dynamic-link library , or DLL, is Microsoft's implementation of the shared library concept in the Microsoft Windows and OS/2 operating systems....
 or the executable itself. However the address of the opcode typically cannot contain any null character
Null character

The null character is a character with the value zero, present in the ASCII and Unicode character sets, and available in nearly all mainstream programming languages....
s and the locations of these opcodes can vary between applications and versions of the operating system. The Metasploit Project
Metasploit Project

The Metasploit Project is a computer security project which provides information about vulnerability and aids in penetration testing and Intrusion-detection system development....
 is one such database of suitable opcodes, though only those found in the Windows
Microsoft Windows

Microsoft Windows is a series of software operating systems and graphical user interfaces produced by Microsoft. Microsoft first introduced an operating environment named Windows in November 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces ....
 operating system are listed.

Stack-based buffer overflows are not to be confused with stack overflow
Stack overflow

In software, a stack overflow occurs when too much computer memory is used on the call stack. In many programming languages the call stack contains a limited amount of memory, usually determined at the start of the program....
s.

Heap-based exploitation


A buffer overflow occurring in the heap data area is referred to as a heap overflow and is exploitable in a different manner to that of stack-based overflows. Memory on the heap is dynamically allocated by the application at run-time and typically contains program data. Exploitation is performed by corrupting this data in specific ways to cause the application to overwrite internal structures such as linked list pointers. The canonical heap overflow technique overwrites dynamic memory allocation linkage (such as malloc
Malloc

In computing, malloc is a subroutine provided in the C and C++'s standard library for performing dynamic memory allocation....
 meta data) and uses the resulting pointer exchange to overwrite a program function pointer.

The Microsoft
Microsoft

Microsoft Corporation is a multinational corporation computer technology corporation that develops, manufactures, licenses, and supports a wide range of computer software products for computing devices....
 JPEG
JPEG

In computing, JPEG is a commonly used method of for photographic images. The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and image quality....
 GDI+
Graphics Device Interface

The Graphics Device Interface is a Microsoft Windows application programming interface and core operating system component that is responsible for representing graphical objects and transmitting them to output devices such as computer display and computer printer....
 vulnerability is an example of the danger a heap overflow can represent to a computer user.

Barriers to exploitation


Manipulation of the buffer, which occurs before it is read or executed, may lead to the failure of an exploitation attempt. These manipulations can mitigate the threat of exploitation, but may not make it impossible. Manipulations could include conversion to upper or lower case, removal of metacharacters and filtering out of non-alphanumeric
Alphanumeric

Alphanumeric is a portmanteau of alphabetic and numeric and is used to describe the collection of Latin alphabet and Arabic numerals used by much of western society....
 strings. However, techniques exist to bypass these filters and manipulations; alphanumeric code
Alphanumeric code

In general, in computing, an alphanumeric code is a series of letters and numbers which are written in a form that can be processed by a computer....
, polymorphic code
Polymorphic code

In computer terminology, polymorphic code is code that mutates while keeping the original algorithm intact. This technique is sometimes used by computer viruses, shellcodes and computer worms to hide their presence....
, Self-modifying code
Self-modifying code

In computer science, self-modifying code is Code that alters its own Instruction while it is Execution - usually to reduce the instruction path length and improve performance....
 and return to libc attacks
Return-to-libc attack

A return-to-libc attack is a computer security attack usually starting with a buffer overflow in which the return statement on the stack is replaced by the address of another function and an additional portion of the stack is overwritten to provide arguments to this function....
. The same methods can be used to avoid detection by Intrusion detection systems. In some cases, including where code is converted into unicode, the threat of the vulnerability have been misrepresented by the disclosers as only Denial of Service when in fact the remote execution of arbitrary code is possible.

Practicalities of exploitation


In real-world exploits there are a variety of issues which need to be overcome for exploits to operate reliably. Null bytes in addresses, variability in the location of shellcode, differences between different environments and various counter-measures in operation.

NOP sled technique


A NOP-sled is the oldest and most widely known technique for successfully exploiting a stack buffer overflow. It solves the problem of finding the exact address of the buffer by effectively increasing the size of the target area. To do this much larger sections of the stack are corrupted with the no-op machine instruction. At the end of the attacker-supplied data, after the no-op instructions, is placed an instruction to perform a relative jump to the top of the buffer where the shellcode is located. This collection of no-ops is referred to as the "NOP-sled" because if the return address is overwritten with any address within the no-op region of the buffer it will "slide" down the no-ops until it is redirected to the actual malicious code by the jump at the end. This technique requires the attacker to guess where on the stack the NOP-sled is instead of the comparatively small shellcode.

Because of the popularity of this technique many vendors of Intrusion prevention systems will search for this pattern of no-op machine instructions in an attempt to detect shellcode in use. It is important to note that a NOP-sled does not necessarily contain only traditional no-op machine instructions; any instruction that does not corrupt the machine state to a point where the shellcode will not run can be used in place of the hardware assisted no-op. As a result it has become common practice for exploit writers to compose the no-op sled with randomly chosen instructions which will have no real effect on the shellcode execution.

While this method greatly improves the chances that an attack will be successful, it is not without problems. Exploits using this technique still must rely on some amount of luck that they will guess offsets on the stack that are within the NOP-sled region. An incorrect guess will usually result in the target program crashing and could alert the system administrator
System administrator

A system administrator, systems administrator, or sysadmin, is a person employed to maintain and operate a computer system and/or computer network....
 to the attacker's activities. Another problem is that the NOP-sled requires a much larger amount of memory in which to hold a NOP-sled large enough to be of any use. This can be a problem when the allocated size of the affected buffer is too small and the current depth of the stack is shallow (i.e. there is not much space from the end of the current stack frame to the start of the stack). Despite its problems, the NOP-sled is often the only method that will work for a given platform, environment, or situation; as such it is still an important technique.

The jump to register technique

The "jump to register" technique allows for reliable exploitation of stack buffer overflows without the need for extra room for a NOP-sled and without having to guess stack offsets. The strategy is to overwrite the return pointer with something that will cause the program to jump to a known pointer stored within a register which points to the controlled buffer and thus the shellcode. For example if register A contains a pointer to the start of a buffer then any jump or call taking that register as an operand can be used to gain control of the flow of execution. In practice a program may not intentionally contain instructions to jump to a particular register. The traditional solution is to find an unintentional instance of a suitable opcode
Opcode

In computer technology, an opcode is the portion of a machine language instruction that specifies the operation to be performed. Their specification and format are laid out in the instruction set architecture of the processor in question ....
 at a fixed location somewhere within the program memory. In figure on the left you can see an example of such an unintentional instance of the i386 jmp esp instruction. The opcode for this instruction is FF E4. This two byte sequence can be found at a one byte offset from the start of the instruction call DbgPrint at address 0x7C941EED. If an attacker overwrites the program return address with this address the program will first jump to 0x7C941EED, interpret the opcode FF E4 as the jmp esp instruction, and will then jump to the top of the stack and execute the attacker's code.

When this technique is possible the severity of the vulnerability increases considerably. This is because exploitation will work reliably enough to automate an attack with a virtual guarantee of success when it is run. For this reason, this is the technique most commonly used in Internet worms that exploit stack buffer overflow vulnerabilities.

This method also allows shellcode to be placed after the overwritten return address on the Windows platform. Since executables are based at address 0x00400000 and x86 is a Little Endian architecture, the last byte of the return address must be a null, which terminates the buffer copy and nothing is written beyond that. This limits the size of the shellcode to the size of the buffer, which may be overly restrictive. DLLs are located in high memory (above 0x01000000 and so have addresses containing no null bytes, so this method can remove null bytes (or other disallowed characters) from the overwritten return address. Used in this way, the method is often referred to as "DLL Trampolining".

Protective countermeasures


Various techniques have been used to detect or prevent buffer overflows, with various tradeoffs. The most reliable way to avoid or prevent buffer overflows is to use automatic protection at the language level. This sort of protection, however, cannot be applied to legacy code
Legacy code

Legacy code is source code that relates to a no-longer supported or manufactured operating system or other computer technology. The term can also mean code inserted into modern software for the purpose of maintaining an older or previously supported feature — for example supporting a serial interface even though many modern systems don...
, and often technical, business, or cultural constraints call for a vulnerable language. The following sections describe the choices and implementations available.

Choice of programming language


The choice of programming language can have a profound effect on the occurrence of buffer overflows. , among the most popular languages are C
C (programming language)

C is a general-purpose computer programming language originally developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories to implement the Unix operating system....
 and its derivative, C++
C++

C++ is a general-purpose programming language. It is regarded as a middle-level language, as it comprises a combination of both high-level programming language and low-level programming language language features....
, with an enormous body of software having been written in these languages. C and C++ provide no built-in protection against accessing or overwriting data in any part of memory; more specifically, they do not check that data written to an array (the implementation of a buffer) is within the boundaries of that array. However, the standard C++ libraries provide many ways of safely buffering data, and technology to avoid buffer overflows also exists for C.

Many other programming languages provide runtime checking and in some cases even compile-time checking which might send a warning or raise an exception
Exception handling

Exception handling is a programming language construct or computer hardware mechanism designed to handle the occurrence of exceptions - special conditions that change the normal flow of execution....
 when C or C++ would overwrite data and continue to execute further instructions until erroneous results are obtained which might or might not cause the program to crash. Examples of such languages include Ada, Lisp
Lisp programming language

Lisp is a family of computer programming languages with a long history and a distinctive, fully parenthesized syntax. Originally specified in 1958, Lisp is the second-oldest high-level programming language in widespread use today; only Fortran is older....
, Modula-2
Modula-2

Modula-2 is a computer programming language invented by Niklaus Wirth at ETH, around 1978, as a successor to his intermediate language Modula. Modula-2 was implemented in 1980 for the Lilith computer, which was commercialized in 1982 by startup company DISER as MC1 and MC2....
, Smalltalk
Smalltalk

Smalltalk is an Object-oriented programming, Type system, reflection computer programming programming language. Smalltalk was created as the language to underpin the "new world" of computing exemplified by "human?computer symbiosis." It was designed and created in part for educational use, more so for constructionist learning, at PARC by Al...
, OCaml and such C-derivatives as Cyclone
Cyclone programming language

The Cyclone programming language is intended to be a safe dialect of the C . Cyclone is designed to avoid buffer overflows and other vulnerabilities that are endemic in C programs, without losing the power and convenience of C as a tool for systems programming....
 and D
D (programming language)

The D programming language, also known simply as D, is an Object-oriented programming, Imperative programming, Multi-paradigm programming language system programming language by Walter Bright of Digital Mars....
. The Java
Java (Sun)

Java refers to a number of computer software products and specifications from Sun Microsystems that together provide a system for developing application software and deploying it in a cross-platform environment....
 and .NET bytecode environments also require bounds checking on all arrays. Nearly every interpreted language will protect against buffer overflows, signalling a well-defined error condition. Often where a language provides enough type information to do bounds checking an option is provided to enable or disable it. Static code analysis
Static code analysis

Static code analysis is the Program analysis that is performed without actually executing programs built from that software . In most cases the analysis is performed on some version of the source code and in the other cases some form of the object code....
 can remove many dynamic bound and type checks, but poor implementations and awkward cases can significantly decrease performance. Software engineers must carefully consider the tradeoffs of safety versus performance costs when deciding which language and compiler setting to use.

Use of safe libraries


The problem of buffer overflows is common in the C and C++ languages because they expose low level representational details of buffers as containers for data types. Buffer overflows must thus be avoided by maintaining a high degree of correctness in code which performs buffer management. It has also long been recommended to avoid standard library functions which are not bounds checked, such as gets
Gets

gets is a function in the C standard library, declared in the header file stdio.h, that reads a line from the standard input and stores it in a buffer provided by the caller....
, scanf
Scanf

scanf is a Function that reads data with specified format from a given string stream source, originated from C programming language, and is present in many other programming languages....
and strcpy
Strcpy

The C offers a C standard library called strcpy, defined in the string.h header file, that allows null-terminated memory blocks to be copied from one location to another....
. The Morris worm exploited a gets call in fingerd.

Well-written and tested abstract data type libraries which centralize and automatically perform buffer management, including bounds checking, can reduce the occurrence and impact of buffer overflows. The two main building-block data types in these languages in which buffer overflows commonly occur are strings and arrays; thus, libraries preventing buffer overflows in these data types can provide the vast majority of the necessary coverage. Still, failure to use these safe libraries correctly can result in buffer overflows and other vulnerabilities; and naturally, any bug in the library itself is a potential vulnerability. "Safe" library implementations include "The Better String Library" , Vstr and Erwin. The OpenBSD
OpenBSD

OpenBSD is a Unix-like computer operating system descended from Berkeley Software Distribution , a Unix derivative developed at the University of California, Berkeley....
 operating system
Operating system

An operating system is an interface between hardware and applications; it is responsible for the management and coordination of activities and the sharing of the limited resources of the computer....
's C library provides the strlcpy
Strlcpy

The strlcpy function, developed by Todd C. Miller and Theo de Raadt for use in the C , is intended to replace the function strcpy and provide a simpler and more robust interface than strncpy....
 and strlcat functions, but these are more limited than full safe library implementations.

In September 2006, Technical Report 24731, prepared by the C standards committee, was published; it specifies a set of functions which are based on the standard C library's string and I/O functions, with additional buffer-size parameters. However, the efficacy of these functions for the purpose of reducing buffer overflows is disputable; it requires programmer intervention on a per function call basis that is equivalent to intervention that could make the analogous older standard library functions buffer overflow safe.

Stack-smashing protection


Stack-smashing protection is used to detect the most common buffer overflows by checking that the stack
Call stack

In computer science, a call stack is a dynamic Stack data structure that stores information about the active subroutines of a computer program....
 has not been altered when a function returns. If it has been altered, the program exits with a segmentation fault
Segmentation fault

A segmentation fault is a particular error condition that can occur during the operation of computer software. A segmentation fault occurs when a program attempts to access a memory location that it is not allowed to access, or attempts to access a memory location in a way that is not allowed ....
. Three such systems are Libsafe, and the StackGuard and ProPolice gcc
GNU Compiler Collection

The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain....
 patches.

Microsoft's Data Execution Prevention
Data Execution Prevention

Data Execution Prevention is a security feature included in modern Microsoft Windows operating systems that is intended to prevent an Application software or service from executing code from a non-executable memory region....
 mode explicitly protects the pointer to the SEH Exception Handler from being overwritten.

Stronger stack protection is possible by splitting the stack in two: one for data and one for function returns. This split is present in the Forth programming language, though it was not a security-based design decision. Regardless, this is not a complete solution to buffer overflows, as sensitive data other than the return address may still be overwritten.

Pointer protection


Buffer overflows work by manipulating pointers (including stored addresses). PointGuard was proposed as a compiler-extension to prevent attackers from being able to reliably manipulate pointers and addresses. The approach works having the compiler add code to automatically XOR-encode pointers before and after they are used. Because the attacker (theoretically) does not know what value will be used to encode/decode the pointer, he cannot predict what it will point to if he overwrites it with a new value. PointGuard was never released, but Microsoft implemented a similar approach beginning in Windows XP SP2 and Windows Server 2003 SP1. Rather than implement pointer protection as an automatic feature, Microsoft added an API routine that can be called at the discretion of the programmer. This allows for better performance (because it is not used all of the time), but places the burden on the programmer to know when it is necessary.

Because XOR is linear, an attacker may be able to manipulate an encoded pointer by overwriting only the lower bytes of an address. This can allow an attack to succeed if the attacker is able to attempt the exploit multiple times and/or is able to complete an attack by causing a pointer to point to one of several locations (such as any location within a NOP sled). Microsoft added a random rotation to their encoding scheme to address this weakness to partial overwrites.

Executable space protection


Executable space protection is an approach to buffer overflow protection which prevents execution of code on the stack or the heap. An attacker may use buffer overflows to insert arbitrary code into the memory of a program, but with executable space protection, any attempt to execute that code will cause an exception.

Some CPUs support a feature called NX
NX bit

The NX bit, which stands for No eXecute, is a technology used in CPUs to segregate areas of memory for use by either storage of processor instructions or for storage of data, a feature normally only found in Harvard architecture processors....
 ("No eXecute") or XD ("eXecute Disabled") bit, which in conjunction with software, can be used to mark pages of data
Paging

In computer operating systems that have their main memory divided into page , paging is a transfer of pages between main memory and an auxiliary store, such as hard disk drive....
 (such as those containing the stack and the heap) as readable and writeable but not executable.

Some Unix operating systems (e.g. OpenBSD
OpenBSD

OpenBSD is a Unix-like computer operating system descended from Berkeley Software Distribution , a Unix derivative developed at the University of California, Berkeley....
, Mac OS X
Mac OS X

Mac OS X is a line of computer operating systems developed, marketed, and sold by Apple Inc., and since 2002 has been included with all new Macintosh computer systems....
) ship with executable space protection (e.g. W^X
W^X

W^X is the name of a Computer insecurity feature present in the OpenBSD operating system. It is a memory protection policy whereby every paging in a Process ' address space is either writable or executable, but not both simultaneously ....
). Some optional packages include:

  • PaX
    Pax

    Pax may refer to:* the Latin language word for peace, used in phrases such as Pax Romana ; also, its personification, Pax , goddess of peace in Roman mythology...
     
  • Exec Shield
    Exec Shield

    Exec Shield is a project started at Red Hat, Inc in late 2002 with the aim of reducing the risk of worm or other automated remote attacks on Linux systems....
     
  • Openwall


Newer variants of Microsoft Windows also support executable space protection, called Data Execution Prevention
Data Execution Prevention

Data Execution Prevention is a security feature included in modern Microsoft Windows operating systems that is intended to prevent an Application software or service from executing code from a non-executable memory region....
. Proprietary
Proprietary software

Proprietary software is a term coined by advocates of the free software movement to describe computer software which is the legal property of one party....
 add-ons include:

  • BufferShield
  • StackDefender


Executable space protection does not generally protect against return-to-libc attack
Return-to-libc attack

A return-to-libc attack is a computer security attack usually starting with a buffer overflow in which the return statement on the stack is replaced by the address of another function and an additional portion of the stack is overwritten to provide arguments to this function....
s, or any other attack which does not rely on the execution of the attackers code. However, on 64-bit
64-bit

64-bit CPUs have existed in supercomputers since the 1960s and in RISC-based computer workstation and Server s since the early 1990s. In 2003 they were introduced to the mainstream personal computer arena, in the form of the x86-64 and 64-bit PowerPC processor architectures....
 systems using ASLR, as described below, executable space protection makes it far more difficult to execute such attacks.

Address space layout randomization


Address space layout randomization (ASLR) is a computer security feature which involves arranging the positions of key data areas, usually including the base of the executable and position of libraries, heap, and stack, randomly in a process' address space.

Randomization of the virtual memory
Virtual memory

Virtual memory is a computer system technique which gives an application program the impression that it has contiguous working memory , while in fact it may be physically fragmented and may even overflow on to disk storage....
 addresses at which functions and variables can be found can make exploitation of a buffer overflow more difficult, but not impossible. It also forces the attacker to tailor the exploitation attempt to the individual system, which foils the attempts of internet worms. A similar but less effective method is to rebase
Rebasing

In computing, the term rebasing may refer to one of the following:*Rebasing is the process of creating a shared library image in such a way that it is guaranteed to use virtual memory without conflicting with any other Library loadable in the system....
 processes and libraries in the virtual address space.

Deep packet inspection


The use of deep packet inspection (DPI) can detect, at the network perimeter, very basic remote attempts to exploit buffer overflows by use of attack signatures and heuristics
Heuristic (computer science)

In computer science, a heuristic algorithm, or simply a heuristic, is an algorithm that is able to produce an acceptable solution to a problem in many practical scenarios, but for which there is no formal proof of its correctness....
. These are able to block packets which have the signature of a known attack, or if a long series of No-Operation instructions (known as a nop-sled) is detected, these were once used when the location of the exploit's payload
Payload (software)

Material transmitted over a network includes both data and information that identifies the source and destination of the material. The payload is the actual data, or the cargo, carried by the Header ....
 is slightly variable.

Packet scanning is not an effective method since it can only prevent known attacks and there are many ways that a 'nop-sled' can be encoded. Attackers have begun to use alphanumeric
Alphanumeric code

In general, in computing, an alphanumeric code is a series of letters and numbers which are written in a form that can be processed by a computer....
, metamorphic
Metamorphic code

In computer virus terms, metamorphic code is code that can reprogram itself. Often, it does this by translating its own code into a temporary representation, edit the temporary representation of itself, and then write itself back to normal code again....
, and self-modifying
Self-modifying code

In computer science, self-modifying code is Code that alters its own Instruction while it is Execution - usually to reduce the instruction path length and improve performance....
 shellcode
Shellcode

In computer security, a shellcode is a small piece of code used as the Payload in the exploit of a software Vulnerability . It is called "shellcode" because it typically starts a Shell from which the attacker can control the compromised machine....
s to evade detection by heuristic packet scanners and Intrusion detection systems.

History of exploitation


Buffer overflows were understood as early as 1972, when the Computer Security Technology Planning Study laid out the technique: "The code performing this function does not check the source and destination addresses properly, permitting portions of the monitor to be overlaid by the user. This can be used to inject code into the monitor that will permit the user to seize control of the machine." (Page 61) Today, the monitor would be referred to as the kernel.

The spread of personal computers in the 1980s increased the number of people who were aware of the technique. On the Commodore PET
Commodore PET

The PET was a home computer-/personal computer produced by Commodore International starting in 1977. Although it was not a top seller outside the Canadian, US, and UK educational markets, it was Commodore's first full-featured computer and would form the basis for their future success....
 for instance it was a common practice to employ a rarely-used second tape buffer to store assembly language routines. Some programmers, to save a few bytes of space on a machine with a maximum of 32K RAM, avoided use of the tedious BASIC "POKE" statement by changing the print buffer start to the tape buffer to print the 6502 assembly language code (as strange looking characters) directly to the desired location. Since the actual print buffer was longer than the tape buffer, the BASIC string could easily overrun byte 1024 and interfere with the Microsoft BASIC interpreter on the PET. The bare-bones boot image
Boot image

A boot image is a type of disk image . When it is transferred onto a boot device it allows the associated hardware to Booting.This usually includes the operating system, utilities and diagnostics, as well as boot and data recovery information....
 loaders of the early personal computers, including the early Mac, Commodore, Atari and all Microsoft operating systems up to Windows 95 and 98, had inadequate buffer protections and so many programmers became aware of buffer overflows.

The earliest documented hostile exploitation of a buffer overflow was in 1988. It was one of several exploits used by the Morris worm to propagate itself over the Internet. The program exploited was a Unix
Unix

Unix is a computer operating system originally developed in 1969 by a group of American Telephone & Telegraph employees at Bell Labs, including Ken Thompson , Dennis Ritchie, Douglas McIlroy, and Joe Ossanna....
 service called finger
Finger protocol

In computer networking, the Name/Finger protocol and the Finger user information protocol are simple network protocols for the exchange of human-oriented status and user information....
. Later, in 1995, Thomas Lopatic independently rediscovered the buffer overflow and published his findings on the Bugtraq
Bugtraq

Bugtraq is an electronic mailing list dedicated to issues about computer security. On-topic issues are new discussions about vulnerabilities, vendor security-related announcements, methods of exploitation, and how to fix them....
 security mailing list. A year later, in 1996, Elias Levy
Elias Levy

Elias Levy , was the moderator of the full disclosure vulnerability mailing list Bugtraq from May 14 1996, until he stepped down on October 15 2001....
 (aka Aleph One) published in Phrack
Phrack

Phrack is an ezine written by and for Hacker first published November 17, 1985. Described by Gordon Lyon as "the best, and by far the longest running hacker zine," the magazine is open for contributions by anyone who desires to publish remarkable works or express original ideas on the topics of interest....
 magazine the paper "Smashing the Stack for Fun and Profit", a step-by-step introduction to exploiting stack-based buffer overflow vulnerabilities.

Since then, at least two major internet worms have exploited buffer overflows to compromise a large number of systems. In 2001, the Code Red worm exploited a buffer overflow in Microsoft's Internet Information Services
Internet Information Services

Internet Information Services - formerly called Internet Information Server - is a set of Internet-based services for servers created by Microsoft for use with Microsoft Windows....
 (IIS) 5.0 and in 2003 the SQL Slammer
SQL slammer (computer worm)

The SQL slammer worm is a computer worm that caused a denial of service on some Internet hosts and dramatically slowed down general Internet traffic, starting at 05:30 UTC on January 25, 2003....
 worm compromised machines running Microsoft SQL Server 2000.

In 2003, buffer overflows present in licensed Xbox
Xbox

The Xbox is a History of video games video game console produced by Microsoft. It was Microsoft's first foray into the gaming console market, and competed with Sony's PlayStation 2 and Nintendo's GameCube....
 games have been exploited to allow unlicensed software, including homebrew games
Homebrew (video games)

Homebrew is a term frequently applied to video games produced by consumers to target proprietary hardware platforms not typically user-programmable or that use proprietary storage methods....
, to run on the console without the need for hardware modifications, known as modchip
Modchip

A modchip is a small electronic device used to modify or disable built-in restrictions and limitations of many popular videogame consoles. It introduces various modifications to its host system's function, including the circumvention of region coding, digital rights management, and copy protection checks for the purpose of running software...
s. The PS2 Independence Exploit
PS2 Independence Exploit

The PS2 Independence Exploit allows the execution of homebrew on an unmodified PlayStation 2....
 also used a buffer overflow to achieve the same for the PlayStation 2
PlayStation 2

The PlayStation 2 is a History of video game consoles video game console manufactured by Sony. The successor to the PlayStation, and the predecessor to the PlayStation 3, the PlayStation 2 forms part of the PlayStation of video game consoles....
. The Twilight Hack
Twilight hack

The Twilight hack is the name given to the exploit found by Team Twiizers of Wiibrew.org in The Legend of Zelda: Twilight Princess that permits Homebrew developers and everyday users to run unofficial Wii homebrew from a Secure Digital card inserted into the slot on the front of the Wii....
 accomplished the same with the Wii
Wii

The Wii is a home video game console released by Nintendo. As a History of video game consoles console, the Wii primarily competes with Microsoft's Xbox 360 and Sony's PlayStation 3....
, using a buffer overflow in The Legend of Zelda: Twilight Princess
The Legend of Zelda: Twilight Princess

is an action-adventure game developed by Nintendo Entertainment Analysis and Development, and published by Nintendo for the Wii and Nintendo GameCube video game consoles....
.

See also


External links

  • [https://www.securecoding.cert.org CERT Secure Coding Standards]
  • by Aleph One
  • by Nomenumbra
  • from Sockets, Shellcode, Porting & Coding: Reverse Engineering Exploits and Tool Coding for Security Professionals by James C. Foster (ISBN 1-59749-005-9). Detailed explanation of how to use Metasploit to develop a buffer overflow exploit from scratch.
  • , James P. Anderson, ESD-TR-73-51, ESD/AFSC, Hanscom AFB, Bedford, MA 01731 (Oct. 1972) [NTIS AD-758 206]