Format string attack
Encyclopedia
Uncontrolled format string is a type of software vulnerability, discovered around 1999, that can be used in security exploits. Previously thought harmless, format string exploits can be used to crash
Crash (computing)
A crash in computing is a condition where a computer or a program, either an application or part of the operating system, ceases to function properly, often exiting after encountering errors. Often the offending program may appear to freeze or hang until a crash reporting service documents...

 a program or to execute harmful code. The problem stems from the use of unchecked user input as the format string parameter in certain C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

 functions that perform formatting, such as printf
Printf
Printf format string refers to a control parameter used by a class of functions typically associated with some types of programming languages. The format string specifies a method for rendering an arbitrary number of varied data type parameter into a string...

. A malicious user may use the %s and %x format tokens, among others, to print data from the stack or possibly other locations in memory. One may also write arbitrary data to arbitrary locations using the %n format token, which commands printf and similar functions to write the number of bytes formatted to an address stored on the stack.

Details

A typical exploit uses a combination of these techniques to force a program to overwrite the address of a library function or the return address on the stack with a pointer to some malicious shellcode
Shellcode
In computer security, a shellcode is a small piece of code used as the payload in the exploitation of a software vulnerability. It is called "shellcode" because it typically starts a command shell from which the attacker can control the compromised machine. Shellcode is commonly written in...

. The padding parameters to format specifiers are used to control the number of bytes output and the %x token is used to pop bytes from the stack until the beginning of the format string itself is reached. The start of the format string is crafted to contain the address that the %n format token can then overwrite with the address of the malicious code to execute.

This is a common vulnerability because format bugs were previously thought harmless and resulted in vulnerabilities in many common tools. MITRE's CVE project lists roughly 500 vulnerable programs as of June 2007, and a trend analysis ranks it the 9th most-reported vulnerability type between 2001 and 2006.

Format string bugs most commonly appear when a programmer wishes to print a string containing user supplied data. The programmer may mistakenly write printf(buffer) instead of printf("%s", buffer). The first version interprets buffer as a format string, and parses any formatting instructions it may contain. The second version simply prints a string to the screen, as the programmer intended.

Format bugs arise because C's argument passing conventions are not type-safe
Type safety
In computer science, type safety is the extent to which a programming language discourages or prevents type errors. A type error is erroneous or undesirable program behaviour caused by a discrepancy between differing data types...

. In particular, the varargs
Stdarg.h
stdarg.h is a header in the C standard library of the C programming language that allows functions to accept an indefinite number of arguments. It provides facilities for stepping through a list of function arguments of unknown number and type...

mechanism allows functions to accept any number of arguments (e.g. printf) by "popping" as many argument
Argument
In philosophy and logic, an argument is an attempt to persuade someone of something, or give evidence or reasons for accepting a particular conclusion.Argument may also refer to:-Mathematics and computer science:...

s off the call stack
Call stack
In computer science, a call stack is a stack data structure that stores information about the active subroutines of a computer program. This kind of stack is also known as an execution stack, control stack, run-time stack, or machine stack, and is often shortened to just "the stack"...

 as they wish, trusting the early arguments to indicate how many additional arguments are to be popped, and of what types.

Format string bugs can occur in other programming languages besides C, although they appear with less frequency and usually cannot be exploited to execute code of the attacker's choice.

Format bugs were first noted in 1990 in the fuzz testing
Fuzz testing
Fuzz testing or fuzzing is a software testing technique, often automated or semi-automated, that involves providing invalid, unexpected, or random data to the inputs of a computer program. The program is then monitored for exceptions such as crashes or failing built-in code assertions...

 work done at the University of Wisconsin (see Miller, Fredriksen, So 1990). They called these bugs "interaction effects" and noted their presence when testing the C shell
C shell
The C shell is a Unix shell that was created by Bill Joy while a graduate student at University of California, Berkeley in the late 1970s. It has been distributed widely, beginning with the 2BSD release of the BSD Unix system that Joy began distributing in 1978...

 (csh).

The use of format string bugs as an attack vector was discovered by Tymm Twillman during a security audit of the ProFTPd daemon. The audit uncovered an snprintf that directly passed user-generated data without a format string. Extensive tests with contrived arguments to printf-style functions showed that use of this for privilege escalation was actually possible. This led to the first posting in September 1999 on the Bugtraq
Bugtraq
Bugtraq is an electronic mailing list dedicated to issues about computer security. On-topic issues are new discussions about vulnerabilities, vendor security-related announcements, methods of exploitation, and how to fix them...

 mailing list regarding this class of vulnerabilities, including a basic exploit. It was still several months, however, before the security community became aware of the full dangers of format string vulnerabilities as exploits for other software using this method began to surface. The first exploits leading to successful privilege escalation
Privilege escalation
Privilege escalation is the act of exploiting a bug, design flaw or configuration oversight in an operating system or software application to gain elevated access to resources that are normally protected from an application or user...

 attack were published simultaneously on the Bugtraq
Bugtraq
Bugtraq is an electronic mailing list dedicated to issues about computer security. On-topic issues are new discussions about vulnerabilities, vendor security-related announcements, methods of exploitation, and how to fix them...

 list in June 2000 by Przemysław Frasunek and the person using nickname tf8. The seminal paper "Format String Attacks" by Tim Newsham was published in September 2000.

Prevention

Many compilers can statically check format strings and produce warnings for dangerous or suspect formats.

In the GNU Compiler Collection
GNU Compiler Collection
The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain...

, the relevant compiler flags are, -Wall,-Wformat, -Wno-format-extra-args, -Wformat-security, -Wformat-nonliteral, and -Wformat=2.

This is only useful for detecting bad format strings that are known at compile-time. If the format string may come from the user or from a source external to the application, the application must validate the format string before using it. Care must also be taken if the application generates or selects format strings on the fly.

See also

  • Cross-application scripting exploits a similar kind of programming error
  • printf
    Printf
    Printf format string refers to a control parameter used by a class of functions typically associated with some types of programming languages. The format string specifies a method for rendering an arbitrary number of varied data type parameter into a string...

  • scanf
    Scanf
    Scanf format string refers to a control parameter used by a class of functions typically associated with some types of programming languages. The format string specifies a method for reading a string into an arbitrary number of varied data type parameter...

  • syslog
    Syslog
    Syslog is a standard for computer data logging. It allows separation of the software that generates messages from the system that stores them and the software that reports and analyzes them...

  • Improper input validation

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK