Shellcode
Encyclopedia
In computer security
Computer security
Computer security is a branch of computer technology known as information security as applied to computers and networks. The objective of computer security includes protection of information and property from theft, corruption, or natural disaster, while allowing the information and property to...

, a shellcode is a small piece of code used as the payload
Payload (software)
Payload in computing is the cargo of a data transmission. It is the part of the transmitted data which is the fundamental purpose of the transmission, to the exclusion of information sent with it solely to facilitate delivery.In computer security, payload refers to the...

 in the exploitation of a software vulnerability
Vulnerability (computing)
In computer security, a vulnerability is a weakness which allows an attacker to reduce a system's information assurance.Vulnerability is the intersection of three elements: a system susceptibility or flaw, attacker access to the flaw, and attacker capability to exploit the flaw...

. It is called "shellcode" because it typically starts a command shell
Shell (computing)
A shell is a piece of software that provides an interface for users of an operating system which provides access to the services of a kernel. However, the term is also applied very loosely to applications and may include any software that is "built around" a particular component, such as web...

 from which the attacker can control the compromised machine. Shellcode is commonly written in machine code
Machine code
Machine code or machine language is a system of impartible instructions executed directly by a computer's central processing unit. Each instruction performs a very specific task, typically either an operation on a unit of data Machine code or machine language is a system of impartible instructions...

, but any piece of code that performs a similar task can be called shellcode. Because the function of a payload is not limited to merely spawning a shell, some have suggested that the name shellcode is insufficient. However, attempts at replacing the term have not gained wide acceptance.

Types of shellcode

Shellcode can either be local or remote, depending on whether it gives an attacker control over the machine it runs on (local) or over another machine through a network (remote).

Local

Local shellcode is used by an attacker who has limited access to a machine but can exploit a vulnerability, for example a buffer overflow
Buffer overflow
In computer security and programming, a buffer overflow, or buffer overrun, is an anomaly where a program, while writing data to a buffer, overruns the buffer's boundary and overwrites adjacent memory. This is a special case of violation of memory safety....

, in a higher-privileged process on that machine. If successfully executed, the shellcode will provide the attacker access to the machine with the same higher privileges as the targeted process.

Remote

Remote shellcode is used when an attacker wants to target a vulnerable process running on another machine on a local network
Local area network
A local area network is a computer network that interconnects computers in a limited area such as a home, school, computer laboratory, or office building...

 or intranet
Intranet
An intranet is a computer network that uses Internet Protocol technology to securely share any part of an organization's information or network operating system within that organization. The term is used in contrast to internet, a network between organizations, and instead refers to a network...

. If successfully executed, the shellcode can provide the attacker access to the target machine across the network. Remote shellcodes normally use standard TCP/IP
Internet protocol suite
The Internet protocol suite is the set of communications protocols used for the Internet and other similar networks. It is commonly known as TCP/IP from its most important protocols: Transmission Control Protocol and Internet Protocol , which were the first networking protocols defined in this...

 socket
Stream socket
In computer networking, a stream socket is a type of internet socket which provides a connection-oriented, sequenced, and unduplicated flow of data without record boundaries, with well-defined mechanisms for creating and destroying connections and for detecting errors.This internet socket type...

 connections to allow the attacker access to the shell on the target machine. Such shellcode can be categorised based on how this connection is set up: if the shellcode can establish this connection, it is called connect-back shellcode because the shellcode connects back to the attacker's machine. On the other hand, if the attacker needs to create the connection, the shellcode is called a bindshell because the shellcode binds to a certain port on which the attacker can connect to control it. A third type, much less common, is socket-reuse shellcode. This type of shellcode is sometimes used when an exploit establishes a connection to the vulnerable process that is not closed before the shellcode is run. The shellcode can then re-use this connection to communicate with the attacker. Socket re-using shellcode is harder to create because the shellcode needs to find out which connection to re-use and the machine may have many connections open.

A firewall can be used to detect the outgoing connections made by connect-back shellcodes and the attempt to accept incoming connections made by bindshells. They can therefore offer some protection against an attacker, even if the system is vulnerable, by preventing the attacker from gaining access to the shell created by the shellcode. This is one reason why socket re-using shellcode is sometimes used: because it does not create new connections and therefore is harder to detect and block.

Download and execute

Download and execute is a type of remote shellcode that downloads
Uploading and downloading
In computer networks, to download means to receive data to a local system from a remote system, or to initiate such a data transfer. Examples of a remote system from which a download might be performed include a webserver, FTP server, email server, or other similar systems...

 and executes
Execution (computers)
Execution in computer and software engineering is the process by which a computer or a virtual machine carries out the instructions of a computer program. The instructions in the program trigger sequences of simple actions on the executing machine...

some form of malware on the target system. This type of shellcode does not spawn a shell, but rather instructs the machine to download a certain executable file off the network, save it to disk and execute it. Nowadays, it is commonly used in drive-by download
Drive-by download
Drive-by download means three things, each concerning the unintended download of computer software from the Internet:# Downloads which a person authorized but without understanding the consequences Drive-by download means three things, each concerning the unintended download of computer software...

 attacks, where a victim visits a malicious webpage that in turn attempts to run such a download and execute shellcode in order to install software on the victim's machine. A variation of this type of shellcode downloads and loads
Dynamic loading
Dynamic loading is a mechanism by which a computer program can, at run time, load a library into memory, retrieve the addresses of functions and variables contained in the library, execute those functions or access those variables, and unload the library from memory...

 a library. Advantages of this technique are that the code can be smaller, that it does not require the shellcode to spawn a new process on the target system, and that the shellcode does not need code to clean up the targeted process as this can be done by the library loaded into the process.

Staged

When the amount of data that an attacker can inject into the target process is too limited to execute useful shellcode directly, it may be possible to execute it in stages. First, a small piece of shellcode (stage 1) is executed. This code then downloads a larger piece of shellcode (stage 2) into the process's memory and executes it.

Egg-hunt

This is another form of staged shellcode, which is used if an attacker can inject a larger shellcode into the process but cannot determine where in the process it will end up. Small egg-hunt shellcode is injected into the process at a predictable location and executed. This code then searches the process's address space for the larger shellcode (the egg) and executes it.

Omelette

This type of shellcode is similar to egg-hunt shellcode, but looks for multiple small blocks of data (eggs) and recombines them into one larger block (the omelet) that is subsequently executed. This is used when an attacker can only inject a number of small blocks of data into the process.

Shellcode execution strategy

An exploit will commonly inject a shellcode into the target process before or at the same time as it exploits a vulnerability to gain control over the program counter
Program counter
The program counter , commonly called the instruction pointer in Intel x86 microprocessors, and sometimes called the instruction address register, or just part of the instruction sequencer in some computers, is a processor register that indicates where the computer is in its instruction sequence...

. The program counter is adjusted to point to the shellcode, after which it gets executed and performs its task. Injecting the shellcode is often done by storing the shellcode in data sent over the network to the vulnerable process, by supplying it in a file that is read by the vulnerable process or through the command line or environment in the case of local exploits.

Shellcode encoding

Because most processes filter or restrict the data that can be injected, shellcode often needs to be written to allow for these restrictions. This includes making the code small, null-free or alphanumeric
Alphanumeric code
In general, in computing, an alphanumeric code is a series of letters and numbers which are written in a form that can be processed by a computer....

. Various solutions have been found to get around such restrictions, including:
  • Design and implementation optimizations to decrease the size of the shellcode.
  • Implementation modifications to get around limitations in the range of bytes used in the shellcode.
  • Self-modifying code that modifies a number of the bytes of its own code before executing them to re-create bytes that are normally impossible to inject into the process.


Since intrusion detection
Intrusion detection
In Information Security, intrusion detection is the act of detecting actions that attempt to compromise the confidentiality, integrity or availability of a resource. When Intrusion detection takes a preventive measure without direct human intervention, then it becomes an Intrusion-prevention...

 can detect signatures of simple shellcodes being sent over the network, it is often encoded, made self-decrypting or polymorphic
Polymorphic code
In computer terminology, polymorphic code is code that uses a polymorphic engine to mutate while keeping the original algorithm intact. That is, the code changes itself each time it runs, but the function of the code will not change at all...

 to avoid detection.

Percent encoding

Exploits that target browsers commonly encode shellcode in a JavaScript string using Percent-encoding
Percent-encoding
Percent-encoding, also known as URL encoding, is a mechanism for encoding information in a Uniform Resource Identifier under certain circumstances. Although it is known as URL encoding it is, in fact, used more generally within the main Uniform Resource Identifier set, which includes both Uniform...

, escape sequence encoding "\uXXXX" or entity encoding
Character encodings in HTML
HTML has been in use since 1991, but HTML 4.0 was the first standardized version where international characters were given reasonably complete treatment...

. Some exploits also obfuscate the encoded shellcode string further to prevent detection by IDS
Intrusion detection
In Information Security, intrusion detection is the act of detecting actions that attempt to compromise the confidentiality, integrity or availability of a resource. When Intrusion detection takes a preventive measure without direct human intervention, then it becomes an Intrusion-prevention...

.

For example, on the IA-32
IA-32
IA-32 , also known as x86-32, i386 or x86, is the CISC instruction-set architecture of Intel's most commercially successful microprocessors, and was first implemented in the Intel 80386 as a 32-bit extension of x86 architecture...

 architecture, here's how two NOP
NOP
In computer science, NOP or NOOP is an assembly language instruction, sequence of programming language statements, or computer protocol command that effectively does nothing at all....

(no-operation) instructions would look, first unencoded:
90 NOP
90 NOP
Then encoded into a string using percent-encoding (using the unescape function to decode):
unescape("%u9090");
Next encoded into a string using "\uXXXX"-encoding:
"\u9090";
And finally encoded into a string using entity encoding:
"邐"
or
"邐"

Null-free shellcode

Most shellcodes are written without the use of null
Null character
The null character , abbreviated NUL, is a control character with the value zero.It is present in many character sets, including ISO/IEC 646 , the C0 control code, the Universal Character Set , and EBCDIC...

 bytes because they are intended to be injected into a target process through null-terminated string
Null-terminated string
In computer programming, a null-terminated string is a character string stored as an array containing the characters and terminated with a null character...

s. When a null-terminated string is copied, it will be copied up to and including the first null but subsequent bytes of the shellcode will not be processed. When shellcode that contains nulls is injected in this way, only part of the shellcode would be injected, making it incapable of running successfully.

To produce null-free shellcode from shellcode that contains null
Null character
The null character , abbreviated NUL, is a control character with the value zero.It is present in many character sets, including ISO/IEC 646 , the C0 control code, the Universal Character Set , and EBCDIC...

 bytes, one can substitute machine instructions that contain zeroes with instructions that have the same effect but are free of nulls. For example, on the IA-32
IA-32
IA-32 , also known as x86-32, i386 or x86, is the CISC instruction-set architecture of Intel's most commercially successful microprocessors, and was first implemented in the Intel 80386 as a 32-bit extension of x86 architecture...

 architecture one could replace this instruction:
B8 01000000 MOV
MOV (x86 instruction)
In the x86 assembly language, the MOV instruction is a mnemonic for the copying of data from one location to another. The x86 assembly language has a number of different move instructions...

 EAX,1 // Set the register EAX to 0x000000001
which contains zeroes as part of the literal (1 expands to 0x00000001) with these instructions:
33C0 XOR EAX,EAX // Set the register EAX to 0x000000000
40 INC EAX // Increase EAX to 0x00000001
which have the same effect but take fewer bytes to encode and are free of nulls.

Alphanumeric and printable shellcode

In certain circumstances, a target process will filter any byte from the injected shellcode that is not a printable or alphanumeric
Alphanumeric
Alphanumeric is a combination of alphabetic and numeric characters, and is used to describe the collection of Latin letters and Arabic digits or a text constructed from this collection. There are either 36 or 62 alphanumeric characters. The alphanumeric character set consists of the numbers 0 to...

 character. Under such circumstances, the range of instructions that can be used to write a shellcode becomes very limited. A solution to this problem was published by Rix in Phrack
Phrack
Phrack is an ezine written by and for hackers first published November 17, 1985. Described by Fyodor as "the best, and by far the longest running hacker zine," the magazine is open for contributions by anyone who desires to publish remarkable works or express original ideas on the topics of interest...

 57 in which he showed it was possible to turn any code into alphanumeric code. A technique often used is to create self-modifying code, because this allows the code to modify its own bytes to include bytes outside of the normally allowed range, thereby expanding the range of instructions it can use. Using this trick, a self-modifying decoder can be created that initially uses only bytes in the allowed range. The main code of the shellcode is encoded, also only using bytes in the allowed range. When the output shellcode is run, the decoder can modify its own code to be able to use any instruction it requires to function properly and then continues to decode the original shellcode. After decoding the shellcode the decoder transfers control to it, so it can be executed as normal. It has been shown that it is possible to create arbitrarily complex shellcode that looks like normal text in English.

Unicode proof shellcode

Modern programs use Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

 strings to allow internationalization of text. Often, these programs will convert incoming ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

 strings to Unicode before processing them. Unicode strings encoded in UTF-16 use two bytes to encode each character (or four bytes for some special characters). When an ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

 string is transformed into UTF-16, a zero byte is inserted after each byte in the original string. Obscou proved in Phrack
Phrack
Phrack is an ezine written by and for hackers first published November 17, 1985. Described by Fyodor as "the best, and by far the longest running hacker zine," the magazine is open for contributions by anyone who desires to publish remarkable works or express original ideas on the topics of interest...

 61 that it is possible to write shellcode that can run successfully after this transformation. Programs that can automatically encode any shellcode into alphanumeric UTF-16-proof shellcode exist, based on the same principle of a small self-modifying decoder that decodes the original shellcode.

Platforms

Most shellcode is written in machine code
Machine code
Machine code or machine language is a system of impartible instructions executed directly by a computer's central processing unit. Each instruction performs a very specific task, typically either an operation on a unit of data Machine code or machine language is a system of impartible instructions...

 because of the low level at which the vulnerability being exploited gives an attacker access to the process. Shellcode is therefore often created to target one specific combination of processor
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...

, operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

 and service pack
Service pack
A service pack is a collection of updates, fixes or enhancements to a software program delivered in the form of a single installable package. Many companies, such as Microsoft or Autodesk, typically release a service pack when the number of individual patches to a given program reaches a certain ...

, called a platform
Platform (computing)
A computing platform includes some sort of hardware architecture and a software framework , where the combination allows software, particularly application software, to run...

. For some exploits, due to the constraints put on the shellcode by the target process, a very specific shellcode must be created. However, it is not impossible for one shellcode to work for multiple exploits, service packs, operating systems and even processors. Such versatility is commonly achieved by creating multiple versions of the shellcode that target the various platforms and creating a header that branches to the correct version for the platform the code is running on. When executed, the code behaves differently for different platforms and executes the right part of the shellcode for the platform it is running on.

See also

  • Alphanumeric code
    Alphanumeric code
    In general, in computing, an alphanumeric code is a series of letters and numbers which are written in a form that can be processed by a computer....

  • Computer security
    Computer security
    Computer security is a branch of computer technology known as information security as applied to computers and networks. The objective of computer security includes protection of information and property from theft, corruption, or natural disaster, while allowing the information and property to...

  • Buffer overflow
    Buffer overflow
    In computer security and programming, a buffer overflow, or buffer overrun, is an anomaly where a program, while writing data to a buffer, overruns the buffer's boundary and overwrites adjacent memory. This is a special case of violation of memory safety....

  • Exploit (computer security)
    Exploit (computer security)
    An exploit is a piece of software, a chunk of data, or sequence of commands that takes advantage of a bug, glitch or vulnerability in order to cause unintended or unanticipated behavior to occur on computer software, hardware, or something electronic...

  • Heap overflow
    Heap overflow
    A heap overflow is a type of buffer overflow that occurs in the heap data area. Heap overflows are exploitable in a different manner to that of stack-based overflows. Memory on the heap is dynamically allocated by the application at run-time and typically contains program data...

  • Metasploit Project
    Metasploit Project
    The Metasploit Project is an open-source computer security project which provides information about security vulnerabilities and aids in penetration testing and IDS signature development....

  • Shell (computing)
    Shell (computing)
    A shell is a piece of software that provides an interface for users of an operating system which provides access to the services of a kernel. However, the term is also applied very loosely to applications and may include any software that is "built around" a particular component, such as web...

  • Stack buffer overflow
    Stack buffer overflow
    In software, a stack buffer overflow occurs when a program writes to a memory address on the program's call stack outside of the intended data structure; usually a fixed length buffer....

  • Vulnerability (computing)
    Vulnerability (computing)
    In computer security, a vulnerability is a weakness which allows an attacker to reduce a system's information assurance.Vulnerability is the intersection of three elements: a system susceptibility or flaw, attacker access to the flaw, and attacker capability to exploit the flaw...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK