All Topics  
X86 assembly language

 

   Email Print
   Bookmark   Link






 

X86 assembly language



 
 
x86 assembly language is the family of backwards-compatible assembly language
Assembly language

An assembly language is a low-level language for programming computers. It implements a symbolic representation of the numeric machine codes and other constants needed to program a particular CPU architecture....
s for the x86 class of processors, which includes Intel's Pentium
Pentium

Introduced on March 22, 1993, the original Pentium was the first superscalar x86 architecture microprocessor. Its fifth-generation x86 microarchitecture was a direct extension of the 80486 architecture with dual integer pipeline s, a faster FPU unit, wider data bus, and features for further reduced address calculation latency....
 series and AMD's Athlon
Athlon

Athlon is the brand name applied to a series of different x86 Central processing unit designed and manufactured by Advanced Micro Devices. The original Athlon was the first seventh-generation x86 processor and, in a first, retained the initial performance lead it had over Intel Corporation's competing processors for a significant period of t...
 series. Like all assembly languages, it uses short mnemonics
Mnemonic

A mnemonic device is a memory aid. Commonly met mnemonics are often verbal, something such as a very short poem or a special word used to help a person remember something, particularly lists, but may be visual, kinesthetic or auditory....
 to represent the fundamental operations that the CPU in a computer can perform. Compiler
Compiler

A compiler is a computer program that transforms source code written in a programming language into another computer language . The most common reason for wanting to transform source code is to create an executable program....
s often produce assembly code as an intermediate step when translating a high level program into machine code
Machine code

Machine code or machine language is a system of instructions and data executed directly by a computer's central processing unit. Machine code may be regarded as a primitive programming language or as the lowest-level representation of a compiled and/or assembly language computer program....
. Regarded as a programming language
Programming language

A programming language is a machine-readable artificial language designed to express computations that can be performed by a machine, particularly a computer....
, assembly coding is machine specific and low level.






Discussion
Ask a question about 'X86 assembly language'
Start a new discussion about 'X86 assembly language'
Answer questions from other users
Full Discussion Forum



Recent Posts









Encyclopedia


x86 assembly language is the family of backwards-compatible assembly language
Assembly language

An assembly language is a low-level language for programming computers. It implements a symbolic representation of the numeric machine codes and other constants needed to program a particular CPU architecture....
s for the x86 class of processors, which includes Intel's Pentium
Pentium

Introduced on March 22, 1993, the original Pentium was the first superscalar x86 architecture microprocessor. Its fifth-generation x86 microarchitecture was a direct extension of the 80486 architecture with dual integer pipeline s, a faster FPU unit, wider data bus, and features for further reduced address calculation latency....
 series and AMD's Athlon
Athlon

Athlon is the brand name applied to a series of different x86 Central processing unit designed and manufactured by Advanced Micro Devices. The original Athlon was the first seventh-generation x86 processor and, in a first, retained the initial performance lead it had over Intel Corporation's competing processors for a significant period of t...
 series. Like all assembly languages, it uses short mnemonics
Mnemonic

A mnemonic device is a memory aid. Commonly met mnemonics are often verbal, something such as a very short poem or a special word used to help a person remember something, particularly lists, but may be visual, kinesthetic or auditory....
 to represent the fundamental operations that the CPU in a computer can perform. Compiler
Compiler

A compiler is a computer program that transforms source code written in a programming language into another computer language . The most common reason for wanting to transform source code is to create an executable program....
s often produce assembly code as an intermediate step when translating a high level program into machine code
Machine code

Machine code or machine language is a system of instructions and data executed directly by a computer's central processing unit. Machine code may be regarded as a primitive programming language or as the lowest-level representation of a compiled and/or assembly language computer program....
. Regarded as a programming language
Programming language

A programming language is a machine-readable artificial language designed to express computations that can be performed by a machine, particularly a computer....
, assembly coding is machine specific and low level. It is therefore mainly used for detailed or time critical applications such as bootloaders
Booting

In computing, booting is a Bootstrapping process that starts operating systems when the user turns on a computer system. A boot sequence is the initial set of operations that the computer performs when it is switched on....
, operating system
Operating system

An operating system is an interface between hardware and applications; it is responsible for the management and coordination of activities and the sharing of the limited resources of the computer....
 kernels, and device drivers, as well as for real time
Real Time

Real Time is a webcast based on the long-running United Kingdom science fiction television series Doctor Who which was then subsequently released on CD....
 or small embedded system
Embedded system

An embedded system is a special-purpose computer system designed to perform one or a few dedicated functions, often with real-time computing constraints....
s.

History

The Intel 8088
Intel 8088

The Intel 8088 is an Intel x86 microprocessor based on the Intel 8086, with 16-bit registers and an 8-bit external data bus. It can address up to 1 megabyte of random access memory....
 and 8086
Intel 8086

The 8086 is a 16-bit microprocessor chip designed by Intel and introduced on the market in 1978, which gave rise to the x86 architecture. Intel 8088, released in 1979, was essentially the same chip, but with an external 8-bit bus , and is notable as the processor used in the original IBM PC....
 CPUs were 16-bit CPU's that first had an instruction set
Instruction set

An instruction set is a list of all the instruction , and all their variations, that a processor can execute.Instructions include:* Arithmetic such as add and subtract...
 that is now commonly referred to as x86. They were an evolutionary step up from the previous generation of 8-bit CPUs such as the 8080 and inherited many characteristics and instructions which were extended for the 16-bit era. Both CPUs contained a 20-bit address bus
Address bus

An address bus is a computer bus that is used to specify a memory address. When a central processing unit or direct memory access-enabled device needs to read or write to a memory location, it specifies that memory location on the address bus ....
 and 16-bit internal register width. The 8086 had a 16-bit data bus and 8-bit for the 8088 which was intended as a low-cost option targeted at the embedded market. The x86 assembly language also refers to the many different versions of CPUs that followed from Intel, such as 80188, 80186, 80286, 80386, 80486, Pentium
Pentium

Introduced on March 22, 1993, the original Pentium was the first superscalar x86 architecture microprocessor. Its fifth-generation x86 microarchitecture was a direct extension of the 80486 architecture with dual integer pipeline s, a faster FPU unit, wider data bus, and features for further reduced address calculation latency....
 and non-Intel CPUs from AMD and Cyrix
Cyrix

Cyrix was a Central processing unit manufacturer that began in 1988 in Richardson, Texas as a specialist supplier of high-performance math coprocessors for Intel 80286 and Intel 80386 systems....
. The term x86 refers to all the CPUs that can run the same original assembly language.

The modern x86 instruction set is really a series of extensions of instruction sets that began with the Intel 8008
Intel 8008

The Intel 8008 was an early byte-oriented microprocessor designed and manufactured by Intel and introduced in April 1972. Originally known as the 1201, the chip was commissioned by Computer Terminal Corporation to implement an instruction set designed for their Datapoint 2200 programmable terminal....
 microprocessor. Nearly full binary backward compatibility
Backward compatibility

In technology, for example in telecommunications and computing, a device or technology is said to be backwards compatible if it allows input generated by older devices....
 is present between the Intel 8086 chip through to the modern Pentium 4
Pentium 4

The Pentium 4 brand refers to Intel's line of single-core mainstream Desktop computer and laptop central processing units introduced on November 20, 2000 ....
, Intel Core
Intel Core

The Core brand refers to Intel's 32-bit mobile dual-core x86 CPUs that derived from the Pentium M branded processors. The processor family used a more advanced version of the Intel P6 microarchitecture....
 Core i7, AMD Athlon 64, Opteron
Opteron

The Opteron is Advanced Micro Devices's x86 server Central processing unit line, and was the first processor to implement the AMD64 instruction set architecture ....
, etc. processors. (There are certain unusual exceptions, such as the counted shift instructions, corrections to the original PUSHA instruction, some orphaned Intel 80286 semantics, the dropped LOADALL
LOADALL

LOADALL is the common name for two different, undocumented machine instructions of Intel 80286 and Intel 80386 processors, which allow access to areas normally outside of the IA-32 Application programming interface scope, like descriptor cache registers....
 instruction, and the Pentium 4 giving up on precise FPU operation counts.) This is accomplished through its use of two ISAs, something which is commonly criticized. Compatibility of assembly language programs with older processors depends upon whether the program includes instructions only available on later processors.

Mnemonics and opcodes

Each x86 assembly instruction is represented by a mnemonic
Mnemonic

A mnemonic device is a memory aid. Commonly met mnemonics are often verbal, something such as a very short poem or a special word used to help a person remember something, particularly lists, but may be visual, kinesthetic or auditory....
, which in turn directly translates to a series of bytes which represent that instruction, called an opcode
Opcode

In computer technology, an opcode is the portion of a machine language instruction that specifies the operation to be performed. Their specification and format are laid out in the instruction set architecture of the processor in question ....
. For example, the NOP
NOP

In computer science NOP or NOOP is an assembly language instruction, sequence of programming language statements, or protocol command that effectively does nothing at all....
 instruction translates to 0x90 and the HLT
HLT

In computer science and more specifically the x86 architecture, HLT is an assembly language instruction which halts the CPU until the next external interrupt is fired....
 instruction translates to 0xF4. Some opcode
Opcode

In computer technology, an opcode is the portion of a machine language instruction that specifies the operation to be performed. Their specification and format are laid out in the instruction set architecture of the processor in question ....
s have no mnemonics named after them and are undocumented. Different processors in the x86-family may interpret undocumented opcodes differently, making a program using them behave differently on different processors. Some undocumented opcodes may generate processor exceptions on some processors.

Syntax

x86 assembly language has two main syntax
Syntax of programming languages

In computer science, the syntax of a programming language is the set of rules that define the combinations of symbols that are considered to be syntactically correct computer programs in that language....
 branches: Intel syntax, originally used for documentation of the x86 platform
X86 architecture

The generic term x86 refers to the most commercially successful instruction set architecture in the history of personal computing. It derived from the model numbers, ending in "86", of the first few processor generations Backward compatibility with the original Intel 8086....
, and AT&T
AT&T

AT&T Inc. is the largest US provider of both local and long distance telephone services, and Digital subscriber line Internet access. AT&T is the second largest provider of wireless service in the United States, with over 77 million wireless customers, and more than 150 million total customers....
 syntax
. Intel syntax is dominant in the Windows world. In the Unix/Linux world, both are used because GCC only supported AT&T-syntax in former times. Here is a summarized list of the main differences between Intel syntax and AT&T syntax:

  • in AT&T syntax, the source comes before the destination, in the opposite style from Intel syntax
  • in AT&T syntax, the opcodes are suffixed with a letter indicating the size of the operands (e.g. "l" for dword, "w" for word, and "b" for byte)
  • in AT&T syntax, immediate values
    Variable

    A variable is a symbol that stands for a value that may vary; the term usually occurs in opposition to constant, which is a symbol for a non-varying value, i.e....
     must be prefixed with a "$", and registers must be prefixed with a "%"
  • in AT&T syntax, effective addresses
    Memory address

    In computer science, a memory address is an identifier for a computer memory location, at which a computer program or a hardware device can store a piece of data and later retrieve it....
     use the general syntax DISP(BASE,INDEX,SCALE), whereas in Intel syntax, effective addresses use variables, and need to be in square brackets; additionally, size keywords like 'byte', 'word' or 'dword' have to be used. For example, the following are equivalent:
    • in AT&T syntax: movl mem_location(%ebx,%ecx,4), %eax
    • in Intel syntax: mov eax, dword [ebx + ecx*4 + mem_location]


Most x86 assemblers use Intel syntax including MASM, TASM
Tasm

TASM can refer to:*Turbo Assembler, the X86 Assembly_language#Assembler*Table Assembler, a table driven cross-assembler for small microprocessors....
, NASM, FASM
FASM

FASM is a free and open source x86 assembly language Assembly language#Assembler supporting the IA-32 and x86-64 architectures. It is known for its high speed, size optimizations, OS portability, and Macro capabilities....
 and YASM
Yasm

Yasm is a software program that attempts to be a complete rewrite of the NASM Assembly Language#Assembler. It is licensed under a revision of the BSD license, and it's currently developed by Peter Johnson and Michael Urman....
. GAS
GNU Assembler

The GNU Assembler, commonly known as Gas, is the assembler used by the GNU Project. It is the default back-end of GNU Compiler Collection....
 has supported both syntaxes since version 2.10 via the .intel_syntax directive.

Registers

x86 processors have a collection of registers available to be used as stores for binary data. Collectively the data and address registers are called the general registers.

With the general registers, there are additionally the:
  • segment registers (CS, DS, ES, FS, GS, SS)
  • other registers (IP instruction pointer, FLAGS
    FLAGS register (computing)

    The FLAGS register is the status register in Intel x86 microprocessors that contains the current state of the processor. This register is 16-bits wide....
    )
  • extra extension registers (MMX, 3DNow!
    3DNow!

    3DNow! is the trade name of a multimedia extension created by AMD for its processors, starting with the K6-2 in 1998. It is an addition of SIMD instructions to the traditional x86 instruction set, designed to improve a central processing unit's ability to perform the vector processing requirements of many graphic-intensive applications....
    , SSE
    Streaming SIMD Extensions

    In computing, Streaming SIMD Extensions is a SIMD instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series processors as a reply to AMD's 3DNow! ....
    , etc).


The IP register points to where in the program the processor is currently executing its code. The IP register cannot be accessed by the programmer directly.

The x86 registers can be used by using the MOV
MOV (x86 instruction)

In the x86 assembly language, the MOV instruction is a mnemonic for the copying of data from one location to another. The x86 assembly language actually contains a number of different opcodes that perform a move....
 instructions. For example: mov ax, 1234h
Hexadecimal

In mathematics and computer science, hexadecimal is a numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 09 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen....
mov bx, ax copies the value 1234h into register ax and then copies the value of the ax register into the bx register. (Intel syntax)

Segmented addressing


The x86 architecture
X86 architecture

The generic term x86 refers to the most commercially successful instruction set architecture in the history of personal computing. It derived from the model numbers, ending in "86", of the first few processor generations Backward compatibility with the original Intel 8086....
 in real and virtual 8086 mode uses a process known as segmentation to address memory, and not a linear method as used in other architectures. Segmentation involves decomposing a linear address into two parts - a segment and an offset. The segment address points to the beginning of a 64K group of addresses and an offset from the base address of the specified segment. In real mode, to translate back into a linear address, the segment address is shifted four bits left (i.e. multiplied by 16) and then added to the offset.

Two registers are used for a memory address: one to hold the segment, and one to hold the offset.

In real mode only, for example, if DS contains the hexadecimal
Hexadecimal

In mathematics and computer science, hexadecimal is a numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 09 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen....
 number 0xDEAD and DX contains the number 0xCAFE they would together point to the memory address 0xDEAD * 0x10 + 0xCAFE = 0xEB5CE. In real mode the CPU can address up to 1,048,576 bytes. This applies to 20-bit address space. By combining segment and offset values we find a 20-bit address.

In protected mode, the segment selector can be broken down into three parts: A 13-bit index, a TI (Table Indicator) bit that indicates whether the entry is in the GDT
Global Descriptor Table

The Global Descriptor Table or GDT is a data structure used by Intel x86-family processors starting with the 80286 in order to define the characteristics of the various memory areas used during program execution, for example the base address, the size and access privileges like executability and writability....
 or LDT
Local Descriptor Table

The Local Descriptor Table is a memory table used in the x86 architecture in protected mode and containing memory segment descriptors: start in linear memory, size, executability, writability, access privilege, actual presence in memory, etc....
 (which when loaded, looked up for the base), and a 2-bit RPL (Requested Privilege Level). See x86 memory segmentation.

In referring to an address with a segment and an offset, the notation of segment:offset is used, in the above example (for real mode only), the linear address 0xEB5CE can be written as 0xDEAD:0xCAFE, or if one has a segment and offset register pair, DS:DX.

There are some special combinations of segment registers and general registers that point to important addresses:
  • CS:IP points to the address where the processor will fetch the next byte of code.
  • SS:SP points to the location of the last item pushed onto the stack.
  • DS:SI is often used to point to data that is about to be copied to ES:DI


Execution modes


The processor supports numerous modes of operation for x86 code in which some instructions are available and some are not. A 16-bit subset of instructions are available in "real mode" (available in all x86 processors), "16-bit protected mode" (available since the 80286), or "v86 mode" (available since the Intel 80386). In "32-bit protected mode" (available in processors starting with the Intel 80386) or "legacy mode
Legacy mode

In computing, legacy mode is a state in which a computer system, component, or software application behaves in a way different from its standard operation in order to support older software, data, or expected behavior....
" (available when 64 bit extensions are enabled), 32-bit instructions (plus SIMD instructions) are available. In "long mode" (available since the AMD Opteron processor) 64-bit instructions are available. The instruction set is based on similar ideas in each mode, but involves different ways of accessing memory and thus employs different programming strategies.

The modes in which x86 code can be executed in are:
  • Real mode
    Real mode

    Real mode, also called real address mode, is an operating mode of 80286 and later x86-compatible Central processing unit. Real mode is characterized by a 20 bit segmented memory address space , direct software access to BIOS routines and peripheral hardware, and no concept of memory protection or computer multitasking at the hardware le...
     (16-bit)
  • Protected mode
    Protected mode

    In computing, protected mode, also called protected virtual address mode, is an operational mode of x86-compatible central processing units ....
     (16-bit and 32-bit)
  • Virtual 8086 mode
    Virtual 8086 mode

    In the 80386 microprocessor and later, Virtual 8086 mode, also called virtual real mode or VM86, allows the execution of real mode applications that are protected mode#Real_mode_application_compatibility directly in protected mode....
     (16-bit)
  • System Management Mode
    System Management Mode

    System Management Mode is an operating mode first released with the Intel_80386#i386SL and available in later microprocessors in the x86 architecture, in which all normal execution is suspended, and special separate software is executed in high-privilege mode....
     (16-bit)
  • Long mode
    Long mode

    In the x86-64 computer architecture, long mode is the mode where a 64-bit application can access the 64-bit instructions and registers, while 32-bit and 16-bit programs are executed in a compatibility sub-mode....
     (64-bit)


Switching modes

By default, the processor starts in real mode; an operating system
Operating system

An operating system is an interface between hardware and applications; it is responsible for the management and coordination of activities and the sharing of the limited resources of the computer....
 kernel, or other program, must explicitly switch to protected mode if it is to run in that mode, and, on x86-64
X86-64

x86-64 is a superset of the x86. x86-64 Central processing units can run existing 32-bit or 16-bit x86 programs at full speed, but also support new programs written with a 64-bit address space and other additional capabilities....
 processors, must then switch to long mode if it is to run in that mode. Switching modes can be accomplished by modifying certain bits of the processor's control register
Control register

A control register is a processor register which changes or controls the general behavior of a CPU or other digital device. Common tasks performed by control registers include interrupt control, switching the addressing mode, paging control, and coprocessor control....
s.

Instruction types

In general, the features of the modern x86 instruction set are:

  • A compact encoding
    • Variable length and alignment independent (encoded as little endian, as is all data in the x86 architecture)
    • Mainly one-address and two-address instructions, that is to say, the first operand
      Operand

      An operand is one of the inputs of an operator in mathematics. The following arithmetic expression shows an example of operators and operands:...
       is also the destination.
    • Memory operands as both source and destination are supported (frequently used to read/write stack elements addressed using small immediate offsets).
    • Both general and implicit register usage; although all seven (counting ebp) general registers can be freely used as accumulator
      Accumulator

      Accumulator may refer to:* Accumulator , in a CPU, a processor register for storing intermediate results* Accumulator , an apparatus for storing energy or power...
      s or for addressing, most of them are also implicitly used by certain (more or less) special instructions; affected registers must therefore be temporarily preserved (normally stacked), if active during such instruction sequences.
  • Produces conditional flags implicitly through most integer ALU
    Arithmetic logic unit

    In computing, an arithmetic logic unit is a digital circuit that performs arithmetic and logicaloperations. The ALU is a fundamental building block of the central processing unit of a computer, and even the simplest microprocessors contain one for purposes such as maintaining timers....
     instructions.
  • Supports various addressing mode
    Addressing mode

    Addressing modes are an aspect of the instruction set architecture in most central processing unit designs. The various addressing modes that are defined in a given instruction set architecture define how Machine code Instruction in that architecture identify the operand of each instruction....
    s including immediate, offset, and scaled index, but not PC-relative (except jumps) until x86-64
    X86-64

    x86-64 is a superset of the x86. x86-64 Central processing units can run existing 32-bit or 16-bit x86 programs at full speed, but also support new programs written with a 64-bit address space and other additional capabilities....
    .
  • Includes floating point
    Floating point

    In computing, floating point describes a system for numerical representation in which a String of digits represents a rational number.The term floating point refers to the fact that the radix point can "float": that is, it can be placed anywhere relative to the Significant figures of the number....
     to a stack of registers.
  • Contains special support for atomic instructions (XCHG, CMPXCHG(8B), XADD, and integer instructions which combine with the LOCK prefix)
  • SIMD
    SIMD

    In computing, SIMD is a technique employed to achieve data level parallelism....
     instructions (instructions which perform parallel simultaneous single instructions on many operands encoded in adjacent cells of wider registers).


Stack instructions

The x86 architecture has hardware support for an execution stack mechanism. Instructions such as push, call, pop, ret, etc are used with the properly set up stack to pass parameters, to allocate space for local data, and to save and restore call-return points. The ret size instruction is very useful for implementing space efficient (and thereby fast) calling convention
Calling convention

In computer science, a calling convention is a scheme for how function s receive parameters from their caller and how they return a result; calling conventions can differ in:...
s where the callee is responsible for reclaiming stack space occupied by parameters.

When setting up a stack frame to hold local data of a recursive procedure
Recursion (computer science)

Recursion is a way of thinking about and solving problems. In fact, Recursion_ is one of the central ideas of computer science. Solving a problem using recursion means the solution depends on solutions to smaller instances of the same problem....
 there are several choices; the high level enter instruction takes a procedure-nesting-depth argument as well as a local size argument, and may be faster than more explicit manipulations of the registers (such as push bp, mov bp,sp, sub sp,size). It depends on the particular x86 implementation (i.e. chip), as well as the calling convention and language compiled; the differences are not great however.

The full range of addressing modes (including immediate and base+offset) even for instructions such as push and pop, makes direct usage of the stack for integer
Integer

The integers are natural numbers including 0 and their negative and non-negative numberss . They are numbers that can be written without a fractional or decimal component, and fall within the set ....
, floating point
Floating point

In computing, floating point describes a system for numerical representation in which a String of digits represents a rational number.The term floating point refers to the fact that the radix point can "float": that is, it can be placed anywhere relative to the Significant figures of the number....
, and address
Memory address

In computer science, a memory address is an identifier for a computer memory location, at which a computer program or a hardware device can store a piece of data and later retrieve it....
 quantities simple. This also means that ABI
Application binary interface

In computer software, an application binary interface describes the low-level interface between an application program and the operating system or an other application....
 specifications and mechanisms are fairly simple compared to some RISC architectures, which must be more explicit about call stack details.

Integer ALU instructions

x86 assembly has the standard mathematical operations, add, sub, mul, with idiv; the logical operators and, or, xor, neg; bitshift arithmetic and logical, sal/sar, shl/shr; rotate with and without carry, rcl/rcr, rol/ror, a complement of BCD arithmetic instructions, aaa, aad, daa and others.

Floating point instructions

x86 assembly language includes instructions for a stack-based floating point unit. They include addition, subtraction, negation, multiplication, division, remainder, square roots, integer truncation, fraction truncation, and scale by power of two. The operations also include conversion instructions which can load or store a value from memory in any of the following formats: Binary coded decimal, 32-bit integer, 64-bit integer, 32-bit floating point, 64-bit floating point or 80-bit floating point (upon loading, the value is converted to the currently used floating point mode). The x86 also includes a number of transcendental functions including sine, cosine, tangent, arctangent, exponentiation with the base 2 and logarithms to bases 2, 10, or e
E (mathematical constant)

The mathematical constant e is the unique real number such that the function ex has the same value as the derivative, for all values of x....
.

The stack register to stack register format of the instructions is usually F(OP) st, st(*) or F(OP) st(*), st. Where st is equivalent to st(0), and st(*) is one of the 8 stack registers (st(0), st(1), ..., st(7)) Like the integers, the first operand is both the first source operand and the destination operand. FSUBR and FDIVR should be singled out as first swapping the source operands before performing the subtraction or division. The addition, subtraction, multiplication, division, store and comparison instructions include instruction modes that will pop the top of the stack after their operation is complete. So for example FADDP st(1), st performs the calculation st(1) = st(1) + st(0), then removes st(0) from the top of stack, thus making what was the result in st(1) the top of the stack in st(0).

SIMD instructions

Modern x86 CPUs contain SIMD instructions, which largely perform the same operation in parallel on many values encoded in a wide SIMD register. Various instruction technologies support different operations on different register sets, but taken as complete whole (from MMX to SSE4.2
SSE4

SSE4 is an instruction set used in the Intel Intel Core microarchitecture microarchitecture and AMD_K10. It was announced on September 27, 2006 at the Fall 2006 Intel Developer Forum, with vague details in a white paper; more precise details of 47 instructions became available at the Spring 2007 Intel Developer Forum in Beijing, in the prese...
) they include general computations on integer or floating point arithmetic (addition, subtraction, multiplication, shift, minimization, maximization, comparison, division or square root). So for example, PADDW MM0, MM1 performs 4 parallel 16-bit (indicated by the W) integer adds (indicated by the PADD) of mm0 values to mm1 and stores the result in mm0. SSE
Streaming SIMD Extensions

In computing, Streaming SIMD Extensions is a SIMD instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series processors as a reply to AMD's 3DNow! ....
 also includes a floating point mode in which only the very first value of the registers is actually modified (expanded in SSE2
SSE2

SSE2, Streaming SIMD Extensions 2, is one of the IA-32 SIMD instruction sets. SSE2 was first introduced by Intel with the initial version of the Pentium 4 in 2001....
). Some other unusual instructions have been added including a sum of absolute differences
Sum of absolute differences

'Sum of Absolute Differences' is a widely used, extremely simple video quality metric used for block-matching in motion estimation for video compression....
 (used for motion estimation
Motion compensation

One method used by various video formats to reduce file size is motion compensation. For many frames of a movie, the only difference between one frame and another is the result of either the camera moving or an object in the frame moving....
 in video compression
Video compression

Video compression refers to reducing the quantity of data used to represent digital video images, and is a straightforward combination of and motion compensation....
, such as is done in MPEG) and a 16-bit multiply accumulation instruction (useful for software-based alpha-blending and digital filter
Digital filter

In electronics, computer science and mathematics, a digital filter is a system that performs mathematical operations on a Sampling , discrete-time Signal to reduce or enhance certain aspects of that signal....
ing). SSE (since SSE3
SSE3

SSE3, also known by its Intel code name Prescott New Instructions , is the third iteration of the Streaming SIMD Extensions instruction set for the IA-32 architecture....
) and 3DNow!
3DNow!

3DNow! is the trade name of a multimedia extension created by AMD for its processors, starting with the K6-2 in 1998. It is an addition of SIMD instructions to the traditional x86 instruction set, designed to improve a central processing unit's ability to perform the vector processing requirements of many graphic-intensive applications....
 extensions include addition and subtraction instructions for treating paired floating point values like complex numbers.

These instruction sets also include numerous fixed sub-word instructions for shuffling, inserting and extracting the values around within the registers. In addition there are instructions for moving data between the integer registers and XMM (used in SSE)/FPU (used in MMX) registers.

Data manipulation instructions

The x86 processor also includes complex addressing modes for addressing memory with an immediate offset, a register, a register with an offset, a scaled register with or without an offset, and a register with an optional offset and another scaled register. So for example, one can encode mov eax, [Table + ebx + esi*4] as a single instruction which loads 32 bits of data from the address computed as (Table + ebx + esi * 4) offset from the DS selector, and stores it to the eax register. In general the x86 processor can load and use memory matched to the size of any register it is operating on. (The SIMD instructions also include half-load instructions.)

The x86 instruction set includes string load, store and move instructions (LODS, STOS, and MOVS) which perform each operation to a specified size (B for 8-bit byte, W for 16-bit word, D for 32-bit double word) then increments/decrements (depending on DF, direction flag) the implicit address register (SI for LODS, DI for STOS, and both for MOVS). For the load and store, the implicit target/source register is in the AL, AX or EAX register (depending on size). The implicit segment used is DS for LODS, ES for STOS and both for MOVS. In modern x86 processors, these complex instructions don't offer any performance advantage over more simply implemented separate load/store and address increment instructions.

The stack is implemented with an implicitly decrementing (push) and incrementing (pop) stack pointer. In 16-bit mode, this implicit stack pointer is addressed as SS:[SP], in 32-bit mode it's SS:[ESP], and in 64-bit mode it's [RSP]. The stack pointer actually points to the last value that was be stored, under the assumption that its size will match the operating mode of the processor (i.e., 16, 32, or 64 bits) to match the default width of the PUSH/POP/CALL/RET instructions. Also included are the instructions ENTER and LEAVE which reserve and remove data from the top of the stack while setting up a stack frame pointer in BP/EBP/RBP. However, direct setting, or addition and subtraction to the SP/ESP/RSP register is also supported, so the ENTER/LEAVE instructions are generally unnecessary. Other instructions for manipulating the stack include PUSHF/POPF for storing and retrieving the (E)FLAGS register. The PUSHA/POPA instructions will store and retrieve the entire integer register state to and from the stack.

Values for a SIMD load or store are assumed to be packed in adjacent positions for the SIMD register and will align them in sequential little-endian order. Some SSE load and store instructions require 16-byte alignment to function properly. The SIMD instruction sets also include "prefetch" instructions which perform the load but do not target any register, used for cache loading. The SSE instruction sets also include non-temporal store instructions which will perform stores straight to memory without performing a cache allocate if the destination is not already cached (otherwise it will behave like a regular store.)

Most generic integer and floating point (but no SIMD) instructions can use one parameter as a complex address as the second source parameter. Integer instructions can also accept one memory parameter as a destination operand.

Program flow

The x86 assembly has an unconditional jump operation, jmp
JMP (x86 instruction)

In the x86 assembly language, the JMP instruction performs an unconditional jump. Such an instruction transfers the flow of execution by changing the instruction pointer register....
, which can take an immediate address, a register or an indirect address as a parameter. (Note that most RISC processors only support a link register or short immediate displacement for jumping.)

Also supported are several conditional jumps, including je (jump on equality), jne (jump on inequality), jg (jump on greater than, signed), jl (jump on less than, signed), ja (jump on above/greater than, unsigned), jb (jump on below/less than, unsigned). These conditional operations are based on the state of specific bits in the (E)FLAGS
FLAGS register (computing)

The FLAGS register is the status register in Intel x86 microprocessors that contains the current state of the processor. This register is 16-bits wide....
 register. Many arithmetic and logic operations set, clear or complement these flags depending on their result. The comparison cmp (compare) and test
TEST (x86 instruction)

In the x86 assembly language, the TEST instruction performs a bitwise Bitwise AND on two operands. The flags Sign flag, Zero flag, Parity flag, Carry flag, Overflow flag and Adjust flag are modified while the result of the Bitwise AND is discarded....
 instructions set the flags as if they had performed a subtraction or a bitwise AND operation, respectively, without altering the values of the operands. There are also instructions such as clc (clear carry flag) and cmc (complement carry flag) which work on the flags directly. Floating point comparisons are performed via FCOM or FICOM instructions which eventually have to be converted to integer flags.

Each jump operation has three different forms, depending on the size of the operand. A short jump uses an 8-bit signed operand, which is a relative offset from the current instruction. A near jump is similar to a short jump but uses a 16-bit signed operand (in real or protected mode) or a 32-bit signed operand (in 32-bit protected mode only). A far jump is one that uses the full segment base:offset value as an absolute address. There are also indirect and indexed forms of each of these.

In addition to the simple jump operations, there are the call (call a subroutine) and ret (return from subroutine) instructions. Before transferring control to the subroutine, call pushes the segment offset address of the instruction following the call onto the stack; ret pops this value off the stack, and jumps to it, effectively returning the flow of control to that part of the program. In the case of a far call, the segment base is pushed following the offset; far ret pops the offset and then the segment base to return.

There are also two similar instructions, int
INT (x86 instruction)

INT is an assembly language instruction for x86 central processing unit for generating a software interrupt. It takes the interrupt number formatted as a byte value....
 (interrupt), which saves the current (E)FLAGS
FLAGS register (computing)

The FLAGS register is the status register in Intel x86 microprocessors that contains the current state of the processor. This register is 16-bits wide....
 register value on the stack, then performs a far call, except that instead of an address, it uses an interrupt vector, an index into a table of interrupt handler addresses. Typically, the interrupt handler saves all other CPU registers it uses, unless they are used to return the result of an operation to the calling program (in software called interrupts). The matching return from interrupt instruction is iret, which restores the flags after returning. Soft Interrupts of the type described above are used by some operating systems for system calls, and can also be used in debugging hard interrupt handlers. Hard interrupts are triggered by external hardware events, and must preserve all register values as the state of the currently executing program is unknown. In Protected Mode, interrupts may be set up by the OS to trigger a task switch, which will automatically save all registers of the active task.

Examples


Using the flags register

Flags are notably used in the x86 architecture for comparisons. A comparison is made between two registers, for example, and in comparison of their difference a flag is raised. A jump instruction then checks the respective flag and jumps if the flag has been raised: for example cmp eax, ebx jne do_something

Flags are also used in the x86 architecture to turn on and off certain features or execution modes. For example, to disable the processing of interrupts you can use the command: cli

The flags register can also be directly accessed. The low 8 bits of the flag register can be loaded into AH using the LAHF instruction. The entire flags register can also be moved on and off the stack using the instructions PUSHF, POPF, INT (including INTO) and IRET.

Using the instruction pointer register

There is also a 32-bit instruction pointer, named EIP. The EIP register points to where in the program the processor is currently executing its code. The EIP register cannot be accessed by the programmer directly. Instead, a sequence like the following can be done to retrieve the address of next_line into EAX:

call next_line next_line: pop eax

This works even in position-independent code
Position-independent code

In computing, position-independent code or position-independent executable is machine instruction code that executes properly regardless of where in computer storage it resides....
 because call takes an EIP-relative immediate operand. To write to EIP is simple: jmp eax

See also

  • assembly language
    Assembly language

    An assembly language is a low-level language for programming computers. It implements a symbolic representation of the numeric machine codes and other constants needed to program a particular CPU architecture....
  • X86 instruction listings
    X86 instruction listings

    The x86 instruction set has undergone numerous changes over time. Most of them were to add new functionality to the instruction set....
  • X86 architecture
    X86 architecture

    The generic term x86 refers to the most commercially successful instruction set architecture in the history of personal computing. It derived from the model numbers, ending in "86", of the first few processor generations Backward compatibility with the original Intel 8086....
  • CPU design
    CPU design

    CPU design is the design engineering task of creating a central processing unit , a component of computer hardware. It is a subfield of electronics engineering and computer engineering....
  • List of assemblers
    List of assemblers

    This is a list of assemblers: software programs that translate assembly language source code into binary programs....
  • self-modifying code
    Self-modifying code

    In computer science, self-modifying code is Code that alters its own Instruction while it is Execution - usually to reduce the instruction path length and improve performance....
  • DOS
    DOS

    DOS, short for "Disk Operating System", is a shorthand term for several closely related operating systems that dominated the IBM PC compatible market between 1981 and 1995, or until about 2000 if one includes the partially DOS-based Microsoft Windows versions Windows 95, Windows 98, and Windows Me....


Further reading

  • (PDF)
  • (PDF)
  • (PDF)
  • (PDF)
  • (PDF)


External links