All Topics  
Streaming SIMD Extensions

 

   Email Print
   Bookmark   Link






 

Streaming SIMD Extensions



 
 
In computing
Computing

Computing is usually defined as the activity of using and developing computer technology, computer hardware and computer software. It is the computer-specific part of information technology....
, Streaming SIMD Extensions (SSE) is a SIMD
SIMD

In computing, SIMD is a technique employed to achieve data level parallelism....
 instruction set
Instruction set

An instruction set is a list of all the instruction , and all their variations, that a processor can execute.Instructions include:* Arithmetic such as add and subtract...
 extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III
Pentium III

The Pentium III brand refers to Intel's 32-bit x86 desktop and mobile microprocessors based on the sixth-generation Intel P6 microarchitecture introduced on February 26, 1999....
 series processors as a reply to AMD's 3DNow!
3DNow!

3DNow! is the trade name of a multimedia extension created by AMD for its processors, starting with the K6-2 in 1998. It is an addition of SIMD instructions to the traditional x86 instruction set, designed to improve a central processing unit's ability to perform the vector processing requirements of many graphic-intensive applications....
 (which had debuted a year earlier). SSE contains 70 new instructions.

It was originally known as KNI for Katmai New Instructions (Katmai being the code name for the first Pentium III
Pentium III

The Pentium III brand refers to Intel's 32-bit x86 desktop and mobile microprocessors based on the sixth-generation Intel P6 microarchitecture introduced on February 26, 1999....
 core revision).






Discussion
Ask a question about 'Streaming SIMD Extensions'
Start a new discussion about 'Streaming SIMD Extensions'
Answer questions from other users
Full Discussion Forum



Encyclopedia


In computing
Computing

Computing is usually defined as the activity of using and developing computer technology, computer hardware and computer software. It is the computer-specific part of information technology....
, Streaming SIMD Extensions (SSE) is a SIMD
SIMD

In computing, SIMD is a technique employed to achieve data level parallelism....
 instruction set
Instruction set

An instruction set is a list of all the instruction , and all their variations, that a processor can execute.Instructions include:* Arithmetic such as add and subtract...
 extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III
Pentium III

The Pentium III brand refers to Intel's 32-bit x86 desktop and mobile microprocessors based on the sixth-generation Intel P6 microarchitecture introduced on February 26, 1999....
 series processors as a reply to AMD's 3DNow!
3DNow!

3DNow! is the trade name of a multimedia extension created by AMD for its processors, starting with the K6-2 in 1998. It is an addition of SIMD instructions to the traditional x86 instruction set, designed to improve a central processing unit's ability to perform the vector processing requirements of many graphic-intensive applications....
 (which had debuted a year earlier). SSE contains 70 new instructions.

It was originally known as KNI for Katmai New Instructions (Katmai being the code name for the first Pentium III
Pentium III

The Pentium III brand refers to Intel's 32-bit x86 desktop and mobile microprocessors based on the sixth-generation Intel P6 microarchitecture introduced on February 26, 1999....
 core revision). During the Katmai project Intel was looking to distinguish it from their earlier product line, particularly their flagship Pentium II
Pentium II

The Pentium II brand refers to Intel's sixth-generation microarchitecture and x86 architecture-compatible microprocessors introduced on May 7, 1997....
. AMD eventually added support for SSE instructions, starting with its Athlon XP processor. It was later renamed
ISSE, for Internet Streaming SIMD Extensions, then SSE.

Intel was generally disappointed with their first IA-32
IA-32

IA-32 , often generically called x86 or x86-32, is the instruction set architecture of Intel's most commercially successful microprocessors....
 SIMD effort, MMX. MMX had two main problems: it re-used existing floating point
Floating point

In computing, floating point describes a system for numerical representation in which a String of digits represents a rational number.The term floating point refers to the fact that the radix point can "float": that is, it can be placed anywhere relative to the Significant figures of the number....
 registers making the CPU
Central processing unit

A central processing unit is an electronic circuit that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage....
 unable to work on both floating point
Floating point

In computing, floating point describes a system for numerical representation in which a String of digits represents a rational number.The term floating point refers to the fact that the radix point can "float": that is, it can be placed anywhere relative to the Significant figures of the number....
 and SIMD data at the same time, and it only worked on integers.

Registers

SSE originally added eight new 128-bit registers known as XMM0 through XMM7. The x86-64
X86-64

x86-64 is a superset of the x86. x86-64 Central processing units can run existing 32-bit or 16-bit x86 programs at full speed, but also support new programs written with a 64-bit address space and other additional capabilities....
 extensions from AMD (originally called
AMD64) and later duplicated by Intel add a further eight registers XMM8 through XMM15. There is also a new 32-bit control/status register, MXCSR. All the 16 128 bit XMM registers are accessible only in 64-bit operating mode.

Each register packs together four 32-bit single-precision floating point numbers or two 64-bit double-precision floating point numbers or four 32-bit integers or eight 16-bit short integers or sixteen 8-bit bytes or characters. The Integer operations have instructions for signed and unsigned variants. Integer SIMD
SIMD

In computing, SIMD is a technique employed to achieve data level parallelism....
 operations may still be performed with the eight 64-bit MMX registers.

Because these 128-bit registers are additional program states that the operating system
Operating system

An operating system is an interface between hardware and applications; it is responsible for the management and coordination of activities and the sharing of the limited resources of the computer....
 must preserve across task switches, they are disabled by default until the operating system explicitly enables them. This means that the OS must know how to use the FXSAVE and FXRSTOR instructions, which is the extended pair of instructions which can save all x87
X87

x87 is a math-related instruction subset of the x86 architecture of Central processing unit. It is so called because initially such instructions were processed by an coprocessor#Intel coprocessors chip 8087....
 and SSE register states all at once. This support was quickly added to all major IA-32 operating systems.

Because SSE adds floating point support, it sees much more use than MMX. The addition of SSE2's integer support makes SSE even more flexible. While MMX is redundant, operations can be operated in parallel with SSE operations offering further performance increases in some situations.

The first CPU to support SSE, the Pentium III, shared execution resources between SSE and the FPU
Floating point unit

A floating-point unit is a part of a computer system specially designed to carry out operations on floating point numbers. Typical operations are addition, subtraction, multiplication, division , and square root....
. While a compiled application can interleave FPU and SSE instructions side-by-side, the Pentium III will not issue an FPU and an SSE instruction in the same clock-cycle. This limitation reduces the effectiveness of pipelining
Instruction pipeline

File:5 Stage Pipeline.svgAn instruction pipeline is a technique used in the design of computers and other digital electronic devices to increase their instruction throughput ....
, but the separate XMM registers do allow SIMD and scalar floating point operations to be mixed without the performance hit from explicit MMX/floating point mode switching.

SSE Instructions

SSE introduced both scalar and packed floating point instructions.

Floating point instructions

  • Memory-to-Register / Register-to-Memory / Register-to-Register data movement
    • Scalar – MOVSS
    • Packed – MOVAPS, MOVUPS, MOVLPS, MOVHPS, MOVLHPS, MOVHLPS
  • Arithmetic
    • Scalar – ADDSS, SUBSS, MULSS, DIVSS, RCPSS, SQRTSS, MAXSS, MINSS, RSQRTSS
    • Packed – ADDPS, SUBPS, MULPS, DIVPS, RCPPS, SQRTPS, MAXPS, MINPS, RSQRTPS
  • Compare
    • Scalar – CMPSS, COMISS, UCOMISS
    • Packed – CMPPS
  • Data shuffle and unpacking
    • Packed – SHUFPS, UNPCKHPS, UNPCKLPS
  • Data-type conversion
    • Scalar – CVTSI2SS, CVTSS2SI, CVTTSS2SI
    • Packed – CVTPI2PS, CVTPS2PI, CVTTPS2PI
  • Bitwise logical operations
    • Packed – ANDPS, ORPS, XORPS, ANDNPS


Integer instructions

  • Arithmetic
    • PMULHUW, PSADBW, PAVGB, PAVGW, PMAXUB, PMINUB, PMAXSW, PMINSW
  • Data movement
    • PEXTRW, PINSRW
  • Other
    • PMOVMSKB, PSHUFW


Other instructions

  • MXCSR management
    • LDMXCSR, STMXCSR
  • Cache and Memory management
    • MOVNTQ, MOVNTPS, MASKMOVQ, PREFETCH0, PREFETCH1, PREFETCH2, PREFETCHNTA, SFENCE


Example

The following simple example demonstrates the advantage of using SSE. Consider an operation like vector addition, which is used very often in computer graphics applications. To add two single precision, 4-component vectors together using x87 requires four floating point addition instructions

vec_res.x = v1.x + v2.x;
vec_res.y = v1.y + v2.y;
vec_res.z = v1.z + v2.z;
vec_res.w = v1.w + v2.w;

This would correspond to four x87 FADD instructions in the object code. On the other hand, as the following pseudo-code shows, a single 128 bit 'packed-add' instruction can replace the four scalar addition instructions.

movaps xmm0,address-of-v1 ;xmm0=v1.w | v1.z | v1.y | v1.x
addps xmm0,address-of-v2 ;xmm0=v1.w+v2.w | v1.z+v2.z | v1.y+v2.y | v1.x+v2.x movaps address-of-vec_res,xmm0

Later versions


  • SSE2
    SSE2

    SSE2, Streaming SIMD Extensions 2, is one of the IA-32 SIMD instruction sets. SSE2 was first introduced by Intel with the initial version of the Pentium 4 in 2001....
    , introduced with the Pentium 4
    Pentium 4

    The Pentium 4 brand refers to Intel's line of single-core mainstream Desktop computer and laptop central processing units introduced on November 20, 2000 ....
    , is a major enhancement to SSE. SSE2 adds new math instructions for double-precision (64-bit) floating point and also extends MMX instructions to operate on 128-bit XMM registers. Until SSE4 [see below], SSE integer instructions introduced with later SSE extensions could still operate on 64-bit MMX registers because the new XMM registers require operating system support. SSE2 enables the programmer to perform SIMD math of virtually any type (from 8-bit integer to 64-bit float) entirely with the XMM vector-register file, without the need to touch the (legacy) MMX/FPU registers. Many programmers consider SSE2 to be "everything SSE should have been", as SSE2 offers an orthogonal set of instructions
    Orthogonal instruction set

    Orthogonal instruction set is a term used in computer engineering. A computer's instruction set is said to be orthogonal if any instruction can use data of any type via any addressing mode....
     for dealing with common datatypes.


  • SSE3
    SSE3

    SSE3, also known by its Intel code name Prescott New Instructions , is the third iteration of the Streaming SIMD Extensions instruction set for the IA-32 architecture....
    , also called Prescott New Instructions (PNI), is an incremental upgrade to SSE2, adding a handful of DSP-oriented mathematics instructions and some process (thread) management instructions.


  • SSSE3
    SSSE3

    Supplemental Streaming SIMD Extension 3 is Intel's name for the Streaming SIMD Extensions instruction set's fourth iteration. The previous version was SSE3, and Intel have added an S rather than increment the version number, as they appear to consider it merely a revision of SSE3....
     is an incremental upgrade to SSE3, adding 16 new opcode
    Opcode

    In computer technology, an opcode is the portion of a machine language instruction that specifies the operation to be performed. Their specification and format are laid out in the instruction set architecture of the processor in question ....
    s which include permuting the bytes in a word, multiplying 16-bit fixed-point numbers with correct rounding, and within-word accumulate instructions. SSSE3 is often mistaken for SSE4
    SSE4

    SSE4 is an instruction set used in the Intel Intel Core microarchitecture microarchitecture and AMD_K10. It was announced on September 27, 2006 at the Fall 2006 Intel Developer Forum, with vague details in a white paper; more precise details of 47 instructions became available at the Spring 2007 Intel Developer Forum in Beijing, in the prese...
     as this term was used during the development of the Core microarchitecture
    Microarchitecture

    In computer engineering, microarchitecture is a description of the electrical circuitry of a computer, central processing unit, or digital signal processor that is sufficient for completely describing the operation of the hardware....
    .


  • SSE4
    SSE4

    SSE4 is an instruction set used in the Intel Intel Core microarchitecture microarchitecture and AMD_K10. It was announced on September 27, 2006 at the Fall 2006 Intel Developer Forum, with vague details in a white paper; more precise details of 47 instructions became available at the Spring 2007 Intel Developer Forum in Beijing, in the prese...
     is another major enhancement, adding a dot product instruction, lots of additional integer instructions, a popcnt instruction, and more. SSE4 ends MMX register support.


  • SSE5
    SSE5

    The SSE5 , announced by Advanced Micro Devices on August 30, 2007, is an extension to the 128-bit Streaming SIMD Extensions core instructions in the AMD64 instruction set for the Bulldozer processor core, due to begin production in 2011....
     is a new iteration announced by AMD in August 2007.


  • AVX
    Advanced Vector Extensions

    The Intel Advanced Vector Extensions is a set of SIMD instructions announced by Intel at the Spring Intel Developer Forum in April 2008. These instructions will appear on 2010 Intel processors such as Sandy Bridge ....
     (Advanced Vector Extensions) is an advanced version of SSE announced by Intel featuring a widened data path from 128 bits to 256 bits and 3-operand instructions (up from 2). Products implementing AVX are slated for 2010.


See also

  • Assembly language
  • MMX (instruction set)
  • 3DNow!
    3DNow!

    3DNow! is the trade name of a multimedia extension created by AMD for its processors, starting with the K6-2 in 1998. It is an addition of SIMD instructions to the traditional x86 instruction set, designed to improve a central processing unit's ability to perform the vector processing requirements of many graphic-intensive applications....
  • AltiVec
    AltiVec

    AltiVec is a floating point and integer SIMD instruction set designed and owned by Apple Inc., International Business Machines and Freescale Semiconductor, formerly the Semiconductor Products Sector of Motorola, , and implemented on versions of the PowerPC including Motorola's PowerPC G4, IBM's PowerPC 970 and POWER6 processors, and P.A....
  • Pentium III's SSE implementation
    Pentium III

    The Pentium III brand refers to Intel's 32-bit x86 desktop and mobile microprocessors based on the sixth-generation Intel P6 microarchitecture introduced on February 26, 1999....