All Topics  
SSE2

 

   Email Print
   Bookmark   Link






 

SSE2



 
 
SSE2, Streaming SIMD Extensions 2, is one of the IA-32
IA-32

IA-32 , often generically called x86 or x86-32, is the instruction set architecture of Intel's most commercially successful microprocessors....
 SIMD
SIMD

In computing, SIMD is a technique employed to achieve data level parallelism....
 (Single Instruction, Multiple Data) instruction set
Instruction set

An instruction set is a list of all the instruction , and all their variations, that a processor can execute.Instructions include:* Arithmetic such as add and subtract...
s. SSE2 was first introduced by Intel with the initial version of the Pentium 4
Pentium 4

The Pentium 4 brand refers to Intel's line of single-core mainstream Desktop computer and laptop central processing units introduced on November 20, 2000 ....
 in 2001. It extends the earlier SSE
Streaming SIMD Extensions

In computing, Streaming SIMD Extensions is a SIMD instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series processors as a reply to AMD's 3DNow! ....
 instruction set, and is intended to fully supplant MMX. Intel extended SSE2 to create SSE3
SSE3

SSE3, also known by its Intel code name Prescott New Instructions , is the third iteration of the Streaming SIMD Extensions instruction set for the IA-32 architecture....
 in 2004. SSE2 added 144 new instructions to SSE, which has 70 instructions. Rival chip-maker AMD added support for SSE2 with the introduction of their Opteron
Opteron

The Opteron is Advanced Micro Devices's x86 server Central processing unit line, and was the first processor to implement the AMD64 instruction set architecture ....
 and Athlon 64
Athlon 64

The Athlon 64 is an eighth-generation, AMD64-architecture microprocessor produced by AMD, released on September 23, 2003. It is the third processor to bear the name Athlon, and the immediate successor to the Athlon XP....
 ranges of AMD64
X86-64

x86-64 is a superset of the x86. x86-64 Central processing units can run existing 32-bit or 16-bit x86 programs at full speed, but also support new programs written with a 64-bit address space and other additional capabilities....
 64-bit CPUs in 2003.

Changes
SSE2 extends MMX instructions to operate on XMM registers, allowing the programmer to completely avoid the eight 64-bit MMX registers "aliased" on the original IA-32 floating point register stack.






Discussion
Ask a question about 'SSE2'
Start a new discussion about 'SSE2'
Answer questions from other users
Full Discussion Forum



Encyclopedia


SSE2, Streaming SIMD Extensions 2, is one of the IA-32
IA-32

IA-32 , often generically called x86 or x86-32, is the instruction set architecture of Intel's most commercially successful microprocessors....
 SIMD
SIMD

In computing, SIMD is a technique employed to achieve data level parallelism....
 (Single Instruction, Multiple Data) instruction set
Instruction set

An instruction set is a list of all the instruction , and all their variations, that a processor can execute.Instructions include:* Arithmetic such as add and subtract...
s. SSE2 was first introduced by Intel with the initial version of the Pentium 4
Pentium 4

The Pentium 4 brand refers to Intel's line of single-core mainstream Desktop computer and laptop central processing units introduced on November 20, 2000 ....
 in 2001. It extends the earlier SSE
Streaming SIMD Extensions

In computing, Streaming SIMD Extensions is a SIMD instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series processors as a reply to AMD's 3DNow! ....
 instruction set, and is intended to fully supplant MMX. Intel extended SSE2 to create SSE3
SSE3

SSE3, also known by its Intel code name Prescott New Instructions , is the third iteration of the Streaming SIMD Extensions instruction set for the IA-32 architecture....
 in 2004. SSE2 added 144 new instructions to SSE, which has 70 instructions. Rival chip-maker AMD added support for SSE2 with the introduction of their Opteron
Opteron

The Opteron is Advanced Micro Devices's x86 server Central processing unit line, and was the first processor to implement the AMD64 instruction set architecture ....
 and Athlon 64
Athlon 64

The Athlon 64 is an eighth-generation, AMD64-architecture microprocessor produced by AMD, released on September 23, 2003. It is the third processor to bear the name Athlon, and the immediate successor to the Athlon XP....
 ranges of AMD64
X86-64

x86-64 is a superset of the x86. x86-64 Central processing units can run existing 32-bit or 16-bit x86 programs at full speed, but also support new programs written with a 64-bit address space and other additional capabilities....
 64-bit CPUs in 2003.

Changes


SSE2 extends MMX instructions to operate on XMM registers, allowing the programmer to completely avoid the eight 64-bit MMX registers "aliased" on the original IA-32 floating point register stack. This permits mixing integer SIMD and scalar floating point operations without the mode switching required between MMX and x87
X87

x87 is a math-related instruction subset of the x86 architecture of Central processing unit. It is so called because initially such instructions were processed by an coprocessor#Intel coprocessors chip 8087....
 floating point operations. However, this is over-shadowed by the value of being able to perform MMX operations on the wider SSE registers.

Other SSE2 extensions include a set of cache
CPU cache

A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access computer storage. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations....
-control instructions intended primarily to minimize cache pollution
Cache pollution

Cache pollution describes situations where an executing computer program loads data into CPU cache unnecessarily, thus causing other needed data to be evicted from the cache into lower levels of the memory hierarchy, potentially all the way down to main memory, thus causing a performance hit....
 when processing indefinite streams of information, and a sophisticated complement of numeric format conversion instructions.

AMD's implementation of SSE2 on the AMD64 (x86-64
X86-64

x86-64 is a superset of the x86. x86-64 Central processing units can run existing 32-bit or 16-bit x86 programs at full speed, but also support new programs written with a 64-bit address space and other additional capabilities....
) platform includes an additional eight registers, doubling the total number to 16 (XMM0 through XMM15). These additional registers are only visible when running in 64-bit mode. Intel adopted these additional registers as part of their support for x86-64 architecture (or in Intel's parlance, "Intel 64") in 2004.

Differences between x87 FPU and SSE2

The FPU (x87) instructions usually store intermediate results with 80-bits of precision. When legacy FPU software algorithms are ported to SSE2, certain combinations of math operations or input datasets can result in measurable numerical deviation. This is of critical importance to scientific computations, if the calculation results must be compared against results generated from a different machine architecture.

A notable problem occurs when a compiler must interpret a mathematical expression consisting of several operations (adding, subtracting, dividing, multiplying). Depending on the compiler (and optimizations) used, different intermediate results of a given mathematical expression may need to be temporarily saved, and later reloaded. This results in a truncation from 80-bits to 64-bits in the x87 FPU. Depending on when this truncation is executed, the final numerical result may end up different. The following Fortran code compiled with G95
G95

G95 is a free software license, porting, open source Fortran 95 compiler. It implements the Fortran 95 standard, part of the Fortran 2003 standard and some old and new extensions including proposed features for the Fortran#Fortran_2008 standard like Co-array Fortran....
 is offered as an example.

program hi real a,b,c,d real x,y,z a=.013 b=.027 c=.0937 d=.79 y=-a/b + (a/b+c)*EXP(d) print *,y z=(-a)/b + (a/b+c)*EXP(d) print *,z x=y-z print *,x end

Compiling to 387 floating point instructions and running yields: # g95 -o hi -mfpmath=387 -fzero -ftrace=full -fsloppy-char hi.for # ./hi 0.78587145 0.7858714 5.9604645E-8

Compiling to SSE2 instructions and running yields: # g95 -o hi -mfpmath=sse -msse2 -fzero -ftrace=full -fsloppy-char hi.for # ./hi 0.78587145 0.78587145 0.

Differences between MMX and SSE2

SSE2 extends MMX instructions to operate on XMM registers. Therefore, it is possible to convert all existing MMX code to SSE2 equivalent. Since an XMM register is two times as long as an MMX register, loop counters and memory access may need to be changed to accommodate this.

Although one SSE2 instruction can operate on twice as much data as an MMX instruction, performance might not increase significantly. Two major reasons are: accessing SSE2 data in memory not aligned
Data structure alignment

Data structure alignment is the way data is arranged and accessed in computer memory. It consists of two separate but related issues: data alignment and data structure padding....
 to a 16-byte boundary will incur significant penalty, and the throughput
Throughput

In communication networks, such as Ethernet or packet radio, throughput is the average rate of successful message delivery over a communication channel....
 of SSE2 instructions in most x86 implementations is usually smaller than MMX instructions. Intel has recently addressed the first problem by adding an instruction in SSE3
SSE3

SSE3, also known by its Intel code name Prescott New Instructions , is the third iteration of the Streaming SIMD Extensions instruction set for the IA-32 architecture....
 to reduce the overhead of accessing unaligned data, and the last problem by widening the execution engine in their Core microarchitecture
Intel Core microarchitecture

The Intel Core microarchitecture is a multi-core central processing unit microarchitecture unveiled by Intel in Q1 2006. It is based around an updated version of the Intel Core core and could be considered the latest iteration of the Intel P6 microarchitecture, which traces its history back to the 1995 Pentium Pro....
.

Compiler usage

When first introduced in 2000, SSE2 was not supported by software development tools. For example, to use SSE2 in a Microsoft Developer Studio
Microsoft Visual Studio

Microsoft Visual Studio is an integrated development environment from Microsoft. It can be used to develop Console application and graphical user interface applications along with Windows Forms applications, web sites, web applications, and web services in both native code together with managed code for all platforms supported by Microsoft W...
 project, the programmer had to either manually write inline-assembly or import object-code from an external source. Later the Visual C++ Processor Pack added SSE2 support to Visual C++
Visual C++

Microsoft Visual C++ is a commercial integrated development environment product engineered by Microsoft for the C , C++, and C++/CLI programming languages....
 and MASM.

The Intel C++ Compiler
Intel C++ Compiler

Intel C++ Compiler describes a group of C /C++ compilers from Intel Corporation. Compilers are available for Linux, Microsoft Windows and Mac OS X....
 can automatically generate SSE4/SSSE3/SSE3/SSE2 and/or SSE-code without the use of hand-coded assembly, letting programmers focus on algorithmic development instead of assembly-level implementation. Since its introduction, the Intel C Compiler has greatly increased adoption of SSE2 in Windows application development.

Since GCC 3, GCC
GNU Compiler Collection

The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain....
 can automatically generate SSE/SSE2 scalar code when the target supports those instructions. Automatic vectorization for SSE/SSE2 has been added since GCC 4.

The Sun Studio Compiler Suite can also generate SSE2 instructions when the compiler flag -xvector=simd is used.

CPUs supporting SSE2


  • AMD K8-based CPUs (Athlon 64
    Athlon 64

    The Athlon 64 is an eighth-generation, AMD64-architecture microprocessor produced by AMD, released on September 23, 2003. It is the third processor to bear the name Athlon, and the immediate successor to the Athlon XP....
    , Sempron 64
    Sempron

    Sempron has been the marketing name used by AMD for several different entry level desktop CPUs, using several different technologies and CPU socket formats....
    , Turion 64
    Turion 64

    Turion 64 is the brand name AMD applies to its 64-bit low-consumption Central Processing Unit codenamed K8L. The Turion 64 and Turion 64 X2 processors compete with Intel's mobile processors, initially the Pentium M and currently the Intel Core and Intel Core 2 processors....
    , etc)
  • AMD Phenom
    Phenom (processor)

    Phenom is the Advanced Micro Devices desktop processor line based on the AMD K10 microarchitecture, or Family 10h Processors, as AMD calls them....
     CPUs
  • Intel NetBurst
    NetBurst

    The Intel NetBurst Microarchitecture, called P68 inside Intel, was the successor to the Intel P6 microarchitecture in the x86 family of central processing units made by Intel....
    -based CPUs (Pentium 4
    Pentium 4

    The Pentium 4 brand refers to Intel's line of single-core mainstream Desktop computer and laptop central processing units introduced on November 20, 2000 ....
    , Xeon
    Xeon

    The Xeon brand refers to many families of Intel Corporation's x86 architecture multiprocessing Central processing units ? for dual processor and multi-processor configuration on a single motherboard targeted at non-consumer markets of server and workstation computers, and also at blade servers and embedded systems....
    , Celeron
    Celeron

    The Celeron brand is a range of x86 CPUs from Intel targeted at budget/value personal computers?with the motto, "delivering great quality at an exceptional value"....
    , Celeron D, etc)
  • Intel Pentium M
    Pentium M

    The Pentium M brand refers to only two single-core 32-bit x86 microprocessors introduced in March 2003 , and forming a part of the Intel Centrino platform....
     and Celeron M
  • Intel Core
    Intel Core

    The Core brand refers to Intel's 32-bit mobile dual-core x86 CPUs that derived from the Pentium M branded processors. The processor family used a more advanced version of the Intel P6 microarchitecture....
    -based CPUs (Core Duo, Core Solo, etc)
  • Intel Core 2
    Intel Core 2

    The Core 2 brand refers to a range of Intel's consumer 64-bit single- and dual-core and 2x2 Multi-Chip Module quad-core CPUs with the x86-64 instruction set, based on the Intel Core microarchitecture, derived from the 32-bit dual-core Intel Core laptop processor....
    -based CPUs (Core 2 Duo, Core 2 Quad, etc)
  • Intel Atom
    Intel Atom

    Intel Atom is the brand name for a line of x86 and x86-64 CPUs from Intel, previously List of Intel codenames Silverthorne and Diamondville processors, designed for a 45 nm CMOS process and intended for use in MIDs, smart phones and ultra-mobile PCs meant for portable and low-power applications....
  • Transmeta
    Transmeta

    Transmeta Corporation was a United States-based corporation that licensed low power semiconductor intellectual property. Transmeta originally produced very long instruction word code morphing microprocessors, with a focus on reducing power consumption in electronic devices....
     Efficeon
  • VIA
    VIA Technologies

    VIA Technologies is a Taiwanese manufacturer of integrated circuits, mainly motherboard chipsets, Central processing unit, and computer memory, and is part of the Formosa Plastics Group....
     C7
    VIA C7

    The VIA C7 is an x86 central processing unit designed by Centaur Technology and sold by VIA Technologies....
  • VIA
    VIA Technologies

    VIA Technologies is a Taiwanese manufacturer of integrated circuits, mainly motherboard chipsets, Central processing unit, and computer memory, and is part of the Formosa Plastics Group....
     Nano
    VIA Nano

    The VIA Nano is a 64-bit central processing unit for personal computers released by VIA Technologies in 2008 after five years of developmentby its CPU division, Centaur Technology....


Notable IA-32 CPUs not supporting SSE2


SSE2 is an extension of the IA-32
IA-32

IA-32 , often generically called x86 or x86-32, is the instruction set architecture of Intel's most commercially successful microprocessors....
 architecture. Therefore any architecture that does not support IA-32 does not support SSE2. x86-64
X86-64

x86-64 is a superset of the x86. x86-64 Central processing units can run existing 32-bit or 16-bit x86 programs at full speed, but also support new programs written with a 64-bit address space and other additional capabilities....
 CPUs all implement IA-32
IA-32

IA-32 , often generically called x86 or x86-32, is the instruction set architecture of Intel's most commercially successful microprocessors....
. All known x86-64
X86-64

x86-64 is a superset of the x86. x86-64 Central processing units can run existing 32-bit or 16-bit x86 programs at full speed, but also support new programs written with a 64-bit address space and other additional capabilities....
 CPUs also implement SSE2. Since IA-32 predates SSE2, early IA-32 CPUs did not implement it. SSE2 and the other SIMD instruction sets were intended primarily to improve CPU support for realtime graphics, notably gaming. A CPU that is not marketed for this purpose or that has an alternative SIMD instruction set has no need for SSE2.

The following CPUs implemented IA-32 after SSE2 was developed, but did not implement SSE2:

  • AMD CPUs prior to Athlon 64
    Athlon 64

    The Athlon 64 is an eighth-generation, AMD64-architecture microprocessor produced by AMD, released on September 23, 2003. It is the third processor to bear the name Athlon, and the immediate successor to the Athlon XP....
    , including all Socket A
    Socket A

    Socket A is the CPU socket used for AMD central processing unit ranging from the Athlon Thunderbird#Athlon Thunderbird .28T-Bird.29 to the Athlon#Athlon XP 3200+, and AMD budget processors including the Duron and Sempron....
    -based CPUs
  • Intel CPUs prior to Pentium 4
    Pentium 4

    The Pentium 4 brand refers to Intel's line of single-core mainstream Desktop computer and laptop central processing units introduced on November 20, 2000 ....
  • Via
    VIA Technologies

    VIA Technologies is a Taiwanese manufacturer of integrated circuits, mainly motherboard chipsets, Central processing unit, and computer memory, and is part of the Formosa Plastics Group....
     C3
    VIA C3

    The VIA C3 is a family of x86 central processing units for personal computers designed by Centaur Technology and sold by VIA Technologies. The different CPU cores are built following the Centaur Technology#Design_methodology....
  • Transmeta
    Transmeta

    Transmeta Corporation was a United States-based corporation that licensed low power semiconductor intellectual property. Transmeta originally produced very long instruction word code morphing microprocessors, with a focus on reducing power consumption in electronic devices....
     Crusoe


See also

  • Streaming SIMD Extensions
    Streaming SIMD Extensions

    In computing, Streaming SIMD Extensions is a SIMD instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series processors as a reply to AMD's 3DNow! ....
     (SSE)
  • SSE3
    SSE3

    SSE3, also known by its Intel code name Prescott New Instructions , is the third iteration of the Streaming SIMD Extensions instruction set for the IA-32 architecture....
  • SSSE3
    SSSE3

    Supplemental Streaming SIMD Extension 3 is Intel's name for the Streaming SIMD Extensions instruction set's fourth iteration. The previous version was SSE3, and Intel have added an S rather than increment the version number, as they appear to consider it merely a revision of SSE3....
  • SSE4
    SSE4

    SSE4 is an instruction set used in the Intel Intel Core microarchitecture microarchitecture and AMD_K10. It was announced on September 27, 2006 at the Fall 2006 Intel Developer Forum, with vague details in a white paper; more precise details of 47 instructions became available at the Spring 2007 Intel Developer Forum in Beijing, in the prese...
  • SIMD
    SIMD

    In computing, SIMD is a technique employed to achieve data level parallelism....
  • 3DNow!
    3DNow!

    3DNow! is the trade name of a multimedia extension created by AMD for its processors, starting with the K6-2 in 1998. It is an addition of SIMD instructions to the traditional x86 instruction set, designed to improve a central processing unit's ability to perform the vector processing requirements of many graphic-intensive applications....
     Professional
  • x86 instruction listings
    X86 instruction listings

    The x86 instruction set has undergone numerous changes over time. Most of them were to add new functionality to the instruction set....