All Topics  
SIMD

 

   Email Print
   Bookmark   Link






 

SIMD



 
 
In computing
Computing

Computing is usually defined as the activity of using and developing computer technology, computer hardware and computer software. It is the computer-specific part of information technology....
, SIMD (Single Instruction, Multiple Data) is a technique employed to achieve data level parallelism.

rcomputers, popular in the 1980s such as the CRAY X-MP were called "vector processors." The CRAY X-MP had up to four vector processors which could function independently or work together using a programming model called "autotasking". Autotasking was similar to OpenMP
OpenMP

The OpenMP is an application programming interface that supports multi-platform shared memory multiprocessing programming in C , C++ and Fortran on many architectures, including Unix and Microsoft Windows platforms....
.






Discussion
Ask a question about 'SIMD'
Start a new discussion about 'SIMD'
Answer questions from other users
Full Discussion Forum



Encyclopedia


In computing
Computing

Computing is usually defined as the activity of using and developing computer technology, computer hardware and computer software. It is the computer-specific part of information technology....
, SIMD (Single Instruction, Multiple Data) is a technique employed to achieve data level parallelism.

History

Supercomputers, popular in the 1980s such as the CRAY X-MP were called "vector processors." The CRAY X-MP had up to four vector processors which could function independently or work together using a programming model called "autotasking". Autotasking was similar to OpenMP
OpenMP

The OpenMP is an application programming interface that supports multi-platform shared memory multiprocessing programming in C , C++ and Fortran on many architectures, including Unix and Microsoft Windows platforms....
. These machines had very fast scalar processors and also vector processors for long vector computations, for example, adding two vectors of 100 numbers each. The CRAY X-MP vector processors were pipelined and had multiple functional units. Pipelining allowed for one single instruction to move a long array of numbers sequentially into a vector register. Multiple registers, compute units, pipelining, and chaining allowed vector computers to compute Z = X*Y+V/W rapidly by streaming data into registers to hide memory latency, overlapping computations, and producing a resultant (Z) at each clock cycle.

The first era of SIMD machines was characterized by supercomputers such as the Thinking Machines CM-1 and CM-2. These machines had many limited functionality processors that would work in parallel. For example, each of 64,000 processors in a Thinking Machines CM-2 would execute the same instruction at the same time so that you could do 64,000 multiplies on 64,000 pairs of numbers at a time.

Supercomputing moved away from the SIMD approach when inexpensive scalar MIMD
MIMD

In computing, MIMD is a technique employed to achieve parallelism. Machines using MIMD have a number of processors that function Asynchrony and independently....
 approaches based on commodity processors such as the Intel i860 XP became more powerful, and interest in SIMD waned. Later, personal computers became common, and became powerful enough to support real-time gaming. This created a mass demand for a particular type of computing power, and microprocessor vendors turned to SIMD to meet the demand. The first widely-deployed SIMD for gaming was Intel's MMX extensions to the x86 architecture. IBM and Motorola then added AltiVec
AltiVec

AltiVec is a floating point and integer SIMD instruction set designed and owned by Apple Inc., International Business Machines and Freescale Semiconductor, formerly the Semiconductor Products Sector of Motorola, , and implemented on versions of the PowerPC including Motorola's PowerPC G4, IBM's PowerPC 970 and POWER6 processors, and P.A....
 to the POWER
Power

Power refers broadly to any ability to cause change or exert control over either things or people, subjects or objects....
 architecture, and there have been several extensions to the SIMD instruction sets for both architectures. All of these developments have been oriented toward support for real-time graphics, and are therefore oriented toward vectors of two, three, or four dimensions. When new SIMD architectures need to be distinguished from older ones, the newer architectures are then considered "short-vector" architectures. A modern supercomputer is almost always a cluster of MIMD machines, each of which implements (short-vector) SIMD instructions. A modern desktop computer is often a multiprocessor MIMD machine where each processor can execute short-vector SIMD instructions.

DSPs

A separate class of processors exist for this sort of task, commonly referred to as Digital Signal Processor
Digital signal processor

A digital signal processor is a specialized microprocessor designed specifically for digital signal processing, generally in real-time computing....
s, or DSPs. The main difference between DSP and other SIMD-capable CPUs is that the DSPs are self-contained processors with their own (often difficult to use) instruction set, while SIMD-extensions rely on the general-purpose portions of the CPU
Central processing unit

A central processing unit is an electronic circuit that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage....
 to handle the program details, and the SIMD instructions handle the data manipulation only. DSPs also tend to include instructions to handle specific types of data, sound or video for instance, while SIMD systems are considerably of more generic purpose. DSPs generally operate in Scratchpad RAM
Scratchpad RAM

Scratchpad memory , also known as scratchpad, scatchpad RAM or local store in computer terminology, is a high-speed internal memory used for temporary storage of calculations, data, and other work in progress....
 driven by DMA transfers initiated from the host system and are unable to access external memory.

Some DSP include SIMD instruction sets. The inclusion of SIMD units in general purpose processors has supplanted the use of DSP chips in computer systems, though they continue to be used in embedded applications. A sliding scale exists - the Cell's SPUs and the Ageia Physics Processing Unit
Physics processing unit

A physics processing unit is a dedicated microprocessor designed to handle the calculations of physics, especially in the physics engine of video games....
 could be considered half way between CPUs & DSPs, in that they are optimized for numeric tasks & operate in local store, but they can autonomously control their own transfers thus are in effect true CPUs.

Advantages

An application that may take advantage of SIMD is one where the same value is being added (or subtracted) to a large number of data points, a common operation in many multimedia
Multimedia

Multimedia is media and content that utilizes a combination of different content format. The term can be used as a noun or as an adjective describing a medium as having multiple content forms....
 applications. One example would be changing the brightness of an image. Each pixel
Pixel

In digital imaging, a pixel is the smallest item of information in an image. Pixels are normally arranged in a 2-dimensional grid, and are often represented using dots, squares, or rectangles....
 of an image consists of three values for the brightness of the red, green and blue portions of the color. To change the brightness, the R G and B values are read from memory, a value is added (or subtracted) from them, and the resulting values are written back out to memory.

With a SIMD processor there are two improvements to this process. For one the data is understood to be in blocks, and a number of values can be loaded all at once. Instead of a series of instructions saying "get this pixel, now get the next pixel", a SIMD processor will have a single instruction that effectively says "get lots of pixels" ("lots" is a number that varies from design to design). For a variety of reasons, this can take much less time than "getting" each pixel individually, like with traditional CPU design.

Another advantage is that SIMD systems typically include only those instructions that can be applied to all of the data in one operation. In other words, if the SIMD system works by loading up eight data points at once, the add operation being applied to the data will happen to all eight values at the same time. Although the same is true for any superscalar
Superscalar

A superscalar Central processing unit architecture implements a form of parallel computer called instruction level parallelism within a single processor....
 processor design, the level of parallelism in a SIMD system is typically much higher.

Disadvantages


  • Not all algorithms can be vectorized. For example, a flow-control-heavy task like code parsing
    Parsing

    In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a sequence of lexical analysis#Token to determine their grammatical structure with respect to a given formal grammar....
     wouldn't benefit from SIMD.


  • Currently, implementing an algorithm with SIMD instructions usually requires human labor; most compilers don't generate SIMD instructions from a typical C
    C (programming language)

    C is a general-purpose computer programming language originally developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories to implement the Unix operating system....
     program, for instance. Vectorization in compilers is an active area of computer science research. (Compare vector processing.)


  • Programming with particular SIMD instruction sets can involve numerous low-level challenges.
    • SSE2 has restrictions on data alignment; programmers used to the x86 architecture may not expect this.
    • Gathering data into SIMD registers and scattering it to the correct destination locations is tricky and can be inefficient.
    • Specific instructions like rotations or three-operand addition aren't in some SIMD instruction sets.
    • Instruction sets are architecture-specific: old processors and non-x86 processors lack SSE2 entirely, for instance, so programmers must provide non-vectorized implementations (or different vectorized implementations) for them. Similarly, the next-generation instruction sets from Intel and AMD will be incompatible with each other (see SSE5
      SSE5

      The SSE5 , announced by Advanced Micro Devices on August 30, 2007, is an extension to the 128-bit Streaming SIMD Extensions core instructions in the AMD64 instruction set for the Bulldozer processor core, due to begin production in 2011....
       and AVX
      Advanced Vector Extensions

      The Intel Advanced Vector Extensions is a set of SIMD instructions announced by Intel at the Spring Intel Developer Forum in April 2008. These instructions will appear on 2010 Intel processors such as Sandy Bridge ....
      ).
    • The early MMX
      MMX

      MMX is a SIMD instruction set designed by Intel, introduced in 1997 in their Pentium line of microprocessors, designated as "Pentium with MMX Technology"....
       instruction set shared a register file with the floating-point stack, which caused inefficiencies when mixing floating-point and MMX code. SSE2
      SSE2

      SSE2, Streaming SIMD Extensions 2, is one of the IA-32 SIMD instruction sets. SSE2 was first introduced by Intel with the initial version of the Pentium 4 in 2001....
       corrects this.


Chronology

The first use of SIMD instructions was in vector supercomputers
Vector processor

A vector processor, or array processor, is a Central processing unit design where the instruction set includes operations that can perform mathematical operations on multiple data elements simultaneously....
 of the early 1970s such as the CDC Star-100
CDC STAR-100

The STAR-100 was a supercomputer from Control Data Corporation , one of the first machines to use a vector processor for improved math performance....
 and the Texas Instruments ASC. Vector processing was especially popularized by Cray
Cray

Cray Inc. is a supercomputer manufacturer based in Seattle, Washington. The company's predecessor, Cray Research, Inc. , was founded in 1972 by computer designer Seymour Cray....
 in the 1970s and 1980s.

Later machines used a much larger number of relatively simple processors in a massively parallel processing-style configuration. Some examples of this type of machine included:

  • ILLIAC IV
    ILLIAC IV

    The ILLIAC IV was one of the most infamous supercomputers ever. Last in a series of research machines, the ILLIAC from the University of Illinois at Urbana-Champaign, the ILLIAC IV design featured fairly high parallel computing with up to 256 processors, used to allow the machine to work on large data sets in what would later be known as vect...
    , circa 1974
  • ICL Distributed Array Processor (DAP), circa 1974
  • Burroughs Scientific Processor, circa 1976
  • Geometric-Arithmetic Parallel Processor
    Geometric-Arithmetic Parallel Processor

    The GAPP , invented by Poland mathematics Wlodzimierz Holsztynski in 1981, was patented by Martin Marietta and is now owned by Silicon Optix, Inc....
    , from Martin Marietta
    Martin Marietta

    Martin Marietta Corporation was founded in 1961 through the merger of Glenn L. Martin Company and American-Marietta Corporation. The combined company became a leader in Construction aggregates, cement, Chemical industry, aerospace, and electronics....
    , starting in 1981, continued at Lockheed Martin
    Lockheed Martin

    Lockheed Martin is a large Multinational corporation aerospace manufacturer and advanced technology company formed in 1995 by the Horizontal integration of Lockheed with Martin Marietta....
    , then at and Silicon Optix
    Silicon Optix

    Silicon Optix Inc is a privately held fabless semiconductor company that manufactures video/image digital processing integrated circuits. Originally a division of Genesis Microchip, Silicon Optix was spun off in 2001 by Paul Russo, the CEO of Genesis Microchip at the time. Silicon Optix acquired Teranex and its patents on the Geo...
  • Massively Parallel Processor
    Goodyear MPP

    The Goodyear Massively Parallel Processor was amassively parallel processing supercomputer built by Goodyear Aerospacefor the NASA Goddard Space Flight Center....
     (MPP), from NASA
    NASA

    The National Aeronautics and Space Administration is an agency of the Federal government of the United States, responsible for the nation's public list of space agencies....
    /Goddard Space Flight Center
    Goddard Space Flight Center

    File:Goddard aerial.gifThe Goddard Space Flight Center is a major NASA space research laboratory established on May 1, 1959 as NASA's first space flight center....
    , circa 1983-1991
  • Connection Machine
    Connection Machine

    The Connection Machine was a series of supercomputers that grew out of W. Daniel Hillis research in the early 1980s at Massachusetts Institute of Technology on alternatives to the traditional von Neumann architecture of computation....
    , models 1 and 2 (CM-1 and CM-2), from Thinking Machines Corporation, circa 1985
  • MasPar
    MasPar

    MasPar Computer Corporation was a minisupercomputer vendor that was founded in 1987 by Jeff Kalb. The company was based in Santa Clara, California....
     MP-1 and MP-2, circa 1987-1996
  • Zephyr DTC computer from Wavetracer, circa 1991
  • Xplor
    Xplor

    Xplor may refer to:* Xplor International , an organization to facilitate the use of electronic documents* X-PLOR and XPLOR-NIH, Protein nuclear magnetic resonance spectroscopy software...
    , from Pyxsys, Inc., circa 2001


There were many others from that era too.

Hardware

Small-scale (64 or 128 bits) SIMD has become popular on general-purpose CPUs, starting in 1989 with the introduction of the Digital Equipment Corporation
Digital Equipment Corporation

Digital Equipment Corporation was a pioneering United States company in the computer industry. It is often referred to within the computing industry as DEC ....
 VAX
VAX

VAX was an instruction set architecture developed by Digital Equipment Corporation in the mid-1970s. A 32-bit complex instruction set computer ISA, it was designed to extend or replace DEC's various Programmed Data Processor ISAs....
 Vector instructions in the Rigel
Rigel (microprocessor)

Rigel was a microprocessor chip set developed and fabricated by Digital Equipment Corporation that implemented the VAX instruction set architecture ....
 chip set, and continuing through 1997 and later with Motion Video Instructions (MVI) for Alpha
DEC Alpha

Alpha, originally known as Alpha AXP, was a 64-bit reduced instruction set computer instruction set architecture developed by Digital Equipment Corporation , designed to replace the 32-bit VAX complex instruction set computer ISA and its implementations....
. SIMD instructions can be found, to one degree or another, on most CPUs, including the IBM
IBM

International Business Machines Corporation, abbreviated IBM and nicknamed "Big Blue" , is a multinational corporation computer technology and consulting corporation headquartered in Armonk, New York, New York, United States....
's AltiVec
AltiVec

AltiVec is a floating point and integer SIMD instruction set designed and owned by Apple Inc., International Business Machines and Freescale Semiconductor, formerly the Semiconductor Products Sector of Motorola, , and implemented on versions of the PowerPC including Motorola's PowerPC G4, IBM's PowerPC 970 and POWER6 processors, and P.A....
 and SPE for PowerPC
PowerPC

PowerPC is a RISC instruction set architecture created by the 1991 Apple Inc.?IBM?Motorola alliance, known as AIM alliance. Originally intended for personal computers, PowerPC CPUs have since become popular embedded system and high-performance processors....
, HP
Hewlett-Packard

The Hewlett-Packard Company , commonly referred to as HP, is a technology corporation headquartered in Palo Alto, California, United States....
's PA-RISC Multimedia Acceleration eXtensions
Multimedia Acceleration eXtensions

The Multimedia Acceleration eXtensions or MAX are a set of SIMD computational units that were developed by HP for their PA-RISC line of processors....
 (MAX), Intel's MMX and iwMMXt, SSE
Streaming SIMD Extensions

In computing, Streaming SIMD Extensions is a SIMD instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series processors as a reply to AMD's 3DNow! ....
, SSE2
SSE2

SSE2, Streaming SIMD Extensions 2, is one of the IA-32 SIMD instruction sets. SSE2 was first introduced by Intel with the initial version of the Pentium 4 in 2001....
, SSE3
SSE3

SSE3, also known by its Intel code name Prescott New Instructions , is the third iteration of the Streaming SIMD Extensions instruction set for the IA-32 architecture....
 and SSSE3
SSSE3

Supplemental Streaming SIMD Extension 3 is Intel's name for the Streaming SIMD Extensions instruction set's fourth iteration. The previous version was SSE3, and Intel have added an S rather than increment the version number, as they appear to consider it merely a revision of SSE3....
, AMD's 3DNow!
3DNow!

3DNow! is the trade name of a multimedia extension created by AMD for its processors, starting with the K6-2 in 1998. It is an addition of SIMD instructions to the traditional x86 instruction set, designed to improve a central processing unit's ability to perform the vector processing requirements of many graphic-intensive applications....
, ARC
ARC International

ARC International plc is a developer of configurable microprocessor technology. ARC develops synthesisable IP and licenses it to semiconductor companies....
's ARC Video subsystem, SPARC
SPARC

SPARC is a Reduced Instruction Set Computer microprocessor instruction set Computer architecture originally designed in 1985 by Sun Microsystems....
's VIS
Visual Instruction Set

Visual Instruction Set, or VIS, is a SIMD instruction set for SPARC microprocessors, implemented on all UltraSPARC and some SPARC64 microprocessors....
, Sun
Sun Microsystems

Sun Microsystems, Inc. is a multinational corporation vendor of computers, computer components, computer software, and information technology services, founded on February 24, 1982....
's MAJC
MAJC

MAJC was a Sun Microsystems multi-core, multithreaded, very long instruction word microprocessor design from the mid-to-late 1990s. Originally called the UltraJava processor, the MAJC processor was targeted at running Java programs, whose "late compiling" allowed Sun to make several favourable design decisions....
, ARM
ARM Holdings

ARM Holdings is a technology company headquartered in Cambridge, England, UK. The company is best known for its processors, although it also designs, licenses and sells software development tools under the RealView and KEIL brands, Platform , system-on-a-chip infrastructure and software....
's NEON
ARM architecture

The ARM architecture is a 32-bit RISC central processing unit architecture developed by ARM Limited that is widely used in embedded system designs....
 technology, MIPS
MIPS architecture

MIPS is a RISC instruction set architecture developed by MIPS Technologies . In the mid to late 1990s, it was estimated that one in three RISC microprocessors produced were MIPS implementations....
' MDMX (MaDMaX) and MIPS-3D. The IBM, Sony, Toshiba co-developed Cell Processor's SPU's instruction set is heavily SIMD based.

Modern Graphics Processing Units are often wide SIMD implementations, capable of branches, loads, and stores on 128 or 256 bits at a time.

Future processors promise greater SIMD capability: Intel's AVX
Advanced Vector Extensions

The Intel Advanced Vector Extensions is a set of SIMD instructions announced by Intel at the Spring Intel Developer Forum in April 2008. These instructions will appear on 2010 Intel processors such as Sandy Bridge ....
 instructions will process 256 bits of data at once, and Intel's Larrabee GPU
Larrabee (GPU)

Larrabee is the Codename for a graphics processing unit chip that Intel is developing separately from its Intel GMA. Larrabee is expected to compete with GeForce and Radeon products from NVIDIA and ATI Technologies respectively....
 promises to 512-bit SIMD registers on each of its cores.

Software

SIMD instructions are widely used to process 3D graphics, although modern graphics cards with embedded SIMD have largely taken over this task from the CPU. Some systems also include permute functions that re-pack elements inside vectors, making them particularly useful for data processing and compression. They are also used in cryptography. The trend of general-purpose computing on GPUs (GPGPU
GPGPU

General-purpose computing on graphics processing units is the technique of using a graphics processing unit, which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the central processing unit....
) may lead to wider use of SIMD in the future.

Adoption of SIMD systems in personal computer
Personal computer

A personal computer is any general-purpose computer whose original sales price, size, and capabilities make it useful for individuals, and which is intended to be operated directly by an end user, with no intervening computer operator....
 software was at first slow, due to a number of problems. One was that many of the early SIMD instruction sets tended to slow overall performance of the system due to the re-use of existing floating point registers. Other systems, like MMX and 3DNow!
3DNow!

3DNow! is the trade name of a multimedia extension created by AMD for its processors, starting with the K6-2 in 1998. It is an addition of SIMD instructions to the traditional x86 instruction set, designed to improve a central processing unit's ability to perform the vector processing requirements of many graphic-intensive applications....
, offered support for data types that were not interesting to a wide audience and had expensive context switching instructions to switch between using the FPU
Floating point unit

A floating-point unit is a part of a computer system specially designed to carry out operations on floating point numbers. Typical operations are addition, subtraction, multiplication, division , and square root....
 and MMX registers
Processor register

In computer architecture, a processor register is a small amount of Computer storage available on the CPU whose contents can be accessed more quickly than storage available elsewhere....
. Compilers also often lacked support requiring programmers to resort to assembly language
Assembly language

An assembly language is a low-level language for programming computers. It implements a symbolic representation of the numeric machine codes and other constants needed to program a particular CPU architecture....
 coding.

SIMD on x86 had a slow start. The introduction of 3DNow!
3DNow!

3DNow! is the trade name of a multimedia extension created by AMD for its processors, starting with the K6-2 in 1998. It is an addition of SIMD instructions to the traditional x86 instruction set, designed to improve a central processing unit's ability to perform the vector processing requirements of many graphic-intensive applications....
 by AMD and SSE
Streaming SIMD Extensions

In computing, Streaming SIMD Extensions is a SIMD instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series processors as a reply to AMD's 3DNow! ....
 by Intel confused matters somewhat, but today the system seems to have settled down (after AMD adopted SSE) and newer compilers should result in more SIMD-enabled software. Intel and AMD now both provide optimized math libraries that use SIMD instructions, and open source alternatives like libSIMD and SIMDx86 have started to appear.

Apple Computer
Apple Computer

Apple Inc., formerly Apple Computer Inc., is an United States multinational corporation which designs and manufactures consumer electronics and software products....
 had somewhat more success, even though they entered the SIMD market later than the rest. AltiVec
AltiVec

AltiVec is a floating point and integer SIMD instruction set designed and owned by Apple Inc., International Business Machines and Freescale Semiconductor, formerly the Semiconductor Products Sector of Motorola, , and implemented on versions of the PowerPC including Motorola's PowerPC G4, IBM's PowerPC 970 and POWER6 processors, and P.A....
 offered a rich system and can be programmed using increasingly sophisticated compilers from Motorola
Motorola

Motorola, Inc. is an United States, multinational, Fortune 100, telecommunications company based in Schaumburg, Illinois. It is a manufacturer of wireless telephone handsets, also designing and selling wireless network infrastructure equipment such as cellular transmission base stations and signal amplifiers....
, IBM
IBM

International Business Machines Corporation, abbreviated IBM and nicknamed "Big Blue" , is a multinational corporation computer technology and consulting corporation headquartered in Armonk, New York, New York, United States....
 and GNU
GNU

GNU is a computer operating system composed entirely of free software. Its name is a recursive acronym for GNU's Not Unix; it was chosen because its design is Unix-like, but differs from Unix by being free software and containing no Unix code....
, therefore assembly language programming is rarely needed. Additionally, many of the systems that would benefit from SIMD were supplied by Apple itself, for example iTunes
ITunes

iTunes is a Proprietary software digital media media player application, used for playing and organizing digital music and video files. The program is also an interface to manage the contents on Apple's popular iPod digital media players as well as the iPhone....
 and QuickTime
QuickTime

QuickTime is a multimedia framework developed by Apple Inc., capable of handling various formats of digital video, media clips, sound, text, animation, music, and QuickTime VRs....
. However, in 2006, Apple computers moved to Intel x86 processors. Apple's API
Application programming interface

An application programming interface is a set of subroutine, data structures, class and/or Protocol provided by library and/or operating system Service s in order to support the building of applications....
s and development tools
Integrated development environment

An integrated development environment also known as integrated design environment or integrated debugging environment is a software application that provides comprehensive facilities to computer programmers for software development....
 (XCode
Xcode

Xcode is a suite of tools for developing software on Mac OS X, developed by Apple Inc.. Xcode 3.0, the latest major version, is bundled free with Mac OS X v10.5, though it is not installed by default....
) were rewritten to use SSE2
SSE2

SSE2, Streaming SIMD Extensions 2, is one of the IA-32 SIMD instruction sets. SSE2 was first introduced by Intel with the initial version of the Pentium 4 in 2001....
 and SSE3
SSE3

SSE3, also known by its Intel code name Prescott New Instructions , is the third iteration of the Streaming SIMD Extensions instruction set for the IA-32 architecture....
 instead of AltiVec. Apple was the dominant purchaser of PowerPC chips from IBM and Freescale Semiconductor
Freescale Semiconductor

Freescale Semiconductor, Inc. is an American semiconductor manufacturer. It was created by the divestiture of the Semiconductor Products Sector of Motorola in 2004....
 and even though they abandoned the platform, further development of AltiVec is continued in several Power Architecture
Power Architecture

Power Architecture is a broad term to describe similar RISC instruction sets for microprocessors developed and manufactured by such companies as IBM, Freescale, Applied Micro Circuits Corporation, Tundra Semiconductor and P.A....
 designs from Freescale, IBM and P.A. Semi
P.A. Semi

P.A. Semi was a fabless semiconductor company founded in Santa Clara, California in 2003 by Dan Dobberpuhl who was the lead designer for the Digital Equipment Corporation DEC Alpha and StrongARM processors....
.

SIMD within a register, or SWAR
SWAR

OverviewSWAR is an acronym for SIMD Within A Register.SIMD, in turn, stands for Single Instruction, Multiple Data....
, is a range of techniques and tricks used for performing SIMD in general-purpose registers on hardware that doesn't provide any direct support for SIMD instructions. This can be used to exploit parallelism in certain algorithms even on hardware that does not support SIMD directly.

Commercial applications

Though it has generally proven difficult to find sustainable commercial applications for SIMD-only processors, one that has had some measure of success is the GAPP
Geometric-Arithmetic Parallel Processor

The GAPP , invented by Poland mathematics Wlodzimierz Holsztynski in 1981, was patented by Martin Marietta and is now owned by Silicon Optix, Inc....
, which was developed by Lockheed Martin
Lockheed Martin

Lockheed Martin is a large Multinational corporation aerospace manufacturer and advanced technology company formed in 1995 by the Horizontal integration of Lockheed with Martin Marietta....
 and taken to the commercial sector by their spin-off Teranex. The GAPP's recent incarnations have become a powerful tool in real-time video processing
Digital image processing

Digital image processing is the use of computer algorithms to perform on digital images. As a subfield of digital signal processing, digital image processing has many advantages over analog image processing; it allows a much wider range of algorithms to be applied to the input data, and can avoid problems such as the build-up of noise and si...
 applications like conversion between various video standards and frame rates (NTSC
NTSC

NTSC is the analog television system used in most of the Americas, Japan, South Korea, Taiwan, the Philippines, Burma, and some Pacific island nations and territories ....
 to/from PAL
PAL

PAL, short for Phase Alternating Line, is a color-encoding system used in broadcast television systems in large parts of the world. Other common analog television systems are SECAM and NTSC....
, NTSC to/from HDTV
High-definition television

High-definition television is a digital television broadcasting system with higher than traditional television systems . HDTV is digitally broadcast; the earliest implementations used analog broadcasting, but today digital television signals are used, requiring less Bandwidth due to digital video compression....
 formats, etc.), deinterlacing
Deinterlacing

Deinterlacing is the process of converting interlaced video, like common analog television signals, into a non-interlaced form....
, image noise reduction, adaptive video compression
Video compression

Video compression refers to reducing the quantity of data used to represent digital video images, and is a straightforward combination of and motion compensation....
, and image enhancement.

A more ubiquitous application for SIMD is found in video games: nearly every modern video game console
Video game console

A video game console is an game development that produces a video signal which can be used with a display device to display a video game. The term "video game console" is used to distinguish a machine designed for consumers to buy and use solely for playing video games from a personal computer, which has many other functions, or arcade machi...
 since 1998
History of video game consoles (sixth generation)

The sixth-generation era refers to the personal computer game and video games, video game consoles, and handheld game console available at the turn of the 21st century....
 has incorporated a SIMD processor somewhere in its architecture. The Sony PlayStation 2 was unusual in that its vector-float units could function as autonomous DSPs executing their own instruction streams, or as coprocessors driven by ordinary CPU instructions. 3D graphics applications tend to lend themselves well to SIMD processing as they rely heavily on operations with 4-dimensional vectors. Microsoft's
Microsoft

Microsoft Corporation is a multinational corporation computer technology corporation that develops, manufactures, licenses, and supports a wide range of computer software products for computing devices....
 Direct3D 9.0
DirectX

Microsoft DirectX is a collection of application programming interfaces for handling tasks related to multimedia, especially game programming and video, on Microsoft platforms....
 now chooses at runtime processor-specific implementations of its own math operations, including the use of SIMD-capable instructions.

One of the very recent processors to use vector processing is the Cell Processor developed by IBM
IBM

International Business Machines Corporation, abbreviated IBM and nicknamed "Big Blue" , is a multinational corporation computer technology and consulting corporation headquartered in Armonk, New York, New York, United States....
 in cooperation with Toshiba
Toshiba

is a multinational corporation list of conglomerates manufacturing company, headquartered in Tokyo, Japan. The company's main business is in Infrastructure, Consumer Products, and Electronic devices and components....
 and Sony
Sony

is a multinational corporation list of conglomerates corporation headquartered in Minato, Tokyo, Japan, and one of the world's largest media conglomerates with revenue exceeding US$99.1 billion ....
. It uses a number of SIMD processors (each with independent RAM
Ram

Ram, ram, or RAM as a non-acronymic wordAs a non-acronymic word Ram, ram, or RAM may refer to:...
 and controlled by a general purpose CPU) and is geared towards the huge datasets required by 3D and video processing applications.

Larger scale commercial SIMD processors are available from and . ClearSpeed's
ClearSpeed

ClearSpeed Technology, PLC is a corporation based in Bristol, England that sells Attached Processors , or Coprocessors, as they are also called for use as accelerators in High Performance Computing and high-performance embedded applications....
 CSX600 (2004) has 96 cores each with 2 double-precision floating point units while the CSX700 (2008) has 192. Stream Processors is headed by computer architect Bill Dally. Their Storm-1 processor (2007) contains 80 SIMD cores controlled by a MIPS CPU.

External links