All Topics  
Vector processor

 
Vector Processor

   Email Print
   Bookmark   Link






 

Vector processor



 
 
A vector processor, or array processor, is a CPU
Central processing unit

A central processing unit is an electronic circuit that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage....
 design where the instruction set includes operations that can perform mathematical operations on multiple data elements simultaneously. This is in contrast to a scalar processor
Scalar processor

Scalar processors represent the simplest class of computer processors. A scalar processor processes one data item at a time . In a vector processor, by contrast, a single instruction operates simultaneously on multiple data items....
 which handles one element at a time using multiple instructions.






Discussion
Ask a question about 'Vector processor'
Start a new discussion about 'Vector processor'
Answer questions from other users
Full Discussion Forum



Recent Posts









Encyclopedia


Processor Board Cray 1 Hg
A vector processor, or array processor, is a CPU
Central processing unit

A central processing unit is an electronic circuit that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage....
 design where the instruction set includes operations that can perform mathematical operations on multiple data elements simultaneously. This is in contrast to a scalar processor
Scalar processor

Scalar processors represent the simplest class of computer processors. A scalar processor processes one data item at a time . In a vector processor, by contrast, a single instruction operates simultaneously on multiple data items....
 which handles one element at a time using multiple instructions. The vast majority of CPUs are scalar (or close to it). Vector processors were common in the scientific computing area, where they formed the basis of most supercomputer
Supercomputer

A supercomputer is a computer that is at the frontline of current processing capacity, particularly speed of calculation. Supercomputers introduced in the 1960s were designed primarily by Seymour Cray at Control Data Corporation , and led the market into the 1970s until Cray left to form his own company, Cray Research....
s through the 1980s and into the 1990s, but general increases in performance and processor design saw the near disappearance of the vector processor as a general-purpose CPU.

Today most commodity CPU designs include single instructions for some vector processing on multiple (vectorised) data sets, typically known as SIMD
SIMD

In computing, SIMD is a technique employed to achieve data level parallelism....
 (Single Instruction, Multiple Data), common examples include SSE
Streaming SIMD Extensions

In computing, Streaming SIMD Extensions is a SIMD instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series processors as a reply to AMD's 3DNow! ....
 and AltiVec
AltiVec

AltiVec is a floating point and integer SIMD instruction set designed and owned by Apple Inc., International Business Machines and Freescale Semiconductor, formerly the Semiconductor Products Sector of Motorola, , and implemented on versions of the PowerPC including Motorola's PowerPC G4, IBM's PowerPC 970 and POWER6 processors, and P.A....
. Modern video game consoles and consumer computer-graphics hardware rely heavily on vector processing in their architecture. In 2000, IBM
IBM

International Business Machines Corporation, abbreviated IBM and nicknamed "Big Blue" , is a multinational corporation computer technology and consulting corporation headquartered in Armonk, New York, New York, United States....
, Toshiba
Toshiba

is a multinational corporation list of conglomerates manufacturing company, headquartered in Tokyo, Japan. The company's main business is in Infrastructure, Consumer Products, and Electronic devices and components....
 and Sony
Sony

is a multinational corporation list of conglomerates corporation headquartered in Minato, Tokyo, Japan, and one of the world's largest media conglomerates with revenue exceeding US$99.1 billion ....
 collaborated to create the Cell processor, consisting of one scalar processor and eight vector processors, which found use in the Sony PlayStation 3
PlayStation 3

The PlayStation 3 is the third home video game console produced by Sony Computer Entertainment, and the successor to the PlayStation 2 as part of the PlayStation ....
 among other applications.

Other CPU designs may include some multiple instructions for vector processing on multiple (vectorised) data sets, typically known as MIMD
MIMD

In computing, MIMD is a technique employed to achieve parallelism. Machines using MIMD have a number of processors that function Asynchrony and independently....
 (Multiple Instruction, Multiple Data), such designs are very special and delicate breeds for dedicated purpose and these are not commonly marketed for general purpose applications.

The more advanced approach is not the active multiplicity of instructions in parallel but the active multiplicity in sequence, which led to the pipelining concept
Pipeline (software)

In software engineering, a pipeline consists of a chain of processing elements , arranged so that the output of each element is the input of the next....
.

History


Vector processing was first worked on in the early 1960s at Westinghouse in their Solomon project. Solomon's goal was to dramatically increase math performance by using a large number of simple math co-processors
Coprocessor

A coprocessor is a computer processor used to supplement the functions of the primary processor . Operations performed by the coprocessor may be floating point arithmetic, graphics, signal processing, string processing, Savitsky-Golay derivation, or encryption....
 (or ALU
Arithmetic logic unit

In computing, an arithmetic logic unit is a digital circuit that performs arithmetic and logicaloperations. The ALU is a fundamental building block of the central processing unit of a computer, and even the simplest microprocessors contain one for purposes such as maintaining timers....
s) under the control of a single master CPU
Central processing unit

A central processing unit is an electronic circuit that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage....
. The CPU fed a single common instruction to all of the ALUs, one per "cycle", but with a different data point for each one to work on. This allowed the Solomon machine to apply a single algorithm
Algorithm

In mathematics, computing, linguistics and related subjects, an algorithm is a sequence of finite instructions, often used for calculation and data processing....
 to a large data set
Data set

A data set is a collection of data, usually presented in tabular form. Each column represents a particular variable. Each row corresponds to a given member of the data set in question....
, fed in the form of an array. In 1962 Westinghouse cancelled the project, but the effort was re-started at the University of Illinois
University of Illinois at Urbana-Champaign

The University of Illinois at Urbana-Champaign is a public university research university in the state of Illinois, United States. It is the oldest and largest campus in the University of Illinois system....
 as the ILLIAC IV
ILLIAC IV

The ILLIAC IV was one of the most infamous supercomputers ever. Last in a series of research machines, the ILLIAC from the University of Illinois at Urbana-Champaign, the ILLIAC IV design featured fairly high parallel computing with up to 256 processors, used to allow the machine to work on large data sets in what would later be known as vect...
. Their version of the design originally called for a 1 GFLOPS
FLOPS

In computing, FLOPS is an acronym meaning FLoating point Operations Per Second. The FLOPS is a measure of a computer's computer performance, especially in fields of scientific calculations that make heavy use of floating point calculations, similar to instructions per second....
 machine with 256 ALUs, but when it was finally delivered in 1972 it had only 64 ALUs and could reach only 100 to 150 MFLOPS. Nevertheless it showed that the basic concept was sound, and when used on data-intensive applications, such as computational fluid dynamics
Computational fluid dynamics

Computational fluid dynamics is one of the branches of fluid mechanics that uses numerical methods and algorithms to solve and analyze problems that involve fluid flows....
, the "failed" ILLIAC was the fastest machine in the world. It should be noted that the ILLIAC approach of using separate ALUs for each data element is not common to later designs, and is often referred to under a separate category, massively parallel
Massively parallel

Massively parallel is a description which appears in computer science, life science, medical diagnostics, and other fields.A massively parallel computer is a distributed memory computer system which consists of many individual nodes, each of which is essentially an independent computer in itself, and in turn consists of at least one...
 computing.

The first successful implementation of vector processing appears to be the CDC STAR-100
CDC STAR-100

The STAR-100 was a supercomputer from Control Data Corporation , one of the first machines to use a vector processor for improved math performance....
 and the Texas Instruments
Texas Instruments

Texas Instruments , better known in the electronics industry as TI, is an United States company based in Dallas, Texas, Texas, United States, renowned for developing and commercializing semiconductor and computer technology....
 Advanced Scientific Computer
Advanced Scientific Computer

The Advanced Scientific Computer, or ASC, was a supercomputer architecture designed by Texas Instruments between 1966 and 1973. Key to the ASC's design was a single high-speed shared memory, which was accessed by a number of processors and channel controllers, in a fashion similar to Seymour Cray's groundbreaking CDC 6600....
 (ASC). The basic ASC (i.e., "one pipe") ALU used a pipeline architecture which supported both scalar and vector computations, with peak performance reaching approximately 20 MFLOPS, readily achieved when processing long vectors. Expanded ALU configurations supported "two pipes" or "four pipes" with a corresponding 2X or 4X performance gain. Memory bandwidth was sufficient to support these expanded modes. The STAR was otherwise slower than CDC
Control Data Corporation

Control Data Corporation was one of the pioneering supercomputer firms. For most of the 1960s, it built the fastest computers in the world by far, only losing that crown in the 1970s to what was effectively a spinoff, after Seymour Cray left the company to found Cray Research, Inc....
's own supercomputers like the CDC 7600
CDC 7600

The CDC 7600 was the Seymour Cray-designed successor to the CDC 6600, extending Control Data's dominance of the supercomputer field into the 1970s....
, but at data related tasks they could keep up while being much smaller and less expensive. However the machine also took considerable time decoding the vector instructions and getting ready to run the process, so it required very specific data sets to work on before it actually sped anything up.

vector instructions were applied between registers, which is much faster than talking to main memory. The Cray design used pipeline parallelism to implement vector instructions rather than multiple ALUs. In addition the design had completely separate pipelines for different instructions, for example, addition/subtraction was implemented in different hardware than multiplication. This allowed a batch of vector instructions themselves to be pipelined, a technique they called vector chaining. The Cray-1 normally had a performance of about 80 MFLOPS, but with up to three chains running it could peak at 240 MFLOPS – a respectable number even as of 2002.

Other examples followed. CDC
Control Data Corporation

Control Data Corporation was one of the pioneering supercomputer firms. For most of the 1960s, it built the fastest computers in the world by far, only losing that crown in the 1970s to what was effectively a spinoff, after Seymour Cray left the company to found Cray Research, Inc....
 tried to re-enter the high-end market again with its ETA-10 machine, but it sold poorly and they took that as an opportunity to leave the supercomputing field entirely. Various Japanese companies (Fujitsu
Fujitsu

is a Japanese company specializing in semiconductors, air conditioners, computers , telecommunications, and Service , and is headquartered in Minato, Tokyo, Tokyo....
, Hitachi
Hitachi, Ltd.

is a multinational corporation specializing in high-technology and services headquartered in Marunouchi Itchome, Chiyoda, Tokyo, Tokyo, Japan. The company is the parent of the Hitachi Group as part of the larger DKB Group companies....
 and NEC) introduced register-based vector machines similar to the Cray-1, typically being slightly faster and much smaller. Oregon
Oregon

Oregon is a U.S. state in the Pacific Northwest region of the United States. The area was inhabited by many indigenous tribes before the arrival of traders, explorers and settlers....
-based Floating Point Systems
Floating Point Systems

Floating Point Systems Inc. was a Beaverton, Oregon vendor of minisupercomputers. The company was founded in 1970 by former Tektronix engineer Norm Winningstad....
 (FPS) built add-on array processors for minicomputer
Minicomputer

A minicomputer is a class of multi-user computers that lies in the middle range of the computing spectrum, in between the largest multi-user systems and the smallest single-user systems ....
s, later building their own minisupercomputer
Minisupercomputer

Minisupercomputers constituted a class of computers that emerged in the mid-1980s. As scientific computing using vector processors became more popular, the need for lower-cost systems that might be used at the departmental level instead of the corporate level created an opportunity for new computer vendors to enter the market....
s. However Cray continued to be the performance leader, continually beating the competition with a series of machines that led to the Cray-2
Cray-2

The Cray-2 was a vector processor supercomputer made by Cray starting in 1985. It was the fastest machine in the world when it was released, replacing Cray's own Cray X-MP in that spot....
, Cray X-MP
Cray X-MP

The Cray X-MP was a supercomputer designed, built and sold by Cray. The company's first parallel processing vector processor machine and a fourth generation super, it was the 1982 successor to the 1976 Cray-1, and the world's fastest computer 1983–1985....
 and Cray Y-MP
Cray Y-MP

The Cray Y-MP was a supercomputer sold by Cray from 1988, and the successor to the company's Cray X-MP. The Y-MP retained software compatibility with the X-MP, but extended the address registers from 24 to 32 bits....
. Since then the supercomputer market has focused much more on massively parallel
Massively parallel

Massively parallel is a description which appears in computer science, life science, medical diagnostics, and other fields.A massively parallel computer is a distributed memory computer system which consists of many individual nodes, each of which is essentially an independent computer in itself, and in turn consists of at least one...
 processing rather than better implementations of vector processors. However, recognising the benefits of vector processing IBM developed Virtual Vector Architecture
IBM ViVA

ViVA is a technology from IBM for coupling together multiple Scalar floating point units to act as a single vector processor. Certain computing tasks are more efficiently handled through vector computations where an instruction can be applied to multiple elements simultaneously, rather than the scalar approach where one instruction is appli...
 for use in supercomputers coupling several scalar processors to act as a vector processor.

Today the average computer at home crunches as much data watching a short video as did all of the supercomputers in the 1970s. Vector processor elements have since been added to almost all modern CPU designs, although they are typically referred to as SIMD
SIMD

In computing, SIMD is a technique employed to achieve data level parallelism....
. In these implementations the vector processor runs beside the main scalar
Scalar (computing)

In computing, a scalar is a variable or field that can hold only one value at a time; as opposed to composite variables like array, List , object composition, etc....
 CPU, and is fed data from programs that know it is there.

Description


In general terms, CPUs are able to manipulate one or two pieces of data at a time. For instance, many CPUs have an instruction that essentially says "add A to B and put the result in C," while others such as the MOS 6502 require two or three instructions to perform these types of operations.

The data for A, B and C could be—in theory at least—encoded directly into the instruction. However things are rarely that simple. In general the data is rarely sent in raw form, and is instead "pointed to" by passing in an address to a memory location that holds the data. Decoding this address and getting the data out of the memory takes some time. As CPU speeds have increased, this memory latency has historically become a large impediment to performance.

In order to reduce the amount of time this takes, most modern CPUs use a technique known as instruction pipelining in which the instructions pass through several sub-units in turn. The first sub-unit reads the address and decodes it, the next "fetches" the values at those addresses, and the next does the math itself. With pipelining the "trick" is to start decoding the next instruction even before the first has left the CPU, in the fashion of an assembly line
Assembly line

An assembly line is a manufacturing process in which parts are added to a product in a sequential manner using optimally planned logistics to create a finished product much faster than with handcrafting-type methods....
, so the address decoder is constantly in use. Any particular instruction takes the same amount of time to complete, a time known as the latency
Latency (engineering)

Latency is a time delay between the moment something is initiated, and the moment one of its effects begins or becomes detectable. The word derives from the fact that during the period of latency the effects of an action are latent, meaning "potential" or "not yet observed"....
, but the CPU can process an entire batch of operations much faster than if it did so one at a time.

Vector processors take this concept one step further. Instead of pipelining just the instructions, they also pipeline the data itself. They are fed instructions that say not just to add A to B, but to add all of the numbers "from here to here" to all of the numbers "from there to there". Instead of constantly having to decode instructions and then fetch the data needed to complete them, it reads a single instruction from memory, and "knows" that the next address will be one larger than the last. This allows for significant savings in decoding time.

To illustrate what a difference this can make, consider the simple task of adding two groups of 10 numbers together. In a normal programming language you would write a "loop" that picked up each of the pairs of numbers in turn, and then added them. To the CPU, this would look something like this:

execute this loop 10 times read the next instruction and decode it fetch this number fetch that number add them put the result here end loop

But to a vector processor, this task looks considerably different:

read instruction and decode it fetch these 10 numbers fetch those 10 numbers add them put the results here

There are several savings inherent in this approach. For one, only two address translations are needed. Depending on the architecture, this can represent a significant savings by itself. Another savings is fetching and decoding the instruction itself, which only has to be done one time instead of ten. The code itself is also smaller, which can lead to more efficient memory use.

But more than that, the vector processor typically has some form of superscalar
Superscalar

A superscalar Central processing unit architecture implements a form of parallel computer called instruction level parallelism within a single processor....
 implementation, meaning there is not one part of the CPU adding up those 10 numbers, but perhaps two or four of them. Since the output of a vector command does not rely on the input from any other, those two (for instance) parts can each add five of the numbers, thereby completing the whole operation in half the time.

As mentioned earlier, the Cray implementations took this a step further, allowing several different types of operations to be carried out at the same time. Consider code that adds two numbers and then multiplies by a third; in the Cray these would all be fetched at once, and both added and multiplied in a single operation. Using the pseudocode above, the Cray essentially did:

read instruction and decode it fetch these 10 numbers fetch those 10 numbers fetch another 10 numbers add and multiply them put the results here

The math operations thus completed far faster overall, the limiting factor being the time required to fetch the data from memory.

Not all problems can be attacked with this sort of solution. Adding these sorts of instructions necessarily adds complexity to the core CPU. That complexity typically makes other instructions run slower—i.e., whenever it is not adding up many numbers in a row. The more complex instructions also add to the complexity of the decoders, which might slow down the decoding of the more common instructions such as normal adding.

In fact vector processors work best only when there are large amounts of data to be worked on. For this reason, these sorts of CPUs were found primarily in supercomputer
Supercomputer

A supercomputer is a computer that is at the frontline of current processing capacity, particularly speed of calculation. Supercomputers introduced in the 1960s were designed primarily by Seymour Cray at Control Data Corporation , and led the market into the 1970s until Cray left to form his own company, Cray Research....
s, as the supercomputers themselves were generally found in places such as weather prediction centres and physics labs, where huge amounts of data are "crunched".

See also

  • Convex Computer
    Convex Computer

    Convex Computer was a company that produced a number of Vector processor minisupercomputers, supercomputers for small-to-medium-sized businesses....
  • Stream processing
    Stream processing

    Stream processing is a computer programming paradigm, related to SIMD, that allows some applications to more easily exploit a limited form of parallel computing....
  • SIMD
    SIMD

    In computing, SIMD is a technique employed to achieve data level parallelism....
  • Vectorization

External links

  • (from 1955 to 1993)