FPS AP-120B
Encyclopedia
The FPS AP-120B was a 38-bit, pipeline-oriented array processor
Vector processor
A vector processor, or array processor, is a central processing unit that implements an instruction set containing instructions that operate on one-dimensional arrays of data called vectors. This is in contrast to a scalar processor, whose instructions operate on single data items...

 manufactured by Floating Point Systems
Floating Point Systems
Floating Point Systems Inc. was a Beaverton, Oregon vendor of minisupercomputers. The company was founded in 1970 by former Tektronix engineer Norm Winningstad....

. It was designed to be attached to a host computer such as a DEC PDP-11
PDP-11
The PDP-11 was a series of 16-bit minicomputers sold by Digital Equipment Corporation from 1970 into the 1990s, one of a succession of products in the PDP series. The PDP-11 replaced the PDP-8 in many real-time applications, although both product lines lived in parallel for more than 10 years...

 as a fast number-cruncher. Data transfer was accomplished using direct memory access
Direct memory access
Direct memory access is a feature of modern computers that allows certain hardware subsystems within the computer to access system memory independently of the central processing unit ....

.

Processor cycle time was 167 nanoseconds, giving a speed of 6 MHz. Since it could present two floating point results per cycle, one from the adder and the other from the multiplier, a capacity of 12 Megaflops
FLOPS
In computing, FLOPS is a measure of a computer's performance, especially in fields of scientific calculations that make heavy use of floating-point calculations, similar to the older, simpler, instructions per second...

 was claimed for the processor.

Architecture

The processor was designed around the concept of multiple parallel processing units operating in synchronization. A single 64-bit instruction word was divided into fields, each of which instructed a particular module under the control of the CPU. The modules were as follows:
  • 16-bit Arithmetic and Logic unit (ALU)
  • 38-bit Floating Point Adder (FADD) (two stages)
  • 38-bit Floating Point Multiplier (FMUL) (three stages)
  • Two Data Pad registers for receiving data from memory.


The processor had access to dual-interleaved core memory in which odd numbered addresses were stored in one physical bank, and even numbered addresses were stored in the other. This represented an attempt to take advantage of typical sequential fetching of memory words. Fetching sequentially from one physical bank would result in a latency of two instruction cycles before the data was loaded into the destination data pad. Interleaving allowed a sequential access to occur immediately after the previous one. Both accesses took two cycles to complete, but the overlap and dual destination pads maximized the use of the data channel.

The floating point arithmetic modules were both multi-stage processors which were driven by explicit instructions. In the two-stage adder an assembler instruction such as FADD DX,DY would load values from data pads DX and DY into stage one of the adder. A subsequent FADD instruction would be required to present the result at the adder's output. This second FADD could be a dummy with no arguments, or it could be the next calculation in a sequence. In this fashion a stream of FADD operations could be performed in a pipeline, with a new result in every instruction cycle though every addition requires two cycles.

Similarly the multiplier, a three-stage unit, required one FMUL DX,DY to begin a multiplication, followed by two more FMUL instructions to produce the result. Careful programming of the pipeline allowed the production of one result per cycle, with each calculation taking three cycles in itself.

For maximum efficiency all calculations were programmed using the assembler language supplied with the hardware. A high-level language resembling Fortran
Fortran
Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...

 was provided for coordinating tasks and controlling data transfers to and from the host computer.

Lookup tables

In order to support typical applications in signal processing, the hardware was delivered with a pre-calculated lookup table of sine
Sine
In mathematics, the sine function is a function of an angle. In a right triangle, sine gives the ratio of the length of the side opposite to an angle to the length of the hypotenuse.Sine is usually listed first amongst the trigonometric functions....

 and cosine values. Sines and cosines for angles from 0 to π/2 radians were stored in alternate addresses to take advantage of the interleaving described above. Values for all other angles could be calculated by using one or other of the values from the lookup table, negating if necessary, using well-known rules.

Typical programming style

This was unusual, being driven by the synchronous parallel processing architecture. The basic philosophy can be summarized as follows:
  • Lay out the shortest sequence of instructions for performing one instance of the desired calculation, allowing for two-cycle memory latency, and the driving of the floating-point modules with explicit FADD and FMUL instructions.
  • Inspect the sequence to determine the minimum number of instructions forming a loop which will perform the calculation repetitively. This requires attention to resource conflicts. For instance the data bus for moving results around can only move one data word per cycle. Likewise the ALU, used mostly for counting loops and memory addressing, can only be used for one purpose per cycle. This step is typically trial-and-error.
  • Conceptually "wrap" the full sequence of instructions around the loop, using FADD and FMUL instructions to drive calculations through the pipelines.
  • Before the loop begins, add parallel process initiations as required.


The final item was accomplished as follows: assume that the entire calculation requires 15 cycles, and the minimum loop size is 5 cycles. The first 5 instruction words begin iteration 1 of the calculation. The second 5 words contain both iteration 1, and the beginning of iteration 2 in parallel. This usually would be a copy of the operations beginning iteration 1. The next 5 words contain the final steps of iteration 1, the middle of iteration 2, and the beginning of iteration 3. These five words form the body of the loop which repeats until the desired number of data points have been processed.

Application

As an attached processor, the AP-120B was typically used as a low cost/cost-effective adjunct to systems like diagnostic medical imaging systems, and more.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK