All Topics  
Cray-1

 
Cray 1

   Email Print
   Bookmark   Link






 

Cray-1



 
 
The Cray-1 was a supercomputer
Supercomputer

A supercomputer is a computer that is at the frontline of current processing capacity, particularly speed of calculation. Supercomputers introduced in the 1960s were designed primarily by Seymour Cray at Control Data Corporation , and led the market into the 1970s until Cray left to form his own company, Cray Research....
 designed by a team including Seymour Cray
Seymour Cray

Seymour Roger Cray was a United States electrical engineer and supercomputer architect who designed a series of computers that were the fastest in the world for decades, and founded the company Cray Research which would build many of these machines....
 for Cray Research. The first Cray-1 system was installed at Los Alamos National Laboratory
Los Alamos National Laboratory

Los Alamos National Laboratory is a United States Department of Energy United States Department of Energy National Labs, managed and operated by Los Alamos National Security, LLC , located in Los Alamos, New Mexico....
 in 1976, and it went on to become one of the best known and most successful supercomputers in history.

History
In the years 1968 to 1972 Cray was working at Control Data on a new machine known as the CDC 8600
CDC 8600

The CDC 8600 was the last of Seymour Cray's supercomputer designs while working for the Control Data Corporation. The "natural successor" to the CDC 6600 and CDC 7600, the 8600 was intended to be about 10 times as fast as the 7600, already the fastest computer on the market....
, the logical successor to his earlier CDC 6600
CDC 6600

The CDC 6600 was a mainframe computer from Control Data Corporation, first delivered in 1964. It is generally considered to be the first successful supercomputer, outperforming its fastest predecessor, IBM 7030 Stretch, by about three times....
 and CDC 7600
CDC 7600

The CDC 7600 was the Seymour Cray-designed successor to the CDC 6600, extending Control Data's dominance of the supercomputer field into the 1970s....
 designs.






Discussion
Ask a question about 'Cray-1'
Start a new discussion about 'Cray-1'
Answer questions from other users
Full Discussion Forum



Encyclopedia


Cray 1 P1010221
The Cray-1 was a supercomputer
Supercomputer

A supercomputer is a computer that is at the frontline of current processing capacity, particularly speed of calculation. Supercomputers introduced in the 1960s were designed primarily by Seymour Cray at Control Data Corporation , and led the market into the 1970s until Cray left to form his own company, Cray Research....
 designed by a team including Seymour Cray
Seymour Cray

Seymour Roger Cray was a United States electrical engineer and supercomputer architect who designed a series of computers that were the fastest in the world for decades, and founded the company Cray Research which would build many of these machines....
 for Cray Research. The first Cray-1 system was installed at Los Alamos National Laboratory
Los Alamos National Laboratory

Los Alamos National Laboratory is a United States Department of Energy United States Department of Energy National Labs, managed and operated by Los Alamos National Security, LLC , located in Los Alamos, New Mexico....
 in 1976, and it went on to become one of the best known and most successful supercomputers in history.

History


In the years 1968 to 1972 Cray was working at Control Data on a new machine known as the CDC 8600
CDC 8600

The CDC 8600 was the last of Seymour Cray's supercomputer designs while working for the Control Data Corporation. The "natural successor" to the CDC 6600 and CDC 7600, the 8600 was intended to be about 10 times as fast as the 7600, already the fastest computer on the market....
, the logical successor to his earlier CDC 6600
CDC 6600

The CDC 6600 was a mainframe computer from Control Data Corporation, first delivered in 1964. It is generally considered to be the first successful supercomputer, outperforming its fastest predecessor, IBM 7030 Stretch, by about three times....
 and CDC 7600
CDC 7600

The CDC 7600 was the Seymour Cray-designed successor to the CDC 6600, extending Control Data's dominance of the supercomputer field into the 1970s....
 designs. The 8600 was essentially made up of four 7600s in a box, with an additional special mode that allowed them to operate lock-step in a SIMD
SIMD

In computing, SIMD is a technique employed to achieve data level parallelism....
 fashion.

Jim Thornton, formerly Cray's engineering partner on earlier designs, had started a more radical project known as the CDC STAR-100
CDC STAR-100

The STAR-100 was a supercomputer from Control Data Corporation , one of the first machines to use a vector processor for improved math performance....
. Unlike the 8600's brute-force approach to performance, the STAR took an entirely different route. In fact the main processor of the STAR had less performance than the 7600, but added additional hardware and instructions to speed up particularly common supercomputer tasks.

In 1972 the 8600 had reached a dead end. The machine was so incredibly complex that it was impossible to get one working properly; even a single faulty component would render the machine non-operational. Cray went to William Norris
William Norris

William Charles Norris was the pioneering CEO of Control Data Corporation, at one time one of the most powerful and respected computer companies in the world....
, Control Data's CEO, saying that a redesign from scratch was needed. At the time the company was in serious financial trouble, and with the STAR in the pipeline as well, Norris simply couldn't invest the money.

Cray left. Starting a new company HQ only yards from the CDC lab, both in the back yard of land he purchased in Chippewa Falls, WI, he and a group of former CDC employees started looking for ideas. At first the concept of building another supercomputer seemed impossible; if CDC couldn't afford it, how could a tiny company with no funding? But after the CTO traveled to Wall Street
Wall Street

Wall Street is a street in lower Manhattan, New York City, New York, United States. It runs east from Broadway to South Street on the East River, through the historical center of the Financial District, Manhattan....
 and found a lineup of investors more than willing to back Cray, all that was needed was a design.

In 1975 the 80 MHz Cray-1 was announced. Excitement was so high that a bidding war for the first machine broke out between Lawrence Livermore National Laboratory
Lawrence Livermore National Laboratory

The Lawrence Livermore National Laboratory in Livermore, California is a scientific research laboratory founded by the University of California in 1952....
 and Los Alamos National Laboratory
Los Alamos National Laboratory

Los Alamos National Laboratory is a United States Department of Energy United States Department of Energy National Labs, managed and operated by Los Alamos National Security, LLC , located in Los Alamos, New Mexico....
, the latter eventually winning and receiving serial number 001 in 1976 for a six-month trial. The National Center for Atmospheric Research
National Center for Atmospheric Research

The National Center for Atmospheric Research is a non-governmental United States-based institute whose stated mission is "exploring and understanding our atmosphere and its interactions with the Sun, the oceans, the biosphere, and human society."...
 (NCAR) was Cray Research's first official customer in July 1977, paying US$8.86 million ($7.9 million plus $1 million for the disks). The NCAR machine was decommissioned in January 1979. The company expected to sell perhaps a dozen of the machines, but over eighty Cray-1s of all types were sold, priced from $5M to $8M. The machine made Cray a celebrity and the company a success, lasting until the supercomputer crash in the early 1990s.

The Cray-1 was succeeded in 1982 by the 800 MFLOPS
FLOPS

In computing, FLOPS is an acronym meaning FLoating point Operations Per Second. The FLOPS is a measure of a computer's computer performance, especially in fields of scientific calculations that make heavy use of floating point calculations, similar to instructions per second....
 Cray X-MP
Cray X-MP

The Cray X-MP was a supercomputer designed, built and sold by Cray. The company's first parallel processing vector processor machine and a fourth generation super, it was the 1982 successor to the 1976 Cray-1, and the world's fastest computer 1983–1985....
, the first Cray multi-processing computer. In 1985 the very advanced Cray-2
Cray-2

The Cray-2 was a vector processor supercomputer made by Cray starting in 1985. It was the fastest machine in the world when it was released, replacing Cray's own Cray X-MP in that spot....
, capable of 1.9 GFLOPS peak performance, succeeded the two first models but met a somewhat limited commercial success because of certain problems at producing sustained performance in real-world applications. A more conservatively designed evolutionary successor of the Cray-1 and X-MP models was therefore made, by the name Cray Y-MP
Cray Y-MP

The Cray Y-MP was a supercomputer sold by Cray from 1988, and the successor to the company's Cray X-MP. The Y-MP retained software compatibility with the X-MP, but extended the address registers from 24 to 32 bits....
, and launched in 1988.

Background


Typical scientific workloads consist of reading in large data sets, transforming them in some way, and then writing them back out again. Normally the transformations being applied are identical across all of the data points in the set. For instance, the program might add 5 to every number in a set of a million numbers. In traditional computers the program would loop over all million numbers, adding five, thereby generating a million instructions saying a = add b, c. Internally the computer solves this instruction in several steps. First it reads the instruction from memory and decodes it, then it collects any additional information it needs, in this case the numbers b and c, and then finally runs the operation and stores the results.

Vector machines


In the STAR, new instructions essentially wrote the loops for the user. The user told the machine where in memory the "big list of numbers" was stored, and then fed in a single instruction a(1..1000000) = addv b(1..1000000), c(1..1000000). At first glance it appears the savings are limited; in this case the machine fetches and decodes only a single instruction instead of 1000000, thereby saving 1000000 fetches and decodes, perhaps ¼ of the overall time.

But the real savings are not so obvious. Internally the computer's CPU
Central processing unit

A central processing unit is an electronic circuit that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage....
 is built up from a number of parts dedicated to a single task, for instance, adding a number or fetching from memory. Normally as the instruction flows through the machine only one part is active at any time, meaning the whole process has to complete before a value is written out. But the addition of an instruction pipeline
Instruction pipeline

File:5 Stage Pipeline.svgAn instruction pipeline is a technique used in the design of computers and other digital electronic devices to increase their instruction throughput ....
 changes this, in such machines the CPU will "look ahead" and start fetching the next instructions while the first is still being worked on. In this assembly line
Assembly line

An assembly line is a manufacturing process in which parts are added to a product in a sequential manner using optimally planned logistics to create a finished product much faster than with handcrafting-type methods....
 fashion any one instruction still takes as long to process, but as soon as one completes the next is already almost done.

Vector processor
Vector processor

A vector processor, or array processor, is a Central processing unit design where the instruction set includes operations that can perform mathematical operations on multiple data elements simultaneously....
s use this technique with one additional "trick". Because the data layout is "known", basically a set of numbers arranged linearly in memory, the pipelines can be tuned to improve the performance of fetches. On the receipt of a vector instruction, special hardware sets up the memory access for the arrays and stuffs the data into the processor as fast as possible.

CDC's approach in the STAR used what is today known as a memory-memory architecture. This referred to the way the machine gathered data, setting up its pipeline to read and write to memory directly. This allowed it to use vectors of any length or width (or stride as it is known), making it highly flexible. Unfortunately the pipeline had to be very "deep" in order to allow it to have enough instructions in flight to make up for the slow memory, and that meant the machine had a very high cost when switching from processing vectors to normal data. Additionally the slow "normal" performance of the machine meant that after the switch had taken place and the machine was running typical logic instructions, the performance was quite poor. The result was rather disappointing real-world performance, something that might have been obvious had the designers considered Amdahl's Law
Amdahl's law

Amdahl's law, also known as Amdahl's argument, is named after Computer architecture Gene Amdahl, and is used to find the maximum expected improvement to an overall system when only part of the system is improved....
.

Cray's approach


Cray was able to look at the failure of the STAR and learn from it. He decided that in addition to fast vector processing, his design would also require excellent all-round logic performance as well. That way when the machine switched modes, it would still be the fastest out there. Additionally they noticed that the workloads could be dramatically improved in most cases through the use of registers
Processor register

In computer architecture, a processor register is a small amount of Computer storage available on the CPU whose contents can be accessed more quickly than storage available elsewhere....
.

Just as earlier machines had ignored the fact that most operations were being applied to many data points, the STAR ignored the fact that those same data points would be repeatedly operated on. Whereas the STAR would read and process the same memory five times to apply five vector operations on a set of data, it would be much faster to read them in once and apply all five operations while sitting in CPU's registers. Registers are extremely expensive in terms of circuitry so only so many could be provided, meaning that Cray's design would have less flexibility in terms of vector sizes. Instead of reading any sized vector several times as in the STAR, in the Cray design the machine would have to read small parts of the data at a time, but then run several operations at once. Given typical workloads, Cray felt that the small loss due to having to access memory in steps was a cost well worth paying.

Since the typical vector operation would involve loading a small set of data into the vector registers and then running several operations on it, the vector system of the new design had its own separate pipeline. For instance, the multiplication and addition units were implemented as separate hardware, so the results of one could be internally pipelined into the next, the instruction decode having already been handled in the machine's main pipeline. Cray referred to this concept as chaining, as it allowed programmers to "chain together" several instructions and extract higher performance.

Description


The new machine was the first Cray design to use integrated circuit
Integrated circuit

In electronics, an integrated circuit is a miniaturized electronic circuit that has been manufactured in the surface of a thin Wafer of semiconductor material....
s (ICs). Although ICs had been available since the 1960s, it was only in the early 1970s that they reached the performance necessary for high-speed applications. The Cray-1 used only 4 different IC types, an ECL dual NAND gate (4 input + 5 input, each with differential output), another slower MECL 4/5 NAND gate used for address fanout, a 16x1 high speed (6 ns) static RAM used for registers, and a 1k x 1 static 50 ns RAM used for main memory. In all, the Cray-1 contained about 200,000 gates, roughly the same as the Intel 386 of the 1980s.

ICs were mounted on large five-layer printed circuit board
Printed circuit board

A printed circuit board, or PCB, is used to mechanically support and electrically connect electronic components using Conductor pathways, or signal traces, industrial etchinged from copper sheets laminated onto a non-conductive substrate....
s, with up to 144 ICs per board. Boards were then mounted back to back for cooling (see below) and placed in twenty-four 28-inch high racks containing 72 double-boards. The typical module (distinct processing unit) required one or two boards. In all the machine contained 1,662 modules in 113 varieties.

One problem discovered during initial design was that the operating speed of the machine was close enough to the signal times on the boards -a deliberate design feature- that standing waves could be set up in some of the electrical circuits. This meant that the signal might be arriving at its destination IC at a "low point" in the standing wave, making it difficult to detect. This problem was solved by adding slight delays into the signal path, either by placing foil beside the traces to add a small amount of capacitance, or alternately adding additional ICs at the signal's high points. Some estimates state that about 40% of the gates in the machine were there simply to add delay and clean up the signal.

As always, Cray spent a considerable time on the mechanical and electrical design of the system, improving performance through shortened cycle times. Modules were wired to each other with high-power circuits to reduce the effects of noise, allowing the receivers to "settle" faster. Each cable between the modules was made of twisted-pair, and cut to very specific lengths in order to avoid electrical reflections. Every amplifier was balanced, so if one wire was put to high power another was lowered, thereby making the demand on the power supply constant and avoid switching noise.

All of this high-power circuitry generates considerable heat, and as always Cray's designers spent as much effort on the refrigeration system as the rest of the mechanical design. In this case each circuit board was paired with a second, placed back to back with a sheet of copper between them. The copper sheet conducted heat to the edges of the cage, where liquid freon running in pipes drew it away to the cooling unit below the machine. The first Cray-1 was delayed six months due to problems in the cooling system; lubricant that is normally mixed with the freon to keep the compressor running would leak through the seals and eventually coat the boards with oil until they shorted out. New welding techniques had to be used to properly seal the tubing.

In order to wring every possible ounce of speed out of the machine, the entire chassis was bent into a large C-shape. Speed-dependent portions of the system were placed on the "inside edge" of the chassis where the wire-lengths were shorter. This allowed the cycle time to be decreased to 12.5 ns (80 MHz), not as fast as the 8 ns 8600 he had given up on, but fast enough to beat his earlier CDC 7600
CDC 7600

The CDC 7600 was the Seymour Cray-designed successor to the CDC 6600, extending Control Data's dominance of the supercomputer field into the 1970s....
 and the STAR. NCAR estimated that the overall throughput on the system was 4.5 times the CDC 7600.

The Cray-1 was built as a 64-bit system, a departure from the 7600/6600 which were 60-bit machines (a change also planned for the 8600). Addressing was 24-bit, for a maximum of 1 megaword (8 MB) of main memory. Memory was spread across 16 banks, each with a 50 ns cycle time, allowing up to four words to be read per cycle.

The main register set consisted of eight 64-bit scalar (S) registers and eight 24-bit address (A) registers. These were backed by a set of sixty-four registers each for S and A temporary storage known as T and B respectively, which could not be seen by the functional units. The vector system added another eight 64-word by 64-bit vector (V) registers, as well as a vector length (VL) and vector mask (VM). Finally the system also included a 64-bit clock and four 64-bit instruction buffers that held sixty-four 16-bit instructions each. The hardware was set up to allow the vector registers to be fed at one word per cycle, while the address and scalar registers required two. In contrast, the entire sixteen-word instruction buffer could be filled in four cycles.

The system used twelve functional units, but had limited parallelism. It could fetch one instruction per clock cycle into the units, but operate on them in parallel and would retire two. Its theoretical performance was thus 160 MIPS
Instructions per second

Instructions per second is a measure of a computer's processor speed. Many reported IPS values have represented "peak" execution rates on artificial instruction sequences with few branches, whereas realistic workloads consist of a mix of instructions and applications, some of which take longer to execute than others....
 (80 MHz × 2 instructions), although there were a few limitations that made floating point
Floating point

In computing, floating point describes a system for numerical representation in which a String of digits represents a rational number.The term floating point refers to the fact that the radix point can "float": that is, it can be placed anywhere relative to the Significant figures of the number....
 performance generally about 136 megaflops
FLOPS

In computing, FLOPS is an acronym meaning FLoating point Operations Per Second. The FLOPS is a measure of a computer's computer performance, especially in fields of scientific calculations that make heavy use of floating point calculations, similar to instructions per second....
. However, by using vector instructions carefully and building useful chains, the system could peak at 250 megaflops.

Since the machine was designed to operate on large data sets, the design also dedicated considerable circuitry to I/O
Input/output

In computing, input/output, or I/O, refers to the communication between an information processing system , and the outside world ? possibly a human, or another information processing system....
. Earlier Cray designs at CDC had included separate computers dedicated to this task, but this was no longer needed. Instead the Cray-1 included four 6-channel controllers, each of which was given access to main memory once every four cycles. The channels were 16-bits wide, and included 3 control bits and four for error correction, so the maximum transfer speed was 1 word per 100 ns, or 500K words per second for the entire machine.

The initial model, the Cray-1A, weighed 5.5 ton
Ton

Units of massThere are several similar units of mass or volume called the ton:Others*The long ton is used for petroleum products such as aviation fuel....
s including the freon
Freon

Freon is DuPont's trade name for its odorless, colorless, nonflammable, and noncorrosive chlorofluorocarbon and hydrochlorofluorocarbon refrigerants, which are used in air conditioning, refrigeration and some automatic fire-fighting systems....
 refrigeration system. Configured with 1 million words
Word (computer science)

In computing, "word" is a term for the natural unit of data used by a particular computer design. A word is simply a fixed-sized group of bits that are handled together by the machine....
 of RAM, the machine and its power supplies consumed about 115 kW of power; cooling and storage likely more than doubled this figure. A Data General
Data General

Data General was one of the first minicomputer firms from the late 1960s. Three of the four founders were former employees of Digital Equipment Corporation....
 SuperNova S/200
Data General Nova

The Data General Nova was a popular 16-bit minicomputer built by the United States company Data General starting in 1969. The Nova was packaged into a single rack mount case and had enough power to do most simple computing tasks....
 was generally used as a "front end" to feed control instructions into the machine, later replaced by the Eclipse
Data General Eclipse

The Data General Eclipse line of computer by Data General were 16-bit minicomputer released in early 1974 and sold until 1988. The Eclipse was based on many of the same concepts as the Data General Nova, but included support for virtual memory and multitasking more suitable to the small office than the lab....
.

The later Cray-1S had a slightly faster clockspeed of 12.0 ns, and main memory in sizes of 1, 2 and 4 million words. The Data General machines were replaced with an in-house 16-bit design running at 80 MIPS. The I/O system was separated from the main machine, connected to the main system via a 6 MB per second control channel and a 100 MB per second High Speed Data Channel. This separation made the 1S look like two "half crays" separated by a few feet, which allowed the I/O system to be expanded as needed. Systems could be bought in a variety of configurations from the S/500 with no I/O and ½ M-word of memory, to the S/4400 with 4M-words and 4 I/O processors.

The Cray 1-S was itself replaced by the Cray-1M, the M indicating that it used less expensive MOS-based RAM in the I/O system. The 1M was supplied in only three versions, the M/1200 with 1M-word in 8 banks, or the M/2200 and M/4200 with 2 or 4M in 16 banks. All of these machines included two, three or four I/O processors, and the system added an optional second High Speed Data Channel. Users could also add a Solid-state Storage Device, a RAM disk
RAM disk

A RAM disk is a software layer that enables applications to transparently use RAM, often a segment of main memory, as if it were a hard disk or other secondary storage....
, with 8 to 32M-words of MOS-RAM.

Software


In 1978, the first standard software package for the Cray-1 was released, consisting of three main products:
  • Cray Operating System
    Cray Operating System

    The Cray Operating System was Cray Research's proprietary operating system for its Cray-1 and Cray X-MP supercomputers, and those platforms' main OS until replaced by UNICOS in the late 1980s....
     (COS) (later machines would run UNICOS
    Unicos

    UNICOS is the name of a range of Unix-like operating system variants developed by Cray Inc. for its supercomputers. UNICOS is the successor of the Cray Operating System ....
    , Cray's UNIX
    Unix

    Unix is a computer operating system originally developed in 1969 by a group of American Telephone & Telegraph employees at Bell Labs, including Ken Thompson , Dennis Ritchie, Douglas McIlroy, and Joe Ossanna....
     flavour),
  • Cray Assembler Language (CAL), and
  • Cray FORTRAN (CFT), the first automatically vectorizing FORTRAN compiler


Comparison with modern PC-processors

As of 2007, the fastest PC processors perform over 40 GFLOPS, over 130 times faster than a Cray-1.

Other images of the Cray-1


External links

  • – From / Ed Thelen