|
|
|
|
Alpha 21064
|
| |
|
| |
The Alpha 21064, introduced as the DECchip 21064 and known also by its code name, EV4, is a microprocessor developed and fabricated by Digital Equipment Corporation that implemented the Alpha (introduced as the Alpha AXP) instruction set architecture (ISA).

Discussion
Ask a question about 'Alpha 21064'
Start a new discussion about 'Alpha 21064'
Answer questions from other users
|
Encyclopedia
The Alpha 21064, introduced as the DECchip 21064 and known also by its code name, EV4, is a microprocessor developed and fabricated by Digital Equipment Corporation that implemented the Alpha (introduced as the Alpha AXP) instruction set architecture (ISA). It was announced in February 1992 and was introduced in November 1992. It was succeeded by the Alpha 21164 in January 1995, but remained available.
History The first Alpha processor was a test chip codenamed EV3. This test chip was fabricated using DECs' 1.0 micron CMOS-3 process. The test chip lacked a floating point unit and only had 1KB caches. The test chip was used to confirm the operation of the aggressive circuit design techniques. The test chip (along with simulators/emulators) was also used to bring up firmware and the various operating systems that the company supported. The production chip, codenamed EV4, was fabricated using DEC's 0.75um CMOS-4 process. Dirk Meyer and Edward McLellan were the micro-architects. Ed designed the issue logic while Dirk designed the other major blocks. Jim Montanaro lead the circuit implementation. The EV3 was used in the Alpha Development Unit (ADU), a computer used by DEC to develop software for the Alpha platform before the availability of EV4 parts.
The Alpha 21064 was unveiled at the 39th International Solid-State Circuits Conference in mid-February 1992. It was announced on 25 February 1992, with a 150 MHz sample introduced on the same day. It was priced at $3,375 in quantities of 100, $1,650 in quantities between 100 and 1,000, and $1,560 for quantities over 1,000. Volume shipments begun in September 1992.
In early February 1993, the price of the 150 MHz version was reduced to $1,096 from $1,559 in quantities greater than 1,000.
On 25 February 1993, the 200 MHz Alpha 21064 was introduced, with sample kits available, priced at $3,495. In volume, it was priced at $1,231 per unit in quantities greater than 10,000. Volume orders were accepted in June 1993, with shipments in August 1993. The price of the 150 MHz version was reduced on 25 February 1993. The sample kit was reduced to $1,690 from $3,375, effective in April 1993; and in volume, it was reduced to $853 from $1,355 per unit in quantities greater than 10,000, effective in July 1993.
With the introduction of the Alpha 21066 and the Alpha 21068 on 10 September 1993, Digital adjusted the Alpha microprocessor family, introduced a 166 MHz version priced at $499 per unit in quantities of 5,000. The price of the 150 MHz version was also reduced to $455 per unit in quantities of 5,000.
On 6 June 1994, Digital reduced the price of the Alpha 21064, with the 200 MHz version reduced by 31% to $544 and the 166 MHz version by 19% to $404 per unit in quantities of 5,000, effective on 3 July 1994.
The Alpha 21064 was fabricated at Digital's Hudson, Massachusetts and South Queensferry, Scotland facilities.
Users The Alpha 21064 was mostly used in high-end computers such as workstations and servers. Digital used the Alpha 21064 in their DECpc AXP 150 entry-level workstations, DEC 2000 AXP entry-level servers, DEC 3000 AXP workstations and entry-level servers, DEC 4000 AXP mid-range servers and DEC 7000/10000 AXP high-end servers. Cray Research, an Alpha AXP partner, used a 150 MHz Alpha 21064 in their Cray T3D supercomputers. The Alpha 21064 was also sold on the open market.
Performance The Alpha 21064 was the highest performing microprocessor when introduced until 1993, after International Business Machines (IBM) introduced the multi-chip POWER2. The Alpha 21064 subsequently became the highest performing single-chip microprocessor, a position it held until the Alpha 21064A was introduced.
Description The Alpha 21064 is a dual-issue superscalar, in-order microprocessor capable of issuing a maximum of two instructions every clock cycle to four functional units: an integer unit, a floating-point unit (FPU), an address unit and a branch unit. It used a 43-bit virtual address and a 34-bit physical address, and is therefore is capable of addressing 8 TB of virtual memory and 16 GB of physical memory.
The Alpha's 21064's pipelines have a total of 45 bypasses. Up to 22 instructions can be in various stages of execution simultaneously: 14 in pipeline stages zero to six, 3 in the extended floating-point pipeline stages 3 outstanding load misses, a floating-point divide and an integer multiply.
Integer unit The integer unit is responsible for executing integer instructions. It consists of the integer register file (IRF) and the E-box. The IRF contains thirty-two 64-bit registers. The IRF has four read ports and two write ports which are equally divided between the integer unit and
the branch unit. The E-box contains an adder, logic units, a barrel shifter and a multiplier.
The integer pipeline is seven stages deep, with the first four stages being: instruction fetch, decode, scoreboard checking of operands. The first four stages can be stalled, but afterwards, the pipeline must advance every cycle.
Most integer instructions are completed in the fourth cycle of the pipeline, for a latency of one cycle. The barrel shifter is pipelined, and shift instructions have a latency of two cycles. The multiplier was not pipelined in order to save die area, thus multiply instructions have a latency of 19 to 23 cycles. Byte instructions also have a latency of two-cycles.
Address unit The address unit, also known as the "A-box", executed load and store instructions. To enable the address unit and integer unit to operate in parallel, the address unit has its own displacement adder, which it uses to calculate virtual addresses, instead of using the adder in the integer unit. A 32-entry data, fully associative translation lookaside buffer (TLB) is used to translate virtual addresses into physical addresses.
Store instructions result in data buffered in a 4-entry by 32-byte write buffer. The write buffer improved performance by reducing the number of writes on the system bus by merging data from adjacent stores and by temporarily delaying stores, enabling loads to be serviced quicker as the system bus is not utilized as often.
Floating-point unit The floating-point unit consists of the floating-point register file (FRF) and the F-box. The FRF contains thirty-two 64-bit registers and has three read ports and two write ports. The F-box contained a floating-point pipeline and a non-pipelined divide unit which retired one bit per cycle.
The floating-point unit has a ten-stage pipeline. The first four stages of the pipeline is identical to the integer pipeline and is mostly shared. Instructions begin execution in stage four, where data read from the floating-point register file is formatted into fraction, exponent and sign. If executing add instructions, the adder calculates the exponent difference, and a predictive leading one or zero detector using input operands for normalizing the result is initiated. If executing multiply instructions, a 3 X multiplicand is generated.
In stages five and six, alignment or a normalization shift and sticky-bit calculations are performed for adds and subtracts. Multiply instructions are multiplied in a pipelined, two-way interleaved array which uses a radix-8 Booth algorithm. In stage eight, final addition is performed in parallel with rounding, and the result is written to the FRF in stage nine.
Instructions executed in the pipeline have a latency of 6 cycles. Single-precision (32-bit) and double-precision (64-bit) divides, which are executed in the non-pipelined divide unit, have a latency of 31 and 61 cycles, respectively.
I-box The I-box is the control unit; it fetched, issued and decoded instructions and controlled the pipeline. Two instructions are fetched from the I-cache and decoded every clock cycle. The I-box then checks if the resources required by the two instructions are available. If so, the instructions are issued, providing they can be paired. Which instructions could be paired was determined by the number of read and write ports in the integer register file. The Alpha 21064 could issue: an integer operate with a floating-point operate, any load/store instruction with any operate instruction, an integer operate with an integer branch, a floating-point operate with a floating-point branch. Two combinations were not permitted: an integer operate and a floating-point store, and a floating-point operate and an integer store.
The I-box contains two translation lookaside buffers (TLBs) for translating virtual addresses so the microprocessor can fetch instructions from the memory. These TLBs are referred to as instruction translation buffers (ITBs). The ITBs cache recently used page table entries for instruction stream. An 8-entry ITB is used for 8 KB pages and a 4-entry ITB for 4 MB pages. Both translation lookaside buffers are fully associative and use a not-last used replacement algorithm.
Branch prediction is performed by logic in the I-box. Either static prediction or dynamic prediction is used. Static prediction examined the sign bit of the displacement field of a branch instruction, predicted the branch as taken if the sign bit indicated a backwards branch (if sign bit contained 1). Dynamic prediction examined an entry in the 2,048-entry by 1-bit branch history table. If an entry contained 1, the branch was predicted as taken. If dynamic prediction was utilized, the branch prediction is approximately 80% accurate for most programs. The branch misprediction penalty is four cycles.
Cache The Alpha 21064 has two on-die first level caches: an 8 KB data cache (known as the D-cache) using a write-through write policy and an 8 KB instruction cache (known as the I-cache). Both caches are direct-mapped for single-cycle access and have a cache block size of 32 bytes. The caches are built with six-transistor static random access memory (SRAM) cells that have an area of 98 µm2. The caches are 1,024 cells wide by 66 cells tall, with the top two rows used for redundancy.
An optional external secondary cache, known as the B-cache, with capacities of 128 KB to 8 MB was supported. The cache operated at one-third to one-sixteenth of the internal clock frequency, or 12.5 to 66.67 MHz at 200 MHz.
External interface The external interface was a 128-bit bus which operated at half to one-eighth the internal clock frequency, or 25 to 100 MHz at 200 MHz. The width of the bus was configurable, systems using the Alpha 21064 could have a 64-bit external interface.
Fabrication The Alpha 21064 contained 1.68 million transistors. It was first fabricated in Digital's fourth-generation complementary metal–oxide–semiconductor (CMOS) process, CMOS-4, with a feature size of 0.75 µm and three levels of aluminium interconnect. Fabricated in CMOS-4, the die measured 16.8 mm by 13.9 mm, for an area of 233.52 mm2. It was later fabricated in CMOS-4S, a 10% optical shrink of CMOS-4 with a feature size of 0.675 µm.
The Alpha 21064 used a 3.3 V power supply. Power dissipation at 150 MHz was 21.0 W, at 166 MHz it was 22.5 W, and at 200 MHz was 27.0 W.
Package The Alpha 21064 is packaged in a 431-pin alumina-ceramic pin grid array (PGA) measuring 61.72 by 61.72 mm. Of the 431 pins, 291 were signal pins. The remaining 140 pins were for Vdd (power supply voltage) and Vss (ground). The heatsink directed attached to the package, secured by nuts attached to two studs protruding from the tungsten heat spreader.
Derivatives
Alpha 21064A The Alpha 21064A, introduced as the DECchip 21064A, code-named EV45, is a further development of the Alpha 21064 introduced in October 1993. It operated at clock frequencies of 200, 225, 233, 275 and 300 MHz. The 225 MHz model was replaced by the 233 MHz model on 6 July 1994, which at introduction, was priced at US$788 in quantities of 5,000, 10% less than the 255 MHz model it replaces. On the same day, prices for the 275 MHz was also reduced by 25% to US$1,083 in quantities of 5,000. The 300 MHz model was announced and sampled on 2 October 1995 and was shipped in December 1995.
One model, the 21064A-275-PC, was restricted to running the Windows NT or operating systems that use the Windows NT memory management model.
The Alpha 21064A had a number of microarchitectural improvements over the Alpha 21064. The cache was improved in two ways: the capacity of the I-cache and D-cache was doubled from 8 KB to 16 KB and the cache tag and cache data was protected with parity. The floating-point divider was updated for improved performance as was the branch predictor and the branch history table, which now contained 2,048 two-bit entries.
The Alpha 21064A contained 2.8 million transistors on a die measuring 14.5 by 10.5 mm, for an area of 152.25 mm2. It was fabricated by Digital in their fifth-generation CMOS process, CMOS-5, a 0.50 µm process with four levels of aluminium interconnect.
The Alpha 21064A was used by Digital in some models of their DEC 3000 AXP, DEC 4000 AXP and DEC 7000/10000 AXP systems. Third-parties who used the Alpha 21064A include BTG, who used a 275 MHz model their Action AXP275 RISC PC, NekoTech, who used a 275 MHz model overclocked by 5% to 289 MHz in their Mach 2-289-T and Network Appliance (now NetApp), who used a 275 MHz model in their storage systems.
Alpha 21066 The Alpha 21066, introduced as the DECchip 21066, code-named LCA4 (Low Cost Alpha), is a low-cost variant of Alpha 21064. Samples were introduced on 10 September 1993, with volume shipments in early 1994. At the time of introduction, the 166 MHz Alpha 21066 was priced at US$385 in quantities of 5,000. A 100 MHz model, intended for embedded systems, also existed. Mitsubishi Electric was a second source of the Alpha 21066 and they fabricated 200 MHz model. Sampling begun in late 1994, with volume shipments in the third quarter of 1995. The Microprocessor Report recognized the Alpha 21066 as the first microprocessor with an integrated PCI controller.
The Alpha 21066 was intended for use in low-cost applications, specifically Alpha-based personal computers running Windows NT. Digital used various models of the Alpha 21066 in their Multia clients, AXPpci 33 original equipment manufacturer (OEM) motherboards and AXPvme single board computers. Outside of Digital, NekoTech used a 166 MHz model in their Mach 1-166 personal computer.
Due to the process shrink, it was able to include features that were desirable in cost-sensitive embedded systems. These features include an on-die B-cache and memory controller with ECC support, a functionally limited graphics accelerator supporting up to 8 MB of VRAM for implementing a framebuffer, a PCI controller and a phase locked loop (PLL) clock generator for multiplying a 33 MHz external clock signal to the desired internal clock frequency.
The memory controller supported 64 KB to 2 MB of B-cache and 2 to 512 MB of memory. The ECC implementation was capable of detecting 1-, 2- and 4-bit errors and correcting 1-bit errors. To reduce cost, the Alpha 21066 has a 64-bit system bus, which reduced the number of pins and thus the size of the package. The reduced width of the system bus also reduced bandwidth and thus performance by 20%, which was deemed acceptable.
The Alpha 21066 contained 1.75 million transistors. It was fabricated by Digital and was second sourced by Mitsubishi Electric. The Digital-fabricated model has a die measuring 17.0 by 12.3 mm, for an area of 209.1 mm2), fabricated in their fourth-generation CMOS process, CMOS-4S, a 0.675 µm process with three levels of interconnect. The Mitsubishi-fabricated model has a die that was 26% smaller than the Digital-fabricated model. It was fabricated in a 0.50 µm process.
The Alpha 21066 was packaged in a 287-pin CPGA measuring 57.404 by 57.404 mm.
Alpha 21066A The Alpha 21066A, code-named LCA45, is a low-cost variant of the Alpha 21064A. It was announced on 14 November 1994, with samples of 100 and 233 MHz models introduced on the same day. Both models were shipped in March 1995. When announced, the 100 and 233 MHz models were priced at $175 and $360, respectively, in quantities of 5,000. A 266 MHz model was later made available.
Although based on the Alpha 21064A, the Alpha 21064 did not have the 16 KB instruction and data caches. A feature specific to the Alpha 21066A was power management – the microprocessor's internal clock frequency could be adjusted by software.
Digital used various models of Alpha 21066A in their products which had previously used the Alpha 21066. Outside of Digital, Tadpole Technology used a 233 MHz model in their ALPHAbook 1 notebook.
The Alpha 21066A contained 1.8 million transistors on a die measuring 14.8 by 10.9 mm, for an area of 161.32 mm2. It was fabricated in Digital's fifth-generation CMOS process, CMOS-5, a 0.50 µm process with three levels of interconnect.
Alpha 21068 The Alpha 21068, introduced as the DECchip 21068, is a variant of the Alpha 21066 designed for embedded systems. It was identical in microarchitecture to the Alpha 21066. Samples were introduced on 10 September 1993 with volume shipments in early 1994. It operated at a clock frequency of 66 MHz and had a power dissipation of 9 W maximum. At the time of introduction, the Alpha 21068 was priced at US$221 each in quantities of 5,000. On 6 June 1994, Digital announced that it was cutting the price by 16% to US$186, effective on 3 July 1994.
The Alpha 21068 was used by Digital in their AXPpci 33 motherboard and the AXPvme 64 and AXPvme 64LC single-board computers.
Alpha 21068A The Alpha 21068A, introduced as the DECchip 21068A, is a variant of the Alpha 21066A for embedded systems. It operated at a clock frequency of 100 MHz.
Chipsets Initially, there was no standard chipset for the Alpha 21064 and Alpha 21064A. Digital's computers used custom application-specific integrated circuits (ASICs) to interface the microprocessor to the system. As this raised development cost for third-parties who wished to develop Alpha-based products, Digital developed a standard chipset, the DECchip 21070 Apecs, for original equipment manufacturers (OEMs).
There were two models of the DECchip 21070, the DECchip 21071 and the DECchip 21072. They differed by the width of the memory bus, the DECchip 21071 had a 64-bit bus, the DECchip 21072 had a 128-bit bus. Naturally, the 21072 was the higher performing and more expensive model. The chipset consisted of three chip designs, the COMANCHE B-cache and memory controller, the DECADE data slice and the EPIC PCI controller. The DECADE chips implemented the data paths in 32-bit slices and therefore the DECchip 21071 has two such chips and the DECchip 21072 has four. The Industry Standard Architecture (ISA) and Extended Industry Standard Architecture (EISA) buses were supported through the use of a standard PCI to ISA or EISA bridge.
The chipsets were introduced on 10 January 1994, with samples available. Volume shipments begun in mid-1994. The DECchip 21071 was priced at $90 in quantities of 5,000 and the DECchip 21072 was priced at $120 in quantities of 5,000.
The DECchip 21070 was used by Digital in Alpha 21064- and Alpha 21064A-based AlphaStations and uniprocessor AlphaServers and by third-party manufacturers in their own products.
|
| |
|
|