All Topics  
Alpha 21264

 

   Email Print
   Bookmark   Link






 

Alpha 21264



 
 
The Alpha 21264 is a microprocessor
Microprocessor

A microprocessor incorporates most or all of the functions of a central processing unit on a single integrated circuit . The first microprocessors emerged in the early 1970s and were used for electronic calculators, using Binary-coded decimal arithmetic on 4-bit Word ....
 developed and fabricated by Digital Equipment Corporation
Digital Equipment Corporation

Digital Equipment Corporation was a pioneering United States company in the computer industry. It is often referred to within the computing industry as DEC ....
 that implemented the Alpha
DEC Alpha

Alpha, originally known as Alpha AXP, was a 64-bit reduced instruction set computer instruction set architecture developed by Digital Equipment Corporation , designed to replace the 32-bit VAX complex instruction set computer ISA and its implementations....
 instruction set architecture (ISA).

Out of order execution
At any given stage, the microprocessor could have up to 80 instructions in various stages of execution, surpassing every other contemporary microprocessor.

Decoded instructions are queued in instruction queues and are issued when their operands are available.






Discussion
Ask a question about 'Alpha 21264'
Start a new discussion about 'Alpha 21264'
Answer questions from other users
Full Discussion Forum



Encyclopedia


The Alpha 21264 is a microprocessor
Microprocessor

A microprocessor incorporates most or all of the functions of a central processing unit on a single integrated circuit . The first microprocessors emerged in the early 1970s and were used for electronic calculators, using Binary-coded decimal arithmetic on 4-bit Word ....
 developed and fabricated by Digital Equipment Corporation
Digital Equipment Corporation

Digital Equipment Corporation was a pioneering United States company in the computer industry. It is often referred to within the computing industry as DEC ....
 that implemented the Alpha
DEC Alpha

Alpha, originally known as Alpha AXP, was a 64-bit reduced instruction set computer instruction set architecture developed by Digital Equipment Corporation , designed to replace the 32-bit VAX complex instruction set computer ISA and its implementations....
 instruction set architecture (ISA).

Description


The Alpha 21264 is a four-issue superscalar
Superscalar

A superscalar Central processing unit architecture implements a form of parallel computer called instruction level parallelism within a single processor....
 microprocessor with out-of-order execution
Out-of-order execution

In computer engineering, out-of-order execution, OoOE, is a paradigm used in most high-performance microprocessors to make use of Instruction cycle that would otherwise be wasted by a certain type of costly delay....
 and speculative execution
Speculative execution

In computer science, speculative execution is the execution of Code , the result of which may not be needed. In the context of functional programming, the term "speculative evaluation" is used instead....
. It has a peak execution rate of six instructions per cycle and can sustain four instructions per cycle. It has a seven-stage instruction pipeline
Instruction pipeline

File:5 Stage Pipeline.svgAn instruction pipeline is a technique used in the design of computers and other digital electronic devices to increase their instruction throughput ....
.

Out of order execution


At any given stage, the microprocessor could have up to 80 instructions in various stages of execution, surpassing every other contemporary microprocessor.

Decoded instructions are queued in instruction queues and are issued when their operands are available. The integer queue contained 20 entries and the floating-point queue 15. Each queue could issue as many instructions as there were pipelines.

Ebox


The Ebox is responsible for the execution of integer and load store instructions. It has two integer units, two load store units and two integer register files. Each register file served an integer unit and a load store unit, and the register file and its two units are referred to as a "cluster". This scheme was used as it reduced the number of write and read ports required to serve operands and receive results, thus reducing the physical size of the register file, enabling the microprocessor to operate at higher clock frequencies. Writes to any of the register files thus have to be synchronized, which required a clock cycle to complete, negatively impacting performance by one percent. Loss of performance was compensated by the higher clock frequency achievable with this scheme and was avoided where possible by the issue logic, which tried to reduce the number of operations requiring data to synchronized.

The clusters are near identical. However, U1 has a seven-cycle pipelined multiplier while U0 has a three-cycle pipeline for executing Motion Video Instructions (MVI), an extension to the Alpha Architecture defining single instruction multiple data (SIMD) instructions for multimedia.

The load store units are simple arithmetic logic units used to calculate virtual addresses for memory access. They are also capable of executing simple arithmetic and logic instructions and the instruction issue logic in the Alpha 21264 utilized this capability, issuing instructions to these units when they were available.

Each integer register file contains 80 entries, of which 31 are architectural registers, 41 are rename registers and 8 are PALshadow registers. There is no entry for register R31 as it is hardwired to zero and can only be read from.

The Ebox therefore has four 64-bit adders, four logic units, two barrel shifters, byte logic, two sets of conditional branch logic equally divided between U1 and U0.

Fbox


The Fbox is responsible for executing floating-point instructions. It consists of two floating-point pipelines and a floating-point register file. The pipelines are not identical, one executes the majority of instructions and the other only multiply instructions. The adder pipeline has two non-pipelined units connected to it, a divide unit and a square root unit. Adds, multiplies and most other instructions have a 4-cycle latency, a double-precision divide has 16-cycle latency and a double-precision square root has a 33-cycle latency. The floating point register file contains 72 entries, of which 32 are architectural registers and 40 are rename registers.

Cache


The Alpha 21264 has two levels of cache
CPU cache

A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access computer storage. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations....
, a primary cache and secondary cache. The three-level cache of the Alpha 21164
Alpha 21164

The Alpha 21164, also known by its code name, EV5, is a microprocessor developed and fabricated by Digital Equipment Corporation that implemented the DEC Alpha instruction set architecture ....
 was not used due to problems with bandwidth.

Primary caches

The primary cache is split into separate caches for instructions and data, the I-cache and D-cache respectively. The I-cache and D-caches are 64 KB.

The D-cache is dual-ported by transferring data on both the rising and falling edges of the clock signal. This method of dual-porting enabled any combination of reads or writes to the cache every processor cycle. It also avoided the having to duplicate the cache so there are two as in the Alpha 21164. Duplicating the cache restricted the capacity of the cache, as it required more transistors to provide the same amount of capacity, and in turn increased the area required and power consumed.

B-cache

The secondary cache, termed the B-cache, is an external cache with a capacity of 1 to 16 MB. It is controlled by the microprocessor and is implemented by synchronous static random access memory
Static random access memory

Static random access memory is a type of semiconductor memory where the word static indicates that, unlike dynamic random access memory, it does not need to be periodically memory refresh, as SRAM uses bistable latch to store each bit....
 (SSRAM) chips that operate at two thirds, half, one-third or one-fourth the internal clock frequency, or 133 to 333 MHz at 500 MHz. The B-cache was accessed with a dedicated 128-bit bus that operates at the same clock frequency as the SSRAM or at twice the clock frequency if double data rate
Double data rate

In computing, a computer bus operating with double data rate transfers data on both the rising and falling edges of the clock signal. This is also known as double pumped, dual-pumped, and double transition....
 SSRAM is used. The B-cache is direct-mapped.

Branch prediction


Branch prediction is performed by a tournament branch prediction algorithm. The algorithm was developed by Scott McFarling at Digital's Western Research Laboratory (WRL) and was described in a 1993 paper. This predictor was used as the Alpha 21264 has a minimum branch misprediction penalty of seven cycles. Due to the instruction cache's two cycle latency and the instruction queues, the average branch misprediction penalty is 11 cycles. The algorithm maintains two history tables, Local and Global, and the table used to predict the outcome of a branch is determined by a Choice predictor.

The local predictor is a two-level table which records the history of individual branches. It consists of a 1,024-entry by 10-bit branch history table. A two-level table was used as the prediction accuracy is similar to that of a larger single-level table while requiring fewer bits of storage. It has a 1,024-entry branch history table. Each entry is a 3-bit saturating counter. The value of the counter determines whether the current branch is taken or not taken.

The choice predictor records the history of the local and global predictors to determine which predictor is the best for a particular branch. It has a 4,096-entry branch history table. Each entry is a 2-bit saturating counter. The value of the counter determines if the local or global predictor is used.

External interface


The external interface consisted of a bidirectional 64-bit double data rate
Double data rate

In computing, a computer bus operating with double data rate transfers data on both the rising and falling edges of the clock signal. This is also known as double pumped, dual-pumped, and double transition....
 (DDR) data bus and two 15-bit unidirectional time-multiplexed address
Address bus

An address bus is a computer bus that is used to specify a memory address. When a central processing unit or direct memory access-enabled device needs to read or write to a memory location, it specifies that memory location on the address bus ....
 and control
Control bus

A control bus is a computer bus, used by central processing units for communicating with other devices within the computer. While the address bus carries the information on which device the CPU is communicating with and the data bus carries the actual data being processed, the control bus carries commands from the CPU and returns status sign...
 buses, one for signals originating from the Alpha 21264 and one for signals originating from the system. Digital licensed the bus to Advanced Micro Devices
Advanced Micro Devices

Advanced Micro Devices, Inc. is an United States multinational corporation semiconductor industry company based in Sunnyvale, California, that develops Central processing unit and related technologies for commercial and consumer markets....
 (AMD), and it was subsequently used in their Athlon
Athlon

Athlon is the brand name applied to a series of different x86 Central processing unit designed and manufactured by Advanced Micro Devices. The original Athlon was the first seventh-generation x86 processor and, in a first, retained the initial performance lead it had over Intel Corporation's competing processors for a significant period of t...
 microprocessors, where it was known as the EV6 bus.

Fabrication


The Alpha 21264 contained 15.2 million transistors. The logic consisted of approximately six million transistors, with the rest contained in the caches and branch history tables. It was fabricated in a 0.35 µm complementary metal–oxide–semiconductor
CMOS

Complementary metal?oxide?semiconductor , is a major class of integrated circuits. CMOS technology is used in microprocessors, microcontrollers, Static Random Access Memory, and other digital logic circuits....
 (CMOS) process with six levels of interconnect.

Packaging


The Alpha 21264 was packaged in a 587-pin ceramic interstitial
Interstitial

Interstitial may refer to:* Interstitial program, short television programming which is often shown between movies or other events* Interstitial defect, a crystallographic defect that may be occupied by another atom...
 pin grid array
Pin grid array

A pin grid array, often abbreviated PGA, refers to the arrangement of pins on the integrated circuit packaging. In a PGA, the pins are arranged in a square array that may or may not cover the bottom of the package....
 (IPGA).

Alpha Processor, Inc. later sold the Alpha 21264 in a Slot B package containing the microprocessor mounted on a printed circuit board with the B-cache and voltage regulators. The design was intended to use the success of slot-based microprocessors from Intel and AMD. Slot B was originally developed to be used by AMD's Athlon as well, so that API could obtain materials for the Slot B at commodity prices in order to reduce the cost of the Alpha 21264 to gain a wider market share. This never materialized as AMD chose to use Slot A for their slot-based Athlons.

Derivatives


Alpha 21264A


The Alpha 21264A, code-named EV67 was a shrink of the Alpha 21264 which was introduced in late 1999. The microarchitecture did not change, although the circuit required modification to be fabricated in the new process. It was fabricated by Samsung in a 0.25 µm CMOS process for a die with an area of 210 mm2. Power supply voltage was reduced to 2.0V and TDP to 70 to 100 W for 600 to 833 MHz.

Alpha 21264B


The Alpha 21264B is a further development for increased clock frequencies. There were two models, one fabricated by IBM, code-named EV68C, and one by Samsung, code-named EV68A

The model fabricated by IBM was done in a 0.18 µm CMOS process with copper interconnects. It was sampled in early 2000 and achieved a maximum clock frequency of 1.25 GHz.

The model fabricated by Samsung was done in a 0.18 µm CMOS process with aluminium interconnects. It had a die size of 125 mm2, a third smaller than the Alpha 21264A, and required a 1.7V power supply. It was available in volume in 2001 and achieved clock frequencies in the range of 750 to 940 MHz with a TDP between 60 to 75W.

In September 1998, Samsung announced they would fabricate a variant of the Alpha 21264B in a 0.18 µm fully depleted silicon-on-insulator (SOI) process with copper interconnects
Copper-based chips

Copper-based chips are semiconductor integrated circuits, usually microprocessors, which use copper for interconnections. Since copper is a better conductor than aluminium, chips using this technology can have smaller metal components, and use less energy to pass electricity through them....
 that was capable of achieving a clock frequency of 1.5 GHz. This version never materialized.

Alpha 21264C


The Alpha 21264C, code-named EV68CB is a faster derivative. It was available at clock frequencies of 1.0, 1.25 and 1.33 GHz and was fabricated by IBM. The packaging was changed to a 675-pad ceramic land grid array
Land grid array

The land grid array is a type of surface-mount packaging used for integrated circuits. It can be electrically connected to a Printed circuit board either by the use of a CPU socket or by soldering directly to the PCB....
 (CLGA) measuring 49.53 by 49.53 mm.

Alpha 21264D


The Alpha 21264D, code-named EV68CD is a faster derivative fabricated by IBM.

Chipsets


Digital and Advanced Micro Devices
Advanced Micro Devices

Advanced Micro Devices, Inc. is an United States multinational corporation semiconductor industry company based in Sunnyvale, California, that develops Central processing unit and related technologies for commercial and consumer markets....
 (AMD) both developed chipsets for the Alpha 21264.

21272


The Digital 21272, also known as the Tsunami and Typhoon was the first chipset for the Alpha 21264. The 21272 chipset supported two-, three- or four-way multiprocessing and one or two 33 MHz PCI-X
PCI-X

PCI-X is a computer bus and expansion card standard that enhanced the PCI Local Bus for higher bandwidth demanded by Server . It is a double-wide version of PCI, running at up to four times the clock speed, but is otherwise similar in electrical implementation and uses the same protocol....
 buses. It had 128- to 512-bit memory bus which operated at 83 MHz, yielding a maximum bandwidth of 5,312 MB/s. The chipset supported 100 MHz registered ECC SDRAM.

The chipset consisted of three devices, a C-chip, a D-chip and a P-chip. The number of devices which made up the chipset varied as it was determined by the configuration of the chipset. The C-chip the control chip containing the memory controller. One C-chip was required for every microprocessor.

The P-chip is the PCI controller, implementing a 33 MHz PCI-X bus. The 21272 could have one or two P-chips.

The 21272 was used extensively by Digital, Compaq and Hewlett Packard in their entry-level to mid-range AlphaServers and in all models of the AlphaStation. It was also used in third-party products from Alpha Processor, Inc. (later known as API NetWorks) such as their UP2000+ motherboard.

Irongate


AMD developed two Alpha 21264-compatible chipsets, the Irongate, also known as the AMD-751, and its successor, Irongate-2, also known as the AMD-761. These chipsets were developed for their Athlon microprocessors but due to AMD licensing the EV6 bus used in the Alpha from Digital, the Athlon and Alpha 21264 were compatible in terms of bus protocol. The Irongate was used by Samsung in their UP1000 and UP1100 motherboards. The Irongate-2 was used by Samsung in their UP1500 motherboard.