Alpha 21064
Encyclopedia
The Alpha 21064 is a microprocessor
Microprocessor
A microprocessor incorporates the functions of a computer's central processing unit on a single integrated circuit, or at most a few integrated circuits. It is a multipurpose, programmable device that accepts digital data as input, processes it according to instructions stored in its memory, and...

 developed and fabricated by Digital Equipment Corporation
Digital Equipment Corporation
Digital Equipment Corporation was a major American company in the computer industry and a leading vendor of computer systems, software and peripherals from the 1960s to the 1990s...

 that implemented the Alpha
DEC Alpha
Alpha, originally known as Alpha AXP, is a 64-bit reduced instruction set computer instruction set architecture developed by Digital Equipment Corporation , designed to replace the 32-bit VAX complex instruction set computer ISA and its implementations. Alpha was implemented in microprocessors...

 (introduced as the Alpha AXP) instruction set architecture (ISA). It was introduced as the DECchip 21064 before it was renamed in 1994. The 21064 is also known by its code name, EV4. It was announced in February 1992 with volume availability in September 1992. The 21064 was the first commercial implementation of the Alpha ISA, and the first microprocessor from Digital to be available commercially. It was succeeded by a derivative, the Alpha 21064A in October 1993.

History

The first Alpha processor was a test chip codenamed EV3. This test chip was fabricated using Digital's 1.0-micrometre
Micrometre
A micrometer , is by definition 1×10-6 of a meter .In plain English, it means one-millionth of a meter . Its unit symbol in the International System of Units is μm...

 (µm) CMOS-3 process. The test chip lacked a floating point unit
Floating point unit
A floating-point unit is a part of a computer system specially designed to carry out operations on floating point numbers. Typical operations are addition, subtraction, multiplication, division, and square root...

 and only had 1 KB
Kilobyte
The kilobyte is a multiple of the unit byte for digital information. Although the prefix kilo- means 1000, the term kilobyte and symbol KB have historically been used to refer to either 1024 bytes or 1000 bytes, dependent upon context, in the fields of computer science and information...

 cache
CPU cache
A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations...

s. The test chip was used to confirm the operation of the aggressive circuit design
Circuit design
The process of circuit design can cover systems ranging from complex electronic systems all the way down to the individual transistors within an integrated circuit...

 techniques. The test chip (along with simulators and emulators) was also used to bring up firmware
Firmware
In electronic systems and computing, firmware is a term often used to denote the fixed, usually rather small, programs and/or data structures that internally control various electronic devices...

 and the various operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

s that the company supported. The production chip, codenamed EV4, was fabricated using Digital's 0.75 µm CMOS-4 process. Dirk Meyer
Dirk Meyer
Derrick R. "Dirk" Meyer was a former Chief Executive Officer of Advanced Micro Devices, serving in the position since July 18, 2008 and resigning on January 10, 2011.- Education :...

 and Edward McLellan were the micro-architects. Ed designed the issue logic while Dirk designed the other major blocks. Jim Montanaro lead the circuit implementation. The EV3 was used in the Alpha Development Unit (ADU), a computer used by Digital to develop software for the Alpha platform before the availability of EV4 parts.

The 21064 was unveiled at the 39th International Solid-State Circuits Conference
International Solid-State Circuits Conference
International Solid-State Circuits Conference is a global forum for presentation of advances in solid-state circuits and Systems-on-a-Chip. The Conference offers a unique opportunity for engineers working at the cutting edge of IC design to maintain technical currency, and to network with leading...

 (ISSCC) in mid-February 1992. It was announced on 25 February 1992, with a 150 MHz sample introduced on the same day. It was priced at $3,375 in quantities of 100, $1,650 in quantities between 100 and 1,000, and $1,560 for quantities over 1,000. Volume shipments begun in September 1992.

In early February 1993, the price of the 150 MHz version was reduced to $1,096 from $1,559 in quantities greater than 1,000.

On 25 February 1993, a 200 MHz was introduced, with sample kits available, priced at $3,495. In volume, it was priced at $1,231 per unit in quantities greater than 10,000. Volume orders were accepted in June 1993, with shipments in August 1993. The price of the 150 MHz version was reduced in response. The sample kit was reduced to $1,690 from $3,375, effective in April 1993; and in volume, it was reduced to $853 from $1,355 per unit in quantities greater than 10,000, effective in July 1993.

With the introduction of the Alpha 21066 and the Alpha 21068 on 10 September 1993, Digital adjusted the positioning of the existing 21064s and introduced a 166 MHz version priced at $499 per unit in quantities of 5,000. The price of the 150 MHz version was reduced to $455 per unit in quantities of 5,000.

On 6 June 1994, the price of the 200 MHz version was reduced by 31% to $544 to position it against the 60 MHz Pentium; and the 166 MHz version by 19% to $404 per unit in quantities of 5,000, effective on 3 July 1994.

The Alpha 21064 was fabricated at Digital's Hudson, Massachusetts
Hudson, Massachusetts
Hudson is a town in Middlesex County, Massachusetts, United States. The population was 19,063 at the 2010 census. The town is located in central Massachusetts, about a 40-minute drive, or about , west of Boston, and about a 20-minute drive, or about , northeast of Worcester.Before its...

 and South Queensferry, Scotland
South Queensferry
South Queensferry , also called Queensferry, is a former Royal Burgh in West Lothian now part of the City of Edinburgh, Scotland. It is located some ten miles to the north west of the city centre, on the shore of the Firth of Forth between the Forth Bridge and the Forth Road Bridge, approximately 8...

 facilities.

Users

The 21064 was mostly used in high-end computers such as workstation
Workstation
A workstation is a high-end microcomputer designed for technical or scientific applications. Intended primarily to be used by one person at a time, they are commonly connected to a local area network and run multi-user operating systems...

s and server
Server (computing)
In the context of client-server architecture, a server is a computer program running to serve the requests of other programs, the "clients". Thus, the "server" performs some computational task on behalf of "clients"...

s. Users included:
  • Aspen Systems in its Alpine workstations
  • Carrera Computers in its Hercules 150, Hercules 200, and Pantera II workstations
  • Cray Research
    Cray
    Cray Inc. is an American supercomputer manufacturer based in Seattle, Washington. The company's predecessor, Cray Research, Inc. , was founded in 1972 by computer designer Seymour Cray. Seymour Cray went on to form the spin-off Cray Computer Corporation , in 1989, which went bankrupt in 1995,...

    , used the 150 MHz 21064 in its Cray T3D
    Cray T3D
    The T3D was Cray Research's first attempt at a massively parallel supercomputer architecture. Launched in 1993, it also marked Cray's first use of another company's microprocessor. The T3D consisted of between 32 and 2048 Processing Elements , each comprising a 150 MHz DEC Alpha 21064 ...

     supercomputer
    Supercomputer
    A supercomputer is a computer at the frontline of current processing capacity, particularly speed of calculation.Supercomputers are used for highly calculation-intensive tasks such as problems including quantum physics, weather forecasting, climate research, molecular modeling A supercomputer is a...

    s
  • Digital, in its DECpc AXP 150 entry-level workstations, DEC 2000 AXP
    DEC 2000 AXP
    The DECpc AXP 150, code-named Jensen, is an entry-level workstation developed and manufactured by Digital Equipment Corporation. Introduced on 25 May 1993, the DECpc AXP 150 was the first Alpha-based system to support the Windows NT operating system and the basis for the DEC 2000 AXP entry-level...

     entry-level servers, DEC 3000 AXP
    DEC 3000 AXP
    DEC 3000 AXP was the name given to a series of computer workstations and servers, produced from 1992 to around 1995 by Digital Equipment Corporation. The DEC 3000 AXP series formed part of the first generation of computer systems based on the 64-bit Alpha AXP architecture...

     workstations and entry-level servers, DEC 4000 AXP
    DEC 4000 AXP
    The DEC 4000 AXP is a series of departmental server computers developed and manufactured by Digital Equipment Corporation introduced on 10 November 1992...

     mid-range servers and DEC 7000/10000 AXP
    DEC 7000/10000 AXP
    The DEC 7000 AXP and DEC 10000 AXP are a series of high-end multiprocessor server computers developed and manufactured by Digital Equipment Corporation, introduced on 10 November 1992...

     high-end servers
  • Encore Computer
    Encore Computer
    Encore Computer was an early pioneer in the parallel computing market, based in Marlborough, Massachusetts. Although offering a number of system designs beginning in 1985, they were never as well known as other companies in this field such as Pyramid Technology, Alliant, and the most similar...

    , in its Infinity R/T high-end real-time computer

Performance

The 21064 was the highest performing microprocessor from when it was introduced until 1993, after International Business Machines
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

 (IBM) introduced the multi-chip POWER2
POWER2
The POWER2, originally named RIOS2, is a processor designed by IBM that implemented the POWER instruction set architecture. The POWER2 was the successor of the POWER1, debuting in September 1993 within IBM's RS/6000 systems. When introduced, the POWER2 was the fastest microprocessor, surpassing the...

. It subsequently became the highest performing single-chip microprocessor, a position it held until the 275 MHz 21064A was introduced in October 1993.

Description

The Alpha 21064 is a superpipelined dual-issue superscalar
Superscalar
A superscalar CPU architecture implements a form of parallelism called instruction level parallelism within a single processor. It therefore allows faster CPU throughput than would otherwise be possible at a given clock rate...

 microprocessor that executes instructions in-order. It is capable of issuing up to two instructions every clock cycle to four functional units: an integer unit
Arithmetic logic unit
In computing, an arithmetic logic unit is a digital circuit that performs arithmetic and logical operations.The ALU is a fundamental building block of the central processing unit of a computer, and even the simplest microprocessors contain one for purposes such as maintaining timers...

, a floating-point unit (FPU), an address unit, and a branch unit. The integer pipeline
Instruction pipeline
An instruction pipeline is a technique used in the design of computers and other digital electronic devices to increase their instruction throughput ....

 is seven stages long, and the floating-point pipeline ten stages. The first four stages of both pipelines are identical and are implemented by the I-Box.

I-box

The I-box is the control unit
Control unit
A control unit in general is a central part of the machinery that controls its operation, provided that a piece of machinery is complex and organized enough to contain any such unit. One domain in which the term is specifically used is the area of computer design...

; it fetches, issues and decodes instructions; and controls the pipeline. During stage one, two instructions are fetched from the I-cache. Branch prediction is performed by logic in the I-box during stage two. Either static prediction or dynamic prediction is used. Static prediction examined the sign bit
Sign bit
In computer science, the sign bit is a bit in a computer numbering format that indicates the sign of a number. In IEEE format, the sign bit is the leftmost bit...

 of the displacement field of a branch instruction
Branch (computer science)
A branch is sequence of code in a computer program which is conditionally executed depending on whether the flow of control is altered or not . The term can be used when referring to programs in high level languages as well as program written in machine code or assembly language...

, predicted the branch as taken if the sign bit indicated a backwards branch (if sign bit contained 1). Dynamic prediction examined an entry in the 2,048-entry by 1-bit branch history table. If an entry contained 1, the branch was predicted as taken. If dynamic prediction was utilized, the branch prediction is approximately 80% accurate for most programs. The branch misprediction
Branch misprediction
Branch misprediction occurs when a central processing unit mispredicts the next instruction to process in branch prediction, which is aimed at speeding up execution....

 penalty is four cycles.

These instructions are decoded during stage three. The I-box then checks if the resources required by the two instructions are available during stage four. If so, the instructions are issued, providing they can be paired. Which instructions could be paired was determined by the number of read and write ports in the integer register file. The 21064 could issue: an integer operate with a floating-point operate, any load/store instruction with any operate instruction, an integer operate with an integer branch, or a floating-point operate with a floating-point branch. Two combinations were not permitted: an integer operate and a floating-point store, and a floating-point operate and an integer store. If one of the two instructions cannot be issued together, the first four stages are stalled until the remaining instruction is issued. The first four stages are also stalled in the event that no instruction can be issued due to resource unavailability, dependencies, or similar conditions.

The I-box contains two translation lookaside buffer
Translation Lookaside Buffer
A translation lookaside buffer is a CPU cache that memory management hardware uses to improve virtual address translation speed. All current desktop and server processors use a TLB to map virtual and physical address spaces, and it is ubiquitous in any hardware which utilizes virtual memory.The...

s (TLBs) for translating virtual address
Virtual address
In computer technology, a virtual address is an address identifying a virtual, i.e. non-physical, entity.-Description:The term virtual address is most commonly used for an address pointing to virtual memory or, in networking, when referring to a virtual network address...

es to physical address
Physical address
In computing, a physical address, also real address, or binary address, is the memory address that is represented in the form of a binary number on the address bus circuitry in order to enable the data bus to access a particular storage cell of main memory.In a computer with virtual memory, the...

es. These TLBs are referred to as instruction translation buffers (ITBs). The ITBs cache recently used page table entries
Page table
A page table is the data structure used by a virtual memory system in a computer operating system to store the mapping between virtual addresses and physical addresses. Virtual addresses are those unique to the accessing process...

 for the instruction stream. An eight-entry ITB is used for 8 KB pages and a four-entry ITB for 4 MB pages. Both ITBs are fully associative and use a not-last used replacement algorithm.

Execution

Execution begins during stage five for all instructions. The register file
Register file
A register file is an array of processor registers in a central processing unit . Modern integrated circuit-based register files are usually implemented by way of fast static RAMs with multiple ports...

s are read during stage four. The pipelines beginning at stage five cannot be stalled.

Integer unit

The integer unit is responsible for executing integer instructions. It consists of the integer register file
Register file
A register file is an array of processor registers in a central processing unit . Modern integrated circuit-based register files are usually implemented by way of fast static RAMs with multiple ports...

 (IRF) and the E-box. The IRF contains thirty-two 64-bit registers and has four read ports and two write ports that are equally divided between the integer unit and the branch unit. The E-box contains an adder
Adder (electronics)
In electronics, an adder or summer is a digital circuit that performs addition of numbers.In many computers and other kinds of processors, adders are used not only in the arithmetic logic unit, but also in other parts of the processor, where they are used to calculate addresses, table indices, and...

, a logic unit, barrel shifter
Barrel shifter
A barrel shifter is a digital circuit that can shift a data word by a specified number of bits in one clock cycle. It can be implemented as a sequence of multiplexers , and in such an implementation the output of one mux is connected to the input of the next mux in a way that depends on the shift...

 and multiplier. Except for multiply, shift and byte manipulation instructions, most integer instructions are completed by the end of stage five and thus have a latency of one cycle. The barrel shifter is pipelined, but shift and byte manipulation instructions are not completed by the end of stage six, and thus have a latency of two cycles. The multiplier was not pipelined in order to save die area, thus multiply instructions have a variable latency of 19 to 23 cycles depending on the operands. In stage seven, integer instructions write their results to the IRF.

Address unit

The address unit, also known as the "A-box", executed load and store instructions. To enable the address unit and integer unit to operate in parallel, the address unit has its own displacement adder
Adder (electronics)
In electronics, an adder or summer is a digital circuit that performs addition of numbers.In many computers and other kinds of processors, adders are used not only in the arithmetic logic unit, but also in other parts of the processor, where they are used to calculate addresses, table indices, and...

, which it uses to calculate virtual address
Virtual address
In computer technology, a virtual address is an address identifying a virtual, i.e. non-physical, entity.-Description:The term virtual address is most commonly used for an address pointing to virtual memory or, in networking, when referring to a virtual network address...

es, instead of using the adder in the integer unit. A 32-entry fully associative translation lookaside buffer
Translation Lookaside Buffer
A translation lookaside buffer is a CPU cache that memory management hardware uses to improve virtual address translation speed. All current desktop and server processors use a TLB to map virtual and physical address spaces, and it is ubiquitous in any hardware which utilizes virtual memory.The...

 (TLB) is used to translate virtual address
Virtual address
In computer technology, a virtual address is an address identifying a virtual, i.e. non-physical, entity.-Description:The term virtual address is most commonly used for an address pointing to virtual memory or, in networking, when referring to a virtual network address...

es into physical address
Physical address
In computing, a physical address, also real address, or binary address, is the memory address that is represented in the form of a binary number on the address bus circuitry in order to enable the data bus to access a particular storage cell of main memory.In a computer with virtual memory, the...

es. This TLB is referred to as the data translation buffer (DTB). The 21064 implemented a 43-bit virtual address and a 34-bit physical address, and is therefore is capable of addressing 8 TB of virtual memory
Virtual memory
In computing, virtual memory is a memory management technique developed for multitasking kernels. This technique virtualizes a computer architecture's various forms of computer data storage , allowing a program to be designed as though there is only one kind of memory, "virtual" memory, which...

 and 16 GB of physical memory.

Store instructions result in data buffered in a 4-entry by 32-byte write buffer. The write buffer improved performance by reducing the number of writes on the system bus by merging data from adjacent stores and by temporarily delaying stores, enabling loads to be serviced quicker as the system bus is not utilized as often.

Floating-point unit

The floating-point unit consists of the floating-point register file (FRF) and the F-box. The FRF contains thirty-two 64-bit registers and has three read ports and two write ports. The F-box contained a floating-point pipeline and a non-pipelined divide unit which retired one bit per cycle.

The floating-point register file is read and the data formatted into fraction, exponent, and sign in stage four. If executing add instructions, the adder calculates the exponent difference, and a predictive leading one or zero detector using input operands for normalizing the result is initiated. If executing multiply instructions, a 3 X multiplicand is generated.

In stages five and six, alignment or a normalization shift and sticky-bit calculations are performed for adds and subtracts. Multiply instructions are multiplied in a pipelined, two-way interleaved array which uses a radix-8 Booth algorithm. In stage eight, final addition is performed in parallel with rounding. Floating-point instructions write their results to the FRF in stage ten.

Instructions executed in the pipeline have a six-cycle latency. Single-precision (32-bit) and double-precision (64-bit) divides, which are executed in the non-pipelined divide unit, have a latency of 31 and 61 cycles, respectively.

Caches

The 21064 has two on-die primary cache
CPU cache
A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations...

s: an 8 KB data cache (known as the D-cache) using a write-through write policy and an 8 KB instruction cache (known as the I-cache). Both caches are direct-mapped for single-cycle access and have 32-byte line size. The caches are built with six-transistor static random access memory
Static random access memory
Static random-access memory is a type of semiconductor memory where the word static indicates that, unlike dynamic RAM , it does not need to be periodically refreshed, as SRAM uses bistable latching circuitry to store each bit...

 (SRAM) cells that have an area of 98 µm2. The caches are 1,024 cells wide by 66 cells tall, with the top two rows used for redundancy.

An optional external secondary cache, known as the B-cache, with capacities of 128 KB to 16 MB was supported. The cache operated at one-third to one-sixteenth of the internal clock frequency, or 12.5 to 66.67 MHz at 200 MHz. The B-cache is direct-mapped and has a 128-byte line size by default that could be configured to use larger quantities. The B-cache is accessed via the system bus.

External interface

The external interface is a 128-bit data bus that operated at half to one-eighth the internal clock rate, or 25 to 100 MHz at 200 MHz. The width of the bus was configurable, systems using the 21064 could have a 64-bit external interface. The external interface also consisted of a 34-bit address bus
Address bus
An address bus is a computer bus that is used to specify a physical address. When a processor or DMA-enabled device needs to read or write to a memory location, it specifies that memory location on the address bus...

.

Fabrication

The 21064 contained 1.68 million transistors. The original EV4 was fabricated by Digital in its CMOS-4 process, which has a 0.75 µm feature size and three levels of aluminium interconnect. The EV4 measures 13.9 mm by 16.8 mm, for an area of 233.52 mm2. The later EV4S was fabricated in CMOS-4S, a 10% optical shrink of CMOS-4 with a 0.675 µm feature size. This version measured 12.4 mm by 15.0 mm, for an area 186 mm2.

The 21064 used a 3.3-volt
Volt
The volt is the SI derived unit for electric potential, electric potential difference, and electromotive force. The volt is named in honor of the Italian physicist Alessandro Volta , who invented the voltaic pile, possibly the first chemical battery.- Definition :A single volt is defined as the...

 (V) power supply. The EV4 dissipated a maximum of 30 W at 200 MHz. The EV4S dissipates a maximum of 21.0 W at 150 MHz, 22.5 W at 166 MHz, and 27.0 W at 200 MHz.

Package

The 21064 is packaged in a 431-pin alumina-ceramic pin grid array
Pin grid array
A pin grid array, often abbreviated PGA, is a type of integrated circuit packaging. In a PGA, the package is square or roughly square, and the pins are arranged in a regular array on the underside of the package...

 (PGA) measuring 61.72 mm by 61.72 mm. Of the 431 pins, 291 were for signals and 140 were for power and ground. The heatsink is directly attached to the package, secured by nuts attached to two studs protruding from the tungsten heat spreader
Heat spreader
A heat spreader is most often simply a copper plate, having high thermal conductivity. Functionally, it is a heat exchanger that moves heat between a heat source and a secondary heat exchanger whose surface area and geometry are more favorable. By definition, the heat is "spread out", such that...

.


Alpha 21064A

The Alpha 21064A, introduced as the DECchip 21064A, code-named EV45, is a further development of the Alpha 21064 introduced in October 1993. It operated at clock frequencies of 200, 225, 233, 275 and 300 MHz. The 225 MHz model was replaced by the 233 MHz model on 6 July 1994, which at introduction, was priced at US$788 in quantities of 5,000, 10% less than the 225 MHz model it replaced. On the same day, prices for the 275 MHz was also reduced by 25% to US$1,083 in quantities of 5,000. The 300 MHz model was announced and sampled on 2 October 1995 and was shipped in December 1995. There was also one model, the 21064A-275-PC, that was restricted to running the Windows NT
Windows NT
Windows NT is a family of operating systems produced by Microsoft, the first version of which was released in July 1993. It was a powerful high-level-language-based, processor-independent, multiprocessing, multiuser operating system with features comparable to Unix. It was intended to complement...

 or operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

s that use the Windows NT memory management model.

The 21064A succeeded the original 21064 as the high-end Alpha microprocessor. It subsequently saw the most use in high-end systems. Users included:
  • Digital in some models of its DEC 3000 AXP, DEC 4000 AXP and DEC 7000/10000 AXP systems
  • Aspen Systems in its Alpine workstation
  • BTG, who used a 275 MHz model in its Action AXP275 RISC PC
  • Carrera Computers in its Cobra AXP 275 workstation
  • NekoTech, who used a 275 MHz model overclocked by 5% to 289 MHz in their Mach 2-289-T workstation
  • Network Appliance (now NetApp), who used a 275 MHz model in its storage systems
    NetApp filer
    In computer storage, NetApp filer, known also as NetApp Fabric-Attached Storage , or NetApp's network attached storage device are NetApp's offering in the area of Storage Systems. A FAS functions in an enterprise-class Storage area network as well as a networked storage appliance...



The 21064A had a number of microarchitectural improvements over the 21064. The primary caches were improved in two ways: the capacity of the I-cache and D-cache was doubled from 8 KB to 16 KB and parity protection was added to the cache tag and cache data arrays. Floating-point divides have a lower latency due to an improved divider that retires two bits per cycle on average. Branch prediction was improved by a larger 4,096-entry by 2-bit BHT.

The 21064A contains 2.8 million transistors and is 14.5 by 10.5 mm large, for an area of 152.25 mm2. It was fabricated by Digital in their fifth-generation CMOS process, CMOS-5, a 0.5 µm process with four levels of aluminium interconnect.

Alpha 21066

The Alpha 21066, introduced as the DECchip 21066, code-named LCA4 (Low Cost Alpha), is a low-cost variant of Alpha 21064. Samples were introduced on 10 September 1993, with volume shipments in early 1994. At the time of introduction, the 166 MHz Alpha 21066 was priced at US$385 in quantities of 5,000. A 100 MHz model, intended for embedded system
Embedded system
An embedded system is a computer system designed for specific control functions within a larger system. often with real-time computing constraints. It is embedded as part of a complete device often including hardware and mechanical parts. By contrast, a general-purpose computer, such as a personal...

s, also existed. Sampling begun in late 1994, with volume shipments in the third quarter of 1995. The Microprocessor Report recognized the Alpha 21066 as the first microprocessor with an integrated PCI controller.

The Alpha 21066 was intended for use in low-cost applications, specifically personal computer
Personal computer
A personal computer is any general-purpose computer whose size, capabilities, and original sales price make it useful for individuals, and which is intended to be operated directly by an end-user with no intervening computer operator...

s running Windows NT
Windows NT
Windows NT is a family of operating systems produced by Microsoft, the first version of which was released in July 1993. It was a powerful high-level-language-based, processor-independent, multiprocessing, multiuser operating system with features comparable to Unix. It was intended to complement...

. Digital used various models of the Alpha 21066 in their Multia
DEC Multia
The Multia, later re-branded the Universal Desktop Box, was a line of desktop computers introduced by Digital Equipment Corporation on 7 November 1994. The line is notable in that units were offered with either an Alpha AXP or Intel Pentium processor as the CPU, and most hardware other than the...

 clients, AXPpci 33 original equipment manufacturer
Original Equipment Manufacturer
An original equipment manufacturer, or OEM, manufactures products or components that are purchased by a company and retailed under that purchasing company's brand name. OEM refers to the company that originally manufactured the product. When referring to automotive parts, OEM designates a...

 (OEM) motherboards and AXPvme single board computers. Outside of Digital, users included Aspen Systems in its Alpine workstation, Carrera Computers in its Pantera I workstation, NekoTech used a 166 MHz model in its Mach 1-166 personal computer, and Parsys in its TransAlpha TA9000 Series supercomputers.

Due to the process shrink, it was able to include features that were desirable in cost-sensitive embedded system
Embedded system
An embedded system is a computer system designed for specific control functions within a larger system. often with real-time computing constraints. It is embedded as part of a complete device often including hardware and mechanical parts. By contrast, a general-purpose computer, such as a personal...

s. These features include an on-die B-cache and memory controller
Memory controller
The memory controller is a digital circuit which manages the flow of data going to and from the main memory. It can be a separate chip or integrated into another chip, such as on the die of a microprocessor...

 with ECC
ECC
-Companies:* ECC , Education through Communication for the Community, a Japanese company* Electric Car Corporation plc, a British adapter and seller of electric cars* English China Clays, an English company-Education:...

 support, a functionally limited graphics accelerator supporting up to 8 MB of VRAM
VRAM
Video RAM, or VRAM, is a dual-ported variant of dynamic RAM , which was once commonly used to store the framebuffer in some graphics adapters....

 for implementing a framebuffer
Framebuffer
A framebuffer is a video output device that drives a video display from a memory buffer containing a complete frame of data.The information in the memory buffer typically consists of color values for every pixel on the screen...

, a PCI controller and a phase locked loop (PLL) clock generator for multiplying a 33 MHz external clock signal to the desired internal clock frequency.

The memory controller supported 64 KB to 2 MB of B-cache and 2 to 512 MB of memory. The ECC implementation was capable of detecting 1-, 2- and 4-bit errors and correcting 1-bit errors. To reduce cost, the Alpha 21066 has a 64-bit system bus, which reduced the number of pins and thus the size of the package. The reduced width of the system bus also reduced bandwidth and thus performance by 20%, which was deemed acceptable.

The 21066 contained 1.75 million transistors and measured 17.0 by 12.3 mm, for an area of 209.1 mm2. It was fabricated in CMOS-4S, a 0.675 µm process with three levels of interconnect. The 21066 was packaged in a 287-pin CPGA measuring 57.404 by 57.404 mm.

Alpha 21066A

The Alpha 21066A, code-named LCA45, is a low-cost variant of the Alpha 21064A. It was announced on 14 November 1994, with samples of 100 and 233 MHz models introduced on the same day. Both models were shipped in March 1995. When announced, the 100 and 233 MHz models were priced at $175 and $360, respectively, in quantities of 5,000. A 266 MHz model was later made available.

The 21066A was second source
Second source
In the electronics industry, a second source is a company that is licensed to manufacture and sell components originally designed by another company ....

d by Mitsubishi Electric
Mitsubishi Electric
is a multinational electronics and information technology company headquartered in Tokyo, Japan. It is one of the core companies of the Mitsubishi Group....

 as the M36066A. It was the first Alpha microprocessor to be fabricated by the company. 100 and 233 MHz parts were announced in November 1994. At the time of the announcement, engineering samples were set for December 1994, commercial samples in July 1995 and volume quantities in September 1995. The 233 MHz part was priced at $490 in quantities of 1,000.

Although it was based on the 21064A, the 21066A did not have the 16 KB instruction and data caches. A feature specific to the 21066A was power management – the microprocessor's internal clock frequency could be adjusted by software.

Digital used various models of 21066A in their products which had previously used the 21066. Outside of Digital, Tadpole Technology
Tadpole Computer
Tadpole Computer is a manufacturer of rugged UNIX workstations and thin client laptops and lightweight servers. Tadpole is based in Cupertino, California.- Products :...

 used a 233 MHz model in their ALPHAbook 1 notebook
Notebook
A notebook is a book or binder composed of pages of notes, often ruled, made out of paper, used for purposes including recording notes or memoranda, writing, drawing, and scrapbooking....

.

The 21066A contained 1.8 million transistors on a die measuring 14.8 by 10.9 mm, for an area of 161.32 mm2. It was fabricated in Digital's fifth-generation CMOS process, CMOS-5, a 0.5 µm process with three levels of interconnect. Mitsubishi Electric fabricated the M36066A in its own 0.5 μm three-level-metal process.

Alpha 21068

The Alpha 21068, introduced as the DECchip 21068, is a version of the 21066 positioned for embedded systems. It was identical to the 21066 but was offered at a lower clock rate to reduce power dissipation and cost. Samples were introduced on 10 September 1993 with volume shipments in early 1994. It operated at 66 MHz and had a 9 W maximum power dissipation. At the time of introduction, the 21068 was priced at US$221 each in quantities of 5,000. On 6 June 1994, Digital announced that it was cutting the price by 16% to US$186, effective on 3 July 1994.

The Alpha 21068 was used by Digital in their AXPpci 33 motherboard and the AXPvme 64 and 64LC single-board computer
Single-board computer
A single-board computer is a complete computer built on a single circuit board, with microprocessor, memory, input/output and other features required of a functional computer. Unlike a typical personal computer, an SBC may not include slots into which accessory cards may be plugged...

s.

Alpha 21068A

The Alpha 21068A, introduced as the DECchip 21068A, is a variant of the Alpha 21066A for embedded systems. It operated at a clock frequency of 100 MHz.

Chipsets

Initially, there was no standard chipset
Chipset
A chipset, PC chipset, or chip set refers to a group of integrated circuits, or chips, that are designed to work together. They are usually marketed as a single product.- Computers :...

 for the 21064 and 21064A. Digital's computers used custom application-specific integrated circuit
Application-specific integrated circuit
An application-specific integrated circuit is an integrated circuit customized for a particular use, rather than intended for general-purpose use. For example, a chip designed solely to run a cell phone is an ASIC...

s (ASICs) to interface the microprocessor to the system. As this raised development cost for third-parties who wished to develop Alpha-based products, Digital developed a standard chipset, the DECchip 21070 (Apecs), for original equipment manufacturer
Original Equipment Manufacturer
An original equipment manufacturer, or OEM, manufactures products or components that are purchased by a company and retailed under that purchasing company's brand name. OEM refers to the company that originally manufactured the product. When referring to automotive parts, OEM designates a...

s (OEMs).

There were two models of the 21070, the DECchip 21071 and the DECchip 21072. The 21071 was intended for workstations whereas the 21072 was intended for high-end workstations or low-end uniprocessor servers. The two models differed in memory subsystem features: the 21071 has a 64-bit memory bus
Memory bus
The memory bus is the computer bus which connects the main memory to the memory controller in computer systems. Originally, general-purpose buses like VMEbus and the S-100 bus were used, but to reduce latency, modern memory buses are designed to connect directly to DRAM chips, and thus are...

 and supports 8 MB to 2 GB of parity
Parity
Parity may refer to:* Parity , a symmetry property of physical quantities or processes under spatial inversion* Parity , indicates whether a number is even or odd...

-protected memory whereas the 21072 has a 128-bit memory bus and supports 16 MB to 4 GB of ECC
ECC
-Companies:* ECC , Education through Communication for the Community, a Japanese company* Electric Car Corporation plc, a British adapter and seller of electric cars* English China Clays, an English company-Education:...

-protected memory.

The chipset consisted of three chip designs, the COMANCHE B-cache and memory controller
Memory controller
The memory controller is a digital circuit which manages the flow of data going to and from the main memory. It can be a separate chip or integrated into another chip, such as on the die of a microprocessor...

, the DECADE data slice and the EPIC PCI controller. The DECADE chips implemented the data paths in 32-bit slices and therefore the 21071 has two such chips while the 21072 has four. The EPIC chip has a 32-bit path to the DECADE chips.

The 21070 was introduced on 10 January 1994, with samples available. Volume shipments began in mid-1994. In quantities of 5,000, the 21071 was priced at $90 and the 21072 at $120.

21070 users included Carrera Computers for its Pantera workstations and Digital in some models of its AlphaStation
AlphaStation
AlphaStation was the name given to a series of computer workstations, produced from 1994 onwards by Digital Equipment Corporation, and latterly by Compaq and HP. As the name suggests, the AlphaStations were based on the DEC Alpha 64-bit microprocessor...

s and uniprocessor AlphaServer
AlphaServer
AlphaServer was the name given to a series of server computers, produced from 1994 onwards by Digital Equipment Corporation, and latterly by Compaq and HP. As the name suggests, the AlphaServers were based on the DEC Alpha 64-bit microprocessor...

s.

Further reading

  • "DEC Enters Microprocessor Business with Alpha". (4 March 1992). Microprocessor Report, Volume 6, Number 3.
  • "DEC's Alpha Architecture Premiers". (4 March 1992). Microprocessor Report, Volume 6, Number 3.
  • "Digital Plans Broad Alpha Processor Family" (18 November 1992). Microprocessor Report, Volume 6, Number 3.
  • "Digital Reveals PCI Chip Sets For Alpha". (12 July 1993). Microprocessor Report, Volume 7, Number 9.
  • "Alpha Hits Low End with Digital's 21066". (13 September 1993). Microprocessor Report, Volume 7, Number 12.
  • Bhandarkar, Dileep P. (1995). Alpha Architecture and Implementations. Digital Press.
  • Fox, Thomas F. (1994). "The design of high-performance microprocessors at Digital". Proceedings of the 31st Annual ACM-IEEE Design Automation Conference. pp. 586–591.
  • Gronowski, Paul E. et al. (May 1998). "High-performance microprocessor design". IEEE Journal of Solid-State Circuits 33 (5): pp. 676–686.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK