Berkeley RISC
Encyclopedia
Berkeley RISC was one of two seminal research projects into RISC-based microprocessor
Microprocessor
A microprocessor incorporates the functions of a computer's central processing unit on a single integrated circuit, or at most a few integrated circuits. It is a multipurpose, programmable device that accepts digital data as input, processes it according to instructions stored in its memory, and...

 design taking place under ARPA's VLSI project
VLSI Project
DARPA's VLSI Project provided research funding to a wide variety of university-based teams in an effort to improve the state of the art in microprocessor design, then known as VLSI. Although little known, notably in comparison to their work on what became the internet, the VLSI Project is likely...

. RISC was led by David Patterson at the University of California, Berkeley
University of California, Berkeley
The University of California, Berkeley , is a teaching and research university established in 1868 and located in Berkeley, California, USA...

 between 1980 and 1984, while the other was taking place only a short drive away at Stanford University
Stanford University
The Leland Stanford Junior University, commonly referred to as Stanford University or Stanford, is a private research university on an campus located near Palo Alto, California. It is situated in the northwestern Santa Clara Valley on the San Francisco Peninsula, approximately northwest of San...

 under their MIPS
MIPS architecture
MIPS is a reduced instruction set computer instruction set architecture developed by MIPS Technologies . The early MIPS architectures were 32-bit, and later versions were 64-bit...

 effort starting in 1981 and running until 1984. Berkeley's project was so successful that it became the name for all similar designs to follow, even the MIPS would become known as a "RISC processor". The RISC design was later commercialized as the SPARC
SPARC
SPARC is a RISC instruction set architecture developed by Sun Microsystems and introduced in mid-1987....

 processor.

The RISC concept

Both RISC and MIPS were developed from the realization that the vast majority of programs did not use the vast majority of a processor's instructions. In one calculation it was found that the entire Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

 system, when compiled
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

, used only 30% of the available instructions on the Motorola 68000
Motorola 68000
The Motorola 68000 is a 16/32-bit CISC microprocessor core designed and marketed by Freescale Semiconductor...

. Much of the circuitry in the m68k, and similar designs, was dedicated to decoding these instructions which were never being used. The RISC idea was to include only those instructions that were really used, using those transistors to speed the system up instead.

To do this, RISC concentrated on adding many more registers
Processor register
In computer architecture, a processor register is a small amount of storage available as part of a CPU or other digital processor. Such registers are addressed by mechanisms other than main memory and can be accessed more quickly...

, small bits of memory holding temporary values that can be accessed at zero cost. This contrasts with normal main memory, which might take several cycles to access. By providing more registers, and making sure the compilers actually used them, programs should run much faster. Additionally the speed of the processor would be more closely defined by its clock speed, because less of its time would be spent waiting for memory accesses. Transistor for transistor, a RISC design would outperform a conventional CPU, hopefully by a lot.

On the downside, the instructions being removed were generally performing several "sub-instructions". For instance, the ADD instruction of a traditional design would generally come in several flavours, one that added the numbers in two registers and placed it in a third, another that added numbers found in main memory and put the result in a register, etc. The RISC designs, on the other hand, included only a single flavour of any particular instruction, the ADD, for instance, would always use registers for all operands. This forced the programmer to write additional instructions to load the values from memory, if needed, making a RISC program "less dense".

In the era of expensive memory this was a real concern, notably because memory was also much slower than the CPU. Since a RISC design's ADD would actually require four instructions (two loads, an add, and a save), the machine would have to do much more memory access to read the extra instructions, potentially slowing it down considerably. This was offset to some degree by the fact that the new designs used what was then a very large instruction word of 32-bit
32-bit
The range of integer values that can be stored in 32 bits is 0 through 4,294,967,295. Hence, a processor with 32-bit memory addresses can directly access 4 GB of byte-addressable memory....

s, allowing small constants to be folded directly into the instruction instead of having to be loaded separately. Additionally, the results of one operation are often used soon after by another, so by skipping the write to memory and storing the result in a register, the program did not end up much larger, and could in theory run much faster. For instance, a string of instructions carrying out a series of mathematical operations might require only a few loads from memory, while the majority of the numbers being used are either loaded in the instructions themselves, or intermediate values in the registers.

But to the casual observer, it was not clear whether or not the RISC concept would improve performance, or even make it worse. The only way to be sure was to actually simulate it, and in test after test, every simulation showed an enormous overall benefit in performance from this design.

Where the two projects, RISC and MIPS, differed was in the handling of the registers. MIPS simply added lots of them and left it to the compilers to make use of them. RISC, on the other hand, added circuitry to the CPU to "help" the compiler. RISC used the concept of register window
Register window
In computer engineering, the use of register windows is a technique to improve the performance of a particularly common operation, the procedure call...

s, in which the entire "register file" was broken down into blocks, allowing the compiler to "see" one block for global variables, and another for local variables.

The idea was to make one particularly common instruction, the procedure call, extremely easy to implement in the compilers. Almost all computer languages use a system known as a activation record or stack frame that contains the address of who called it, the data that was passed in, and any results that need to be returned. In the vast majority of cases these frames are small, typically with three or less inputs and one or no outputs. In the Berkeley design, then, the entire procedure stack would most likely fit entirely within the register window.

In this case the call into and return from a procedure is simple and extremely fast. A single instruction is called to set up a new block of registers, operands passed in on the "low end" of the new frame, and then the code jumps into the procedure. On return, the results are placed in the frame at the same end, and the code exits. The register windows are set up to overlap at the ends, meaning that the results from the call simply "appear" in the window of the code that called it, with no data having to be copied. Thus the common procedure call did not have to interact with main memory, greatly speeding it up.

On the downside, this approach meant that procedures with large numbers of local variables were problematic, and ones with less led to registers -an expensive resource- being wasted. It was Stanford's work on compilers that led them to ignore the register window concept, believing that a smart compiler could make better use of the registers than a fixed system in hardware.

RISC I

The first attempt to implement the RISC concept was originally known as Gold. Work on the design started in 1980 as part of a VLSI design course, but the then-complicated design crashed almost all existing design tools. The team had to spend considerable amounts of time improving or re-writing the tools, and even with these new tools it took just under an hour to extract the design on a VAX 11/780
VAX
VAX was an instruction set architecture developed by Digital Equipment Corporation in the mid-1970s. A 32-bit complex instruction set computer ISA, it was designed to extend or replace DEC's various Programmed Data Processor ISAs...

.

The final design, as RISC I, was published ACM
Association for Computing Machinery
The Association for Computing Machinery is a learned society for computing. It was founded in 1947 as the world's first scientific and educational computing society. Its membership is more than 92,000 as of 2009...

 in 1981. It had 44,500 transistors implementing 31 instructions and a register file containing 78 32-bit registers. This allowed for six register windows containing 14 registers each, with an additional 18 globals. The control and instruction decode section occupied only 6% of the die, whereas the typical design of the era used about 50% for the same role. The register file took up most of that space.

RISC I also featured a two-stage instruction pipeline
Instruction pipeline
An instruction pipeline is a technique used in the design of computers and other digital electronic devices to increase their instruction throughput ....

 for additional speed, but without the complex instruction re-ordering of more modern designs. This makes conditional branches a problem, because the compiler has to fill the instruction following a conditional branch (the so-called "branch delay slot
Branch delay slot
In computer architecture, a delay slot is an instruction slot that gets executed without the effects of a preceding instruction. The most common form is a single arbitrary instruction located immediately after a branch instruction on a RISC or DSP architecture; this instruction will execute even if...

"), with something selected to be "safe" (i.e., not dependent on the outcome of the conditional). Sometimes the only suitable instruction in this case is NOP
NOP
In computer science, NOP or NOOP is an assembly language instruction, sequence of programming language statements, or computer protocol command that effectively does nothing at all....

. A notable number of later RISC-style designs still require the consideration of branch delay.

After a month of validation and debugging, the design was sent to the innovative MOSIS
MOSIS
MOSIS is probably the oldest integrated circuit foundry service and one of the first Internet services other than supercomputing services and basic infrastructure such as E-mail or FTP....

 fab
Foundry (electronics)
In the microelectronics industry a semiconductor fabrication plant is a factory where devices such as integrated circuits are manufactured....

 for production on June 22, 1981, using a 2 μm (2,000 nm) process. A variety of delays forced them to abandon their masks four separate times, and wafers with working examples did not arrive back at Berkeley until May 1982. The first working RISC I "computer" (actually a checkout board) ran on June 11th. In testing, the chips proved to have lesser performance than expected. In general, an instruction would take 2 μs to complete, while the original design allotted for about 400 ns (five times as fast). The precise reasons for this problem were never fully explained. However, throughout testing it was clear that certain instructions did run at the expected speed, suggesting the problem was physical, not logical.

Had the design worked at full speed, performance would have been excellent. Simulations using a variety of small programs compared the 4 MHz RISC I to the 5 MHz 32-bit
32-bit
The range of integer values that can be stored in 32 bits is 0 through 4,294,967,295. Hence, a processor with 32-bit memory addresses can directly access 4 GB of byte-addressable memory....

 VAX 11/780
VAX
VAX was an instruction set architecture developed by Digital Equipment Corporation in the mid-1970s. A 32-bit complex instruction set computer ISA, it was designed to extend or replace DEC's various Programmed Data Processor ISAs...

 and the 5 MHz 16-bit
16-bit
-16-bit architecture:The HP BPC, introduced in 1975, was the world's first 16-bit microprocessor. Prominent 16-bit processors include the PDP-11, Intel 8086, Intel 80286 and the WDC 65C816. The Intel 8088 was program-compatible with the Intel 8086, and was 16-bit in that its registers were 16...

 Zilog Z8000
Zilog Z8000
The Z8000 is a 16-bit microprocessor introduced by Zilog in 1979. The architecture was designed by Bernard Peuto while the logic and physical implementation was done by Masatoshi Shima, assisted by a small group of people. The Z8000 was not Z80-compatible, and although it saw steady use well into...

 showed this clearly. Program size was about 30% larger than the VAX but very close to that of the Z8000, validating the argument that the higher code density of CISC designs was not actually all that impressive in reality. In terms of overall performance, the RISC I was twice as fast as the VAX, and about four times that of the Z8000. More interestingly, the programs ended up performing about the same overall amount of memory access because the large register file dramatically improved the odds the needed operand was already on-chip.

It is important to put this performance in context. Even though the RISC design had run slower than the VAX, it made no difference to the importance of the design. RISC allowed for the production of a true 32-bit processor on a real chip die using what was already an older fab. Traditional designs simply could not do this; with so much of the chip surface dedicated to decoder logic, a true 32-bit design like the Motorola 68020
Motorola 68020
The Motorola 68020 is a 32-bit microprocessor from Motorola, released in 1984. It is the successor to the Motorola 68010 and is succeeded by the Motorola 68030...

 required newer fabs before becoming practical. Using the same fabs, RISC I could have largely outperformed the competition.

RISC II

While the RISC I design ran into delays, work at Berkeley had already turned to the new Blue design. Work on Blue progressed slower than Gold, due both to the lack of a pressing need now that Gold was going to fab, as well as changeovers in the classes and students staffing the effort. This pace also allowed them to add in several new features that would end up improving the design considerably.

The key difference was a simpler cache circuitry that eliminated one line per bit (from three to two), dramatically shrinking the register file size. The change also required much tighter bus timing, but this was a small price to pay and in order to meet the needs several other parts of the design were sped up as well.

The savings due to the new design were tremendous. Whereas Gold contained a total of 78 registers in 6 windows, Blue contained 138 registers broken into 8 windows of 16 registers each, with another 10 globals. This expansion of the register file increases the chance that a given procedure can fit all of its local storage in registers, as well as increasing the nesting depth. Nevertheless the larger register file used up less transistors, and the final Blue design, fabed as RISC II, implemented all of the RISC instruction set with only 39,000 transistors.

The other major change was to include an "instruction-format expander", which invisibly "up-converted" 16-bit instructions into a 32-bit format. This allowed smaller instructions, typically things with one or no operands, like NOP, to be stored in memory in a smaller 16-bit format and stuff two such instructions into a single machine word. The instructions would be invisibly expanded back to 32-bit versions before they reached the ALU
Arithmetic logic unit
In computing, an arithmetic logic unit is a digital circuit that performs arithmetic and logical operations.The ALU is a fundamental building block of the central processing unit of a computer, and even the simplest microprocessors contain one for purposes such as maintaining timers...

, meaning that no changes were needed in the core logic. This simple technique yielded a surprising 30% improvement in code density, making an otherwise identical program on Blue run faster than on Gold due to the decreased number of memory accesses.

RISC II proved to be much more successful in silicon, and entered testing, outperforming almost all minicomputers on almost all tasks. For instance, performance ranged from 85% of VAX speed to 256% on a variety of loads, that is, the RISC II often outperformed the VAX by two times. RISC II was also benched against the famous Motorola 68000
Motorola 68000
The Motorola 68000 is a 16/32-bit CISC microprocessor core designed and marketed by Freescale Semiconductor...

, then considered to be the best commercial chip implementation, and outperformed it by 140% to 420%

Follow-ons

Work on the original RISC designs ended with RISC II, but the concept itself lived on at Berkeley. The basic core was re-used in SOAR in 1984, basically a RISC converted to run Smalltalk
Smalltalk
Smalltalk is an object-oriented, dynamically typed, reflective programming language. Smalltalk was created as the language to underpin the "new world" of computing exemplified by "human–computer symbiosis." It was designed and created in part for educational use, more so for constructionist...

 (in the same way that it could be claimed RISC ran C), and later in the similar VLSI-BAM that ran PROLOG
Prolog
Prolog is a general purpose logic programming language associated with artificial intelligence and computational linguistics.Prolog has its roots in first-order logic, a formal logic, and unlike many other programming languages, Prolog is declarative: the program logic is expressed in terms of...

 instead of Smalltalk. Another effort was SPUR, which was a full set of chips needed to build a complete 32-bit workstation
Workstation
A workstation is a high-end microcomputer designed for technical or scientific applications. Intended primarily to be used by one person at a time, they are commonly connected to a local area network and run multi-user operating systems...

.

RISC is less famous, but more influential, for being the basis of the commercial SPARC
SPARC
SPARC is a RISC instruction set architecture developed by Sun Microsystems and introduced in mid-1987....

 processor design from Sun Microsystems
Sun Microsystems
Sun Microsystems, Inc. was a company that sold :computers, computer components, :computer software, and :information technology services. Sun was founded on February 24, 1982...

. It was the SPARC that first clearly demonstrated the power of the RISC concept; when they shipped in the first SPARCstation
SPARCstation
The SPARCstation, SPARCserver and SPARCcenter product lines were a series of SPARC-based computer workstations and servers in desktop, deskside and rack-based form factor developed and sold by Sun Microsystems...

s they outperformed anything on the market. This led to virtually every Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

 vendor hurrying for a RISC design of their own, leading to designs like the DEC Alpha
DEC Alpha
Alpha, originally known as Alpha AXP, is a 64-bit reduced instruction set computer instruction set architecture developed by Digital Equipment Corporation , designed to replace the 32-bit VAX complex instruction set computer ISA and its implementations. Alpha was implemented in microprocessors...

 and PA-RISC
PA-RISC family
PA-RISC is an instruction set architecture developed by Hewlett-Packard. As the name implies, it is a reduced instruction set computer architecture, where the PA stands for Precision Architecture...

, while SGI
Silicon Graphics
Silicon Graphics, Inc. was a manufacturer of high-performance computing solutions, including computer hardware and software, founded in 1981 by Jim Clark...

 purchased MIPS Computer Systems. By 1986 most large chip vendors followed, working on efforts like the Motorola 88000
Motorola 88000
The 88000 is a RISC instruction set architecture developed by Motorola. The 88000 was Motorola's attempt at a home-grown RISC architecture, started in the 1980s. The 88000 arrived on the market some two years after the competing SPARC and MIPS...

, Fairchild Clipper, AMD 29000 and the PowerPC
PowerPC
PowerPC is a RISC architecture created by the 1991 Apple–IBM–Motorola alliance, known as AIM...

.

Techniques developed for and alongside the idea of the reduced instruction set have also been adopted in successively more powerful implementations and extensions of the traditional "complex" x86 architecture
X86 architecture
The term x86 refers to a family of instruction set architectures based on the Intel 8086 CPU. The 8086 was launched in 1978 as a fully 16-bit extension of Intel's 8-bit based 8080 microprocessor and also introduced segmentation to overcome the 16-bit addressing barrier of such designs...

. Much of a modern microprocessor's transistor count is devoted to large caches, many pipeline
Instruction pipeline
An instruction pipeline is a technique used in the design of computers and other digital electronic devices to increase their instruction throughput ....

 stages, superscalar
Superscalar
A superscalar CPU architecture implements a form of parallelism called instruction level parallelism within a single processor. It therefore allows faster CPU throughput than would otherwise be possible at a given clock rate...

 instruction dispatch, branch prediction
Branch predictor
In computer architecture, a branch predictor is a digital circuit that tries to guess which way a branch will go before this is known for sure. The purpose of the branch predictor is to improve the flow in the instruction pipeline...

and other modern techniques which are applicable regardless of instruction architecture. The amount of silicon dedicated to instruction decoding on a modern x86 implementation is proportionately quite small, so the distinction between "complex" and RISC processor implementations has become blurred.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK