All Topics  
Register file

 

   Email Print
   Bookmark   Link






 

Register file



 
 
A register file is an array of processor register
Processor register

In computer architecture, a processor register is a small amount of Computer storage available on the CPU whose contents can be accessed more quickly than storage available elsewhere....
s in a central processing unit (CPU)
Central processing unit

A central processing unit is an electronic circuit that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage....
. Modern integrated circuit
Integrated circuit

In electronics, an integrated circuit is a miniaturized electronic circuit that has been manufactured in the surface of a thin Wafer of semiconductor material....
-based register files are usually implemented by way of fast static RAMs with multiple ports. Such RAMs are distinguished by having dedicated read and write ports, whereas ordinary multiported SRAMs will usually read and write through the same ports.

The instruction set architecture of a CPU will almost always define a set of registers which are used to stage data between memory and the functional units on the chip.






Discussion
Ask a question about 'Register file'
Start a new discussion about 'Register file'
Answer questions from other users
Full Discussion Forum



Encyclopedia


A register file is an array of processor register
Processor register

In computer architecture, a processor register is a small amount of Computer storage available on the CPU whose contents can be accessed more quickly than storage available elsewhere....
s in a central processing unit (CPU)
Central processing unit

A central processing unit is an electronic circuit that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage....
. Modern integrated circuit
Integrated circuit

In electronics, an integrated circuit is a miniaturized electronic circuit that has been manufactured in the surface of a thin Wafer of semiconductor material....
-based register files are usually implemented by way of fast static RAMs with multiple ports. Such RAMs are distinguished by having dedicated read and write ports, whereas ordinary multiported SRAMs will usually read and write through the same ports.

The instruction set architecture of a CPU will almost always define a set of registers which are used to stage data between memory and the functional units on the chip. In simpler CPUs, these architectural registers correspond one-for-one to the entries in a physical register file within the CPU. More complicated CPUs use register renaming
Register renaming

In computer engineering, register renaming refers to a technique usedto avoid unnecessary serialization of program operations imposed by the reuse...
, so that the mapping of which physical entry stores a particular architectural register changes dynamically during execution. The register file is part of the architecture and visible to the programmer, as opposed to the concept of caches
CPU cache

A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access computer storage. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations....
 (transparent).

Implementation


Regfile Array
The usual layout convention is that a simple array is read out vertically. That is, a single word line, which runs horizontally, causes a row of bit cell
Bit cell

Bit cell is the length of magnetic tape, the area of disc storage surface, or the part of integrated circuit in which a single bit is recorded. The smaller the bit cells are, the better the computer storage density of the medium is....
s to put their data on bit lines, which run vertically. Sense amps, which convert low-swing read bitlines into full-swing logic levels, are usually at the bottom (by convention). Larger register files are then sometimes constructed by tiling mirrored and rotated simple arrays.

Register files have one word line per entry per port, one bit line per bit of width per read port, and two bit lines per bit of width per write port. Each bit cell also has a Vdd and Vss. Therefore, the wire pitch area increases as the square of the number of ports, and the transistor area increases linearly. At some point, it may be smaller and/or faster to have multiple redundant register files, with smaller numbers of read ports, than a single register file with all the read ports. The MIPS
MIPS architecture

MIPS is a RISC instruction set architecture developed by MIPS Technologies . In the mid to late 1990s, it was estimated that one in three RISC microprocessors produced were MIPS implementations....
 R8000
R8000

The R8000 is a microprocessor chip set implementing the MIPS architecture instruction set architecture jointly developed by MIPS Technologies , then a subsidiary of Silicon Graphics, Inc....
's integer
Integer (computer science)

In computer science, the term integer is used to refer to a data type which represents some finite subset of the mathematical integers. These are also known as integral data types....
 unit, for example, had a 9 read 4 write port 32 entry 64-bit register file implemented in a 0.7 µm process, which could be seen when looking at the chip from arm's length.

Decoder


  • The decoder is often broken into predecoder and decoder proper.
  • The decoder is a series of AND gates that drive word lines.
  • There is one decoder per read or write port. If the array has four read and two write ports, for example, it has 6 word lines per bit cell in the array, and six AND gates per row in the decoder. Note that the decoder has to be pitch matched to the array, which forces those AND gates to be wide and short


Array


Regfile Cell
The basic scheme for a bit cell:
  • State is stored in pair of inverters
  • Data is read out by nmos transistor to a bit line.
  • Data is written by shorting one side or the other to ground through a two-nmos stack.
  • So: read ports take one transistor per bit cell, write ports take four!
Many optimizations are possible:
  • Sharing lines between cells, for example, Vdd and Vss.
  • Read bit lines are often precharged to something between Vdd and Vss.
  • Read bit lines often swing only a fraction of the way to Vdd or Vss. A sense amplifier converts this small-swing signal into a full logic level. Small swing signals are faster because the bit line has little drive but a great deal of parasitic capacitance.
  • Write bit lines may be braided, so that they couple equally to the nearby read bitlines. Because write bitlines are full swing, they can cause significant disturbances on read bitlines.
  • If Vdd is a horizontal line, it can be switched off, by yet another decoder, if any of the write ports are writing that line during that cycle. This optimization increases the speed of the write.


Microarchitecture


Most register files make no special provision to prevent multiple write ports from writing the same entry simultaneously. Instead, the instruction scheduling hardware ensures that only one instruction in any particular cycle writes a particular entry. If multiple instructions targeting the same register are issued, all but one have their write enables turned off.

The crossed inverters take some finite time to settle after a write operation, during which a read operation will either take longer or return garbage. It is common to have bypass multiplexors that bypass written data to the read ports when a simultaneous read and write to the same entry is commanded. These bypass multiplexors are often just part of a larger bypass network that forwards results that have not yet been committed between functional units.

The register file is usually pitch matched to the datapath that it serves. Pitch matching avoids having the many busses passing over the datapath turn corners, which would use a lot of area. But since every unit must have the same bit pitch, every unit in the datapath ends up with the bit pitch forced by the widest unit, which can waste area in the other units. Register files, because they have two wires per bit per write port, and because all the bit lines must contact the silicon at every bit cell, can often set the pitch of a datapath.

Area can sometimes be saved, on machines with multiple units in a datapath, by having two datapaths side-by-side, each of which has smaller bit pitch than a single datapath would have. This case usually forces multiple copies of a register file, one for each datapath.

The Alpha 21264
Alpha 21264

The Alpha 21264 is a microprocessor developed and fabricated by Digital Equipment Corporation that implemented the DEC Alpha instruction set architecture ....
 (EV6), for instance, had two copies of the integer register file, and took an extra cycle to propagate data between the two. The issue logic tried to reduce the number of operations forwarding data between the two. The MIPS
MIPS architecture

MIPS is a RISC instruction set architecture developed by MIPS Technologies . In the mid to late 1990s, it was estimated that one in three RISC microprocessors produced were MIPS implementations....
 R8000 floating-point unit had two copies of the floating-point register file, each with four write and four read ports, and wrote both copies at the same time.

Processors that do register renaming
Register renaming

In computer engineering, register renaming refers to a technique usedto avoid unnecessary serialization of program operations imposed by the reuse...
 can arrange for each functional unit to write to a subset of the physical register file. This arrangement can eliminate the need for multiple write ports per bit cell, for a large savings in area. The resulting register file, effectively a stack of register files with single write ports, then benefits from replication and subsetting the read ports. At the limit, this technique would place a stack of 1-write, 2-read regfiles at the inputs to each functional unit. Since regfiles with a small number of ports are often dominated by transistor area, it is best not to push this technique to this limit, but it is useful all the same.

The SPARC
SPARC

SPARC is a Reduced Instruction Set Computer microprocessor instruction set Computer architecture originally designed in 1985 by Sun Microsystems....
 ISA defines register window
Register window

In computer engineering, the use of register windows is a technique to improve the performance of a particularly common operation, the procedure call....
s, in which the 5-bit architectural names of the registers actually point into a window on a much larger register file, with hundreds of entries. Implementing multiported register files with hundreds of entries requires a lot of area. The register window slides by 16 registers when moved, so that each architectural register name can refer to only a small number of registers in the larger array, e.g. architectural register r20 can only refer to physical registers #20, #36, #52, #68, #84, #100, #116, if there are just seven windows in the physical file.

To save area, some SPARC implementations implement a 32-entry register file, in which each cell has seven "bits". Only one is read and writeable through the external ports, but the contents of the bits can be rotated. A rotation accomplishes in a single cycle a movement of the register window. Because most of the wires accomplishing the state movement are local, tremendous bandwidth is possible with little power.

This same technique is used in the R10000
R10000

The R10000, code-named "T5", is a microprocessor implementation of the MIPS architecture instruction set architecture developed by MIPS Technologies , then a division of Silicon Graphics ....
 register renaming mapping file, which stores a 6-bit virtual register number for each of the physical registers. In the renaming file, the renaming state is checkpointed whenever a branch is taken, so that when a branch is detected to be mispredicted, the old renaming state can be recovered in a single cycle. (See Register renaming
Register renaming

In computer engineering, register renaming refers to a technique usedto avoid unnecessary serialization of program operations imposed by the reuse...
.)

External links


  • - Farkas, Jouppi, Chow - 1995 (obsolete?)
  • - Farkas, Jouppi, Chow - 1995


See also

  • Sum addressed decoder
    Sum addressed decoder

    In CPU design, a Sum Addressed Decoder or Sum Addressed Memory Decoder is a method of reducing the RAM latency of the CPU cache access. This is achieved by fusing the address generation sum operation with the decode operation in the cache Static random access memory....