All Topics  
Microarchitecture

 

   Email Print
   Bookmark   Link






 

Microarchitecture



 
 
In computer engineering
Computer engineering

Computer Engineering is a discipline that combines elements of both Electrical Engineering and Computer Science. Computer engineers are electrical engineers that have additional training in the areas of software design and hardware-software integration....
, microarchitecture (sometime abbreviated to µarch or uarch) is a description of the electrical circuitry of a computer
Computer

A computer is a machine that manipulates Data according to a list of Code .The first devices that resemble modern computers date to the mid-20th century , although the computer concept and various machines similar to computers existed earlier....
, central processing unit
Central processing unit

A central processing unit is an electronic circuit that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage....
, or digital signal processor
Digital signal processor

A digital signal processor is a specialized microprocessor designed specifically for digital signal processing, generally in real-time computing....
 that is sufficient for completely describing the operation of the hardware.

In academic circles, the term computer organization is used, while in the computer industry, the term microarchitecture is more often used. Microarchitecture and instruction set architecture (ISA) together constitute the field of computer architecture
Computer architecture

Computer architecture in computer engineering is the conceptual design and fundamental operational structure of a computer system. It is a blueprint and functional description of requirements and design implementations for the various parts of a computer, focusing largely on the way by which the central processing unit performs internally an...
.

Etymology of the term
Since the 1950s, many computers used microprogramming to implement their control logic
Control logic

Control logic is the part of a software architecture that controls what the program will do. This part of the program is also called the controller....
 which decoded the program instructions and executed them.






Discussion
Ask a question about 'Microarchitecture'
Start a new discussion about 'Microarchitecture'
Answer questions from other users
Full Discussion Forum



Encyclopedia


In computer engineering
Computer engineering

Computer Engineering is a discipline that combines elements of both Electrical Engineering and Computer Science. Computer engineers are electrical engineers that have additional training in the areas of software design and hardware-software integration....
, microarchitecture (sometime abbreviated to µarch or uarch) is a description of the electrical circuitry of a computer
Computer

A computer is a machine that manipulates Data according to a list of Code .The first devices that resemble modern computers date to the mid-20th century , although the computer concept and various machines similar to computers existed earlier....
, central processing unit
Central processing unit

A central processing unit is an electronic circuit that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage....
, or digital signal processor
Digital signal processor

A digital signal processor is a specialized microprocessor designed specifically for digital signal processing, generally in real-time computing....
 that is sufficient for completely describing the operation of the hardware.

In academic circles, the term computer organization is used, while in the computer industry, the term microarchitecture is more often used. Microarchitecture and instruction set architecture (ISA) together constitute the field of computer architecture
Computer architecture

Computer architecture in computer engineering is the conceptual design and fundamental operational structure of a computer system. It is a blueprint and functional description of requirements and design implementations for the various parts of a computer, focusing largely on the way by which the central processing unit performs internally an...
.

Etymology of the term


Since the 1950s, many computers used microprogramming to implement their control logic
Control logic

Control logic is the part of a software architecture that controls what the program will do. This part of the program is also called the controller....
 which decoded the program instructions and executed them. The bits within the microprogram words controlled the processor at the level of electrical signals.

The term microarchitecture was used to describe the units that were controlled by the microprogram words, as opposed to architecture that was visible and documented for programmers. While architecture usually had to be compatible between hardware generations, the underlying microarchitecture could be easily changed.

Relation to instruction set architecture

The microarchitecture is related to, but not the same as, the instruction set architecture
Instruction set

An instruction set is a list of all the instruction , and all their variations, that a processor can execute.Instructions include:* Arithmetic such as add and subtract...
. The instruction set architecture is roughly the same as the programming model of a processor as seen by an assembly language
Assembly language

An assembly language is a low-level language for programming computers. It implements a symbolic representation of the numeric machine codes and other constants needed to program a particular CPU architecture....
 programmer or compiler writer, which includes the execution model, processor register
Processor register

In computer architecture, a processor register is a small amount of Computer storage available on the CPU whose contents can be accessed more quickly than storage available elsewhere....
s, address and data formats etc. The microarchitecture (or computer organization) is mainly a lower level structure and therefore governs a large number of details that are hidden in the programming model. It describes the constituent parts of the processor and how these interconnect and interoperate to implement the architectural specification.

The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectual elements of the machine, which may be everything from single gates and latches, via registers, LUTs, multiplexers and counters, to complete ALUs and even larger elements. The actual electronic circuitry that implements these elements is called the implementation of that microarchitecture (also including layout, packaging, and other physical details). This implementation-level can, in turn, be further subdivided into the important transistor-level foundation on which the actual logic design is built at - mainly - the gate-level and up. This may govern what kinds of transistors are used as well as which basic gate and latch-building structures and logic implementation types are chosen.

Transistors (if used, this includes older designs) may be bipolar, pMOS, enhancement/depletion mode nMOS, or pMOS+nMOS (CMOS), all in several manufacturing variants and connected in various ways: Basic logic implementation structures may use fully static and/or complementary gates only, or it may employ pass-transistors or various forms of dynamic or clocked logic, such as domino logic, or other dynamic circuitry types using two or four phases; pseudo nMOS may be included in ordinary CMOS topologies to save space, differential and/or low level signaling may be used, etc.

A few important points:
  • A single microarchitecture, especially if it includes microcode
    Microcode

    Microcode is a layer of lowest-level instructions involved in the implementation of machine code instructions in many computers and other processors; it resides in a special high-speed memory and translates machine instructions into sequences of detailed circuit-level operations....
    , can be used to implement many different instruction sets, by means of changing the control store
    Control store

    A control store is the part of a Central processing unit control unit that stores the CPU's microprogram. It is usually accessed by a microsequencer....
    . This can be quite complicated and cumbersome however, even when simplified by microcode and/or table structures in ROMs or PLAs.


  • Two machines may have the same microarchitecture, and hence the same block diagram, but radically different hardware implementations. This governs both the level of the electronic circuitry and the (even more) physical level of manufacturing (of both ICs and/or discrete components).


  • Machines with different microarchitectures may have the same instruction set architecture, and thus be capable of executing the same programs. New microarchitectures and/or circuitry solutions, along with advances in semiconductor manufacturing, are what allows newer generations of processors to achieve higher performance.


Simplified descriptions

A very simplified high level description — common in marketing — may show only fairly basic characteristics, such as bus-widths, along with various types of execution units and other large systems, such as branch prediction and cache memories, pictured as simple blocks — perhaps with some important attributes or characteristics noted. Some details regarding pipeline structure (like fetch, decode, assign, execute, write-back) may sometimes also be included.

Aspects of microarchitecture


The pipelined
Instruction pipeline

File:5 Stage Pipeline.svgAn instruction pipeline is a technique used in the design of computers and other digital electronic devices to increase their instruction throughput ....
 datapath
Datapath

A datapath is a collection of digital electronics, such as arithmetic logic units or multiplication ALUs, that perform data processing operations....
 is the most commonly used datapath design in microarchitecture today. This technique is used in most modern microprocessors, microcontrollers, and DSPs. The pipelined architecture allows multiple instructions to overlap in execution, much like an assembly line. The pipeline includes several different stages which are fundamental in microarchitecture designs. Some of these stages include instruction fetch, instruction decode, execute, and write back. Some architectures include other stages such as memory access. The design of pipelines is one of the central microarchitectural tasks.

Execution units are also essential to microarchitecture. Execution units include arithmetic logic unit
Arithmetic logic unit

In computing, an arithmetic logic unit is a digital circuit that performs arithmetic and logicaloperations. The ALU is a fundamental building block of the central processing unit of a computer, and even the simplest microprocessors contain one for purposes such as maintaining timers....
s (ALU), floating point unit
Floating point unit

A floating-point unit is a part of a computer system specially designed to carry out operations on floating point numbers. Typical operations are addition, subtraction, multiplication, division , and square root....
s (FPU), load/store units, branch prediction, and SIMD
SIMD

In computing, SIMD is a technique employed to achieve data level parallelism....
. These units perform the operations or calculations of the processor. The choice of the number of execution units, their latency and throughput is a central microarchitectural design task. The size, latency, throughput and connectivity of memories within the system are also microarchitectural decisions.

System-level design decisions such as whether or not to include peripherals, such as memory controller
Memory controller

The memory controller is a digital circuit which manages the flow of data going to and from the main memory. It can be a separate chip or integrated into another chip, such as on the Die of a microprocessor....
s, can be considered part of the microarchitectural design process. This includes decisions on the performance-level and connectivity of these peripherals.

Unlike architectural design, where achieving a specific performance level is the main goal, microarchitectural design pays closer attention to other constraints. Since microarchitecture design decisions directly affect what goes into a system, attention must be paid to such issues as:

  • chip area/cost
  • power consumption
  • logic complexity
  • ease of connectivity
  • manufacturability
  • ease of debugging
  • testability


Microarchitectural concepts

In general, all CPUs, single-chip microprocessors or multi-chip implementations run programs by performing the following steps:
  1. read an instruction and decode it
  2. find any associated data that is needed to process the instruction
  3. process the instruction
  4. write the results out


Complicating this simple-looking series of steps is the fact that the memory hierarchy, which includes caching, main memory and non-volatile storage like hard disk
Hard disk

A hard disk drive , commonly referred to as a hard drive, hard disk, or fixed disk drive, is a non-volatile storage device which stores digitally encoded data on rapidly rotating hard disk platters with magnetic surfaces....
s, (where the program instructions and data reside) has always been slower than the processor itself. Step (2) often introduces a lengthy (in CPU terms) delay while the data arrives over the computer bus
Computer bus

In computer architecture, a bus is a subsystem that transfers data between computer components inside a computer or between computers. Each bus defines its set of connectors to physically plug devices, cards or cables together....
. A considerable amount of research has been put into designs that avoid these delays as much as possible. Over the years, a central goal was to execute more instructions in parallel, thus increasing the effective execution speed of a program. These efforts introduced complicated logic and circuit structures. Initially these techniques could only be implemented on expensive mainframes or supercomputers due to the amount of circuitry needed for these techniques. As semiconductor manufacturing progressed, more and more of these techniques could be implemented on a single semiconductor chip.

See Article Central Processing Unit
Central processing unit

A central processing unit is an electronic circuit that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage....
 for a more detailed discussion on operation basics.


See Article History of general purpose CPUs
History of general purpose CPUs

The history of general purpose CPUs is a continuation of the earlier history of computing hardware....
 for a more detailed discussion on the development history of CPUs.


What follows is a survey of micro-architectural techniques that are common in modern CPUs.

Instruction set choice

Instruction sets has shifted over the years, from originally very simple to sometimes very complex (in various respects). In recent years, load-store architectures, VLIW and EPIC
Explicitly Parallel Instruction Computing

Explicitly Parallel Instruction Computing is a term coined in 1997 by the Itanium to describe a computing paradigm that began to be researched in the early 1980s....
 types have been in fashion. Architectures that are dealing with data parallelism
Data parallelism

Data parallelism is a form of parallelization of computing across multiple central processing units in parallel computing environments. Data parallelism focuses on distributing the data across different parallel computing nodes....
 include SIMD
SIMD

In computing, SIMD is a technique employed to achieve data level parallelism....
 and Vectors
Vector processor

A vector processor, or array processor, is a Central processing unit design where the instruction set includes operations that can perform mathematical operations on multiple data elements simultaneously....
. Some labels used to denote classes of CPU architectures are not pariculary descriptive, especially so the CISC label; many early designs, retroactively denoted "CISC" are in fact significantly simpler than modern RISC processors (in several respects).

However, the choice of instruction set architecture may greatly affect the complexity of implementing high performance devices. The prominent strategy, used to develop the first RISC processors, was to simplify instructions to a minimum of individual semantic complexity combined with high encoding regularity and simplicity. Such uniform instructions was easily feched, decoded and executed in a pipelined fashion and a simple strategy to reduce the number of logic levels in order to reach high operating frequencies; instruction cache-memories compensated for the higher operating frequency and inherently low code density while large register sets were used to factor out as much of the (slow) memory accesses as possible.

Instruction pipelining

One of the first, and most powerful, techniques to improve performance is the use of the instruction pipeline
Instruction pipeline

File:5 Stage Pipeline.svgAn instruction pipeline is a technique used in the design of computers and other digital electronic devices to increase their instruction throughput ....
. Early processor designs would carry out all of the steps above for one instruction before moving onto the next. Large portions of the circuitry were left idle at any one step; for instance, the instruction decoding circuitry would be idle during execution and so on.

Pipelines improve performance by allowing a number of instructions to work their way through the processor at the same time. In the same basic example, the processor would start to decode (step 1) a new instruction while the last one was waiting for results. This would allow up to four instructions to be "in flight" at one time, making the processor look four times as fast. Although any one instruction takes just as long to complete (there are still four steps) the CPU as a whole "retires" instructions much faster and can be run at a much higher clock speed.

RISC make pipelines smaller and much easier to construct by cleanly separating each stage of the instruction process and making them take the same amount of time — one cycle. The processor as a whole operates in an assembly line
Assembly line

An assembly line is a manufacturing process in which parts are added to a product in a sequential manner using optimally planned logistics to create a finished product much faster than with handcrafting-type methods....
 fashion, with instructions coming in one side and results out the other. Due to the reduced complexity of the Classic RISC pipeline
Classic RISC pipeline

In the history of computer hardware, some early reduced instruction set computer central processing units used a very similar architectural solution, now called a classic RISC pipeline....
, the pipelined core and an instruction cache could be placed on the same size die that would otherwise fit the core alone on a CISC design. This was the real reason that RISC was faster. Early designs like the SPARC
SPARC

SPARC is a Reduced Instruction Set Computer microprocessor instruction set Computer architecture originally designed in 1985 by Sun Microsystems....
 and MIPS
MIPS architecture

MIPS is a RISC instruction set architecture developed by MIPS Technologies . In the mid to late 1990s, it was estimated that one in three RISC microprocessors produced were MIPS implementations....
 often ran over 10 times as fast as Intel and Motorola
Motorola

Motorola, Inc. is an United States, multinational, Fortune 100, telecommunications company based in Schaumburg, Illinois. It is a manufacturer of wireless telephone handsets, also designing and selling wireless network infrastructure equipment such as cellular transmission base stations and signal amplifiers....
 CISC solutions at the same clock speed and price.

Pipelines are by no means limited to RISC designs. By 1986 the top-of-the-line VAX implementation (VAX 8800
VAX 8000

The VAX 8000 is a family of minicomputers developed and manufactured by Digital Equipment Corporation using processors implementing the VAX instruction set architecture ....
) was a heavily pipelined design, slightly predating the first commercial MIPS and SPARC designs. Most modern CPUs (even embedded CPUs) are now pipelined, and microcoded CPUs with no pipelining are seen only in the most area-constrained embedded processors. Large CISC machines, from the VAX 8800 to the modern Pentium 4 and Athlon, are implemented with both microcode and pipelines. Improvements in pipelining and caching are the two major microarchitectural advances that have enabled processor performance to keep pace with the circuit technology on which they are based.

Cache

It was not long before improvements in chip manufacturing allowed for even more circuitry to be placed on the die, and designers started looking for ways to use it. One of the most common was to add an ever-increasing amount of cache memory
CPU cache

A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access computer storage. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations....
 on-die. Cache is simply very fast memory, memory that can be accessed in a few cycles as opposed to "many" needed to talk to main memory. The CPU includes a cache controller which automates reading and writing from the cache, if the data is already in the cache it simply "appears," whereas if it is not the processor is "stalled" while the cache controller reads it in.

RISC designs started adding cache in the mid-to-late 1980s, often only 4 KB in total. This number grew over time, and typical CPUs now have about 512 KB, while more powerful CPUs come with 1 or 2 or even 4, 6, 8 or 12 MB, organized in multiple levels of a memory hierarchy
Memory hierarchy

The hierarchical arrangement of computer storage in current computer architectures is called the memory hierarchy. It is designed to take advantage of memory locality in computer programs....
. Generally speaking, more cache means more speed.

Caches and pipelines were a perfect match for each other. Previously, it didn't make much sense to build a pipeline that could run faster than the access latency of off-chip memory. Using on-chip cache memory instead, meant that a pipeline could run at the speed of the cache access latency, a much smaller length of time. This allowed the operating frequencies of processors to increase at a much faster rate than that of off-chip memory.

Branch prediction


One barrier to achieving higher performance through instruction-level parallelism stems from pipeline stalls and flushes due to branches. Normally, whether a conditional branch will be taken isn't known until late in the pipeline as conditional branches depend on results coming from a register. From the time that the processor's instruction decoder has figured out that it has encountered a conditional branch instruction to the time that the deciding register value can be read out, the pipeline might be stalled for several cycles. On average, every fifth instruction executed is a branch, so that's a high amount of stalling. If the branch is taken, its even worse, as then all of the subsequent instructions which were in the pipeline needs to be flushed.

Techniques such as branch prediction and speculative execution
Speculative execution

In computer science, speculative execution is the execution of Code , the result of which may not be needed. In the context of functional programming, the term "speculative evaluation" is used instead....
 are used to lessen these branch penalties. Branch prediction is where the hardware makes educated guesses on whether a particular branch will be taken. The guess allows the hardware to prefetch instructions without waiting for the register read. Speculative execution is a further enhancement in which the code along the predicted path is executed before it is known whether the branch should be taken or not.

Superscalar

Even with all of the added complexity and gates needed to support the concepts outlined above, improvements in semiconductor manufacturing soon allowed even more logic gates to be used.

In the outline above the processor processes parts of a single instruction at a time. Computer programs could be executed faster if multiple instructions were processed simultaneously. This is what superscalar
Superscalar

A superscalar Central processing unit architecture implements a form of parallel computer called instruction level parallelism within a single processor....
 processors achieve, by replicating functional units such as ALUs. The replication of functional units was only made possible when the die area of a single-issue processor no longer stretched the limits of what could be reliably manufactured. By the late 1980s, superscalar designs started to enter the market place.

In modern designs it is common to find two load units, one store (many instructions have no results to store), two or more integer math units, two or more floating point units, and often a SIMD
SIMD

In computing, SIMD is a technique employed to achieve data level parallelism....
 unit of some sort. The instruction issue logic grows in complexity by reading in a huge list of instructions from memory and handing them off to the different execution units that are idle at that point. The results are then collected and re-ordered at the end.

Out-of-order execution

The addition of caches reduces the frequency or duration of stalls due to waiting for data to be fetched from the memory hierarchy, but does not get rid of these stalls entirely. In early designs a cache miss would force the cache controller to stall the processor and wait. Of course there may be some other instruction in the program whose data is available in the cache at that point. Out-of-order execution
Out-of-order execution

In computer engineering, out-of-order execution, OoOE, is a paradigm used in most high-performance microprocessors to make use of Instruction cycle that would otherwise be wasted by a certain type of costly delay....
 allows that ready instruction to be processed while an older instruction waits on the cache, then re-orders the results to make it appear that everything happened in the programmed order. This technique is also used to avoid other operand dependency stalls, such as an instruction awaiting a result from a long latency floating-point operation or other multi-cycle operations.

Speculative execution

One problem with an instruction pipeline is that there are a class of instructions that must make their way entirely through the pipeline before execution can continue. In particular, conditional branches need to know the result of some prior instruction before "which side" of the branch to run is known. For instance, an instruction that says "if x is larger than 5 then do this, otherwise do that" will have to wait for the results of x to be known before it knows if the instructions for this or that can be fetched.

For a small four-deep pipeline this means a delay of up to three cycles — the decode can still happen. But as clock speeds increase the depth of the pipeline increases with it, and some modern processors may have 20 stages or more. In this case the CPU is being stalled for the vast majority of its cycles every time one of these instructions is encountered.

The solution, or one of them, is speculative execution
Speculative execution

In computer science, speculative execution is the execution of Code , the result of which may not be needed. In the context of functional programming, the term "speculative evaluation" is used instead....
, also known as branch prediction
Branch predictor

In computer architecture, a branch predictor is the part of a central processing unit that determines whether a conditional branch in the instruction flow of a program is likely to be taken or not....
. In reality one side or the other of the branch will be called much more often than the other, so it is often correct to simply go ahead and say "x will likely be smaller than five, start processing that". If the prediction turns out to be correct, a huge amount of time will be saved. Modern designs have rather complex prediction systems, which watch the results of past branches to predict the future with greater accuracy.

Register renaming

Register renaming refers to a technique used to avoid unnecessary serialized execution of program instructions because of the reuse of the same registers by those instructions. Suppose we have two groups of instruction that will use the same register
Processor register

In computer architecture, a processor register is a small amount of Computer storage available on the CPU whose contents can be accessed more quickly than storage available elsewhere....
, one set of instruction is executed first to leave the register to the other set, but if the other set is assigned to a different similar register both sets of instructions can be executed in parallel.

Multiprocessing and multithreading

Computer architects have become stymied by the growing mismatch in CPU operating frequencies and DRAM
Dynamic random access memory

Dynamic random access memory is a type of random access memory that stores each bit of data in a separate capacitor within an integrated circuit....
 access times. None of the techniques that exploited instruction-level parallelism within one program could make up for the long stalls that occurred when data had to be fetched from main memory. Additionally, the large transistor counts and high operating frequencies needed for the more advanced ILP techniques required power dissipation levels that could no longer be cheaply cooled. For these reasons, newer generations of computers have started to exploit higher levels of parallelism that exist outside of a single program or program thread
Thread (computer science)

In computer science, a thread of execution is a Fork of a computer program into two or more Concurrency running task s. The implementation of threads and process es differs from one operating system to another, but in most cases, a thread is contained inside a process....
.

This trend is sometimes known as throughput computing. This idea originated in the mainframe market where online transaction processing emphasized not just the execution speed of one transaction, but the capacity to deal with massive numbers of transactions. With transaction-based applications such as network routing and web-site serving greatly increasing in the last decade, the computer industry has re-emphasized capacity and throughput issues.

One technique of how this parallelism is achieved is through multiprocessing
Multiprocessing

Multiprocessing is the use of two or more CPU within a single computer system. The term also refers to the ability of a system to support more than one processor and/or the ability to allocate tasks between them....
 systems, computer systems with multiple CPUs. Once reserved for high-end mainframes
Mainframe computer

Mainframes are computers used mainly by large organizations for critical applications, typically bulk data processing such as census, industry and consumer statistics, Enterprise Resource Planning, and financial transaction processing....
 and supercomputer
Supercomputer

A supercomputer is a computer that is at the frontline of current processing capacity, particularly speed of calculation. Supercomputers introduced in the 1960s were designed primarily by Seymour Cray at Control Data Corporation , and led the market into the 1970s until Cray left to form his own company, Cray Research....
s, small scale (2-8) multiprocessors servers have become commonplace for the small business market. For large corporations, large scale (16-256) multiprocessors are common. Even personal computer
Personal computer

A personal computer is any general-purpose computer whose original sales price, size, and capabilities make it useful for individuals, and which is intended to be operated directly by an end user, with no intervening computer operator....
s with multiple CPUs have appeared since the 1990s.

With further transistor size reductions made available with semiconductor technology advances, multicore CPUs
Multi-core (computing)

A multi-core processor combines two or more independent cores into a single package composed of a single integrated circuit , called a Die , or more dies packaged together....
 have appeared where multiple CPUs are implemented on the same silicon chip. Initially used in chips targeting embedded markets, where simpler and smaller CPUs would allow multiple instantiations to fit on one piece of silicon. By 2005, semiconductor technology allowed dual high-end desktop CPUs CMP chips to be manufactured in volume. Some designs, such as Sun Microsystems
Sun Microsystems

Sun Microsystems, Inc. is a multinational corporation vendor of computers, computer components, computer software, and information technology services, founded on February 24, 1982....
' UltraSPARC T1
UltraSPARC T1

Sun Microsystems' UltraSPARC T1 microprocessor, known until its 14 November 2005 announcement by its development codename "Niagara", is a multithreading, multicore central processing unit....
 have reverted back to simpler (scalar, in-order) designs in order to fit more processors on one piece of silicon.

Another technique that has become more popular recently is multithreading. In multithreading, when the processor has to fetch data from slow system memory, instead of stalling for the data to arrive, the processor switches to another program or program thread which is ready to execute. Though this does not speed up a particular program/thread, it increases the overall system throughput by reducing the time the CPU is idle.

Conceptually, multithreading is equivalent to a context switch
Context switch

A context switch is the computing process of storing and restoring the State of a Central processing unit such that multiple Process es can share a single CPU resource....
 at the operating system level. The difference is that a multithreaded CPU can do a thread switch in one CPU cycle instead of the hundreds or thousands of CPU cycles a context switch normally requires. This is achieved by replicating the state hardware (such as the register file
Register file

A register file is an array of processor registers in a central processing unit. Modern integrated circuit-based register files are usually implemented by way of fast static RAMs with multiple ports....
 and program counter
Program counter

The program counter, or PC is a processor register that indicates where the computer is in its instruction sequence. Depending on the details of the particular computer, the PC holds either the address of the instruction being executed, or the address of the next instruction to be executed....
) for each active thread.

A further enhancement is simultaneous multithreading
Simultaneous multithreading

Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar Central processing unit with Multithreading ....
. This technique allows superscalar CPUs to execute instructions from different programs/threads simultaneously in the same cycle.

See Article History of general purpose CPUs
History of general purpose CPUs

The history of general purpose CPUs is a continuation of the earlier history of computing hardware....
 for other research topics affecting CPU design.

See also


  • Microprocessor
    Microprocessor

    A microprocessor incorporates most or all of the functions of a central processing unit on a single integrated circuit . The first microprocessors emerged in the early 1970s and were used for electronic calculators, using Binary-coded decimal arithmetic on 4-bit Word ....
  • Microcontroller
    Microcontroller

    A microcontroller is a small computer on a single integrated circuit consisting of a relatively simple CPU combined with support functions such as a crystal oscillator, timers, watchdog, serial and analog I/O etc....
  • Digital signal processor
    Digital signal processor

    A digital signal processor is a specialized microprocessor designed specifically for digital signal processing, generally in real-time computing....
  • CPU design
    CPU design

    CPU design is the design engineering task of creating a central processing unit , a component of computer hardware. It is a subfield of electronics engineering and computer engineering....
  • Hardware description language
    Hardware description language

    In electronics, a hardware description language or HDL is any language from a class of computer languages and/or programming languages for formal description of digital logic and electronic circuits....
  • Hardware architecture
    Hardware architecture

    In engineering, hardware architecture refers to the identification of a system's physical components and their interrelationships. This description, often called a hardware design model, allows hardware designers to understand how their components fit into a system architecture and provides software component designers important inf...
  • Harvard architecture
    Harvard architecture

    The Harvard architecture is a computer architecture with physically separate computer storage and signal pathways for instructions and data. The term originated from the Harvard Mark I relay-based computer, which stored instructions on punched tape and data in electro-mechanical counters ....
  • Von Neumann architecture
    Von Neumann architecture

    The von Neumann architecture is a design model for a stored-program digital computer that uses a central processing unit and a single separate computer storage structure to hold both instructions and data ....
  • Multi-core (computing)
    Multi-core (computing)

    A multi-core processor combines two or more independent cores into a single package composed of a single integrated circuit , called a Die , or more dies packaged together....
  • Datapath
    Datapath

    A datapath is a collection of digital electronics, such as arithmetic logic units or multiplication ALUs, that perform data processing operations....
  • Dataflow architecture
    Dataflow architecture

    Dataflow architecture is a computer architecture that directly contrasts the traditional von Neumann architecture or control flow architecture. Dataflow architectures do not have a program counter or the executability and execution of instructions is solely determined based on the availability of input arguments to the instructions....
  • VLSI
  • VHDL
  • Verilog
    Verilog

    In the semiconductor and electronic design industry, Verilog is a hardware description language used to model Electronics#Electronic systems. Verilog HDL, not to be confused with VHDL, is most commonly used in the design, verification, and implementation of Digital circuit logic chips at the Register transfer level level of Abstraction...
  • Stream processing
    Stream processing

    Stream processing is a computer programming paradigm, related to SIMD, that allows some applications to more easily exploit a limited form of parallel computing....
  • Instruction-level parallelism (ILP)


Further reading