All Topics  
Central processing unit

 
Central Processing Unit

   Email Print
   Bookmark   Link






 

Central processing unit



 
 
A central processing unit (CPU) is an electronic circuit that can execute computer program
Computer program

Computer programs are Instruction for a computer. A computer requires programs to function. Moreover, a computer program does not run unless its instructions are executed by a Central processing unit; however, a program may communicate an Algorithm#Formalization of algorithms to people without running....
s. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage. The term itself and its initialism have been in use in the computer industry at least since the early 1960s .






Discussion
Ask a question about 'Central processing unit'
Start a new discussion about 'Central processing unit'
Answer questions from other users
Full Discussion Forum



Encyclopedia


80486dx2 Large
A central processing unit (CPU) is an electronic circuit that can execute computer program
Computer program

Computer programs are Instruction for a computer. A computer requires programs to function. Moreover, a computer program does not run unless its instructions are executed by a Central processing unit; however, a program may communicate an Algorithm#Formalization of algorithms to people without running....
s. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage. The term itself and its initialism have been in use in the computer industry at least since the early 1960s . The form, design and implementation of CPUs have changed dramatically since the earliest examples, but their fundamental operation has remained much the same.

Early CPUs were custom-designed as a part of a larger, sometimes one-of-a-kind, computer. However, this costly method of designing custom CPUs for a particular application has largely given way to the development of mass-produced processors that are suited for one or many purposes. This standardization trend generally began in the era of discrete transistor
Transistor

In electronics, a transistor is a semiconductor device commonly used to Electronic amplifier or switch Electronics signals. A transistor is made of a solid piece of a semiconductor material, with at least three terminals for connection to an external circuit....
 mainframes
Mainframe computer

Mainframes are computers used mainly by large organizations for critical applications, typically bulk data processing such as census, industry and consumer statistics, Enterprise Resource Planning, and financial transaction processing....
 and minicomputer
Minicomputer

A minicomputer is a class of multi-user computers that lies in the middle range of the computing spectrum, in between the largest multi-user systems and the smallest single-user systems ....
s and has rapidly accelerated with the popularization of the integrated circuit
Integrated circuit

In electronics, an integrated circuit is a miniaturized electronic circuit that has been manufactured in the surface of a thin Wafer of semiconductor material....
 (IC). The IC has allowed increasingly complex CPUs to be designed and manufactured to tolerances on the order of nanometers. Both the miniaturization and standardization of CPUs have increased the presence of these digital devices in modern life far beyond the limited application of dedicated computing machines. Modern microprocessors appear in everything from automobile
Automobile

An automobile or motor car is a wheeled motor vehicle for transportation passengers, which also carries its own car engine or motor. Most definitions of the term specify that automobiles are designed to run primarily on roads, to have seating for one to eight people, to typically have four wheels, and to be constructed principally f...
s to cell phones to children's toys.

History of CPUs


Prior to the advent of machines that resemble today's CPUs, computers such as the ENIAC
ENIAC

ENIAC, short for Electronic Numerical Integrator And Computer, was a general-purpose electronic computer. It was a Turing complete, digital computer capable of being reprogrammed to solve a full range of computing problems....
 had to be physically rewired in order to perform different tasks. These machines are often referred to as "fixed-program computers," since they had to be physically reconfigured in order to run a different program. Since the term "CPU" is generally defined as a software (computer program) execution device, the earliest devices that could rightly be called CPUs came with the advent of the stored-program computer.

The idea of a stored-program computer was already present during ENIAC's design, but was initially omitted so the machine could be finished sooner. On June 30, 1945, before ENIAC was even completed, mathematician John von Neumann
John von Neumann

John von Neumann was a Hungarian American mathematician who made major contributions to a vast range of fields, including set theory, functional analysis, quantum mechanics, ergodic theory, continuous geometry, economics and game theory, computer science, numerical analysis, hydrodynamics , and statistics, as well as many other mathematical...
 distributed the paper entitled "First Draft of a Report on the EDVAC
First Draft of a Report on the EDVAC

The First Draft of a Report on the EDVAC was an unfinished work 101-page document written by John von Neumann and distributed on June 30, 1945 by Herman Goldstine, security officer on the classified ENIAC project....
." It outlined the design of a stored-program computer that would eventually be completed in August 1949 . EDVAC was designed to perform a certain number of instructions (or operations) of various types. These instructions could be combined to create useful programs for the EDVAC to run. Significantly, the programs written for EDVAC were stored in high-speed computer memory rather than specified by the physical wiring of the computer. This overcame a severe limitation of ENIAC, which was the large amount of time and effort it took to reconfigure the computer to perform a new task. With von Neumann's design, the program, or software, that EDVAC ran could be changed simply by changing the contents of the computer's memory.

While von Neumann is most often credited with the design of the stored-program computer because of his design of EDVAC, others before him such as Konrad Zuse
Konrad Zuse

Konrad Zuse was a Germany Civil engineering and computer pioneer. His greatest achievement was the world's first functional program-controlled Turing-complete computer, the Z3 , in 1941 ....
 had suggested similar ideas. Additionally, the so-called Harvard architecture
Harvard architecture

The Harvard architecture is a computer architecture with physically separate computer storage and signal pathways for instructions and data. The term originated from the Harvard Mark I relay-based computer, which stored instructions on punched tape and data in electro-mechanical counters ....
 of the Harvard Mark I
Harvard Mark I

The IBM Automatic Sequence Controlled Calculator , called the Mark I by Harvard University, was the first large-scale automatic digital computer in the USA....
, which was completed before EDVAC, also utilized a stored-program design using punched paper tape
Punched tape

Punched tape or paper tape is a largely obsolete form of data storage, consisting of a long strip of paper in which holes are punched to store data....
 rather than electronic memory. The key difference between the von Neumann and Harvard architectures is that the latter separates the storage and treatment of CPU instructions and data, while the former uses the same memory space for both. Most modern CPUs are primarily von Neumann in design, but elements of the Harvard architecture are commonly seen as well.

Being digital
Digital

A digital system uses discrete values, usually but not always symbolized numerically to represent information for input, processing, transmission, storage, etc....
 devices, all CPUs deal with discrete states and therefore require some kind of switching elements to differentiate between and change these states. Prior to commercial acceptance of the transistor, electrical relays
Relay

A relay is an electrical switch that opens and closes under the control of another electrical circuit. In the original form, the switch is operated by an magnet to open or close one or many sets of contacts....
 and vacuum tube
Vacuum tube

In electronics, a vacuum tube, electron tube , thermionic valve, or just valve is a device used to amplifier, switch, otherwise modify, or create an Electricity signal by controlling the movement of electrons in a low-pressure space....
s (thermionic valves) were commonly used as switching elements. Although these had distinct speed advantages over earlier, purely mechanical designs, they were unreliable for various reasons. For example, building direct current
Direct current

Direct current is the unidirectional flow of electric charge. Direct current is produced by such sources as battery , thermocouples, solar cells, and commutator-type electric machines of the dynamo type....
 sequential logic
Sequential logic

In digital circuit theory, sequential logic is a type of logic circuit whose output depends not only on the present input but also on the history of the input....
 circuits out of relays requires additional hardware to cope with the problem of contact bounce
Switch

In electronics, a switch is an electrical component which can break an electrical circuit, interrupting the Electric current or diverting it from one conductor to another....
. While vacuum tubes do not suffer from contact bounce, they must heat up before becoming fully operational and eventually stop functioning altogether. Usually, when a tube failed, the CPU would have to be diagnosed to locate the failing component so it could be replaced. Therefore, early electronic (vacuum tube based) computers were generally faster but less reliable than electromechanical (relay based) computers.

Tube computers like EDVAC
EDVAC

EDVAC was one of the earliest electronics computers. Unlike its predecessor the ENIAC, it was binary numeral system rather than decimal, and was a Von Neumann architecture machine....
 tended to average eight hours between failures, whereas relay computers like the (slower, but earlier) Harvard Mark I
Harvard Mark I

The IBM Automatic Sequence Controlled Calculator , called the Mark I by Harvard University, was the first large-scale automatic digital computer in the USA....
 failed very rarely . In the end, tube based CPUs became dominant because the significant speed advantages afforded generally outweighed the reliability problems. Most of these early synchronous CPUs ran at low clock rate
Clock rate

The clock rate is the fundamental rate in cycles per second for the frequency of the clock in any synchronous circuit. For example, a crystal oscillator frequency reference typically is synonymous with a fixed sinusoidal waveform, a clock rate is that frequency reference translated by electronic circuitry into a corresponding square wav...
s compared to modern microelectronic designs (see below for a discussion of clock rate). Clock signal frequencies ranging from 100 kHz
Hertz

The hertz is a measure of frequency per unit of time, or the number of list of cycles per second. It is the SI base unit of frequency in the International System of Units , and is used worldwide in both general-purpose and scientific contexts....
 to 4 MHz were very common at this time, limited largely by the speed of the switching devices they were built with.

Discrete transistor and IC CPUs

Pdp 8i Cpu
The design complexity of CPUs increased as various technologies facilitated building smaller and more reliable electronic devices. The first such improvement came with the advent of the transistor
Transistor

In electronics, a transistor is a semiconductor device commonly used to Electronic amplifier or switch Electronics signals. A transistor is made of a solid piece of a semiconductor material, with at least three terminals for connection to an external circuit....
. Transistorized CPUs during the 1950s and 1960s no longer had to be built out of bulky, unreliable, and fragile switching elements like vacuum tube
Vacuum tube

In electronics, a vacuum tube, electron tube , thermionic valve, or just valve is a device used to amplifier, switch, otherwise modify, or create an Electricity signal by controlling the movement of electrons in a low-pressure space....
s and electrical relays
Relay

A relay is an electrical switch that opens and closes under the control of another electrical circuit. In the original form, the switch is operated by an magnet to open or close one or many sets of contacts....
. With this improvement more complex and reliable CPUs were built onto one or several printed circuit board
Printed circuit board

A printed circuit board, or PCB, is used to mechanically support and electrically connect electronic components using Conductor pathways, or signal traces, industrial etchinged from copper sheets laminated onto a non-conductive substrate....
s containing discrete (individual) components.

During this period, a method of manufacturing many transistors in a compact space gained popularity. The integrated circuit
Integrated circuit

In electronics, an integrated circuit is a miniaturized electronic circuit that has been manufactured in the surface of a thin Wafer of semiconductor material....
 (IC) allowed a large number of transistors to be manufactured on a single semiconductor
Semiconductor

A semiconductor is a material that has electrical conductivity between those of a Electrical conductor and an electrical insulation; it can vary over that wide range either permanently or dynamically....
-based die
Die (integrated circuit)

A die in the context of integrated circuits is a small block of semiconducting material, on which a given functional circuit is fabricated.Typically, integrated circuits are produced in large batches on a single wafer of electronic-grade silicon through processes such as photolithography....
, or "chip." At first only very basic non-specialized digital circuits such as NOR gate
NOR gate

The NOR gate is a digital logic gate that implements logical NOR - it behaves according to the truth table to the right. A HIGH output results if both the inputs to the gate are LOW ....
s were miniaturized into ICs. CPUs based upon these "building block" ICs are generally referred to as "small-scale integration" (SSI) devices. SSI ICs, such as the ones used in the Apollo guidance computer
Apollo Guidance Computer

The Apollo Guidance Computer was the first recognizably modern embedded system, used in Real-time computing by astronaut pilot to collect and provide flight information, and to automatically control all of the navigational functions of the Apollo spacecraft....
, usually contained transistor counts numbering in multiples of ten. To build an entire CPU out of SSI ICs required thousands of individual chips, but still consumed much less space and power than earlier discrete transistor designs. As microelectronic technology advanced, an increasing number of transistors were placed on ICs, thus decreasing the quantity of individual ICs needed for a complete CPU. MSI and LSI (medium- and large-scale integration) ICs increased transistor counts to hundreds, and then thousands.

In 1964 IBM
IBM

International Business Machines Corporation, abbreviated IBM and nicknamed "Big Blue" , is a multinational corporation computer technology and consulting corporation headquartered in Armonk, New York, New York, United States....
 introduced its System/360
System/360

The IBM System/360 is a mainframe computer system family announced by IBM on April 7, 1964. It was the first family of computers making a clear distinction between computer architecture and implementation, allowing IBM to release a suite of compatible designs at different price points....
 computer architecture which was used in a series of computers that could run the same programs with different speed and performance. This was significant at a time when most electronic computers were incompatible with one another, even those made by the same manufacturer. To facilitate this improvement, IBM utilized the concept of a microprogram (often called "microcode"), which still sees widespread usage in modern CPUs . The System/360 architecture was so popular that it dominated the mainframe computer
Mainframe computer

Mainframes are computers used mainly by large organizations for critical applications, typically bulk data processing such as census, industry and consumer statistics, Enterprise Resource Planning, and financial transaction processing....
 market for the decades and left a legacy that is still continued by similar modern computers like the IBM zSeries
ZSeries

IBM System z, or earlier IBM eServer zSeries, is a brand name designated by IBM to all its mainframe computers.In 2000, IBM rebranded the existing System/390 to IBM eServer zSeries with the e depicted in IBM's red trademarked symbol....
. In the same year (1964), Digital Equipment Corporation
Digital Equipment Corporation

Digital Equipment Corporation was a pioneering United States company in the computer industry. It is often referred to within the computing industry as DEC ....
 (DEC) introduced another influential computer aimed at the scientific and research markets, the PDP-8
PDP-8

The PDP-8 was the first successful commercial minicomputer, produced by Digital Equipment Corporation in the 1960s. DEC introduced it on 22 March 1965, and sold more than 50,000 systems, the most of any computer up to that date....
. DEC would later introduce the extremely popular PDP-11
PDP-11

The PDP-11 was a series of 16-bit minicomputers sold by Digital Equipment Corporation from 1970 into the 1990s. Though not explicitly conceived as successor to DEC's PDP-8 computer in the Programmed Data Processor series of computers , the PDP-11 replaced the PDP-8 in many Real-time computing....
 line that originally was built with SSI ICs but was eventually implemented with LSI components once these became practical. In stark contrast with its SSI and MSI predecessors, the first LSI implementation of the PDP-11 contained a CPU composed of only four LSI integrated circuits .

Transistor-based computers had several distinct advantages over their predecessors. Aside from facilitating increased reliability and lower power consumption, transistors also allowed CPUs to operate at much higher speeds because of the short switching time of a transistor in comparison to a tube or relay. Thanks to both the increased reliability as well as the dramatically increased speed of the switching elements (which were almost exclusively transistors by this time), CPU clock rates in the tens of megahertz were obtained during this period. Additionally while discrete transistor and IC CPUs were in heavy usage, new high-performance designs like SIMD
SIMD

In computing, SIMD is a technique employed to achieve data level parallelism....
 (Single Instruction Multiple Data) vector processor
Vector processor

A vector processor, or array processor, is a Central processing unit design where the instruction set includes operations that can perform mathematical operations on multiple data elements simultaneously....
s began to appear. These early experimental designs later gave rise to the era of specialized supercomputer
Supercomputer

A supercomputer is a computer that is at the frontline of current processing capacity, particularly speed of calculation. Supercomputers introduced in the 1960s were designed primarily by Seymour Cray at Control Data Corporation , and led the market into the 1970s until Cray left to form his own company, Cray Research....
s like those made by Cray Inc.

Microprocessors

Intel 80486dx2 Bottom
The introduction of the microprocessor
Microprocessor

A microprocessor incorporates most or all of the functions of a central processing unit on a single integrated circuit . The first microprocessors emerged in the early 1970s and were used for electronic calculators, using Binary-coded decimal arithmetic on 4-bit Word ....
 in the 1970s significantly affected the design and implementation of CPUs. Since the introduction of the first microprocessor (the Intel 4004
Intel 4004

The Intel 4004 is a 4-bit central processing unit released by Intel Corporation in 1971. The 4004 is the first complete CPU on one chip, the first commercially available microprocessor, a feat made possible by the use of the new silicon gate technology allowing the integration of a higher number of transistors and a faster speed than was pos...
) in 1970 and the first widely used microprocessor (the Intel 8080
Intel 8080

The Intel 8080 was an early microprocessor designed and manufactured by Intel. The 8-bit microprocessor was released in April 1974 running at 2 megahertz , and is generally considered to be the first truly usable microprocessor....
) in 1974, this class of CPUs has almost completely overtaken all other central processing unit implementation methods. Mainframe and minicomputer manufacturers of the time launched proprietary IC development programs to upgrade their older computer architecture
Computer architecture

Computer architecture in computer engineering is the conceptual design and fundamental operational structure of a computer system. It is a blueprint and functional description of requirements and design implementations for the various parts of a computer, focusing largely on the way by which the central processing unit performs internally an...
s, and eventually produced instruction set compatible microprocessors that were backward-compatible with their older hardware and software. Combined with the advent and eventual vast success of the now ubiquitous personal computer
Personal computer

A personal computer is any general-purpose computer whose original sales price, size, and capabilities make it useful for individuals, and which is intended to be operated directly by an end user, with no intervening computer operator....
, the term "CPU" is now applied almost exclusively to microprocessors.

Previous generations of CPUs were implemented as discrete components and numerous small integrated circuit
Integrated circuit

In electronics, an integrated circuit is a miniaturized electronic circuit that has been manufactured in the surface of a thin Wafer of semiconductor material....
s (ICs) on one or more circuit boards. Microprocessors, on the other hand, are CPUs manufactured on a very small number of ICs; usually just one. The overall smaller CPU size as a result of being implemented on a single die means faster switching time because of physical factors like decreased gate parasitic capacitance
Capacitance

In electromagnetism and electronics, capacitance is the ability of a body to hold an electrical charge.Capacitance is also a measure of the amount of electric charge stored for a given electric potential....
. This has allowed synchronous microprocessors to have clock rates ranging from tens of megahertz to several gigahertz. Additionally, as the ability to construct exceedingly small transistors on an IC has increased, the complexity and number of transistors in a single CPU has increased dramatically. This widely observed trend is described by Moore's law
Moore's Law

Moore's law describes a long-term trend in the history of computing hardware. Since the invention of the integrated circuit in 1958, the number of transistors that can be placed inexpensively on an integrated circuit has increased exponential growth, doubling approximately every two years....
, which has proven to be a fairly accurate predictor of the growth of CPU (and other IC) complexity to date.

While the complexity, size, construction, and general form of CPUs have changed drastically over the past sixty years, it is notable that the basic design and function has not changed much at all. Almost all common CPUs today can be very accurately described as von Neumann stored-program machines. As the aforementioned Moore's law continues to hold true, concerns have arisen about the limits of integrated circuit transistor technology. Extreme miniaturization of electronic gates is causing the effects of phenomena like electromigration
Electromigration

Electromigration is the transport of material caused by the gradual movement of the ions in a Conductor due to the momentum transfer between conducting electrons and diffusing metal atoms....
 and subthreshold leakage
Subthreshold leakage

Subthreshold leakage or subthreshold conduction or subthreshold drain current is the electric current that flows between the source and drain of a MOSFET when the transistor is in the subthreshold region, that is, for gate-to-source voltages below the threshold voltage....
 to become much more significant. These newer concerns are among the many factors causing researchers to investigate new methods of computing such as the quantum computer
Quantum computer

A quantum computer is a device for computation that makes direct use of quantum mechanical phenomena, such as quantum superposition and quantum entanglement, to perform operations on data....
, as well as to expand the usage of parallelism
Parallel computing

Parallel computing is a form of computing in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved Concurrency ....
 and other methods that extend the usefulness of the classical von Neumann model.

CPU operation

The fundamental operation of most CPUs, regardless of the physical form they take, is to execute a sequence of stored instructions called a program. The program is represented by a series of numbers that are kept in some kind of computer memory. There are four steps that nearly all CPUs use in their operation: fetch, decode, execute, and writeback.

The first step, fetch, involves retrieving an instruction
Instruction (computer science)

In computer science, an instruction is a single operation of a central processing unit defined by an instruction set architecture. In a broader sense, an "instruction" may be any representation of an element of an executable program, such as a bytecode....
 (which is represented by a number or sequence of numbers) from program memory. The location in program memory is determined by a program counter
Program counter

The program counter, or PC is a processor register that indicates where the computer is in its instruction sequence. Depending on the details of the particular computer, the PC holds either the address of the instruction being executed, or the address of the next instruction to be executed....
 (PC), which stores a number that identifies the current position in the program. In other words, the program counter keeps track of the CPU's place in the current program. After an instruction is fetched, the PC is incremented by the length of the instruction word in terms of memory units. Often the instruction to be fetched must be retrieved from relatively slow memory, causing the CPU to stall while waiting for the instruction to be returned. This issue is largely addressed in modern processors by caches and pipeline architectures (see below).

The instruction that the CPU fetches from memory is used to determine what the CPU is to do. In the decode step, the instruction is broken up into parts that have significance to other portions of the CPU. The way in which the numerical instruction value is interpreted is defined by the CPU's instruction set architecture(ISA). Often, one group of numbers in the instruction, called the opcode, indicates which operation to perform. The remaining parts of the number usually provide information required for that instruction, such as operands for an addition operation. Such operands may be given as a constant value (called an immediate value), or as a place to locate a value: a register
Processor register

In computer architecture, a processor register is a small amount of Computer storage available on the CPU whose contents can be accessed more quickly than storage available elsewhere....
 or a memory address, as determined by some addressing mode. In older designs the portions of the CPU responsible for instruction decoding were unchangeable hardware devices. However, in more abstract and complicated CPUs and ISAs, a microprogram is often used to assist in translating instructions into various configuration signals for the CPU. This microprogram is sometimes rewritable so that it can be modified to change the way the CPU decodes instructions even after it has been manufactured.

After the fetch and decode steps, the execute step is performed. During this step, various portions of the CPU are connected so they can perform the desired operation. If, for instance, an addition operation was requested, an arithmetic logic unit (ALU) will be connected to a set of inputs and a set of outputs. The inputs provide the numbers to be added, and the outputs will contain the final sum. The ALU contains the circuitry to perform simple arithmetic and logical operations on the inputs (like addition and bitwise operations). If the addition operation produces a result too large for the CPU to handle, an arithmetic overflow flag in a flags register may also be set.

The final step, writeback, simply "writes back" the results of the execute step to some form of memory. Very often the results are written to some internal CPU register for quick access by subsequent instructions. In other cases results may be written to slower, but cheaper and larger, main memory. Some types of instructions manipulate the program counter rather than directly produce result data. These are generally called "jumps" and facilitate behavior like loops
Control flow

In computer science control flow refers to the order in which the individual statement , Instruction or function calls of an imperative programming or functional programming computer program are execution or evaluated....
, conditional program execution (through the use of a conditional jump), and functions
Subroutine

In computer science, a subroutine or subprogram is a portion of computer code within a larger computer program, which performs a specific task and is relatively independent of the remaining code....
 in programs. Many instructions will also change the state of digits in a "flags" register. These flags can be used to influence how a program behaves, since they often indicate the outcome of various operations. For example, one type of "compare" instruction considers two values and sets a number in the flags register according to which one is greater. This flag could then be used by a later jump instruction to determine program flow.

After the execution of the instruction and writeback of the resulting data, the entire process repeats, with the next instruction cycle normally fetching the next-in-sequence instruction because of the incremented value in the program counter. If the completed instruction was a jump, the program counter will be modified to contain the address of the instruction that was jumped to, and program execution continues normally. In more complex CPUs than the one described here, multiple instructions can be fetched, decoded, and executed simultaneously. This section describes what is generally referred to as the "Classic RISC pipeline," which in fact is quite common among the simple CPUs used in many electronic devices (often called microcontroller). It largely ignores the important role of CPU cache
CPU cache

A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access computer storage. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations....
, and therefore the access stage of the pipeline.

Design and implementation


Integer range

The way a CPU represents numbers is a design choice that affects the most basic ways in which the device functions. Some early digital computers used an electrical model of the common decimal
Decimal

The decimal numeral system has 10 as its Base . It is the most widely used numeral system....
 (base ten) numeral system
Numeral system

A numeral system is a writing system for expressing numerals , and a mathematical notation for representing numbers of a given set, using graphemes or symbols in a consistent manner....
 to represent numbers internally. A few other computers have used more exotic numeral systems like ternary
Balanced ternary

Balanced ternary is a Non-standard positional numeral systems , useful for comparison logic. It is a Ternary numeral system system, but unlike the standard ternary system, the digits have the values -1, 0, and 1....
 (base three). Nearly all modern CPUs represent numbers in binary
Binary numeral system

The binary numeral system, or notation with a radix of 2. Owing to its straightforward implementation in digital electronic circuitry using logic gates, the binary system is used internally by all modern computers....
 form, with each digit being represented by some two-valued physical quantity such as a "high" or "low" volt
Volt

The volt is the SI SI derived unit of electric potential difference or electromotive force, commonly known as voltage. It is named in honor of the Lombard physicist Alessandro Volta , who invented the voltaic pile, possibly the first chemical battery ....
age.

Mos 6502ad 4585 Top
Related to number representation is the size and precision of numbers that a CPU can represent. In the case of a binary CPU, a bit refers to one significant place in the numbers a CPU deals with. The number of bits (or numeral places) a CPU uses to represent numbers is often called "word size
Word (computer science)

In computing, "word" is a term for the natural unit of data used by a particular computer design. A word is simply a fixed-sized group of bits that are handled together by the machine....
", "bit width", "data path width", or "integer precision" when dealing with strictly integer numbers (as opposed to floating point). This number differs between architectures, and often within different parts of the very same CPU. For example, an 8-bit
8-bit

Eight-bit CPUs normally use an 8-bit data bus and a 16-bit address bus which means that their address space is limited to 64 KBs. This is not a "natural law", however, so there are exceptions....
 CPU deals with a range of numbers that can be represented by eight binary digits (each digit having two possible values), that is, 28 or 256 discrete numbers. In effect, integer size sets a hardware limit on the range of integers the software run by the CPU can utilize.

Integer range can also affect the number of locations in memory the CPU can address (locate). For example, if a binary CPU uses 32 bits to represent a memory address, and each memory address represents one octet
Octet (computing)

In computing, an octet is a grouping of eight bits.Octet, with the only exception noted below, always refers to an entity having exactly eight bits....
 (8 bits), the maximum quantity of memory that CPU can address is 232 octets, or 4 GiB
Gib

Gib may refer to:* A castrated male cat or ferret* Gibibit , a unit of information used, for example, to quantify computer memory or storage capacity...
. This is a very simple view of CPU address space
Address space

In computing, an address space defines a range of discrete addresses, each of which may correspond to a physical or virtual memory register, a Node , peripheral device, disk sector or other logical or physical entity....
, and many designs use more complex addressing methods like paging
Bank switching

Bank switching was a technique common in 8-bit microcomputer systems, to increase the amount of addressable random-access memory and read-only memory without extending the address bus....
 in order to locate more memory than their integer range would allow with a flat address space.

Higher levels of integer range require more structures to deal with the additional digits, and therefore more complexity, size, power usage, and general expense. It is not at all uncommon, therefore, to see 4- or 8-bit microcontroller
Microcontroller

A microcontroller is a small computer on a single integrated circuit consisting of a relatively simple CPU combined with support functions such as a crystal oscillator, timers, watchdog, serial and analog I/O etc....
s used in modern applications, even though CPUs with much higher range (such as 16, 32, 64, even 128-bit) are available. The simpler microcontrollers are usually cheaper, use less power, and therefore dissipate less heat, all of which can be major design considerations for electronic devices. However, in higher-end applications, the benefits afforded by the extra range (most often the additional address space) are more significant and often affect design choices. To gain some of the advantages afforded by both lower and higher bit lengths, many CPUs are designed with different bit widths for different portions of the device. For example, the IBM System/370
System/370

The IBM System/370 was a model range of IBM mainframes announced on June 30, 1970 as the successors to the System/360 family. The series maintained backward compatibility with the S/360, allowing an easy migration path for customers; this, plus improved performance, were the dominant themes of the product announcement....
 used a CPU that was primarily 32 bit, but it used 128-bit precision inside its floating point
Floating point

In computing, floating point describes a system for numerical representation in which a String of digits represents a rational number.The term floating point refers to the fact that the radix point can "float": that is, it can be placed anywhere relative to the Significant figures of the number....
 units to facilitate greater accuracy and range in floating point numbers . Many later CPU designs use similar mixed bit width, especially when the processor is meant for general-purpose usage where a reasonable balance of integer and floating point capability is required.

Clock rate


Most CPUs, and indeed most sequential logic
Sequential logic

In digital circuit theory, sequential logic is a type of logic circuit whose output depends not only on the present input but also on the history of the input....
 devices, are synchronous
Synchronous circuit

A synchronous circuit is a digital circuit in which the parts are synchronized by a clock signal.In an ideal synchronous circuit, every change in the logical levels of its storage components is simultaneous....
 in nature. That is, they are designed and operate on assumptions about a synchronization signal. This signal, known as a clock signal, usually takes the form of a periodic square wave
Square wave

A square wave is a kind of non-sinusoidal waveform, most typically encountered in electronics and signal processing. An ideal square wave alternates regularly and instantaneously between two levels....
. By calculating the maximum time that electrical signals can move in various branches of a CPU's many circuits, the designers can select an appropriate period
Frequency

Frequency is the number of occurrences of a repeating event per unit time. It is also referred to as temporal frequency.The period is the duration of one cycle in a repeating event, so the period is the reciprocal of the frequency....
 for the clock signal.

This period must be longer than the amount of time it takes for a signal to move, or propagate, in the worst-case scenario. In setting the clock period to a value well above the worst-case propagation delay, it is possible to design the entire CPU and the way it moves data around the "edges" of the rising and falling clock signal. This has the advantage of simplifying the CPU significantly, both from a design perspective and a component-count perspective. However, it also carries the disadvantage that the entire CPU must wait on its slowest elements, even though some portions of it are much faster. This limitation has largely been compensated for by various methods of increasing CPU parallelism (see below).

However architectural improvements alone do not solve all of the drawbacks of globally synchronous CPUs. For example, a clock signal is subject to the delays of any other electrical signal. Higher clock rates in increasingly complex CPUs make it more difficult to keep the clock signal in phase (synchronized) throughout the entire unit. This has led many modern CPUs to require multiple identical clock signals to be provided in order to avoid delaying a single signal significantly enough to cause the CPU to malfunction. Another major issue as clock rates increase dramatically is the amount of heat that is dissipated by the CPU. The constantly changing clock causes many components to switch regardless of whether they are being used at that time. In general, a component that is switching uses more energy than an element in a static state. Therefore, as clock rate increases, so does heat dissipation, causing the CPU to require more effective cooling solutions.

One method of dealing with the switching of unneeded components is called clock gating
Clock gating

Clock gating is one of the power-saving techniques used on many synchronous circuits including the Pentium 4 Central processing unit. To save power, clock gating support adds additional logic to a circuit to prune the clock tree, thus disabling portions of the circuitry so that its Flip-flop do not change state: their switching power consum...
, which involves turning off the clock signal to unneeded components (effectively disabling them). However, this is often regarded as difficult to implement and therefore does not see common usage outside of very low-power designs. Another method of addressing some of the problems with a global clock signal is the removal of the clock signal altogether. While removing the global clock signal makes the design process considerably more complex in many ways, asynchronous (or clockless) designs carry marked advantages in power consumption and heat dissipation in comparison with similar synchronous designs. While somewhat uncommon, entire asynchronous CPU
Asynchronous circuit

An asynchronous circuit is a electrical network in which the parts are largely autonomous. They are not governed by a clock circuit or global clock signal, but instead need only wait for the signals that indicate completion of instructions and operations....
s have been built without utilizing a global clock signal. Two notable examples of this are the ARM
ARM architecture

The ARM architecture is a 32-bit RISC central processing unit architecture developed by ARM Limited that is widely used in embedded system designs....
 compliant AMULET
AMULET microprocessor

AMULET is a series of microprocessors that implement the ARM architecture processor architecture. Developed by the group under the University of Manchester's computer science University of Manchester School of Computer Science , AMULET is unique from other ARM implementations in that it is an Asynchronous circuit#Asynchronous CPU, not makin...
 and the MIPS
MIPS architecture

MIPS is a RISC instruction set architecture developed by MIPS Technologies . In the mid to late 1990s, it was estimated that one in three RISC microprocessors produced were MIPS implementations....
 R3000 compatible MiniMIPS. Rather than totally removing the clock signal, some CPU designs allow certain portions of the device to be asynchronous, such as using asynchronous ALUs
Arithmetic logic unit

In computing, an arithmetic logic unit is a digital circuit that performs arithmetic and logicaloperations. The ALU is a fundamental building block of the central processing unit of a computer, and even the simplest microprocessors contain one for purposes such as maintaining timers....
 in conjunction with superscalar pipelining to achieve some arithmetic performance gains. While it is not altogether clear whether totally asynchronous designs can perform at a comparable or better level than their synchronous counterparts, it is evident that they do at least excel in simpler math operations. This, combined with their excellent power consumption and heat dissipation properties, makes them very suitable for embedded computers .

Parallelism

Nopipeline
The description of the basic operation of a CPU offered in the previous section describes the simplest form that a CPU can take. This type of CPU, usually referred to as subscalar, operates on and executes one instruction on one or two pieces of data at a time.

This process gives rise to an inherent inefficiency in subscalar CPUs. Since only one instruction is executed at a time, the entire CPU must wait for that instruction to complete before proceeding to the next instruction. As a result the subscalar CPU gets "hung up" on instructions which take more than one clock cycle to complete execution. Even adding a second execution unit (see below) does not improve performance much; rather than one pathway being hung up, now two pathways are hung up and the number of unused transistors is increased. This design, wherein the CPU's execution resources can operate on only one instruction at a time, can only possibly reach scalar performance (one instruction per clock). However, the performance is nearly always subscalar (less than one instruction per cycle).

Attempts to achieve scalar and better performance have resulted in a variety of design methodologies that cause the CPU to behave less linearly and more in parallel. When referring to parallelism in CPUs, two terms are generally used to classify these design techniques. Instruction level parallelism
Instruction level parallelism

Instruction-level parallelism is a measure of how many of the operations in a computer program can be performed simultaneously. Consider the following program:...
 (ILP) seeks to increase the rate at which instructions are executed within a CPU (that is, to increase the utilization of on-die execution resources), and thread level parallelism (TLP) purposes to increase the number of threads
Thread (computer science)

In computer science, a thread of execution is a Fork of a computer program into two or more Concurrency running task s. The implementation of threads and process es differs from one operating system to another, but in most cases, a thread is contained inside a process....
 (effectively individual programs) that a CPU can execute simultaneously. Each methodology differs both in the ways in which they are implemented, as well as the relative effectiveness they afford in increasing the CPU's performance for an application.

Instruction level parallelism
Fivestagespipeline
One of the simplest methods used to accomplish increased parallelism is to begin the first steps of instruction fetching and decoding before the prior instruction finishes executing. This is the simplest form of a technique known as instruction pipelining, and is utilized in almost all modern general-purpose CPUs. Pipelining allows more than one instruction to be executed at any given time by breaking down the execution pathway into discrete stages. This separation can be compared to an assembly line, in which an instruction is made more complete at each stage until it exits the execution pipeline and is retired.

Pipelining does, however, introduce the possibility for a situation where the result of the previous operation is needed to complete the next operation; a condition often termed data dependency conflict. To cope with this, additional care must be taken to check for these sorts of conditions and delay a portion of the instruction pipeline if this occurs. Naturally, accomplishing this requires additional circuitry, so pipelined processors are more complex than subscalar ones (though not very significantly so). A pipelined processor can become very nearly scalar, inhibited only by pipeline stalls (an instruction spending more than one clock cycle in a stage).

Superscalarpipeline
Further improvement upon the idea of instruction pipelining led to the development of a method that decreases the idle time of CPU components even further. Designs that are said to be superscalar include a long instruction pipeline and multiple identical execution units. In a superscalar pipeline, multiple instructions are read and passed to a dispatcher, which decides whether or not the instructions can be executed in parallel (simultaneously). If so they are dispatched to available execution units, resulting in the ability for several instructions to be executed simultaneously. In general, the more instructions a superscalar CPU is able to dispatch simultaneously to waiting execution units, the more instructions will be completed in a given cycle.

Most of the difficulty in the design of a superscalar CPU architecture lies in creating an effective dispatcher. The dispatcher needs to be able to quickly and correctly determine whether instructions can be executed in parallel, as well as dispatch them in such a way as to keep as many execution units busy as possible. This requires that the instruction pipeline is filled as often as possible and gives rise to the need in superscalar architectures for significant amounts of CPU cache
CPU cache

A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access computer storage. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations....
. It also makes hazard
Hazard (computer architecture)

In computer architecture, a hazard is a potential problem that can happen in a Instruction pipelined central processing unit. It refers to the possibility of erroneous computation when a CPU tries to simultaneously execute multiple instructions which exhibit data dependence....
-avoiding techniques like branch prediction, speculative execution
Speculative execution

In computer science, speculative execution is the execution of Code , the result of which may not be needed. In the context of functional programming, the term "speculative evaluation" is used instead....
, and out-of-order execution
Out-of-order execution

In computer engineering, out-of-order execution, OoOE, is a paradigm used in most high-performance microprocessors to make use of Instruction cycle that would otherwise be wasted by a certain type of costly delay....
 crucial to maintaining high levels of performance. By attempting to predict which branch (or path) a conditional instruction will take, the CPU can minimize the number of times that the entire pipeline must wait until a conditional instruction is completed. Speculative execution often provides modest performance increases by executing portions of code that may or may not be needed after a conditional operation completes. Out-of-order execution somewhat rearranges the order in which instructions are executed to reduce delays due to data dependencies.

In the case where a portion of the CPU is superscalar and part is not, the part which is not suffers a performance penalty due to scheduling stalls. The original Intel Pentium (P5) had two superscalar ALUs which could accept one instruction per clock each, but its FPU could not accept one instruction per clock. Thus the P5 was integer superscalar but not floating point superscalar. Intel's successor to the Pentium architecture, P6
Intel P6

The P6 microarchitecture is the sixth generation Intel x86 microprocessor architecture, released in 1995 and is sometimes referenced as i686. It was succeeded by the Intel NetBurst microarchitecture in 2000, but eventually revived in the Pentium M line of microprocessors....
, added superscalar capabilities to its floating point features, and therefore afforded a significant increase in floating point instruction performance.

Both simple pipelining and superscalar design increase a CPU's ILP by allowing a single processor to complete execution of instructions at rates surpassing one instruction per cycle (IPC). Most modern CPU designs are at least somewhat superscalar, and nearly all general purpose CPUs designed in the last decade are superscalar. In later years some of the emphasis in designing high-ILP computers has been moved out of the CPU's hardware and into its software interface, or ISA. The strategy of the very long instruction word
Very long instruction word

Very Long Instruction Word or VLIW refers to a Central processing unit architecture designed to take advantage of instruction level parallelism ....
 (VLIW) causes some ILP to become implied directly by the software, reducing the amount of work the CPU must perform to boost ILP and thereby reducing the design's complexity.

Thread level parallelism
Another strategy of achieving performance is to execute multiple programs or threads
Thread (computer science)

In computer science, a thread of execution is a Fork of a computer program into two or more Concurrency running task s. The implementation of threads and process es differs from one operating system to another, but in most cases, a thread is contained inside a process....
 in parallel. This area of research is known as parallel computing
Parallel computing

Parallel computing is a form of computing in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved Concurrency ....
. In Flynn's taxonomy
Flynn's Taxonomy

Flynn's taxonomy is a classification of computer architectures, proposed by Michael J. Flynn in 1966....
, this strategy is known as Multiple Instructions-Multiple Data or MIMD.

One technology used for this purpose was multiprocessing
Multiprocessing

Multiprocessing is the use of two or more CPU within a single computer system. The term also refers to the ability of a system to support more than one processor and/or the ability to allocate tasks between them....
 (MP). The initial flavor of this technology is known as symmetric multiprocessing
Symmetric multiprocessing

In computing, symmetric multiprocessing or SMP involves a multiprocessor computer-architecture where two or more identical processors can connect to a single shared main memory....
 (SMP), where a small number of CPUs share a coherent view of their memory system. In this scheme, each CPU has additional hardware to maintain a constantly up-to-date view of memory. By avoiding stale views of memory, the CPUs can cooperate on the same program and programs can migrate from one CPU to another. To increase the number of cooperating CPUs beyond a handful, schemes such as non-uniform memory access
Non-Uniform Memory Access

Non-Uniform Memory Access or Non-Uniform Memory Architecture is a computer storage design used in multiprocessors, where the memory access time depends on the memory location relative to a processor....
 (NUMA) and directory-based coherence protocols were introduced in the 1990s. SMP systems are limited to a small number of CPUs while NUMA systems have been built with thousands of processors. Initially, multiprocessing was built using multiple discrete CPUs and boards to implement the interconnect between the processors. When the processors and their interconnect are all implemented on a single silicon chip, the technology is known as a multi-core
Multi-core (computing)

A multi-core processor combines two or more independent cores into a single package composed of a single integrated circuit , called a Die , or more dies packaged together....
 microprocessor.

It was later recognized that finer-grain parallelism existed with a single program. A single program might have several threads (or functions) that could be executed separately or in parallel. Some of earliest examples of this technology implemented input/output
Input/output

In computing, input/output, or I/O, refers to the communication between an information processing system , and the outside world ? possibly a human, or another information processing system....
 processing such as direct memory access
Direct memory access

Direct memory access is a feature of modern computers and microprocessors that allows certain hardware subsystems within the computer to access system Computer storage for reading and/or writing independently of the central processing unit....
 as a separate thread from the computation thread. A more general approach to this technology was introduced in the 1970s when systems were designed to run multiple computation threads in parallel. This technology is known as multi-threading (MT). This approach is considered more cost-effective than multiprocessing, as only a small number of components within a CPU is replicated in order to support MT as opposed to the entire CPU in the case of MP. In MT, the execution units and the memory system including the caches are shared among multiple threads. The downside of MT is that the hardware support for multithreading is more visible to software than that of MP and thus supervisor software like operating systems have to undergo larger changes to support MT. One type of MT that was implemented is known as block multithreading, where one thread is executed until it is stalled waiting for data to return from external memory. In this scheme, the CPU would then quickly switch to another thread which is ready to run, the switch often done in one CPU clock cycle. Another type of MT is known as simultaneous multithreading
Simultaneous multithreading

Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar Central processing unit with Multithreading ....
, where instructions of multiple threads are executed in parallel within one CPU clock cycle.

For several decades from the 1970s to early 2000s, the focus in designing high performance general purpose CPUs was largely on achieving high ILP through technologies such as pipelining, caches, superscalar execution, out-of-order execution, etc. This trend culminated in large, power-hungry CPUs such as the Intel Pentium 4
Pentium 4

The Pentium 4 brand refers to Intel's line of single-core mainstream Desktop computer and laptop central processing units introduced on November 20, 2000 ....
. By the early 2000s, CPU designers were thwarted from achieving higher performance from ILP techniques due to the growing disparity between CPU operating frequencies and main memory operating frequencies as well as escalating CPU power dissipation owing to more esoteric ILP techniques.

CPU designers then borrowed ideas from commercial computing markets such as transaction processing
Transaction processing

In computer science, transaction processing is information processing that is divided into individual, indivisible operations, called transactions. Each transaction must succeed or fail as a complete unit; it cannot remain in an intermediate state....
, where the aggregate performance of multiple programs, also known as throughput
Throughput

In communication networks, such as Ethernet or packet radio, throughput is the average rate of successful message delivery over a communication channel....
 computing, was more important than the performance of a single thread or program.

This reversal of emphasis is evidenced by the proliferation of dual and multiple core CMP (chip-level multiprocessing) designs and notably, Intel's newer designs resembling its less superscalar P6
Intel P6

The P6 microarchitecture is the sixth generation Intel x86 microprocessor architecture, released in 1995 and is sometimes referenced as i686. It was succeeded by the Intel NetBurst microarchitecture in 2000, but eventually revived in the Pentium M line of microprocessors....
 architecture. Late designs in several processor families exhibit CMP, including the x86-64
X86-64

x86-64 is a superset of the x86. x86-64 Central processing units can run existing 32-bit or 16-bit x86 programs at full speed, but also support new programs written with a 64-bit address space and other additional capabilities....
 Opteron
Opteron

The Opteron is Advanced Micro Devices's x86 server Central processing unit line, and was the first processor to implement the AMD64 instruction set architecture ....
 and Athlon 64 X2
Athlon 64 X2

The Athlon 64 X2 is the first multi-core desktop computer Central processing unit designed by AMD. It is essentially a processor consisting of two Athlon 64 cores joined together on one Die with additional control logic....
, the SPARC
SPARC

SPARC is a Reduced Instruction Set Computer microprocessor instruction set Computer architecture originally designed in 1985 by Sun Microsystems....
 UltraSPARC T1
UltraSPARC T1

Sun Microsystems' UltraSPARC T1 microprocessor, known until its 14 November 2005 announcement by its development codename "Niagara", is a multithreading, multicore central processing unit....
, IBM POWER4
POWER4

The POWER4 chip is a CPU that implements the 64-bit PowerPC Power Architecture. Released in 2001, the POWER4 chip is based on the previous POWER3 chip design....
 and POWER5
POWER5

POWER5 is a microprocessor developed and fabricated by IBM. It is an improved variant of the highly successful POWER4. The principal improvements are support for simultaneous multithreading and an Semiconductor-die cutting memory controller....
, as well as several video game console
Video game console

A video game console is an game development that produces a video signal which can be used with a display device to display a video game. The term "video game console" is used to distinguish a machine designed for consumers to buy and use solely for playing video games from a personal computer, which has many other functions, or arcade machi...
 CPUs like the Xbox 360
Xbox 360

The Xbox 360 is the second video game console produced by Microsoft, and the successor to the Xbox. The Xbox 360 competes with Sony's PlayStation 3 and Nintendo's Wii as part of the History of video game consoles of video game consoles....
's triple-core PowerPC design, and the PS3's 8-core Cell microprocessor.

Data parallelism

A less common but increasingly important paradigm of CPUs (and indeed, computing in general) deals with data parallelism. The processors discussed earlier are all referred to as some type of scalar device. As the name implies, vector processors deal with multiple pieces of data in the context of one instruction. This contrasts with scalar processors, which deal with one piece of data for every instruction. Using Flynn's taxonomy
Flynn's Taxonomy

Flynn's taxonomy is a classification of computer architectures, proposed by Michael J. Flynn in 1966....
, these two schemes of dealing with data are generally referred to as SISD
SISD

In computing, SISD is a term referring to an computer architecture in which a single processor, an uniprocessor, executes a single instruction stream, to operate on data stored in a single memory....
 (single instruction, single data) and SIMD
SIMD

In computing, SIMD is a technique employed to achieve data level parallelism....
 (single instruction, multiple data), respectively. The great utility in creating CPUs that deal with vectors of data lies in optimizing tasks that tend to require the same operation (for example, a sum or a dot product
Dot product

In mathematics, the dot product, also known as the scalar product, is an operation which takes two vector over the real numbers R and returns a real-valued scalar quantity....
) to be performed on a large set of data. Some classic examples of these types of tasks are multimedia
Multimedia

Multimedia is media and content that utilizes a combination of different content format. The term can be used as a noun or as an adjective describing a medium as having multiple content forms....
 applications (images, video, and sound), as well as many types of scientific and engineering tasks. Whereas a scalar CPU must complete the entire process of fetching, decoding, and executing each instruction and value in a set of data, a vector CPU can perform a single operation on a comparatively large set of data with one instruction. Of course, this is only possible when the application tends to require many steps which apply one operation to a large set of data.

Most early vector CPUs, such as the Cray-1
Cray-1

The Cray-1 was a supercomputer designed by a team including Seymour Cray for Cray Research. The first Cray-1 system was installed at Los Alamos National Laboratory in 1976, and it went on to become one of the best known and most successful supercomputers in history....
, were associated almost exclusively with scientific research and cryptography
Cryptography

Cryptography is the practice and study of hiding information. In modern times cryptography is considered a branch of both mathematics and computer science and is affiliated closely with information theory, computer security and engineering....
 applications. However, as multimedia has largely shifted to digital media, the need for some form of SIMD in general-purpose CPUs has become significant. Shortly after floating point execution units
Floating point unit

A floating-point unit is a part of a computer system specially designed to carry out operations on floating point numbers. Typical operations are addition, subtraction, multiplication, division , and square root....
 started to become commonplace to include in general-purpose processors, specifications for and implementations of SIMD execution units also began to appear for general-purpose CPUs. Some of these early SIMD specifications like Intel's MMX were integer-only. This proved to be a significant impediment for some software developers, since many of the applications that benefit from SIMD primarily deal with floating point
Floating point

In computing, floating point describes a system for numerical representation in which a String of digits represents a rational number.The term floating point refers to the fact that the radix point can "float": that is, it can be placed anywhere relative to the Significant figures of the number....
 numbers. Progressively, these early designs were refined and remade into some of the common, modern SIMD specifications, which are usually associated with one ISA. Some notable modern examples are Intel's SSE
Streaming SIMD Extensions

In computing, Streaming SIMD Extensions is a SIMD instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series processors as a reply to AMD's 3DNow! ....
 and the PowerPC-related AltiVec
AltiVec

AltiVec is a floating point and integer SIMD instruction set designed and owned by Apple Inc., International Business Machines and Freescale Semiconductor, formerly the Semiconductor Products Sector of Motorola, , and implemented on versions of the PowerPC including Motorola's PowerPC G4, IBM's PowerPC 970 and POWER6 processors, and P.A....
 (also known as VMX).

See also


External links

Microprocessor designers
  • - Advanced Micro Devices
    Advanced Micro Devices

    Advanced Micro Devices, Inc. is an United States multinational corporation semiconductor industry company based in Sunnyvale, California, that develops Central processing unit and related technologies for commercial and consumer markets....
    , a designer of primarily x86-compatible personal computer CPUs.
  • - ARM Ltd, one of the few CPU designers that profits solely by licensing their designs rather than manufacturing them. ARM architecture
    ARM architecture

    The ARM architecture is a 32-bit RISC central processing unit architecture developed by ARM Limited that is widely used in embedded system designs....
     microprocessors are among the most popular in the world for embedded applications.
  • (formerly of Motorola
    Motorola

    Motorola, Inc. is an United States, multinational, Fortune 100, telecommunications company based in Schaumburg, Illinois. It is a manufacturer of wireless telephone handsets, also designing and selling wireless network infrastructure equipment such as cellular transmission base stations and signal amplifiers....
    ) - Freescale Semiconductor
    Freescale Semiconductor

    Freescale Semiconductor, Inc. is an American semiconductor manufacturer. It was created by the divestiture of the Semiconductor Products Sector of Motorola in 2004....
    , designer of several embedded and SoC
    System-on-a-chip

    System-on-a-chip or system on chip refers to integrating all components of a computer or other Electronics system into a single integrated circuit ....
     PowerPC based processors.
  • - Microelectronics division of IBM
    IBM

    International Business Machines Corporation, abbreviated IBM and nicknamed "Big Blue" , is a multinational corporation computer technology and consulting corporation headquartered in Armonk, New York, New York, United States....
    , which is responsible for many POWER
    IBM POWER

    POWER is a RISC instruction set architecture designed by International Business Machines. The name is a backronym for Performance Optimization With Enhanced RISC....
     and PowerPC
    PowerPC

    PowerPC is a RISC instruction set architecture created by the 1991 Apple Inc.?IBM?Motorola alliance, known as AIM alliance. Originally intended for personal computers, PowerPC CPUs have since become popular embedded system and high-performance processors....
     based designs, including many of the CPUs utilized in late video game console
    Video game console

    A video game console is an game development that produces a video signal which can be used with a display device to display a video game. The term "video game console" is used to distinguish a machine designed for consumers to buy and use solely for playing video games from a personal computer, which has many other functions, or arcade machi...
    s.
  • - Intel, a maker of several notable CPU lines, including IA-32
    IA-32

    IA-32 , often generically called x86 or x86-32, is the instruction set architecture of Intel's most commercially successful microprocessors....
    , IA-64, and XScale. Also a producer of various peripheral chips for use with their CPUs.
  • - MIPS Technologies
    MIPS Technologies

    MIPS Technologies, Inc. , formerly MIPS Computer Systems, is most widely known for developing the MIPS architecture and a series of pioneering Reduced instruction set computer Central processing unit....
    , developers of the MIPS architecture
    MIPS architecture

    MIPS is a RISC instruction set architecture developed by MIPS Technologies . In the mid to late 1990s, it was estimated that one in three RISC microprocessors produced were MIPS implementations....
    , a pioneer in RISC designs.
  • - , developers of the , , and .
  • - Sun Microsystems
    Sun Microsystems

    Sun Microsystems, Inc. is a multinational corporation vendor of computers, computer components, computer software, and information technology services, founded on February 24, 1982....
    , developers of the SPARC
    SPARC

    SPARC is a Reduced Instruction Set Computer microprocessor instruction set Computer architecture originally designed in 1985 by Sun Microsystems....
     architecture, a RISC design.
  • - Texas Instruments
    Texas Instruments

    Texas Instruments , better known in the electronics industry as TI, is an United States company based in Dallas, Texas, Texas, United States, renowned for developing and commercializing semiconductor and computer technology....
     semiconductor division. Designs and manufactures several types of low-power microcontrollers among their many other semiconductor products.
  • - Transmeta
    Transmeta

    Transmeta Corporation was a United States-based corporation that licensed low power semiconductor intellectual property. Transmeta originally produced very long instruction word code morphing microprocessors, with a focus on reducing power consumption in electronic devices....
     Corporation. Creators of low-power x86 compatibles like Crusoe
    Crusoe

    Crusoe may refer to:* Crusoe , a 2008 television series based on the novel Robinson Crusoe* Crusoe , a 1989 film by Caleb Deschanel, based on the novel...
     and Efficeon.
  • - Taiwanese maker of low-power x86-compatible CPUs.


Further reading