Home      Discussion      Topics      Dictionary      Almanac
Signup       Login
Multi-core (computing)

Multi-core (computing)

Overview


A multi-core processor is a single computing
Computing
Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...

 component with two or more independent actual processors (called "cores"), which are the units that read and execute program instructions. The instructions are ordinary cpu instructions
Instruction set
An instruction set, or instruction set architecture , is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O...

 like add, move data, and branch, but the multiple cores can run multiple instructions at the same time, increasing overall speed for programs amenable to parallel computing
Parallel computing
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...

. Manufacturers typically integrate the cores onto a single integrated circuit
Integrated circuit
An integrated circuit or monolithic integrated circuit is an electronic circuit manufactured by the patterned diffusion of trace elements into the surface of a thin substrate of semiconductor material...

 die
Die (integrated circuit)
A die in the context of integrated circuits is a small block of semiconducting material, on which a given functional circuit is fabricated.Typically, integrated circuits are produced in large batches on a single wafer of electronic-grade silicon or other semiconductor through processes such as...

 (known as a chip multiprocessor or CMP), or onto multiple dies in a single chip package
Chip carrier
A chip carrier, also known as a chip container or chip package, is a container for a transistor or an integrated circuit. The carrier usually provides metal leads, or "pins", which are sturdy enough to electrically and mechanically connect the fragile chip to a circuit board. This connection may be...

.

Processors were originally developed with only one core.
Discussion
Ask a question about 'Multi-core (computing)'
Start a new discussion about 'Multi-core (computing)'
Answer questions from other users
Full Discussion Forum
 
Unanswered Questions
Encyclopedia


A multi-core processor is a single computing
Computing
Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...

 component with two or more independent actual processors (called "cores"), which are the units that read and execute program instructions. The instructions are ordinary cpu instructions
Instruction set
An instruction set, or instruction set architecture , is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O...

 like add, move data, and branch, but the multiple cores can run multiple instructions at the same time, increasing overall speed for programs amenable to parallel computing
Parallel computing
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...

. Manufacturers typically integrate the cores onto a single integrated circuit
Integrated circuit
An integrated circuit or monolithic integrated circuit is an electronic circuit manufactured by the patterned diffusion of trace elements into the surface of a thin substrate of semiconductor material...

 die
Die (integrated circuit)
A die in the context of integrated circuits is a small block of semiconducting material, on which a given functional circuit is fabricated.Typically, integrated circuits are produced in large batches on a single wafer of electronic-grade silicon or other semiconductor through processes such as...

 (known as a chip multiprocessor or CMP), or onto multiple dies in a single chip package
Chip carrier
A chip carrier, also known as a chip container or chip package, is a container for a transistor or an integrated circuit. The carrier usually provides metal leads, or "pins", which are sturdy enough to electrically and mechanically connect the fragile chip to a circuit board. This connection may be...

.

Processors were originally developed with only one core. A many-core processor is a multi-core processor in which the number of cores is large enough that traditional multi-processor techniques are no longer efficientlargely because of issues with congestion in supplying instructions and data to the many processors. The many-core threshold is roughly in the range of several tens of cores; above this threshold network on chip
Network On Chip
Network-on-Chip or Network-on-a-Chip is an approach to designing the communication subsystem between IP cores in a System-on-a-Chip . NoCs can span synchronous and asynchronous clock domains or use unclocked asynchronous logic...

 technology is advantageous. Tilera
Tilera
Tilera Corporation is a fabless semiconductor company focusing on scalable multicore embedded processor design. The company is currently shipping multiple processors, including the TILE64, TILEPro64, and the TILEPro36, TILE-Gx36, TILE-Gx16 and TILE-Gx9...

 processors feature a switch in each core to route data through an on-chip mesh network to lessen the data congestion, enabling their core count to scale up to 100 cores.

A dual-core processor has two cores (e.g. AMD Phenom II X2, Intel Core Duo), a quad-core processor contains four cores (e.g. AMD Phenom II X4, the Intel 2010 core line that includes three levels of quad-core processors, see i3, i5, and i7 at Intel Core
Intel Core
Yonah was the code name for Intel's first generation of 65 nm process mobile microprocessors, based on the Banias/Dothan-core Pentium M microarchitecture. SIMD performance has been improved through the addition of SSE3 instructions and improvements to SSE and SSE2 implementations, while integer...

), a hexa-core processor contains six cores (e.g. AMD Phenom II X6
Phenom II
Phenom II is a family of AMD's multi-core 45 nm processors using the AMD K10 microarchitecture, succeeding the original Phenom. Advanced Micro Devices released the Socket AM2+ version of Phenom II in December 2008, while Socket AM3 versions with DDR3 support, along with an initial batch of...

, Intel Core i7 Extreme Edition 980X), an octa-core processor containes eight cores (e.g. AMD FX-8150). A multi-core processor implements multiprocessing
Multiprocessing
Multiprocessing is the use of two or more central processing units within a single computer system. The term also refers to the ability of a system to support more than one processor and/or the ability to allocate tasks between them...

 in a single physical package. Designers may couple cores in a multi-core device tightly or loosely. For example, cores may or may not share cache
Cache
In computer engineering, a cache is a component that transparently stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere...

s, and they may implement message passing
Message passing
Message passing in computer science is a form of communication used in parallel computing, object-oriented programming, and interprocess communication. In this model, processes or objects can send and receive messages to other processes...

 or shared memory
Shared memory
In computing, shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Depending on context, programs may run on a single processor or on multiple separate processors...

 inter-core communication methods. Common network topologies
Network topology
Network topology is the layout pattern of interconnections of the various elements of a computer or biological network....

 to interconnect cores include bus, ring, two-dimensional mesh, and crossbar
Crossbar switch
In electronics, a crossbar switch is a switch connecting multiple inputs to multiple outputs in a matrix manner....

. Homogeneous multi-core systems include only identical cores, heterogeneous
Heterogeneous computing
Heterogeneous computing systems refer to electronic systems that use a variety of different types of computational units. A computational unit could be a general-purpose processor , a special-purpose processor Heterogeneous computing systems refer to electronic systems that use a variety of...

 multi-core systems have cores which are not identical. Just as with single-processor systems, cores in multi-core systems may implement architectures such as superscalar
Superscalar
A superscalar CPU architecture implements a form of parallelism called instruction level parallelism within a single processor. It therefore allows faster CPU throughput than would otherwise be possible at a given clock rate...

, VLIW, vector processing
Vector processor
A vector processor, or array processor, is a central processing unit that implements an instruction set containing instructions that operate on one-dimensional arrays of data called vectors. This is in contrast to a scalar processor, whose instructions operate on single data items...

, SIMD
SIMD
Single instruction, multiple data , is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously...

, or multithreading
Multithreading (computer hardware)
Multithreading computers have hardware support to efficiently execute multiple threads. These are distinguished from multiprocessing systems in that the threads have to share the resources of a single core: the computing units, the CPU caches and the translation lookaside buffer...

.

Multi-core processors are widely used across many application domains including general-purpose, embedded, network
Network processor
A network processor is an integrated circuit which has a feature set specifically targeted at the networking application domain.Network processors are typically software programmable devices and would have generic characteristics similar to general purpose central processing units that are commonly...

, digital signal processing
Digital signal processing
Digital signal processing is concerned with the representation of discrete time signals by a sequence of numbers or symbols and the processing of these signals. Digital signal processing and analog signal processing are subfields of signal processing...

 (DSP), and graphics
Graphics processing unit
A graphics processing unit or GPU is a specialized circuit designed to rapidly manipulate and alter memory in such a way so as to accelerate the building of images in a frame buffer intended for output to a display...

.

The improvement in performance gained by the use of a multi-core processor depends very much on the software algorithms used and their implementation. In particular, possible gains are limited by the fraction of the software that can be parallelized
Parallel processing
Parallel processing is the ability to carry out multiple operations or tasks simultaneously. The term is used in the contexts of both human cognition, particularly in the ability of the brain to simultaneously process incoming stimuli, and in parallel computing by machines.-Parallel processing by...

 to run on multiple cores simultaneously; this effect is described by Amdahl's law
Amdahl's law
Amdahl's law, also known as Amdahl's argument, is named after computer architect Gene Amdahl, and is used to find the maximum expected improvement to an overall system when only part of the system is improved...

. In the best case, so-called embarrassingly parallel
Embarrassingly parallel
In parallel computing, an embarrassingly parallel workload is one for which little or no effort is required to separate the problem into a number of parallel tasks...

 problems may realize speedup factors near the number of cores, or even more if the problem is split up enough to fit within each core's cache(s), avoiding use of much slower main system memory. Most applications, however, are not accelerated so much unless programmers invest a prohibitive amount of effort in re-factoring the whole problem. The parallelization of software is a significant ongoing topic of research.

Terminology


The terms multi-core and dual-core most commonly refer to some sort of central processing unit
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...

 (CPU), but are sometimes also applied to digital signal processor
Digital signal processor
A digital signal processor is a specialized microprocessor with an architecture optimized for the fast operational needs of digital signal processing.-Typical characteristics:...

s (DSP) and system-on-a-chip
System-on-a-chip
A system on a chip or system on chip is an integrated circuit that integrates all components of a computer or other electronic system into a single chip. It may contain digital, analog, mixed-signal, and often radio-frequency functions—all on a single chip substrate...

 (SoC). The terms are generally used only to refer to multi-core microprocessors that are manufactured on the same integrated circuit die
Die (integrated circuit)
A die in the context of integrated circuits is a small block of semiconducting material, on which a given functional circuit is fabricated.Typically, integrated circuits are produced in large batches on a single wafer of electronic-grade silicon or other semiconductor through processes such as...

; separate microprocessor dies in the same package are generally referred to by another name, such as multi-chip module
Multi-Chip Module
A multi-chip module is a specialized electronic package where multiple integrated circuits , semiconductor dies or other discrete components are packaged onto a unifying substrate, facilitating their use as a single component...

. This article uses the terms "multi-core" and "dual-core" for CPUs manufactured on the same integrated circuit, unless otherwise noted.

In contrast to multi-core systems, the term multi-CPU refers to multiple physically separate processing-units (which often contain special circuitry to facilitate communication between each other).

The terms many-core and massively multi-core are sometimes used to describe multi-core architectures with an especially high number of cores (tens or hundreds).

Some systems use many soft microprocessor
Soft microprocessor
A soft microprocessor is a microprocessor core that can be wholly implemented using logic synthesis...

 cores placed on a single FPGA
Field-programmable gate array
A field-programmable gate array is an integrated circuit designed to be configured by the customer or designer after manufacturing—hence "field-programmable"...

. Each "core" can be considered a "semiconductor intellectual property core
Semiconductor intellectual property core
In electronic design a semiconductor intellectual property core, IP core, or IP block is a reusable unit of logic, cell, or chip layout design that is the intellectual property of one party. IP cores may be licensed to another party or can be owned and used by a single party alone...

" as well as a CPU core.

Development


While manufacturing technology improves, reducing the size of individual gates, physical limits of semiconductor-based microelectronics have become a major design concern. These physical limitations can cause significant heat dissipation and data synchronization problems. Various other methods are used to improve CPU performance. Some instruction-level parallelism (ILP) methods such as superscalar
Superscalar
A superscalar CPU architecture implements a form of parallelism called instruction level parallelism within a single processor. It therefore allows faster CPU throughput than would otherwise be possible at a given clock rate...

 pipelining are suitable for many applications, but are inefficient for others that contain difficult-to-predict code. Many applications are better suited to thread level parallelism (TLP) methods, and multiple independent CPUs are commonly used to increase a system's overall TLP. A combination of increased available space (due to refined manufacturing processes) and the demand for increased TLP led to the development of multi-core CPUs.

Commercial incentives


Several business motives drive the development of dual-core architectures. For decades, it was possible to improve performance of a CPU by shrinking the area of the integrated circuit, which drove down the cost per device on the IC. Alternatively, for the same circuit area, more transistors could be utilized in the design, which increased functionality, especially for CISC architectures. Clock rate
Clock rate
The clock rate typically refers to the frequency that a CPU is running at.For example, a crystal oscillator frequency reference typically is synonymous with a fixed sinusoidal waveform, a clock rate is that frequency reference translated by electronic circuitry into a corresponding square wave...

s also increased by orders of magnitude in the decades of the late 20th century, from several megahertz in the 1980s to several gigahertz in the early 2000s.

As manufacturing techniques reach theoretical limits in miniaturization and clock speed, increased use of parallel computing in the form of multi-core processors has been perused to improve overall processing performance. Multiple cores were used on the same CPU chip, sale which could then fund further research and development of multiple-core processors. Intel has produced a 48-core processor for research in cloud computing.

Technical factors


Since computer manufacturers have long implemented symmetric multiprocessing
Symmetric multiprocessing
In computing, symmetric multiprocessing involves a multiprocessor computer hardware architecture where two or more identical processors are connected to a single shared main memory and are controlled by a single OS instance. Most common multiprocessor systems today use an SMP architecture...

 (SMP) designs using discrete CPUs, the issues regarding implementing multi-core processor architecture and supporting it with software are well known.

Additionally:
  • Utilizing a proven processing-core design without architectural changes reduces design risk significantly.
  • For general-purpose processors, much of the motivation for multi-core processors comes from greatly diminished gains in processor performance from increasing the operating frequency
    Frequency scaling
    In computer architecture, frequency scaling is the technique of ramping a processor's frequency so as to achieve performance gains...

    . This is due to three primary factors:
    1. The memory wall; the increasing gap between processor and memory speeds. This effect pushes cache sizes larger in order to mask the latency of memory. This helps only to the extent that memory bandwidth is not the bottleneck in performance.
    2. The ILP wall; the increasing difficulty of finding enough parallelism in a single instructions stream to keep a high-performance single-core processor busy.
    3. The power wall; the trend of consuming exponentially increasing power with each factorial increase of operating frequency. This increase can be mitigated by "shrinking
      Die shrink
      The term "die shrink" refers to a simple semiconductor scaling of semiconductor devices, mainly transistors. The act of shrinking a die is to create a somewhat identical circuitry using a more advanced fabrication process, usually involving an advance of lithographic node...

      " the processor by using smaller traces for the same logic. The power wall poses manufacturing, system design and deployment problems that have not been justified in the face of the diminished gains in performance due to the memory wall and ILP wall.


In order to continue delivering regular performance improvements for general-purpose processors, manufacturers such as Intel and AMD have turned to multi-core designs, sacrificing lower manufacturing-costs for higher performance in some applications and systems. Multi-core architectures are being developed, but so are the alternatives. An especially strong contender for established markets is the further integration of peripheral functions into the chip.

Advantages


The proximity of multiple CPU cores on the same die allows the cache coherency
Cache coherency
In computing, cache coherence refers to the consistency of data stored in local caches of a shared resource.When clients in a system maintain caches of a common memory resource, problems may arise with inconsistent data. This is particularly true of CPUs in a multiprocessing system...

 circuitry to operate at a much higher clock-rate than is possible if the signals have to travel off-chip. Combining equivalent CPUs on a single die significantly improves the performance of cache snoop (alternative: Bus snooping) operations. Put simply, this means that signals
Discrete signal
A discrete signal or discrete-time signal is a time series consisting of a sequence of qualities...

 between different CPUs travel shorter distances, and therefore those signals degrade
Degradation (telecommunications)
In telecommunication, degradation, which may be categorized as either "graceful" or "catastrophic", has the following meanings:#The deterioration in quality, level, or standard of performance of a functional unit....

 less. These higher-quality signals allow more data to be sent in a given time period, since individual signals can be shorter and do not need to be repeated as often.

The largest boost in performance will likely be noticed in improved response-time while running CPU-intensive processes, like antivirus scans, ripping/burning media (requiring file conversion), or file searching. For example, if the automatic virus-scan runs while a movie is being watched, the application running the movie is far less likely to be starved of processor power, as the antivirus program will be assigned to a different processor core than the one running the movie playback.

Assuming that the die can fit into the package, physically, the multi-core CPU designs require much less printed circuit board (PCB)
Printed circuit board
A printed circuit board, or PCB, is used to mechanically support and electrically connect electronic components using conductive pathways, tracks or signal traces etched from copper sheets laminated onto a non-conductive substrate. It is also referred to as printed wiring board or etched wiring...

 space than do multi-chip SMP designs. Also, a dual-core processor uses slightly less power than two coupled single-core processors, principally because of the decreased power required to drive signals external to the chip. Furthermore, the cores share some circuitry, like the L2 cache and the interface to the front side bus (FSB)
Front side bus
A front-side bus is a computer communication interface often used in computers during the 1990s and 2000s.It typically carries data between the central processing unit and a memory controller hub, known as the northbridge....

. In terms of competing technologies for the available silicon die area, multi-core design can make use of proven CPU core library designs and produce a product with lower risk of design error than devising a new wider core-design. Also, adding more cache suffers from diminishing returns.

Multi-core chips also allow higher performance at lower energy. This can be a big factor in mobile devices that operate on batteries. Since each core in multi-core is generally more energy-efficient, the chip becomes more efficient than having a single large monolithic core. This allows to get higher performance with less energy. The challenge of writing parallel code clearly offsets this benefit.

Disadvantages


Maximizing the utilization of the computing resources provided by multi-core processors requires adjustments both to the operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

 (OS) support and to existing application software. Also, the ability of multi-core processors to increase application performance depends on the use of multiple threads within applications. The situation is improving: for example the Valve Corporation
Valve Corporation
Valve Corporation is an American video game development and digital distribution company based in Bellevue, Washington, United States...

's Source engine offers multi-core support, and Crytek
Crytek
Crytek is a German video game company founded in 1999 by three Turkish brothers: Cevat, Avni and Faruk Yerli. Crytek's main headquarters are in Frankfurt, Germany, with five other studios in Kiev, Budapest, Nottingham, Sofia and Seoul. The company is best known for developing the game Far Cry and...

 has developed similar technologies for CryEngine 2, which powers their game, Crysis
Crysis
Crysis is a science fiction first-person shooter video game developed by Crytek , published by Electronic Arts for Microsoft Windows, and released in November 2007. It is the first game of a trilogy. A separate game entitled Crysis Warhead was released on September 12, 2008, and follows similar...

. Emergent Game Technologies' Gamebryo
Gamebryo
Gamebryo is a game engine, originally from Numerical Design Limited , and the successor to NDL's NetImmerse engine.Since the creation of Gamebryo, NDL merged into Emergent Game Technologies...

 engine includes their Floodgate technology which simplifies multicore development across game platforms. In addition, Apple Inc.'s second latest OS, Mac OS X Snow Leopard  has a built-in multi-core facility called Grand Central Dispatch for Intel CPUs.

Integration of a multi-core chip drives chip production yields down and they are more difficult to manage thermally than lower-density single-chip designs. Intel has partially countered this first problem by creating its quad-core designs by combining two dual-core on a single die with a unified cache, hence any two working dual-core dies can be used, as opposed to producing four cores on a single die and requiring all four to work to produce a quad-core. From an architectural point of view, ultimately, single CPU designs may make better use of the silicon surface area than multiprocessing cores, so a development commitment to this architecture may carry the risk of obsolescence. Finally, raw processing power is not the only constraint on system performance. Two processing cores sharing the same system bus and memory bandwidth limits the real-world performance advantage. If a single core is close to being memory-bandwidth limited, going to dual-core might only give 30% to 70% improvement. If memory bandwidth is not a problem, a 90% improvement can be expected. It would be possible for an application that used two CPUs to end up running faster on one dual-core if communication between the CPUs was the limiting factor, which would count as more than 100% improvement.

Trends


The general trend in processor development has moved from dual-, tri-, quad-, hexa-, octo-core chips to ones with tens or even hundreds of cores. In addition, multi-core chips mixed with simultaneous multithreading
Simultaneous multithreading
Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading...

, memory-on-chip, and special-purpose "heterogeneous"
Heterogeneous computing
Heterogeneous computing systems refer to electronic systems that use a variety of different types of computational units. A computational unit could be a general-purpose processor , a special-purpose processor Heterogeneous computing systems refer to electronic systems that use a variety of...

 cores promise further performance and efficiency gains, especially in processing multimedia, recognition and networking applications. There is also a trend of improving energy-efficiency by focusing on performance-per-watt with advanced fine-grain or ultra fine-grain power management
Power management
Power management is a feature of some electrical appliances, especially copiers, computers and computer peripherals such as monitors and printers, that turns off the power or switches the system to a low-power state when inactive. In computing this is known as PC power management and is built...

 and dynamic voltage and frequency scaling (i.e. laptop
Laptop
A laptop, also called a notebook, is a personal computer for mobile use. A laptop integrates most of the typical components of a desktop computer, including a display, a keyboard, a pointing device and speakers into a single unit...

 computers and portable media player
Portable media player
A portable media player or digital audio player, is a consumer electronics device that is capable of storing and playing digital media such as audio, images, video, documents, etc. the data is typically stored on a hard drive, microdrive, or flash memory. In contrast, analog portable audio...

s).

Architecture


The composition and balance of the cores in multi-core architecture show great variety. Some architectures use one core design repeated consistently ("homogeneous"), while others use a mixture of different cores, each optimized for a different, "heterogeneous
Heterogeneous computing
Heterogeneous computing systems refer to electronic systems that use a variety of different types of computational units. A computational unit could be a general-purpose processor , a special-purpose processor Heterogeneous computing systems refer to electronic systems that use a variety of...

" role.

The article CPU designers debate multi-core future by Rick Merritt, EE Times 2008, includes comments:
"Chuck Moore [...] suggested computers should be more like cellphones, using a variety of specialty cores to run modular software scheduled by a high-level applications programming interface.
[...] Atsushi Hasegawa, a senior chief engineer at Renesas, generally agreed. He suggested the cellphone's use of many specialty cores working in concert is a good model for future multi-core designs.
[...] Anant Agarwal
Anant Agarwal
Anant Agarwal is a computer architecture researcher. He is a professor of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology. He has also founded the Tilera corporation....

, founder and chief executive of startup Tilera
Tilera
Tilera Corporation is a fabless semiconductor company focusing on scalable multicore embedded processor design. The company is currently shipping multiple processors, including the TILE64, TILEPro64, and the TILEPro36, TILE-Gx36, TILE-Gx16 and TILE-Gx9...

, took the opposing view. He said multi-core chips need to be homogeneous collections of general-purpose cores to keep the software model simple."

Software impact


An outdated version of an anti-virus application may create a new thread for a scan process, while its GUI
Graphical user interface
In computing, a graphical user interface is a type of user interface that allows users to interact with electronic devices with images rather than text commands. GUIs can be used in computers, hand-held devices such as MP3 players, portable media players or gaming devices, household appliances and...

 thread waits for commands from the user (e.g. cancel the scan). In such cases, a multicore architecture is of little benefit for the application itself due to the single thread doing all heavy lifting and the inability to balance the work evenly across multiple cores. Programming truly multithreaded code often requires complex co-ordination of threads and can easily introduce subtle and difficult-to-find bugs due to the interleaving of processing on data shared between threads (thread-safety). Consequently, such code is much more difficult to debug than single-threaded code when it breaks. There has been a perceived lack of motivation for writing consumer-level threaded applications because of the relative rarity of consumer-level demand for maximum utilisation of computer hardware. Although threaded applications incur little additional performance penalty on single-processor machines, the extra overhead of development has been difficult to justify due to the preponderance of single-processor machines. Also, serial tasks like decoding the entropy encoding
Entropy encoding
In information theory an entropy encoding is a lossless data compression scheme that is independent of the specific characteristics of the medium....

 algorithms used in video codec
Video codec
A video codec is a device or software that enables video compression and/or decompression for digital video. The compression usually employs lossy data compression. Historically, video was stored as an analog signal on magnetic tape...

s are impossible to parallelize because each result generated is used to help create the next result of the entropy decoding algorithm.

Given the increasing emphasis on multicore chip design, stemming from the grave thermal and power consumption problems posed by any further significant increase in processor clock speeds, the extent to which software can be multithreaded to take advantage of these new chips is likely to be the single greatest constraint on computer performance in the future. If developers are unable to design software to fully exploit the resources provided by multiple cores, then they will ultimately reach an insurmountable performance ceiling.

The telecommunications market had been one of the first that needed a new design of parallel datapath packet processing because there was a very quick adoption of these multiple-core processors for the datapath and the control plane. These MPUs are going to replace the traditional Network Processors that were based on proprietary micro- or pico-code.

Parallel programming techniques can benefit from multiple cores directly. Some existing parallel programming model
Parallel programming model
A parallel programming model is a concept that enables the expression of parallel programs which can be compiled and executed. The value of a programming model is usually judged on its generality: how well a range of different problems can be expressed and how well they execute on a range of...

s such as Cilk++, OpenMP
OpenMP
OpenMP is an API that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran, on most processor architectures and operating systems, including Linux, Unix, AIX, Solaris, Mac OS X, and Microsoft Windows platforms...

, OpenHMPP, FastFlow, Skandium, and MPI
Message Passing Interface
Message Passing Interface is a standardized and portable message-passing system designed by a group of researchers from academia and industry to function on a wide variety of parallel computers...

 can be used on multi-core platforms. Intel introduced a new abstraction for C++ parallelism called TBB
Intel Threading Building Blocks
Intel Threading Building Blocks is a C++ template library developed by Intel Corporation for writing software programs that take advantage of multi-core processors...

. Other research efforts include the Codeplay Sieve System
Sieve C++ Parallel Programming System
The Sieve C++ Parallel Programming System is a C++ compiler and parallel runtime designed and released by Codeplay that aims to simplify the parallelization of code so that it may run efficiently on multi-processor or multi-core systems...

, Cray's Chapel
Chapel programming language
Chapel is a new parallel programming language developed by Cray. It is being developed as part of the Cray Cascade project, a participant in DARPA's High Productivity Computing Systems program, which has the goal of increasing supercomputer productivity by the year 2010...

, Sun's Fortress
Fortress programming language
Fortress is a programming language designed for high-performance computing. It was created by Sun Microsystems with funding from DARPA's High Productivity Computing Systems project. One of the language designers is Guy L...

, and IBM's X10
X10 (programming language)
X10 is a programming language being developed by IBM at the Thomas J. Watson Research Center as part of the Productive, Easy-to-use, Reliable Computing System project funded by DARPA's High Productivity Computing Systems program...

.

Multi-core processing has also affected the ability of modern computational software development. Developers programming in newer languages might find that their modern languages do not support multi-core functionality. This then requires the use of numerical libraries to access code written in languages like C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

 and Fortran
Fortran
Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...

, which perform math computations faster than newer languages like C#. Intel's MKL and AMD's ACML are written in these native languages and take advantage of multi-core processing.

Managing concurrency
Concurrent computing
Concurrent computing is a form of computing in which programs are designed as collections of interacting computational processes that may be executed in parallel...

 acquires a central role in developing parallel applications. The basic steps in designing parallel applications are:

Partitioning : The partitioning stage of a design is intended to expose opportunities for parallel execution. Hence, the focus is on defining a large number of small tasks in order to yield what is termed a fine-grained decomposition of a problem.

Communication : The tasks generated by a partition are intended to execute concurrently but cannot, in general, execute independently. The computation to be performed in one task will typically require data associated with another task. Data must then be transferred between tasks so as to allow computation to proceed. This information flow is specified in the communication phase of a design.

Agglomeration : In the third stage, development moves from the abstract toward the concrete. Developers revisit decisions made in the partitioning and communication phases with a view to obtaining an algorithm that will execute efficiently on some class of parallel computer. In particular, developers consider whether it is useful to combine, or agglomerate, tasks identified by the partitioning phase, so as to provide a smaller number of tasks, each of greater size. They also determine whether it is worthwhile to replicate data and/or computation.

Mapping : In the fourth and final stage of the design of parallel algorithms, the developers specify where each task is to execute. This mapping problem does not arise on uniprocessors or on shared-memory computers that provide automatic task scheduling.

On the other hand, on the server side
Server-side
Server-side refers to operations that are performed by the server in a client–server relationship in computer networking.Typically, a server is a software program, such as a web server, that runs on a remote server, reachable from a user's local computer or workstation...

, multicore processors are ideal because they allow many users to connect to a site simultaneously and have independent threads
Thread (computer science)
In computer science, a thread of execution is the smallest unit of processing that can be scheduled by an operating system. The implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process...

 of execution. This allows for Web servers and application servers that have much better throughput
Throughput
In communication networks, such as Ethernet or packet radio, throughput or network throughput is the average rate of successful message delivery over a communication channel. This data may be delivered over a physical or logical link, or pass through a certain network node...

.

Licensing


Typically, proprietary enterprise-server software is licensed "per processor". In the past a CPU was a processor and most computers had only one CPU, so there was no ambiguity.

Now there is the possibility of counting cores as processors and charging a customer for multiple licenses for a multi-core CPU. However, the trend seems to be counting dual-core chips as a single processor: Microsoft, Intel, and AMD support this view. Microsoft have said they would treat a socket as a single processor.

Oracle counts an AMD X2 or Intel dual-core CPU as a single processor but has other numbers for other types, especially for processors with more than two cores. IBM and HP count a multi-chip module as multiple processors. If multi-chip modules count as one processor, CPU makers have an incentive to make large expensive multi-chip modules so their customers save on software licensing. It seems that the industry is slowly heading towards counting each die (see Integrated circuit
Integrated circuit
An integrated circuit or monolithic integrated circuit is an electronic circuit manufactured by the patterned diffusion of trace elements into the surface of a thin substrate of semiconductor material...

) as a processor, no matter how many cores each die has.

Embedded applications


Embedded computing operates in an area of processor technology distinct from that of "mainstream" PCs. The same technological drivers towards multicore apply here too. Indeed, in many cases the application is a "natural" fit for multicore technologies, if the task can easily be partitioned between the different processors.

In addition, embedded software is typically developed for a specific hardware release, making issues of software portability, legacy code or supporting independent developers less critical than is the case for PC or enterprise computing. As a result, it is easier for developers to adopt new technologies and as a result there is a greater variety of multicore processing architectures and suppliers.

, multi-core network processing devices have become mainstream, with companies such as Freescale Semiconductor
Freescale Semiconductor
Freescale Semiconductor, Inc. is a producer and designer of embedded hardware, with 17 billion semiconductor chips in use around the world. The company focuses on the automotive, consumer, industrial and networking markets with its product portfolio including microprocessors, microcontrollers,...

, Cavium Networks
Cavium Networks
Cavium is a San Jose, California-based company specializing in ARM-based and MIPS-based network, video and security processors. Cavium offers processor and board level products targeting routers, switches, appliances, storage and servers.-Major acquisitions::...

, Wintegra and Broadcom
Broadcom
Broadcom Corporation is a fabless semiconductor company in the wireless and broadband communication business. The company is headquartered in Irvine, California, USA. Broadcom was founded by a professor-student pair Henry Samueli and Henry T. Nicholas III from the University of California, Los...

 all manufacturing products with eight processors. For the system developer, a key challenge is how to exploit all the cores in these devices to achieve maximum networking performance at the system level, despite the performance limitations inherent in an SMP operating system. To address this issue, companies such as 6WIND
6WIND
6WIND S.A. is a privately held company that provides packet processing software used by OEM companies to meet both the wire-speed performance and time-to-market requirements of mobile infrastructure, network security, high-frequency trading and deep packet inspection applications...

 provide portable packet processing software architected so that the networking data plane runs in a fast path environment outside the OS, while retaining full compatibility with standard OS APIs.

In digital signal processing
Digital signal processing
Digital signal processing is concerned with the representation of discrete time signals by a sequence of numbers or symbols and the processing of these signals. Digital signal processing and analog signal processing are subfields of signal processing...

 the same trend applies: Texas Instruments
Texas Instruments
Texas Instruments Inc. , widely known as TI, is an American company based in Dallas, Texas, United States, which develops and commercializes semiconductor and computer technology...

 has the three-core TMS320C6488 and four-core TMS320C5441, Freescale the four-core MSC8144 and six-core MSC8156 (and both have stated they are working on eight-core successors). Newer entries include the Storm-1 family from Stream Processors, Inc with 40 and 80 general purpose ALUs per chip, all programmable in C as a SIMD engine and Picochip
PicoChip
Picochip is a venture-backed fabless semiconductor company based in Bath, England, founded in 2000.The company is active in two areas, with two distinct product families.-Multi-core DSP:...

 with three-hundred processors on a single die, focused on communication applications.

Commercial

  • Adapteva
    Adapteva
    Adapteva is a fabless semiconductor company focusing on low power multicore microprocessor design. The company was the first company to announce a design with 1000 general-purpose microprocessors on a single chip. The company name is a combination of "adapt" and the Hebrew word "Teva" meaning nature...

     Epiphany, a many-core processor architecture with up to 4096 processors on-chip
  • Aeroflex Gaisler LEON3, a multi-core SPARC
    SPARC
    SPARC is a RISC instruction set architecture developed by Sun Microsystems and introduced in mid-1987....

     that also exists in a fault-tolerant version.
  • Ageia
    AGEIA
    Ageia, founded in 2002, was a fabless semiconductor company. Ageia invented PhysX – a Physics Processing Unit chip capable of performing game physics calculations much faster than general purpose CPUs; they also licensed out the PhysX SDK , a large physics middleware library for game...

     PhysX
    PhysX
    PhysX is a proprietary realtime physics engine middleware SDK developed by Ageia with the purchase of ETH Zurich spin-off NovodeX in 2004...

    , a multi-core physics processing unit
    Physics processing unit
    A physics processing unit is a dedicated microprocessor designed to handle the calculations of physics, especially in the physics engine of video games. Examples of calculations involving a PPU might include rigid body dynamics, soft body dynamics, collision detection, fluid dynamics, hair and...

    .
  • Ambric
    Ambric
    Ambric-architecture processors, are developed and marketed by a division of Nethra, a fabless semiconductor company based in Santa Clara, California. Nethra purchased the Ambric technology in early 2009. Ambric the company was founded in 2003 and the current team, all from the original startup,...

     Am2045, a 336-core Massively Parallel Processor Array (MPPA)
  • AMD
    Advanced Micro Devices
    Advanced Micro Devices, Inc. or AMD is an American multinational semiconductor company based in Sunnyvale, California, that develops computer processors and related technologies for commercial and consumer markets...

    • Athlon 64
      Athlon 64
      The Athlon 64 is an eighth-generation, AMD64-architecture microprocessor produced by AMD, released on September 23, 2003. It is the third processor to bear the name Athlon, and the immediate successor to the Athlon XP...

      , Athlon 64 FX and Athlon 64 X2
      Athlon 64 X2
      The Athlon 64 X2 is the first dual-core desktop CPU designed by AMD. It was designed from scratch as native dual-core by using an already multi-CPU enabled Athlon 64, joining it with another functional core on one die, and connecting both via a shared dual-channel memory controller/north bridge and...

       family, dual-core desktop processors.
    • Athlon II
      Athlon II
      Athlon II is a family of AMD multi-core 45 nm central processing units, which is aimed at the midrange to budget market and is a complementary product lineup to the Phenom II.-Features:...

      , dual-, triple-, and quad-core desktop processors.
    • Opteron
      Opteron
      Opteron is AMD's x86 server and workstation processor line, and was the first processor which supported the AMD64 instruction set architecture . It was released on April 22, 2003 with the SledgeHammer core and was intended to compete in the server and workstation markets, particularly in the same...

      , dual-, quad-, hex-, 8-, 12-, and 16-core server/workstation processors.
    • Phenom
      Phenom (processor)
      Phenom is the 64-bit AMD desktop processor line based on the K10 microarchitecture, in what AMD calls family 10h processors, sometimes incorrectly called "K10h". Triple-core versions belong to the Phenom 8000 series and quad cores to the AMD Phenom X4 9000 series...

      , dual-, triple-, and quad-core processors.
    • Phenom II
      Phenom II
      Phenom II is a family of AMD's multi-core 45 nm processors using the AMD K10 microarchitecture, succeeding the original Phenom. Advanced Micro Devices released the Socket AM2+ version of Phenom II in December 2008, while Socket AM3 versions with DDR3 support, along with an initial batch of...

      , dual-, triple-, quad-, hex-, and 8-core desktop processors.
    • Sempron X2, dual-core entry level processors.
    • Turion 64 X2, dual-core laptop processors.
    • Radeon
      Radeon
      Radeon is a brand of graphics processing units and random access memory produced by Advanced Micro Devices , first launched in 2000 by ATI Technologies, which was acquired by AMD in 2006. Radeon is the successor to the Rage line. There are four different groups, which can be differentiated by...

       and FireStream
      AMD FireStream
      The AMD FireStream is a stream processor produced by Advanced Micro Devices to utilize the stream processing/GPGPU concept for heavy floating-point computations to target various industries, such as the High Performance Computing , scientific, and financial sectors...

       multi-core GPU
      Graphics processing unit
      A graphics processing unit or GPU is a specialized circuit designed to rapidly manipulate and alter memory in such a way so as to accelerate the building of images in a frame buffer intended for output to a display...

      /GPGPU
      GPGPU
      General-purpose computing on graphics processing units is the technique of using a GPU, which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the CPU...

       (10 cores, 16 5-issue wide superscalar
      Superscalar
      A superscalar CPU architecture implements a form of parallelism called instruction level parallelism within a single processor. It therefore allows faster CPU throughput than would otherwise be possible at a given clock rate...

       stream processors
      Stream processing
      Stream processing is a computer programming paradigm, related to SIMD , that allows some applications to more easily exploit a limited form of parallel processing...

       per core)
  • Analog Devices
    Analog Devices
    Analog Devices, Inc. , known as ADI, is an American multinational semiconductor company specializing in data conversion and signal conditioning technology, headquartered in Norwood, Massachusetts...

     Blackfin
    Blackfin
    The Blackfin is a family of 16- or 32-bit microprocessors developed, manufactured and marketed by Analog Devices. The family is characterized by their built-in, fixed-point digital signal processor functionality supplied by 16-bit Multiply–accumulates , accompanied on-chip by a small and...

     BF561, a symmetrical dual-core processor
  • ARM
    ARM architecture
    ARM is a 32-bit reduced instruction set computer instruction set architecture developed by ARM Holdings. It was named the Advanced RISC Machine, and before that, the Acorn RISC Machine. The ARM architecture is the most widely used 32-bit ISA in numbers produced...

     MPCore is a fully synthesizable multicore container for ARM11 MPCore and ARM Cortex-A9 MPCore
    ARM Cortex-A9 MPCore
    The ARM Cortex-A9 MPCore is a 32-bit multicore processor providing up to 4 cache-coherent Cortex-A9 cores, each implementing the ARM v7 instruction set architecture.-Features:Key features of the Cortex-A9 core are:...

     processor cores, intended for high-performance embedded and entertainment applications.
  • ASOCS ModemX, up to 128 cores, wireless applications.
  • Azul Systems
    Azul Systems
    Azul Systems, Inc., a privately held company, develops runtime platforms for executing Java-based applications. Founded in March 2002, Azul Systems is headquartered in Sunnyvale, California, with offices in Slough, United Kingdom; Tokyo, Japan and Bangalore, India.- Products :Azul produces Zing, a...

    • Vega 1, a 24-core processor, released in 2005.
    • Vega 2, a 48-core processor, released in 2006.
    • Vega 3, a 54-core processor, released in 2008.
  • Broadcom SiByte SB1250, SB1255 and SB1455.
  • ClearSpeed
    ClearSpeed
    ClearSpeed Technology Ltd is a semiconductor company, formed in 2002 to develop enhanced SIMD processors for use in high-performance computing and embedded systems. Based in Bristol, UK, the company has been selling its processors since 2005...

    • CSX700, 192-core processor, released in 2008 (32/64-bit floating point; Integer ALU)
  • Cradle Technologies CT3400 and CT3600, both multi-core DSPs.
  • Cavium Networks
    Cavium Networks
    Cavium is a San Jose, California-based company specializing in ARM-based and MIPS-based network, video and security processors. Cavium offers processor and board level products targeting routers, switches, appliances, storage and servers.-Major acquisitions::...

     Octeon, a 16-core MIPS
    MIPS architecture
    MIPS is a reduced instruction set computer instruction set architecture developed by MIPS Technologies . The early MIPS architectures were 32-bit, and later versions were 64-bit...

     MPU.
  • Freescale Semiconductor
    Freescale Semiconductor
    Freescale Semiconductor, Inc. is a producer and designer of embedded hardware, with 17 billion semiconductor chips in use around the world. The company focuses on the automotive, consumer, industrial and networking markets with its product portfolio including microprocessors, microcontrollers,...

     QorIQ series processors, up to 8 cores, Power Architecture
    Power Architecture
    Power Architecture is a broad term to describe similar RISC instruction sets for microprocessors developed and manufactured by such companies as IBM, Freescale, AMCC, Tundra and P.A. Semi...

     MPU.
  • Hewlett-Packard
    Hewlett-Packard
    Hewlett-Packard Company or HP is an American multinational information technology corporation headquartered in Palo Alto, California, USA that provides products, technologies, softwares, solutions and services to consumers, small- and medium-sized businesses and large enterprises, including...

     PA-8800 and PA-8900, dual core PA-RISC
    PA-RISC
    PA-RISC is an instruction set architecture developed by Hewlett-Packard. As the name implies, it is a reduced instruction set computer architecture, where the PA stands for Precision Architecture...

     processors.
  • IBM
    IBM
    International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

    • POWER4
      POWER4
      The POWER4 is a microprocessor developed by International Business Machines that implemented the 64-bit PowerPC and PowerPC AS instruction set architectures. Released in 2001, the POWER4 succeeded the POWER3 and RS64 microprocessors, and was used in RS/6000 and AS/400 computers, ending a separate...

      , the world's first non-embedded dual-core processor, released in 2001.
    • POWER5
      POWER5
      The POWER5 is a microprocessor developed and fabricated by IBM. It is an improved version of the highly successful POWER4. The principal improvements are support for simultaneous multithreading and an on-die memory controller...

      , a dual-core processor, released in 2004.
    • POWER6
      POWER6
      The POWER6 is a microprocessor developed by IBM that implemented the Power ISA v.2.03. When it became available in systems in 2007, it succeeded the POWER5+ as IBM's flagship Power microprocessor...

      , a dual-core processor, released in 2007.
    • POWER7
      POWER7
      POWER7 is a Power Architecture microprocessor released in 2010 that succeeded the POWER6. POWER7 was developed by IBM at several sites including IBM's Rochester, MN; Austin, TX; Essex Junction, Vermont; T. J. Watson Research Center, NY; Bromont, QC and Böblingen, Germany laboratories...

      , a 4,6,8-core processor, released in 2010.
    • PowerPC 970
      PowerPC 970
      The PowerPC 970, PowerPC 970FX, PowerPC 970GX, and PowerPC 970MP, are 64-bit Power Architecture processors from IBM introduced in 2002. When used in Apple Inc. machines, they were dubbed the PowerPC G5....

      MP, a dual-core processor, used in the Apple Power Mac G5
      Power Mac G5
      The Power Mac G5 is Apple's marketing name for models of the Power Macintosh that contains the IBM PowerPC G5 CPU. The professional-grade computer was the most powerful in Apple's lineup when it was introduced, widely hailed as the first 64-bit PC, and was touted by Apple as the fastest personal...

      .
    • Xenon
      Xenon (processor)
      Xenon is a CPU that is used in the Xbox 360 game console. The processor, internally codenamed "Waternoose", which was named after Henry J. Waternoose III in Monsters Inc. by IBM and XCPU by Microsoft, is based on IBM's PowerPC instruction set architecture, consisting of three independent processor...

      , a triple-core, SMT
      Simultaneous multithreading
      Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading...

      -capable, PowerPC
      PowerPC
      PowerPC is a RISC architecture created by the 1991 Apple–IBM–Motorola alliance, known as AIM...

       microprocessor used in the Microsoft
      Microsoft
      Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...

       Xbox 360
      Xbox 360
      The Xbox 360 is the second video game console produced by Microsoft and the successor to the Xbox. The Xbox 360 competes with Sony's PlayStation 3 and Nintendo's Wii as part of the seventh generation of video game consoles...

       game console.
  • Sony
    Sony
    , commonly referred to as Sony, is a Japanese multinational conglomerate corporation headquartered in Minato, Tokyo, Japan and the world's fifth largest media conglomerate measured by revenues....

    /IBM
    IBM
    International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

    /Toshiba
    Toshiba
    is a multinational electronics and electrical equipment corporation headquartered in Tokyo, Japan. It is a diversified manufacturer and marketer of electrical products, spanning information & communications equipment and systems, Internet-based solutions and services, electronic components and...

    's Cell
    Cell (microprocessor)
    Cell is a microprocessor architecture jointly developed by Sony, Sony Computer Entertainment, Toshiba, and IBM, an alliance known as "STI". The architectural design and first implementation were carried out at the STI Design Center in Austin, Texas over a four-year period beginning March 2001 on a...

     processor, a nine-core processor with one general purpose PowerPC core and eight specialized SPUs (Synergystic Processing Unit) optimized for vector operations used in the Sony
    Sony
    , commonly referred to as Sony, is a Japanese multinational conglomerate corporation headquartered in Minato, Tokyo, Japan and the world's fifth largest media conglomerate measured by revenues....

     PlayStation 3
    PlayStation 3
    The is the third home video game console produced by Sony Computer Entertainment and the successor to the PlayStation 2 as part of the PlayStation series. The PlayStation 3 competes with Microsoft's Xbox 360 and Nintendo's Wii as part of the seventh generation of video game consoles...

  • Infineon Danube, a dual-core, MIPS-based, home gateway processor.
  • Intel
    • Celeron Dual-Core, the first dual-core processor for the budget/entry-level market.
    • Core Duo, a dual-core processor.
    • Core 2 Duo, a dual-core processor.
    • Core 2 Quad, 2 dual-core dies packaged in a multi-chip module.
    • Core i3, Core i5 and Core i7, a family of multi-core processors, the successor of the Core 2 Duo and the Core 2 Quad.
    • Itanium 2, a dual-core processor.
    • Pentium D
      Pentium D
      The Pentium D brand refers to two series of desktop dual-core 64-bit x86-64 microprocessors with the NetBurst microarchitecture manufactured by Intel. Each CPU comprised two dies, each containing a single core, residing next to each other on a multi-chip module package. The brand's first processor,...

      , 2 single-core dies packaged in a multi-chip module.
    • Pentium Extreme Edition, 2 single-core dies packaged in a multi-chip module.
    • Pentium Dual-Core, a dual-core processor.
    • Teraflops Research Chip
      Teraflops Research Chip
      The Teraflops Research Chip is a research processor containing 80 cores developed by Intel Corporation's Tera-Scale Computing Research Program. The processor was officially announced February 11, 2007 and shown working at the 2007 International Solid-State Circuits Conference...

       (Polaris), a 3.16 GHz, 80-core processor prototype, which the company originally stated would be released by 2011.
    • Xeon
      Xeon
      The Xeon is a brand of multiprocessing- or multi-socket-capable x86 microprocessors from Intel Corporation targeted at the non-consumer server, workstation and embedded system markets.-Overview:...

       dual-, quad-, hexa-, octo- and 10-core processors.
  • IntellaSys
    • SEAforth 40C18, a 40-core processor
    • SEAforth24, a 24-core processor designed by Charles H. Moore
      Charles H. Moore
      Charles H. Moore is the inventor of the Forth programming language.- Biography :In 1968, while employed at the United States National Radio Astronomy Observatory , Moore invented the initial version of the Forth language to help control radio telescopes...

  • NetLogic Microsystems
    • XLP, a 32-core, quad-threaded MIPS64 processor
    • XLR, an eight-core, quad-threaded MIPS64 processor
    • XLS, an eight-core, quad-threaded MIPS64 processor
  • Nvidia
    NVIDIA
    Nvidia is an American global technology company based in Santa Clara, California. Nvidia is best known for its graphics processors . Nvidia and chief rival AMD Graphics Techonologies have dominated the high performance GPU market, pushing other manufacturers to smaller, niche roles...

    • GeForce 9
      GeForce 9 Series
      The GeForce 9 Series is the ninth generation of NVIDIA's GeForce series of graphics processing units, the first of which was released on February 21, 2008.-Geforce 9300GE :*65nm G98 GPU*PCI-E x16*64 Bit Bus Width*4 ROP, 8 Unified Shaders...

       multi-core GPU
      Graphics processing unit
      A graphics processing unit or GPU is a specialized circuit designed to rapidly manipulate and alter memory in such a way so as to accelerate the building of images in a frame buffer intended for output to a display...

       (8 cores, 16 scalar
      Scalar processor
      Scalar processors represent the simplest class of computer processors. A scalar processor processes one datum at a time . , a scalar processor is classified as a SISD processor .In a vector processor, by contrast, a single instruction operates simultaneously on multiple data items...

       stream processors
      Stream processing
      Stream processing is a computer programming paradigm, related to SIMD , that allows some applications to more easily exploit a limited form of parallel processing...

       per core)
    • GeForce 200
      GeForce 200 Series
      The GeForce 200 Series is the 10th generation of Nvidia's GeForce graphics processing units. The series also represents the continuation of the company's unified shader architecture introduced with the GeForce 8 Series and the GeForce 9 Series. Its primary competition came from ATI's Radeon HD 4000...

       multi-core GPU
      Graphics processing unit
      A graphics processing unit or GPU is a specialized circuit designed to rapidly manipulate and alter memory in such a way so as to accelerate the building of images in a frame buffer intended for output to a display...

       (10 cores, 24 scalar
      Scalar processor
      Scalar processors represent the simplest class of computer processors. A scalar processor processes one datum at a time . , a scalar processor is classified as a SISD processor .In a vector processor, by contrast, a single instruction operates simultaneously on multiple data items...

       stream processors
      Stream processing
      Stream processing is a computer programming paradigm, related to SIMD , that allows some applications to more easily exploit a limited form of parallel processing...

       per core)
    • Tesla
      Nvidia Tesla
      The Tesla graphics processing unit is nVidia's third brand of GPUs. It is based on high-end GPUs from the G80 , as well as the Quadro lineup. Tesla is nVidia's first dedicated General Purpose GPU...

       multi-core GPGPU
      GPGPU
      General-purpose computing on graphics processing units is the technique of using a GPU, which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the CPU...

       (10 cores, 24 scalar
      Scalar processor
      Scalar processors represent the simplest class of computer processors. A scalar processor processes one datum at a time . , a scalar processor is classified as a SISD processor .In a vector processor, by contrast, a single instruction operates simultaneously on multiple data items...

       stream processors
      Stream processing
      Stream processing is a computer programming paradigm, related to SIMD , that allows some applications to more easily exploit a limited form of parallel processing...

       per core)
  • Parallax
    Parallax, Inc. (company)
    Parallax Inc. is a privately held company in Rocklin, California. Parallax Inc. designs, manufactures, and sells BASIC Stamp microcontrollers, Propeller microcontrollers, microcontroller accessories Parallax Inc. is a privately held company in Rocklin, California. Parallax Inc. designs,...

     Propeller P8X32
    Parallax Propeller
    The Parallax P8X32A Propeller chip, introduced in 2006, is a multi-core architecture parallel microcontroller with eight 32-bit RISC CPU cores....

    , an eight-core microcontroller.
  • picoChip
    PicoChip
    Picochip is a venture-backed fabless semiconductor company based in Bath, England, founded in 2000.The company is active in two areas, with two distinct product families.-Multi-core DSP:...

     PC200 series 200–300 cores per device for DSP & wireless
  • Plurality
    Plurality (company)
    Plurality Ltd. is an Israeli semiconductor company, the developer of the HyperCore technology and the HAL multi-core processor. The company is a member of the Multicore Association.- HyperCore :...

     HAL series tightly coupled 16-256 cores, L1 shared memory, hardware synchronized processor.
  • Rapport Kilocore
    Kilocore
    Kilocore, from Rapport Inc. and IBM, is a high-performance, low-power multi-core microprocessor that has 1,025 cores. It contains a single PowerPC processing core, and 1,024 eight-bit Processing Elements running at 125 MHz each, which can be dynamically reconfigured, connected by a shared...

     KC256, a 257-core microcontroller with a PowerPC core and 256 8-bit "processing elements". Is now out of business.
  • SiCortex
    SiCortex
    SiCortex is a supercomputer manufacturer founded in 2003 and headquartered in Maynard, Massachusetts. On 27 May 2009, HPCwire reported that the company had shut down its operations, laid off most of its staff, and is seeking a buyer for its assets. The Register reported that Gerbsman Partners was...

     "SiCortex node" has six MIPS64 cores on a single chip.
  • Sun Microsystems
    Sun Microsystems
    Sun Microsystems, Inc. was a company that sold :computers, computer components, :computer software, and :information technology services. Sun was founded on February 24, 1982...

    • MAJC
      MAJC
      MAJC was a Sun Microsystems multi-core, multithreaded, very long instruction word microprocessor design from the mid-to-late 1990s. Originally called the UltraJava processor, the MAJC processor was targeted at running Java programs, whose "late compiling" allowed Sun to make several favourable...

       5200, two-core VLIW processor
    • UltraSPARC IV and UltraSPARC IV+, dual-core processors.
    • UltraSPARC T1
      UltraSPARC T1
      |right|262px|UltraSPARC T1 processorSun Microsystems' UltraSPARC T1 microprocessor, known until its 14 November 2005 announcement by its development codename "Niagara", is a multithreading, multicore CPU...

      , an eight-core, 32-thread processor.
    • UltraSPARC T2
      UltraSPARC T2
      Sun Microsystems' UltraSPARC T2 microprocessor is a multithreading, multi-core CPU. It is a member of the SPARC family, and the successor to the UltraSPARC T1. The chip is sometimes referred to by its codename, Niagara 2...

      , an eight-core, 64-concurrent-thread processor.
    • UltraSPARC T3
      UltraSPARC T3
      The SPARC T3 microprocessor is a multithreading, multi-core CPU produced by Oracle Corporation...

      , an sixteen-core, 128-concurrent-thread processor.
  • Texas Instruments
    Texas Instruments
    Texas Instruments Inc. , widely known as TI, is an American company based in Dallas, Texas, United States, which develops and commercializes semiconductor and computer technology...

    • TMS320C80 MVP
      Texas Instruments TMS320
      Texas Instruments TMS320 is a blanket name for a series of digital signal processors from Texas Instruments. It was introduced on April 8, 1983 through the TMS32010 processor, which was then the fastest DSP on the market....

      , a five-core multimedia video processor.
    • TMS320TMS320C66, 2,4,8 core dsp.
  • Tilera
    Tilera
    Tilera Corporation is a fabless semiconductor company focusing on scalable multicore embedded processor design. The company is currently shipping multiple processors, including the TILE64, TILEPro64, and the TILEPro36, TILE-Gx36, TILE-Gx16 and TILE-Gx9...

    • TILE64
      TILE64
      TILE64 is a multicore processor manufactured by Tilera. It consists of a mesh network of 64 "tiles", where each tile houses a general purpose processor, cache, and a non-blocking router, which the tile uses to communicate with the other tiles on the processor....

      , a 64-core 32-bit processor
    • TILE-Gx
      TILE-Gx
      TILE-Gx is a future multicore processor family by Tilera. It consists of a mesh network of up to 100 cores. It is to be produced by TSMC with 40 nm.*64-bit core *32 KB L1 I-cache, 32 KB L1 D-cache *256 KB L2 cache...

      , a 100-core 64-bit processor
  • XMOS
    XMOS
    XMOS is a fabless semiconductor company that develops multi-core multi-threaded processors designed to execute several real-time tasks, DSP, and control flow all at once.-Company history:...

     Software Defined Silicon
    Software Defined Silicon
    Software Defined Silicon is a computer chip technology created by XMOS of Bristol . XMOS was jointly-founded in 2005 by INMOS transputer architect David May. Software Defined Silicon is the name given to the family of multithreaded multicore on-chip processors that XMOS is developing....

     quad-core XS1-G4

Academic

  • MIT, 16-core RAW processor
  • University of California, Davis
    University of California, Davis
    The University of California, Davis is a public teaching and research university established in 1905 and located in Davis, California, USA. Spanning over , the campus is the largest within the University of California system and third largest by enrollment...

    , Asynchronous array of simple processors
    Asynchronous Array of Simple Processors
    The asynchronous array of simple processors architecture comprises a 2-D array of reduced complexity programmable processors with small memories interconnected by a reconfigurable mesh network...

     (AsAP)
    • 36-core 610 MHz AsAP
      Asynchronous Array of Simple Processors
      The asynchronous array of simple processors architecture comprises a 2-D array of reduced complexity programmable processors with small memories interconnected by a reconfigurable mesh network...

    • 167-core 1.2 GHz AsAP2
      Asynchronous Array of Simple Processors
      The asynchronous array of simple processors architecture comprises a 2-D array of reduced complexity programmable processors with small memories interconnected by a reconfigurable mesh network...

  • University of Washington
    University of Washington
    University of Washington is a public research university, founded in 1861 in Seattle, Washington, United States. The UW is the largest university in the Northwest and the oldest public university on the West Coast. The university has three campuses, with its largest campus in the University...

    , Wavescalar processor
  • University of Texas, Austin, TRIPS
    TRIPS architecture
    TRIPS is a new microprocessor architecture being designed by a team at the University of Texas at Austin in conjunction with IBM, Intel, and Sun Microsystems. TRIPS uses a new instruction set architecture that is designed to be easily broken down into large groups of instructions that can be run...

     processor

See also

  • Race condition
    Race condition
    A race condition or race hazard is a flaw in an electronic system or process whereby the output or result of the process is unexpectedly and critically dependent on the sequence or timing of other events...

  • Multicore Association
    Multicore Association
    The Multicore Association, founded in 2005, is a member-funded, non-profit, industry consortium focused on the creation of open standard APIs, specifications, and guidelines that will allow system developers and programmers to adopt multicore technology into their applications more readily .The...

  • Multithreading (computer architecture)
  • Multiprocessing
    Multiprocessing
    Multiprocessing is the use of two or more central processing units within a single computer system. The term also refers to the ability of a system to support more than one processor and/or the ability to allocate tasks between them...

  • Hyper-threading
    Hyper-threading
    Hyper-threading is Intel's term for its simultaneous multithreading implementation in its Atom, Intel Core i3/i5/i7, Itanium, Pentium 4 and Xeon CPUs....

  • Symmetric multiprocessing
    Symmetric multiprocessing
    In computing, symmetric multiprocessing involves a multiprocessor computer hardware architecture where two or more identical processors are connected to a single shared main memory and are controlled by a single OS instance. Most common multiprocessor systems today use an SMP architecture...

     (SMP)
  • Simultaneous multithreading
    Simultaneous multithreading
    Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading...

     (SMT)
  • Multitasking
    Computer multitasking
    In computing, multitasking is a method where multiple tasks, also known as processes, share common processing resources such as a CPU. In the case of a computer with a single CPU, only one task is said to be running at any point in time, meaning that the CPU is actively executing instructions for...

  • OpenHMPP HPC Open Standard for Manycore Programming
  • Parallel computing
    Parallel computing
    Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...

  • PureMVC MultiCore
    PureMVC
    PureMVC is a framework for creating applications based upon the well-established Model, View and Controller design pattern. The free, open source framework was originally implemented in the ActionScript 3 language for use with Adobe Flex, Flash and AIR, and it has since been ported to nearly all...

     – a modular programming framework
  • XMTC
    XMTC
    XMTC is a shared-memory parallel programming language. It is an extension of the C programming language which strives to enable easy PRAM-like programming based on the explicit multi-threading paradigm. It is developed as part of the by a research team at the University of Maryland, College...

  • Parallel Random Access Machine
    Parallel Random Access Machine
    In computer science, Parallel Random Access Machine is a shared memory abstract machine. As its name indicates, the PRAM was intended as the parallel computing analogy to the random access machine...

  • Partitioned global address space
    Partitioned global address space
    In computer science, a partitioned global address space is a parallel programming model. It assumes a global memory address space that is logically partitioned and a portion of it is local to each processor. The novelty of PGAS is that the portions of the shared memory space may have an affinity...

     (PGAS)
  • Thread
    Thread (computer science)
    In computer science, a thread of execution is the smallest unit of processing that can be scheduled by an operating system. The implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process...

  • GPGPU
    GPGPU
    General-purpose computing on graphics processing units is the technique of using a GPU, which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the CPU...

  • CUDA
    CUDA
    CUDA or Compute Unified Device Architecture is a parallel computing architecture developed by Nvidia. CUDA is the computing engine in Nvidia graphics processing units that is accessible to software developers through variants of industry standard programming languages...

  • OpenCL
    OpenCL
    OpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. OpenCL includes a language for writing kernels , plus APIs that are used to define and then control the platforms...

     (Open Computing Language), a framework for heterogeneous execution
  • Ateji PX
    Ateji PX
    Ateji PX is an object-oriented programming language extension for Java. It is intended to facilliate parallel computing on multi-core processors, GPU, Grid and Cloud....

    , an extension of the Java language for parallelism

External links