All Topics  
Simultaneous multithreading

 

   Email Print
   Bookmark   Link






 

Simultaneous multithreading



 
 
Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar
Superscalar

A superscalar Central processing unit architecture implements a form of parallel computer called instruction level parallelism within a single processor....
 CPUs
Central processing unit

A central processing unit is an electronic circuit that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage....
 with hardware multithreading
Multithreading (computer hardware)

Multithreading computers have hardware support to efficiently execute multiple thread . These are distinguished from multiprocessing systems in that the threads have to share the resources of single core: the computing units, the CPU caches and the translation lookaside buffer ....
. SMT permits multiple independent thread
Thread (computer science)

In computer science, a thread of execution is a Fork of a computer program into two or more Concurrency running task s. The implementation of threads and process es differs from one operating system to another, but in most cases, a thread is contained inside a process....
s of execution to better utilize the resources provided by modern processor architecture
CPU design

CPU design is the design engineering task of creating a central processing unit , a component of computer hardware. It is a subfield of electronics engineering and computer engineering....
s.

ithreading is similar in concept to preemptive multitasking but is implemented at the thread
Thread (computer science)

In computer science, a thread of execution is a Fork of a computer program into two or more Concurrency running task s. The implementation of threads and process es differs from one operating system to another, but in most cases, a thread is contained inside a process....
 level of execution in modern superscalar
Superscalar

A superscalar Central processing unit architecture implements a form of parallel computer called instruction level parallelism within a single processor....
 processors.

Simultaneous multithreading (SMT) is one of the two main implementations of multithreading, the other form being temporal multithreading
Temporal multithreading

Temporal multithreading is one of the two main forms of multithreading that can be implemented on computer processor hardware, the other form being simultaneous multithreading....
.






Discussion
Ask a question about 'Simultaneous multithreading'
Start a new discussion about 'Simultaneous multithreading'
Answer questions from other users
Full Discussion Forum



Encyclopedia


Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar
Superscalar

A superscalar Central processing unit architecture implements a form of parallel computer called instruction level parallelism within a single processor....
 CPUs
Central processing unit

A central processing unit is an electronic circuit that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage....
 with hardware multithreading
Multithreading (computer hardware)

Multithreading computers have hardware support to efficiently execute multiple thread . These are distinguished from multiprocessing systems in that the threads have to share the resources of single core: the computing units, the CPU caches and the translation lookaside buffer ....
. SMT permits multiple independent thread
Thread (computer science)

In computer science, a thread of execution is a Fork of a computer program into two or more Concurrency running task s. The implementation of threads and process es differs from one operating system to another, but in most cases, a thread is contained inside a process....
s of execution to better utilize the resources provided by modern processor architecture
CPU design

CPU design is the design engineering task of creating a central processing unit , a component of computer hardware. It is a subfield of electronics engineering and computer engineering....
s.

Details

Multithreading is similar in concept to preemptive multitasking but is implemented at the thread
Thread (computer science)

In computer science, a thread of execution is a Fork of a computer program into two or more Concurrency running task s. The implementation of threads and process es differs from one operating system to another, but in most cases, a thread is contained inside a process....
 level of execution in modern superscalar
Superscalar

A superscalar Central processing unit architecture implements a form of parallel computer called instruction level parallelism within a single processor....
 processors.

Simultaneous multithreading (SMT) is one of the two main implementations of multithreading, the other form being temporal multithreading
Temporal multithreading

Temporal multithreading is one of the two main forms of multithreading that can be implemented on computer processor hardware, the other form being simultaneous multithreading....
. In temporal multithreading, only one thread of instructions can execute in any given pipeline stage at a time. In simultaneous multithreading, instructions from more than one thread can be executing in any given pipeline stage at a time. This is done without great changes to the basic processor architecture: the main additions needed are the ability to fetch instructions from multiple threads in a cycle, and a larger register file to hold data from multiple threads. The number of concurrent threads can be decided by the chip designers, but practical restrictions on chip complexity have limited the number to two for most SMT implementations.

Because the technique is really an efficiency solution and there is inevitable increased conflict on shared resources, measuring or agreeing on the effectiveness of the solution can be difficult. Some researchers have shown that the extra threads can be used to proactively seed a shared resource like a cache, to improve the performance of another single thread, and claim this shows that SMT is not just an efficiency solution. Others use SMT to provide redundant computation, for some level of error detection and recovery.

However, in most current cases, SMT is about hiding memory latency, efficiency and increased throughput of computations per amount of hardware used.

Taxonomy

In processor design, there are two ways to increase on-chip parallelism with less resource requirements: one is superscalar
Superscalar

A superscalar Central processing unit architecture implements a form of parallel computer called instruction level parallelism within a single processor....
 technique which tries to increase Instruction Level Parallelism (ILP), the other is multithreading approach exploiting Thread Level Parallelism (TLP).

Superscalar means executing multiple instructions at the same time while chip-level multithreading (CMT) executes instructions from multiple threads within one processor chip at the same time. There are many ways to support more than one thread within a chip, namely:
  • Interleaved multithreading: Interleaved issue of multiple instructions from different threads, also referred to as Temporal multithreading
    Temporal multithreading

    Temporal multithreading is one of the two main forms of multithreading that can be implemented on computer processor hardware, the other form being simultaneous multithreading....
    . It can be further divided into fine-grain multithreading or coarse-grain multithreading depending on the frequency of interleaved issues. Fine-grain multithreading -- such as in a barrel processor
    Barrel processor

    A barrel processor is a Central processing unit that switches between Thread of execution on every Instruction cycle. This CPU design technique is also known as "interleaved" or "fine-grained" temporal multithreading....
     -- issues instructions for different threads after every cycle, while coarse-grain multithreading only switches to issue instructions from another thread when the current executing thread causes some long latency events (like page fault etc.). Coarse-grain multithreading is more common for less context switch between threads. For example, Intel's Montecito
    Montecito (processor)

    Montecito is the code-name of a major release of Intel's Itanium central processing unit Family , which implements the Intel Itanium architecture on a dual-core processor....
     processor uses coarse-grain multithreading, while Sun's UltraSPARC T1
    UltraSPARC T1

    Sun Microsystems' UltraSPARC T1 microprocessor, known until its 14 November 2005 announcement by its development codename "Niagara", is a multithreading, multicore central processing unit....
     uses fine-grain multithreading. For those processors that have only one pipeline per core, interleaved multithreading is the only possible way, because it can only issue up to one instruction per cycle.
  • Simultaneous multithreading (SMT): Issue multiple instructions from multiple threads in one cycle. The processor must be superscalar to do so.
  • Chip-level multiprocessing (CMP or multicore
    Multi-core (computing)

    A multi-core processor combines two or more independent cores into a single package composed of a single integrated circuit , called a Die , or more dies packaged together....
    ): integrates two or more superscalar
    Superscalar

    A superscalar Central processing unit architecture implements a form of parallel computer called instruction level parallelism within a single processor....
     processors into one chip, each executes threads independently
  • Any combination of multithreaded/SMT/CMP


The key factor to distinguish them is to look at how many instructions the processor can issue in one cycle and how many threads from which the instructions come. For example, Sun Microsystems' UltraSPARC T1 (known as "Niagara" until its November 14, 2005 release) is a multicore processor combined with fine-grain multithreading technique instead of simultaneous multithreading because each core can only issue one instruction at a time.

Historical implementations

While multithreading CPUs have been around since the 1950s, simultaneous multithreading was first researched by IBM in 1968. The first major commercial microprocessor developed with SMT was the Alpha 21464
DEC Alpha

Alpha, originally known as Alpha AXP, was a 64-bit reduced instruction set computer instruction set architecture developed by Digital Equipment Corporation , designed to replace the 32-bit VAX complex instruction set computer ISA and its implementations....
 (EV8). This microprocessor was developed by DEC
Dec

DEC, dec or Dec may refer to:Places* Dec, a village in Serbia* Decatur Airport, Decatur, Illinois * Derwent Entertainment Centre, an entertainment centre in Hobart, Australia...
 in coordination with Dean Tullsen of the University of California, San Diego, and Susan Eggers and Hank Levy of the University of Washington. The microprocessor was never released, since the Alpha line of microprocessors was discontinued shortly before HP
Hewlett-Packard

The Hewlett-Packard Company , commonly referred to as HP, is a technology corporation headquartered in Palo Alto, California, United States....
 acquired Compaq
Compaq

Compaq Computer Corporation was an United States personal computer company founded in 1982, and is now a brand name of Hewlett-Packard Company....
 which had in turn acquired DEC
Dec

DEC, dec or Dec may refer to:Places* Dec, a village in Serbia* Decatur Airport, Decatur, Illinois * Derwent Entertainment Centre, an entertainment centre in Hobart, Australia...
. Dean Tullsen's work was also used to develop the "Hyperthreading" (or "HTT") versions of the Intel Pentium 4 microprocessors, such as the "Northwood" and "Prescott".

Modern commercial implementations

The Intel Pentium 4
Pentium 4

The Pentium 4 brand refers to Intel's line of single-core mainstream Desktop computer and laptop central processing units introduced on November 20, 2000 ....
 was the first modern desktop processor to implement simultaneous multithreading, starting from the 3.06GHz model released in 2002, and since introduced into a number of their processors. Intel calls the functionality Hyper-Threading Technology
Hyper-threading

Hyper-threading is Intel trademarked term for its simultaneous multithreading implementation in their Pentium 4, Intel Atom, and Intel Core i7 CPUs....
 (HTT), and provides a basic two-thread SMT engine. Intel claims up to a 30% speed improvement compared against an otherwise identical, non-SMT Pentium 4. The performance improvement seen is very application dependent, and some programs actually slow down slightly when HTT is turned on due to increased contention for resources such as bandwidth, caches, TLB
Translation Lookaside Buffer

A Translation lookaside buffer is a Central processing unit CPU cache that is used by Memory management unit to improve the speed of virtual address translation....
s, re-order buffer
Re-order buffer

A re-order buffer is used in a Tomasulo algorithm for out-of-order execution instruction execution. It allows instructions to be committed in-order....
 entries, etc. This is generally the case for poorly written data access routines that cause high latency intercache transactions (cache thrashing) on multi-processor systems. Programs written before multiprocessor and multicore designs were prevelant commonly did not optimize cache access because on a single cpu system there is only a single cache which is always coherent with itself. On a multiprocessor system each cpu or core will typically have its own cache, which is interlinked with the cache of other cpu/cores in the system to maintain cache coherency. If thread A accesses a memory location [00] and thread B then accesses memory location [01] it can cause an intercache transaction particularly where the cache line fill exceeds 2 bytes, as is the case for all modern processors.

The latest MIPS architecture
MIPS architecture

MIPS is a RISC instruction set architecture developed by MIPS Technologies . In the mid to late 1990s, it was estimated that one in three RISC microprocessors produced were MIPS implementations....
 designs include an SMT system known as "MIPS MT". MIPS MT provides for both heavyweight virtual processing elements and lighter-weight hardware microthreads. RMI, a Cupertino-based startup, is the first MIPS vendor to provide a processor SOC based on 8 cores, each of which runs 4 threads. The threads can be run in fine-grain mode where a different thread can be executed each cycle. The threads can also be assigned priorities.

The IBM
IBM

International Business Machines Corporation, abbreviated IBM and nicknamed "Big Blue" , is a multinational corporation computer technology and consulting corporation headquartered in Armonk, New York, New York, United States....
 POWER5
POWER5

POWER5 is a microprocessor developed and fabricated by IBM. It is an improved variant of the highly successful POWER4. The principal improvements are support for simultaneous multithreading and an Semiconductor-die cutting memory controller....
, announced in May 2004, comes as either a dual core DCM, or quad-core or oct-core MCM, with each core including a two-thread SMT engine. IBM's implementation is more sophisticated than the previous ones, because it can assign a different priority to the various threads, is more fine-grained, and the SMT engine can be turned on and off dynamically, to better execute those workloads where an SMT processor would not increase performance. This is IBM's second implementation of generally available hardware multithreading.

Although many people reported that Sun Microsystems
Sun Microsystems

Sun Microsystems, Inc. is a multinational corporation vendor of computers, computer components, computer software, and information technology services, founded on February 24, 1982....
' UltraSPARC T1
UltraSPARC T1

Sun Microsystems' UltraSPARC T1 microprocessor, known until its 14 November 2005 announcement by its development codename "Niagara", is a multithreading, multicore central processing unit....
 (known as "Niagara" until its 14 November 2005 release) and the upcoming processor codenamed "Rock
Rock processor

Rock is a multithreading, multicore, SPARC-family microprocessor currently in development at Sun Microsystems. It is a separate development from the Niagara family....
"
(to be launched ~2009 ) are implementations of SPARC
SPARC

SPARC is a Reduced Instruction Set Computer microprocessor instruction set Computer architecture originally designed in 1985 by Sun Microsystems....
 focused almost entirely on exploiting SMT and CMP
Multi-core (computing)

A multi-core processor combines two or more independent cores into a single package composed of a single integrated circuit , called a Die , or more dies packaged together....
 techniques, Niagara is not actually using SMT. Sun refers to these combined approaches as "CMT", and the overall concept as "Throughput Computing". The Niagara has 8 cores, but each core has only one pipeline, so actually it uses fine-grained multithreading. Unlike SMT, where instructions from multiple threads share the issue window each cycle, the processor uses a round robin policy to issue instructions from the next active thread each cycle. This makes it more similar to a barrel processor
Barrel processor

A barrel processor is a Central processing unit that switches between Thread of execution on every Instruction cycle. This CPU design technique is also known as "interleaved" or "fine-grained" temporal multithreading....
. Sun Microsystems
Sun Microsystems

Sun Microsystems, Inc. is a multinational corporation vendor of computers, computer components, computer software, and information technology services, founded on February 24, 1982....
' Rock processor
Rock processor

Rock is a multithreading, multicore, SPARC-family microprocessor currently in development at Sun Microsystems. It is a separate development from the Niagara family....
 is different, it has more complex cores that have more than one pipeline.

The Intel Atom
Intel Atom

Intel Atom is the brand name for a line of x86 and x86-64 CPUs from Intel, previously List of Intel codenames Silverthorne and Diamondville processors, designed for a 45 nm CMOS process and intended for use in MIDs, smart phones and ultra-mobile PCs meant for portable and low-power applications....
, released in 2008, is the first Intel product to feature SMT (marketed as Hyper-Threading) without supporting instruction reordering, speculative execution, or register renaming. Intel reintroduced Hyper-Threading with the Nehalem microarchitecture, after its absence on the Core microarchitecture.

See also

  • Thread (computer science)
    Thread (computer science)

    In computer science, a thread of execution is a Fork of a computer program into two or more Concurrency running task s. The implementation of threads and process es differs from one operating system to another, but in most cases, a thread is contained inside a process....
    , the fundamental software entity scheduled by the operating system kernel to execute on a CPU or processor (core)
  • Symmetric multiprocessing
    Symmetric multiprocessing

    In computing, symmetric multiprocessing or SMP involves a multiprocessor computer-architecture where two or more identical processors can connect to a single shared main memory....
    , where the system (or partition of a larger computer hardware platform) contains more than one CPU or processor (core) and where the operating system kernel is not limited to which of the available CPUs (cores) a given thread can be scheduled to execute on


External links