Bulldozer (processor)
Encyclopedia
Bulldozer is the codename Advanced Micro Devices
Advanced Micro Devices
Advanced Micro Devices, Inc. or AMD is an American multinational semiconductor company based in Sunnyvale, California, that develops computer processors and related technologies for commercial and consumer markets...

 (AMD) has given to one of the next-generation CPU cores after the K10 microarchitecture
AMD K10
The AMD Family 10h is a microprocessor microarchitecture by AMD. Though there were once reports that the K10 had been canceled, the first third-generation Opteron products for servers were launched on September 10, 2007, with the Phenom processors for desktops following and launching on November...

 for the company's M-SPACE design methodology, with the core specifically aimed at 10-watt
Watt
The watt is a derived unit of power in the International System of Units , named after the Scottish engineer James Watt . The unit, defined as one joule per second, measures the rate of energy conversion.-Definition:...

 to 125-watt TDP
Thermal Design Power
The thermal design power , sometimes called thermal design point, refers to the maximum amount of power the cooling system in a computer is required to dissipate. For example, a laptop's CPU cooling system may be designed for a 20 watt TDP, which means that it can dissipate up to 20 watts of heat...

 computing products. Bulldozer is a completely new design developed from the ground up. AMD claims dramatic performance-per-watt improvements in HPC
High-performance computing
High-performance computing uses supercomputers and computer clusters to solve advanced computation problems. Today, computer systems approaching the teraflops-region are counted as HPC-computers.-Overview:...

 applications with Bulldozer cores. Desktop products implementing the Bulldozer core were released on October 12, 2011.

The Bulldozer cores support most of the instruction sets currently implemented in Intel processors (including SSE4.1, SSE4.2, AES
AES instruction set
Advanced Encryption Standard Instruction Set is an extension to the x86 instruction set architecture for microprocessors from Intel and AMD proposed by Intel in March 2008...

, CLMUL
CLMUL instruction set
Carry-less Multiplication is an extension to the x86 instruction set used by microprocessors from Intel and AMD which was proposed by Intel in March 2008 and made available in the Intel Westmere processors announced in early 2010. The purpose is to improve the speed of applications doing block...

, and AVX
Advanced Vector Extensions
Advanced Vector Extensions is an extension to the x86 instruction set architecture for microprocessors from Intel and AMD proposed by Intel in March 2008 and first supported by Intel with the Westmere processor shipping in Q1 2011 and now by AMD with the Bulldozer processor shipping in Q3 2011.AVX...

) as well as future instruction sets proposed by AMD (XOP
XOP instruction set
The XOP instruction set, announced by AMD on May 1, 2009, is an extension to the 128-bit SSE core instructions in the x86 and AMD64 instruction set for the Bulldozer processor core, which was released on October 12th, 2011....

 and FMA4).

Basic description

According to AMD, Bulldozer-based CPUs are based on GlobalFoundries
GLOBALFOUNDRIES
GlobalFoundries Inc. is the world's third largest independent semiconductor foundry, with its headquarters located in Milpitas, California. GlobalFoundries was created by the divestiture of the manufacturing side of AMD on March 2, 2009, and was expanded through its merger with Chartered...

' 32 nm Silicon on insulator (SOI)
Silicon on insulator
Silicon on insulator technology refers to the use of a layered silicon-insulator-silicon substrate in place of conventional silicon substrates in semiconductor manufacturing, especially microelectronics, to reduce parasitic device capacitance and thereby improving performance...

 process technology and utilize a new approach to multithreaded computer performance that, according to press notes, "balances dedicated and shared computer resources to provide a highly compact, high core count design that is easily replicated on a chip for performance scaling." In other words, by eliminating some of the redundancies that naturally creep into multicore designs, AMD hoped to take better advantage of its hardware capabilities, while using less power.

Bulldozer-based implementations built on 32nm
32 nanometer
The 32 nm process is the step following the 45 nanometer process in CMOS semiconductor device fabrication. 32 nanometer refers to the average half-pitch of a memory cell at this technology level...

 SOI with HKMG
High-k Dielectric
The term high-κ dielectric refers to a material with a high dielectric constant κ used in semiconductor manufacturing processes which replaces the silicon dioxide gate dielectric...

 arrived in October 2011 for both servers and desktops. The server segment included the dual chip 16-core Opteron processor codenamed Interlagos (for Socket G34
Socket G34
Socket G34 is a CPU socket designed by AMD to support AMD's multi-chip module Opteron 6000-series server processors. G34 was launched on March 29, 2010, alongside the initial grouping of Opteron 6100 processors designed for it. Socket G34 supports four DDR3 SDRAM channels, two for each die in the...

) and single chip 4–8 core Valencia (for Socket C32
Socket C32
The AMD Socket C32 is the server processor socket for AMD's current single-CPU and dual-CPU Opteron 4000 series CPUs. It is the successor to Socket AM3 for single-CPU servers and the successor for Socket F for lower-end dual-CPU servers...

), while the 4–8 core Zambezi targeted desktops on Socket AM3+.

Bulldozer is the first major redesign of AMD’s processor architecture since 2003, when the firm launched its Athlon 64/Opteron (K8) processors, and also features two 128-bit FMA-capable FPUs which can be combined into one 256-bit FPU. This design is accompanied by two integer cores each with 4 pipelines (the fetch/decode stage is shared). Bulldozer will also introduce shared L2 cache in the new architecture. AMD calls this design a "Bulldozer module". A 16-core processor design would feature eight of these modules, but the operating system will recognize each module as two physical cores.

The module, described as two cores, can be contrasted with a single Intel core with HyperThreading. The difference between the two approaches is that Bulldozer provides dedicated schedulers and integer units for each thread, whereas in Intel's core all threads must compete for available execution resources.

Bulldozer Module

  • AMD has introduced a new microarchitecture building block called module. In terms of hardware complexity and functionality, a module is midway between a dual-core processor (in which each core is fully independent) and a single processor core that has two SMT
    Simultaneous multithreading
    Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading...

     threads (in which each thread shares most of the hardware resources with the other thread).
    • A module consists of two tightly coupled, "conventional" x86 out-of-order processing engines. The processing engine shares the early pipeline stages (eg. instruction fetch, decode), the FPUs, and the L2 cache with the sibling in the module.
  • Each module has the following independent hardware resources:
    • up to 2048 kB L2 cache per module (shared between the cores in a module)
    • 16 kB four-way L1 data cache (way-predicted) per core and two-way 64 kB L1 instruction cache per module, one way for each of the two cores
    • Two dedicated integer cores
      - each consists of two ALU
      Arithmetic logic unit
      In computing, an arithmetic logic unit is a digital circuit that performs arithmetic and logical operations.The ALU is a fundamental building block of the central processing unit of a computer, and even the simplest microprocessors contain one for purposes such as maintaining timers...

       and two AGU which are capable for total of 4 independent arithmetic and memory operations per clock per core
      - duplicating integer schedulers and execution pipelines offers dedicated hardware to each of two threads which significantly increase performance in multithreaded integer applications
      - second integer core increases Bulldozer module die by around 12%, which at chip level adds about 5% of total die space
    • Two symmetrical 128-bit FMAC (fused multiply–add capability) floating-point pipelines per module that can be unified into one large 256-bit-wide unit if one of integer cores dispatch AVX instruction and two symmetrical x87/MMX/SSE capable FPPs for backward compatibility with SSE2 non-optimized software
  • Multiple modules share an L3 cache as well as an Advanced Dual-Channel Memory Sub-System (IMC - Integrated Memory Controller).
  • A module has 213 million transistors in an area of 30.9 mm² (including 2 MB L2 cache) on an Orochi die
  • A dual-core Bulldozer processor has a single module, a quad-core processor has two modules and an octo-core processor has four modules.

Instruction set extensions

  • Support for Intel's Advanced Vector Extensions (AVX
    Advanced Vector Extensions
    Advanced Vector Extensions is an extension to the x86 instruction set architecture for microprocessors from Intel and AMD proposed by Intel in March 2008 and first supported by Intel with the Westmere processor shipping in Q1 2011 and now by AMD with the Bulldozer processor shipping in Q3 2011.AVX...

    ) instruction set, which supports 256-Bit floating point operations, and SSE4.1, SSE4.2, AES
    AES instruction set
    Advanced Encryption Standard Instruction Set is an extension to the x86 instruction set architecture for microprocessors from Intel and AMD proposed by Intel in March 2008...

    , CLMUL
    CLMUL instruction set
    Carry-less Multiplication is an extension to the x86 instruction set used by microprocessors from Intel and AMD which was proposed by Intel in March 2008 and made available in the Intel Westmere processors announced in early 2010. The purpose is to improve the speed of applications doing block...

    , as well as future 128-bit instruction sets proposed by AMD (XOP
    XOP instruction set
    The XOP instruction set, announced by AMD on May 1, 2009, is an extension to the 128-bit SSE core instructions in the x86 and AMD64 instruction set for the Bulldozer processor core, which was released on October 12th, 2011....

    , FMA4 and CVT16
    CVT16 instruction set
    The CVT16 instruction set, announced by AMD on May 1, 2009, is an extension to the 128-bit SSE core instructions in the x86 and AMD64 instruction set.CVT16 is a revision of part of the SSE5 instruction set proposal announced on August 30, 2007...

    ), which have the same functionality as the SSE5
    SSE5
    The SSE5 was an instruction set extension proposed by AMD on August 30, 2007 as a supplement to the 128-bit SSE core instructions in the AMD64 architecture....

     instruction set formerly proposed by AMD, but with compatibility to the AVX
    Advanced Vector Extensions
    Advanced Vector Extensions is an extension to the x86 instruction set architecture for microprocessors from Intel and AMD proposed by Intel in March 2008 and first supported by Intel with the Westmere processor shipping in Q1 2011 and now by AMD with the Bulldozer processor shipping in Q3 2011.AVX...

     coding scheme.

Process technology and clock frequency

  • 11-metal layer 32 nm SOI process with implemented first generation GlobalFoundries
    GLOBALFOUNDRIES
    GlobalFoundries Inc. is the world's third largest independent semiconductor foundry, with its headquarters located in Milpitas, California. GlobalFoundries was created by the divestiture of the manufacturing side of AMD on March 2, 2009, and was expanded through its merger with Chartered...

    ' High-K Metal Gate (HKMG)
  • Turbo Core performance boost to increase clock frequency by 500 MHz with all cores active (for most workloads) and further, as TDP headroom permits
  • The chip operates at 0.8 to 1.3 V, achieving clock frequencies of 3.5 GHz or more
  • Min-Max power usage - 10 to 125 watts

Cache and memory interface

  • Up to 8 MB of L3 cache shared among all modules on the same silicon die (8MB per 4 Modules, 16MB per 8 Modules and so on)(16 MB for dual-die MCM), divided into four subcaches of 2 MB each, capable of operating at 2.4 GHz or more at 1.1 V
  • Native DDR3-1866 memory support
  • Dual Channel DDR3 integrated memory controller (support for PC3-15000 (DDR3-1866)) for Desktop, Quad Channel DDR3 Integrated Memory Controller (support for PC-12800 (DDR3-1600) and Registered DDR3) for Server/Workstation (New Opteron Valencia and Interlagos)

I/O and socket interface

  • Hyper Transport Technology rev. 3.1 (3.20 GHz, 6.4 GT/s, 25.6 GB/s, 16-bit uplink/16-bit downlink) [first implemented into HY-D1 revision "Magny-Cours" on the socket G34 Opteron platform in March 2010 and "Lisbon" on the socket C32 Opteron platform in June 2010]
  • Socket AM3+
    Socket AM3
    Socket AM3 is a CPU socket for AMD processors. AM3 was launched as the successor to Socket AM2+ on February 9, 2009, alongside the initial grouping of Phenom II processors designed for it...

     (AM3b)
    • 942pin, DDR3 support
    • will retain backward compatibility with Socket AM3 motherboards (as per motherboard manufacturer choice and if BIOS updates are provided), however this not officially supported by AMD; AM3+ motherboards will be backward-compatible with AM3 processors.
  • For the server segment, the existing socket G34
    Socket G34
    Socket G34 is a CPU socket designed by AMD to support AMD's multi-chip module Opteron 6000-series server processors. G34 was launched on March 29, 2010, alongside the initial grouping of Opteron 6100 processors designed for it. Socket G34 supports four DDR3 SDRAM channels, two for each die in the...

     (LGA1974) and socket C32
    Socket C32
    The AMD Socket C32 is the server processor socket for AMD's current single-CPU and dual-CPU Opteron 4000 series CPUs. It is the successor to Socket AM3 for single-CPU servers and the successor for Socket F for lower-end dual-CPU servers...

     (LGA1207) will be used.

Processors

The first revenue shipments of Bulldozer-based Opteron processors was announced on September 7, 2011. The FX-6100, FX-8120, and FX-8150 were released October 12, 2011. Latter Zambezi parts are expected to come later in 2011 or in Q1 of 2012.

The expected Zambezi parts are summarized in the table below:
Model FX-8170 FX-8150 FX-8120 FX-8100 FX-6120 FX-6100 FX-4170 FX-4120 FX-4100
Code Name Zambezi
Integer Cores/Modules 8/4 6/3 4/2
Clock Freq. 3.9 GHz 3.6 GHz 3.1 GHz 2.8 GHz 3.6 GHz 3.3 GHz 4.2 GHz 3.9 GHz 3.6 GHz
Turbo Freq. 4.5 GHz 4.2 GHz 4.0 GHz 3.7 GHz 4.2 GHz 3.9 GHz 4.3 GHz 4.1 GHz 3.8 GHz
L2 Cache 8MB 6MB 4MB
L3 Cache 8MB
TDP 125W 95W 125 W 95W
Memory 1866/2133 MHz
Unlocked colspan="9" align="center"
Turbo Core 2.0 colspan="9" align="center"
Socket AM3+
Process Technology 32 nm SOI

AMD plans two series of Bulldozer based processors for servers: Opteron 4200 series (code named Valencia, with up to 8 cores) and Opteron 6200 series (code named Interlagos, with up to 16 cores).

"FX" Release

On 12 October 2011, AMD released the first four FX Series Processors of the Bulldozer line (FX-8150, FX-8120, FX-6100, FX-4100) and lifted their NDA
Non-disclosure agreement
A non-disclosure agreement , also known as a confidentiality agreement , confidential disclosure agreement , proprietary information agreement , or secrecy agreement, is a legal contract between at least two parties that outlines confidential material, knowledge, or information that the parties...

 on official reviews.

The first Bulldozer CPUs were met with a mixed response. It was discovered that the FX-8150 performed poorly in benchmarks that were not highly threaded, falling behind the second-generation Intel Core i* series processors and matching or even being outperformed by AMD's own Phenom II X6, which were clocked lower. In highly threaded benchmarks, performance varied: the FX-8150 would perform anywhere from on par with the Phenom II X6 to slightly better than the Core i7 2600K depending on the benchmark. Given the overall more consistent real-world performance of the 2500K at a more competitive price, these results left many reviewers underwhelmed. It is important to note that the i5 2500K also outperforms most of the previous generation of i7 CPUs, which the high-end Bulldozer CPUs are more comparable to. The processor was also found to be extremely power-hungry
Power management
Power management is a feature of some electrical appliances, especially copiers, computers and computer peripherals such as monitors and printers, that turns off the power or switches the system to a low-power state when inactive. In computing this is known as PC power management and is built...

 when under load, especially being overclocked, compared to Intel's Sandy Bridge.

Tom's Hardware commented that the lower than expected performance in multi-threaded workloads may be because of the way Windows 7 currently schedules threads to the cores. They point out that "if Windows were able to utilize an FX-8150’s four modules first, and then backfill each module’s second core, it’d maximize performance with up to four threads running concurrently." This is similar to what happens on Intel CPUs with HyperThreading - Windows 7 "schedules to physical cores before utilizing logical (HyperThreaded) cores."

Overclocking was found to increase power draw significantly, but did see performance gains.

On 13 October, AMD acknowledged on its blog that "there are some in our community who feel the product performance did not meet their expectations".

2nd Generation

AMD Financial Analyst Day 2010 revealed the 2nd generation is to be scheduled for 2012. AMD currently refers to this as Enhanced Bulldozer.

This new generation of Bulldozer core is codenamed Piledriver and will be incorporated into specific desktop and notebook markets:
  • Desktop Performance market (Volan platform): Zambezi will be replaced by Vishera (up to 8 cores). Vishera will feature Turbo Core 3.0 while using the existing Socket AM3+ format and 9xx series chipset of the 1st generation FX-series Zambezi processor. AMD has projected this 2nd generation FX-series processor will be 10% better under digital media workloads.
  • Desktop Budget and Mainstream market (Virgo platform): The Stars
    AMD K10
    The AMD Family 10h is a microprocessor microarchitecture by AMD. Though there were once reports that the K10 had been canceled, the first third-generation Opteron products for servers were launched on September 10, 2007, with the Phenom processors for desktops following and launching on November...

    -based Llano Fusion
    AMD Fusion
    AMD Fusion is the marketing name for a series of APUs by AMD. There are two flavors of Fusion currently available, one with its CPU logic based on the Bobcat core and the other its CPU logic based on the 10h core. In both cases the GPU logic is HD6xxx, which itself is based on the mobile variant of...

    APU line will be replaced by Trinity, Weatherford, and Richland Fusion
    AMD Fusion
    AMD Fusion is the marketing name for a series of APUs by AMD. There are two flavors of Fusion currently available, one with its CPU logic based on the Bobcat core and the other its CPU logic based on the 10h core. In both cases the GPU logic is HD6xxx, which itself is based on the mobile variant of...

    APUs (2 to 4 cores) that are aimed at various price points in the desktop market. This platform will use Socket FM2
    Socket FM2
    Socket FM2 is a CPU socket to be used by AMD'supcoming Trinity Fusion processor lines. It is scheduled for release in 2012. According to recent information, AMD plans on holding off on Socket FM2 and will retain support for Socket AM3+ up until Q1 2013....

     format.
  • Notebook Mainstream and Performance market (Comal platform): Will be the same as mentioned in Desktop Budget/Mainstream market.


At AMD Fusion Developer Summit (AFDS) 2011, AMD has claimed the computational capacity of the notebook variant of Trinity will be 50% faster than Llano.

For the server market, two versions are known to be under development:
  • Cost-effective, energy efficient server (1 to 2 CPUs) market: Opteron 4200-series (Valencia; 6 or 8 cores) will be replaced by Sepang (up to 10 cores). Sepang will be using a socket format called C2012. The memory controller will support triple-channel DDR3 memory configuration, and will have PCI Express 3.0 controller support.
  • Enterprise and Mainstream server (2 to 4 CPUs) market: Opteron 6200-series (Interlagos; 8, 12, and 16 cores) will be replaced by Terramar (up to 20 cores). Terramar will be using a socket format called G2012. Like Sepang, it will also have a PCI Express 3.0 controller. But differ by supporting quad-channel DDR3 memory configuration.

3rd Generation

AMD has mentioned (by name) a 3rd generation Bulldozer-based line for 2013. It is currently referred to as Next Generation Bulldozer and will be made on the 22 nm FD-SOI manufacturing process.

On 21st September 2011, leaked AMD slides indicated this 3rd generation of Bulldozer core is codenamed Steamroller and will be incorporated into specific desktop and notebook markets:
  • Desktop Budget and Mainstream market (??? platform): The Trinity Fusion
    AMD Fusion
    AMD Fusion is the marketing name for a series of APUs by AMD. There are two flavors of Fusion currently available, one with its CPU logic based on the Bobcat core and the other its CPU logic based on the 10h core. In both cases the GPU logic is HD6xxx, which itself is based on the mobile variant of...

    APU line will be replaced by Kaveri Fusion
    AMD Fusion
    AMD Fusion is the marketing name for a series of APUs by AMD. There are two flavors of Fusion currently available, one with its CPU logic based on the Bobcat core and the other its CPU logic based on the 10h core. In both cases the GPU logic is HD6xxx, which itself is based on the mobile variant of...

    APU line as the 3rd generation A8-, A6-, and A4-series for the desktop market.
  • Notebook Mainstream and Performance market (Indus platform): Will be the same as mentioned in Desktop Budget/Mainstream market. The FCH chipset will be codenamed Bolton.


For the server market, two versions are being planned:
  • Cost-effective, energy efficient server (1 to 2 CPUs) market: Opteron 4200-series Sepang (up to 10 cores) will be replaced by Macau (up to 10 cores). Macau will re-use the C2012 socket format.
  • Enterprise and Mainstream server (2 to 4 CPUs) market: Opteron 6200-series Terramar (up to 20 cores) will be replaced by Dublin (up to 20 cores). Dublin will re-use the G2012 socket format.

4th Generation

On 12th October 2011, AMD revealed Excavator to be the codename for the 4th generation Bulldozer core. It is scheduled for 2014 release.

See also

  • List of AMD FX microprocessors
  • List of future AMD microprocessors
  • AMD Fusion
    AMD Fusion
    AMD Fusion is the marketing name for a series of APUs by AMD. There are two flavors of Fusion currently available, one with its CPU logic based on the Bobcat core and the other its CPU logic based on the 10h core. In both cases the GPU logic is HD6xxx, which itself is based on the mobile variant of...

  • Bobcat
    Bobcat (processor)
    Bobcat is the latest x86 processor core from AMD aimed at low-power / low-cost market.It was revealed during a speech from AMD executive vice-president Henri Richard in Computex 2007 and was put into production Q1 2011. One of the major supporters was executive vice-president Mario A...

    , core for sub 20 watt products
  • SSE5
    SSE5
    The SSE5 was an instruction set extension proposed by AMD on August 30, 2007 as a supplement to the 128-bit SSE core instructions in the AMD64 architecture....

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK