Advanced Vector Extensions
Encyclopedia
Advanced Vector Extensions (AVX) is an extension to the x86 instruction set architecture for microprocessor
Microprocessor
A microprocessor incorporates the functions of a computer's central processing unit on a single integrated circuit, or at most a few integrated circuits. It is a multipurpose, programmable device that accepts digital data as input, processes it according to instructions stored in its memory, and...

s from Intel
Intel Corporation
Intel Corporation is an American multinational semiconductor chip maker corporation headquartered in Santa Clara, California, United States and the world's largest semiconductor chip maker, based on revenue. It is the inventor of the x86 series of microprocessors, the processors found in most...

 and AMD
Advanced Micro Devices
Advanced Micro Devices, Inc. or AMD is an American multinational semiconductor company based in Sunnyvale, California, that develops computer processors and related technologies for commercial and consumer markets...

 proposed by Intel in March 2008 and first supported by Intel with the Westmere processor shipping in Q1 2011 and now by AMD with the Bulldozer processor shipping in Q3 2011.

AVX provides new features, new instructions and a new coding scheme.

New features

The width of the SIMD
SIMD
Single instruction, multiple data , is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously...

 register file is increased from 128 bits to 256 bits, and renamed from XMM0–XMM15 to YMM0–YMM15. In processors with AVX support, the legacy SSE instructions (which previously operated on 128-bit XMM registers) now operate on the lower 128 bits of the YMM registers.

AVX introduces a three-operand SIMD instruction format, where the destination register is distinct from the two source operands. For example, an SSE
Streaming SIMD Extensions
In computing, Streaming SIMD Extensions is a SIMD instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series processors as a reply to AMD's 3DNow! . SSE contains 70 new instructions, most of which work on single precision floating point...

 instruction using the conventional two-operand form a = a + b can now use a non-destructive three-operand form c = a + b, preserving both source operands. AVX's three-operand format is limited to the instructions with SIMD
SIMD
Single instruction, multiple data , is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously...

 operands (YMM), and does not include instructions with general purpose registers (e.g. EAX). Such support will first appear in AVX2.

The alignment
Data structure alignment
Data structure alignment is the way data is arranged and accessed in computer memory. It consists of two separate but related issues: data alignment and data structure padding. When a modern computer reads from or writes to a memory address, it will do this in word sized chunks...

 requirement of SIMD
SIMD
Single instruction, multiple data , is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously...

 memory operands is relaxed.

New coding scheme

The new VEX coding scheme
VEX prefix
The VEX prefix and VEX coding scheme is a proposed future extension to the x86 instruction set architecture for microprocessors from Intel, AMD and others.-Features:...

 introduces a new set of code prefixes that extends the opcode
Opcode
In computer science engineering, an opcode is the portion of a machine language instruction that specifies the operation to be performed. Their specification and format are laid out in the instruction set architecture of the processor in question...

 space, allows instructions to have more than two operands, and allows SIMD
SIMD
Single instruction, multiple data , is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously...

 vector registers to be longer than 128 bits.

New instructions

Instruction Description
VBROADCASTSS, VBROADCASTSD, VBROADCASTF128 Copy a 32-bit, 64-bit or 128-bit memory operand to all elements of a XMM or YMM vector register.
VINSERTF128 Replaces either the lower half or the upper half of a 256-bit YMM register with the value of a 128-bit source operand. The other half of the destination is unchanged.
VEXTRACTF128 Extracts either the lower half or the upper half of a 256-bit YMM register and copies the value to a 128-bit destination operand.
VMASKMOVPS, VMASKMOVPD Conditionally reads any number of elements from a SIMD vector memory operand into a destination register, leaving the remaining vector elements unread and setting the corresponding elements in the destination register to zero. Alternatively, conditionally writes any number of elements from a SIMD vector register operand to a vector memory operand, leaving the remaining elements of the memory operand unchanged.
VPERMILPS, VPERMILPD Shuffle 32-bit or 64-bit vector elements, with a register or memory operand as selector.
VPERM2F128 Shuffle the four 128-bit vector elements of two 256-bit source operands into a 256-bit destination operand, with an immediate constant as selector.
VZEROALL Set all YMM registers to zero and tag them as unused. Used when switching between 128-bit use and 256-bit use.
VZEROUPPER Set the upper half of all YMM registers to zero. Used when switching between 128-bit use and 256-bit use.

Applications

  • Suitable for floating point
    Floating point
    In computing, floating point describes a method of representing real numbers in a way that can support a wide range of values. Numbers are, in general, represented approximately to a fixed number of significant digits and scaled using an exponent. The base for the scaling is normally 2, 10 or 16...

    -intensive calculations in multimedia, scientific and financial applications (integer
    Integer
    The integers are formed by the natural numbers together with the negatives of the non-zero natural numbers .They are known as Positive and Negative Integers respectively...

     operations are expected in later extensions).
  • Increases parallelism and throughput in floating point SIMD
    SIMD
    Single instruction, multiple data , is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously...

     calculations.
  • Reduces register load due to the non-destructive instructions.

Compiler and assembler support

Recent releases of GCC
GNU Compiler Collection
The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain...

 starting with version 4.6 (although there was a 4.3 branch with certain support) and the Intel Compiler Suite starting with version 11.1 support AVX. The Visual Studio 2010 compiler supports AVX via intrinsic and /arch:AVX switch. The GNU Assembler
GNU Assembler
The GNU Assembler, commonly known as GAS , is the assembler used by the GNU Project. It is the default back-end of GCC. It is used to assemble the GNU operating system and the Linux kernel, and various other software. It is a part of the GNU Binutils package.GAS' executable is named after as, a...

 (GAS) inline assembly functions support these instructions (accessible via GCC), as do Intel primitives and the Intel inline assembler (closely compatible to GAS, although more general in its handling of local references within inline code). Other assemblers such as MASM VS2010 version, YASM
Yasm
Yasm in computing is an assembler, intended as a full rewrite of the Netwide Assembler . Yasm can generally be used interchangeably with NASM and supports the x86 and x86-64 architectures. It is licensed under a revision of the BSD licenses...

 1.1.0, FASM
FASM
FASM in computing is an assembler. It supports programming in Intel-style assembly language on the IA-32 and x86-64 computer architectures. It claims high speed, size optimizations, operating system portability, and macro abilities. It is a low-level assembler and intentionally uses very few...

, NASM and JWASM also apparently support AVX instructions.

Operating system support

AVX adds new register-state through the 256-bit wide YMM register file, so explicit operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

 support is required to properly save & restore AVX's new registers between context switches. The following operating system versions will support AVX:
  • Apple OS X: Support for AVX added in 10.6.8 (Snow Leopard) update released on June 23, 2011.
  • Linux: supported since kernel version 2.6.30, released on June 9, 2009.
  • Windows: supported in Windows 7 SP1 and Windows Server 2008 R2
    Windows Server 2008 R2
    Windows Server 2008 R2 is a server operating system produced by Microsoft. It was released to manufacturing on July 22, 2009 and launched on October 22, 2009. According to the Windows Server Team blog, the retail availability was September 14, 2009. It is built on Windows NT 6.1, the same core...

     SP1.; hotfix 2517374 available for non-SP1 version of Windows Server 2008 R2.; Windows 8
    Windows 8
    Windows 8 is the codename for the next version of the Microsoft Windows computer operating system following Windows 7. It has many changes from previous versions. In particular it adds support for ARM microprocessors in addition to the previously supported x86 microprocessors from Intel and AMD...


CPUs with AVX

  • Intel
    Intel Corporation
    Intel Corporation is an American multinational semiconductor chip maker corporation headquartered in Santa Clara, California, United States and the world's largest semiconductor chip maker, based on revenue. It is the inventor of the x86 series of microprocessors, the processors found in most...

    • Sandy Bridge
      Sandy Bridge (microarchitecture)
      Sandy Bridge is the codename for a microarchitecture developed by Intel beginning in 2005 for central processing units in computers to replace the Nehalem microarchitecture...

       processor, Q1 2011.
    • Future Ivy Bridge processor, Q1 2012.

  • AMD
    Advanced Micro Devices
    Advanced Micro Devices, Inc. or AMD is an American multinational semiconductor company based in Sunnyvale, California, that develops computer processors and related technologies for commercial and consumer markets...

    :
    • Bulldozer
      Bulldozer (processor)
      Bulldozer is the codename Advanced Micro Devices has given to one of the next-generation CPU cores after the K10 microarchitecture for the company's M-SPACE design methodology, with the core specifically aimed at 10-watt to 125-watt TDP computing products. Bulldozer is a completely new design...

       processor, Q3 2011.


Issues regarding compatibility between future Intel and AMD processors are discussed under XOP instruction set.

Future instruction sets

The VEX coding scheme
VEX prefix
The VEX prefix and VEX coding scheme is a proposed future extension to the x86 instruction set architecture for microprocessors from Intel, AMD and others.-Features:...

 allows future extensions of the SIMD
SIMD
Single instruction, multiple data , is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously...

 register size.

Descriptions of other future x86 instruction sets:
  • Intel FMA3
    FMA instruction set
    The FMA instruction set is the name of a future extension to the 128-bit SIMD instructions in the X86 microprocessor instruction set to perform fused multiply–add operations...

  • AMD FMA4
    FMA instruction set
    The FMA instruction set is the name of a future extension to the 128-bit SIMD instructions in the X86 microprocessor instruction set to perform fused multiply–add operations...

  • AMD XOP
    XOP instruction set
    The XOP instruction set, announced by AMD on May 1, 2009, is an extension to the 128-bit SSE core instructions in the x86 and AMD64 instruction set for the Bulldozer processor core, which was released on October 12th, 2011....

  • AMD CVT16
    CVT16 instruction set
    The CVT16 instruction set, announced by AMD on May 1, 2009, is an extension to the 128-bit SSE core instructions in the x86 and AMD64 instruction set.CVT16 is a revision of part of the SSE5 instruction set proposal announced on August 30, 2007...


Advanced Vector Extensions 2

Advanced Vector Extensions 2 (AVX2), also known as Haswell New Instructions, is an expansion of the AVX instruction set to be first introduced in Intel's Haswell microarchitecture. AVX2 makes the following additions:
  • Expansion of most integer AVX instructions to 256 bits
  • 3-operand general-purpose bit manipulation and multiply
  • Gather
    Gather-scatter (vector addressing)
    Gather-scatter is a type of memory addressing that often arises when addressingvectors in sparse linear algebra operations. It is thevector-equivalent of register indirect addressing, with gather involving indexedreads and scatter indexed writes...

     support, enabling vector elements to be loaded from non-contiguous memory locations
  • DWORD- and QWORD-granularity any-to-any permutes
  • Vector shifts
  • 3-operand fused multiply-accumulate
    FMA instruction set
    The FMA instruction set is the name of a future extension to the 128-bit SIMD instructions in the X86 microprocessor instruction set to perform fused multiply–add operations...

     support

CPUs with AVX2

  • Intel
    Intel Corporation
    Intel Corporation is an American multinational semiconductor chip maker corporation headquartered in Santa Clara, California, United States and the world's largest semiconductor chip maker, based on revenue. It is the inventor of the x86 series of microprocessors, the processors found in most...

    • Haswell processor, 2013.
    • Broadwell processor, 2014.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK