InfiniBand
Encyclopedia
InfiniBand is a switched fabric
Switched fabric
Switched fabric, switching fabric, or just fabric, is a network topology where network nodes connect with each other via one or more network switches . The term is popular in telecommunication, Fibre Channel storage area networks and other high-speed networks, including InfiniBand...

 communications link used in high-performance computing
High-performance computing
High-performance computing uses supercomputers and computer clusters to solve advanced computation problems. Today, computer systems approaching the teraflops-region are counted as HPC-computers.-Overview:...

 and enterprise data centers. Its features include high throughput, low latency, quality of service
Quality of service
The quality of service refers to several related aspects of telephony and computer networks that allow the transport of traffic with special requirements...

 and failover
Failover
In computing, failover is automatic switching to a redundant or standby computer server, system, or network upon the failure or abnormal termination of the previously active application, server, system, or network...

, and it is designed to be scalable
Scalability
In electronics scalability is the ability of a system, network, or process, to handle growing amount of work in a graceful manner or its ability to be enlarged to accommodate that growth...

. The InfiniBand architecture specification defines a connection between processor nodes and high performance I/O nodes such as storage devices.

InfiniBand forms a superset of the Virtual Interface Architecture
Virtual Interface Architecture
The Virtual Interface Architecture is an abstract model of a user-level zero-copy network, and is the basis for InfiniBand, iWARP and RoCE...

.

Description

Effective theoretical throughput

(actual data rate, not signaling rate)
  SDR DDR QDR FDR EDR HDR NDR
1X 2 Gbit/s 4 Gbit/s 8 Gbit/s 14 Gbit/s 25 Gbit/s 125 Gbit/s 750 Gbit/s
4X 8 Gbit/s 16 Gbit/s 32 Gbit/s 56 Gbit/s 100 Gbit/s 500 Gbit/s 3000 Gbit/s
12X 24 Gbit/s 48 Gbit/s 96 Gbit/s 168 Gbit/s 300 Gbit/s 1500 Gbit/s 9000 Gbit/s


Like Fibre Channel
Fibre Channel
Fibre Channel, or FC, is a gigabit-speed network technology primarily used for storage networking. Fibre Channel is standardized in the T11 Technical Committee of the InterNational Committee for Information Technology Standards , an American National Standards Institute –accredited standards...

, PCI Express
PCI Express
PCI Express , officially abbreviated as PCIe, is a computer expansion card standard designed to replace the older PCI, PCI-X, and AGP bus standards...

, Serial ATA
Serial ATA
Serial ATA is a computer bus interface for connecting host bus adapters to mass storage devices such as hard disk drives and optical drives...

, and many other modern interconnects, InfiniBand offers point-to-point bidirectional serial link
Serial communications
In telecommunication and computer science, serial communication is the process of sending data one bit at a time, sequentially, over a communication channel or computer bus. This is in contrast to parallel communication, where several bits are sent as a whole, on a link with several parallel channels...

s intended for the connection of processors with high-speed peripherals such as disks. On top of the point to point capabilities, InfiniBand also offers multicast operations as well. It supports several signalling rates and, as with PCI Express
PCI Express
PCI Express , officially abbreviated as PCIe, is a computer expansion card standard designed to replace the older PCI, PCI-X, and AGP bus standards...

, links can be bonded
Channel bonding
Channel bonding is a computer networking arrangement in which two or more network interfaces on a host computer are combined for redundancy or increased throughput....

 together for additional throughput.

Signaling rate

The SDR serial connection's signalling rate is 2.5 gigabit per second (Gbit/s) in each direction per connection. DDR
Double data rate
In computing, a computer bus operating with double data rate transfers data on both the rising and falling edges of the clock signal. This is also known as double pumped, dual-pumped, and double transition....

 is 5 Gbit/s and QDR
Quad Data Rate
Quad data rate is a communication signaling technique wherein data are transmitted at four points in the clock cycle: on the rising and falling edges, and at two intermediate points between them. The intermediate points are defined by a 2nd clock that is 90° out of phase from the first...

 is 10 Gbit/s. FDR is 14.0625 Gbit/s and EDR is 25.78125Gbit/s per lane.

For SDR, DDR and QDR, links use 8B/10B encoding
8B/10B encoding
In telecommunications, 8b/10b is a line code that maps 8-bit symbols to 10-bit symbols to achieve DC-balance and bounded disparity, and yet provide enough state changes to allow reasonable clock recovery. This means that the difference between the count of 1s and 0s in a string of at least 20 bits...

 — every 10 bits sent carry 8bits of data — making the effective data transmission rate four-fifths the raw rate. Thus single, double, and quad data rates carry 2, 4, or 8 Gbit/s useful data, respectively. For FDR and EDR, links use 64B/66B encoding
64b/66b encoding
In data networking and transmission, 64b/66b is a line code that transforms 64-bit data to 66-bit line code to provide enough state changes to allow reasonable clock recovery and facilitate alignment of the data stream at the receiver....

 — every 66 bits sent carry 64bits of data. (Neither of these calculations takes into account the additional physical layer overhead requirements for comma characters or protocol requirements such as StartOfFrame and EndOfFrame).

Implementers can aggregate links in units of 4 or 12, called 4X or 12X. A 12X QDR link therefore carries 120 Gbit/s raw, or 96 Gbit/s of useful data. most systems use a 4X aggregate, implying a 10 Gbit/s (SDR), 20 Gbit/s (DDR) or 40 Gbit/s (QDR) connection. Larger systems with 12X links are typically used for cluster and supercomputer
Supercomputer
A supercomputer is a computer at the frontline of current processing capacity, particularly speed of calculation.Supercomputers are used for highly calculation-intensive tasks such as problems including quantum physics, weather forecasting, climate research, molecular modeling A supercomputer is a...

 interconnects and for inter-switch
Network switch
A network switch or switching hub is a computer networking device that connects network segments.The term commonly refers to a multi-port network bridge that processes and routes data at the data link layer of the OSI model...

 connections.

The Infiniband future roadmap also has "HDR" (High Data rate), due in 2014, and "NDR" (Next Data Rate), due "some time later", but as of June 2010, these data rates were not yet tied to specific speeds.

Latency

The single data rate switch chips have a latency
Latency (engineering)
Latency is a measure of time delay experienced in a system, the precise definition of which depends on the system and the time being measured. Latencies may have different meaning in different contexts.-Packet-switched networks:...

 of 200 nanoseconds, DDR switch chips have a latency of 140 nanoseconds and QDR switch chips have a latency of 100 nanoseconds. The end-to-end latency range ranges from 1.07 microseconds MPI
Message Passing Interface
Message Passing Interface is a standardized and portable message-passing system designed by a group of researchers from academia and industry to function on a wide variety of parallel computers...

 latency (Mellanox ConnectX QDR HCAs) to 1.29 microseconds MPI latency (Qlogic InfiniPath HCAs) to 2.6 microseconds (Mellanox InfiniHost DDR III HCAs). various InfiniBand host channel adapters (HCA) exist in the market, each with different latency and bandwidth characteristics. InfiniBand also provides RDMA capabilities for low CPU overhead. The latency for RDMA operations is less than 1 microsecond (Mellanox ConnectX HCAs).

Topology

InfiniBand uses a switched fabric
Switched fabric
Switched fabric, switching fabric, or just fabric, is a network topology where network nodes connect with each other via one or more network switches . The term is popular in telecommunication, Fibre Channel storage area networks and other high-speed networks, including InfiniBand...

 topology, as opposed to a hierarchical switched network like traditional Ethernet
Ethernet
Ethernet is a family of computer networking technologies for local area networks commercially introduced in 1980. Standardized in IEEE 802.3, Ethernet has largely replaced competing wired LAN technologies....

 architectures, although emerging Ethernet
Ethernet
Ethernet is a family of computer networking technologies for local area networks commercially introduced in 1980. Standardized in IEEE 802.3, Ethernet has largely replaced competing wired LAN technologies....

 fabric architectures propose many benefits which could see Ethernet
Ethernet
Ethernet is a family of computer networking technologies for local area networks commercially introduced in 1980. Standardized in IEEE 802.3, Ethernet has largely replaced competing wired LAN technologies....

 replace Infiniband. Most of the network topologies
Network topology
Network topology is the layout pattern of interconnections of the various elements of a computer or biological network....

 are Fat-Tree (Clos
Clos network
In the field of telecommunications, a Clos network is a kind of multistage circuit switching network, first formalized by Charles Clos in 1953, which represents a theoretical idealization of practical multi-stage telephone switching systems. Clos networks are required when the physical circuit...

), mesh
Mesh networking
Mesh networking is a type of networking where each node must not only capture and disseminate its own data, but also serve as a relay for other nodes, that is, it must collaborate to propagate the data in the network....

 or 3D-Torus. Recent papers (ISCA'10) demonstrated butterfly topologies as well.

As in the channel model used in most mainframe computer
Mainframe computer
Mainframes are powerful computers used primarily by corporate and governmental organizations for critical applications, bulk data processing such as census, industry and consumer statistics, enterprise resource planning, and financial transaction processing.The term originally referred to the...

s, all transmissions begin or end at a channel adapter. Each processor contains a host channel adapter (HCA) and each peripheral has a target channel adapter (TCA). These adapters can also exchange information for security or quality of service
Quality of service
The quality of service refers to several related aspects of telephony and computer networks that allow the transport of traffic with special requirements...

.

Messages

InfiniBand transmits data in packets of up to 4 KB that are taken together to form a message. A message can be:
  • a direct memory access
    Direct memory access
    Direct memory access is a feature of modern computers that allows certain hardware subsystems within the computer to access system memory independently of the central processing unit ....

     read from or, write to, a remote node (RDMA
    Remote Direct Memory Access
    In computing, remote direct memory access is a direct memory access from the memory of one computer into that of another without involving either one's operating system...

    )
  • a channel
    Channel (communications)
    In telecommunications and computer networking, a communication channel, or channel, refers either to a physical transmission medium such as a wire, or to a logical connection over a multiplexed medium such as a radio channel...

     send or receive
  • a transaction-based operation (that can be reversed)
  • a multicast
    Multicast
    In computer networking, multicast is the delivery of a message or information to a group of destination computers simultaneously in a single transmission from the source creating copies automatically in other network elements, such as routers, only when the topology of the network requires...

     transmission.
  • an atomic operation

Programming

InfiniBand has no standard programming API within the specification. The standard only lists a set of "verbs"; functions that must exist. The syntax of these functions is left to the vendors. The de-facto standard to date has been the syntax developed by the OpenFabrics Alliance
OpenFabrics Alliance
The OpenFabrics Alliance is a non-profit organization that evangelizes remote direct memory access switched fabric technologies for server and storage connectivity...

, which was adopted by most of the InfiniBand vendors, for Linux, FreeBSD, and Windows. The Infiniband software stack developed by OpenFabrics Alliance is released as "OpenFabrics Enterprise Distribution (OFED)", under a choice of two licenses GPL2 or BSD license for Linux and FreeBSD, and as "WinOF" under a choice of BSD license for Windows.

History

InfiniBand originated from the 1999 merger of two competing designs:
  1. Future I/O, developed by Compaq
    Compaq
    Compaq Computer Corporation is a personal computer company founded in 1982. Once the largest supplier of personal computing systems in the world, Compaq existed as an independent corporation until 2002, when it was acquired for US$25 billion by Hewlett-Packard....

    , IBM
    IBM
    International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

    , and Hewlett-Packard
    Hewlett-Packard
    Hewlett-Packard Company or HP is an American multinational information technology corporation headquartered in Palo Alto, California, USA that provides products, technologies, softwares, solutions and services to consumers, small- and medium-sized businesses and large enterprises, including...

  2. Next Generation I/O (ngio), developed by Intel
    Intel Corporation
    Intel Corporation is an American multinational semiconductor chip maker corporation headquartered in Santa Clara, California, United States and the world's largest semiconductor chip maker, based on revenue. It is the inventor of the x86 series of microprocessors, the processors found in most...

    , Microsoft
    Microsoft
    Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...

    , and Sun
    Sun Microsystems
    Sun Microsystems, Inc. was a company that sold :computers, computer components, :computer software, and :information technology services. Sun was founded on February 24, 1982...



From the Compaq side, the roots of the technology derived from Tandem
Tandem Computers
Tandem Computers, Inc. was the dominant manufacturer of fault-tolerant computer systems for ATM networks, banks, stock exchanges, telephone switching centers, and other similar commercial transaction processing applications requiring maximum uptime and zero data loss. The company was founded in...

's ServerNet
ServerNet (Tandem)
- History :ServerNet is a switched fabric communications link primarily used in proprietary computers made by Tandem Computers, Compaq, and HP. Its features include good scalability, clean fault containment, error detection and failover. The ServerNet architecture specification defines a connection...

. For a short time before the group came up with a new name, InfiniBand was called System I/O.

InfiniBand was originally envisioned as a comprehensive "system area network" that would connect CPUs and provide all high speed I/O for "back-office" applications. In this role it would potentially replace just about every datacenter I/O standard including PCI
Peripheral Component Interconnect
Conventional PCI is a computer bus for attaching hardware devices in a computer...

, Fibre Channel
Fibre Channel
Fibre Channel, or FC, is a gigabit-speed network technology primarily used for storage networking. Fibre Channel is standardized in the T11 Technical Committee of the InterNational Committee for Information Technology Standards , an American National Standards Institute –accredited standards...

, and various networks like Ethernet
Ethernet
Ethernet is a family of computer networking technologies for local area networks commercially introduced in 1980. Standardized in IEEE 802.3, Ethernet has largely replaced competing wired LAN technologies....

. Instead, all of the CPUs and peripherals would be connected into a single pan-datacenter switched InfiniBand fabric. This vision offered a number of advantages in addition to greater speed, not the least of which is that I/O workload would be largely lifted from computer and storage. In theory, this should make the construction of clusters much easier, and potentially less expensive, because more devices could be shared and they could be easily moved around as workloads shifted. Proponents of a less comprehensive vision saw InfiniBand as a pervasive, low latency, high bandwidth, low overhead interconnect for commercial datacenters, albeit one that might perhaps only connect servers and storage to each other, while leaving more local connections to other protocols and standards such as PCI.

InfiniBand has become a popular interconnect for high performance computing, and its adoption as seen in the TOP500
TOP500
The TOP500 project ranks and details the 500 most powerful known computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year...

 supercomputers list is faster than Ethernet. In the recent years InfiniBand have been more and more adopted in the Enterprise datacenters, for example Oracle Exadata and Exalogic Machines, financial sectors, cloud computing (InfiniBand based system won the best of VMWorld for Cloud Computing) and more. InfiniBand has been mostly used for high performance clustering computer cluster applications. A number of the TOP500
TOP500
The TOP500 project ranks and details the 500 most powerful known computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year...

 supercomputers have used InfiniBand including the former reigning fastest supercomputer, the IBM Roadrunner. In another example of InfiniBand use within high performance computing, the Cray XD1
Cray XD1
The Cray XD1 was an entry-level supercomputer range, made by Cray Inc.The XD1 uses AMD Opteron 64-bit CPUs, and utilizes the Direct Connect Architecture over HyperTransport to remove the bottleneck at the PCI and contention at the memory. The MPI latency is ¼ that of Infiniband, and 1/30...

 uses built-in Mellanox InfiniBand switches to create a fabric between HyperTransport-connected Opteron
Opteron
Opteron is AMD's x86 server and workstation processor line, and was the first processor which supported the AMD64 instruction set architecture . It was released on April 22, 2003 with the SledgeHammer core and was intended to compete in the server and workstation markets, particularly in the same...

-based compute nodes.

SGI
Silicon Graphics
Silicon Graphics, Inc. was a manufacturer of high-performance computing solutions, including computer hardware and software, founded in 1981 by Jim Clark...

, LSI, DDN, Oracle, Rorke Data among others, have also released storage utilizing InfiniBand "target adapters". These products essentially compete with architectures such as Fibre Channel, SCSI
SCSI
Small Computer System Interface is a set of standards for physically connecting and transferring data between computers and peripheral devices. The SCSI standards define commands, protocols, and electrical and optical interfaces. SCSI is most commonly used for hard disks and tape drives, but it...

, and other more traditional connectivity-methods. Such target adapter-based discs can become a part of the fabric of a given network, in a fashion similar to DEC
Digital Equipment Corporation
Digital Equipment Corporation was a major American company in the computer industry and a leading vendor of computer systems, software and peripherals from the 1960s to the 1990s...

 VMS
OpenVMS
OpenVMS , previously known as VAX-11/VMS, VAX/VMS or VMS, is a computer server operating system that runs on VAX, Alpha and Itanium-based families of computers. Contrary to what its name suggests, OpenVMS is not open source software; however, the source listings are available for purchase...

 clustering. The advantage to this configuration is lower latency and higher availability to nodes on the network (because of the fabric nature of the network). In 2009, the Oak-Ridge National Lab Spider storage system used this type of InfiniBand attached storage to deliver over 240 gigabytes per second of bandwidth.

InfiniBand uses copper CX4 cable for SDR and DDR rates — also commonly used to connect SAS (Serial Attached SCSI
Serial Attached SCSI
Serial Attached SCSI is a computer bus used to move data to and from computer storage devices such as hard drives and tape drives. SAS depends on a point-to-point serial protocol that replaces the parallel SCSI bus technology that first appeared in the mid 1980s in data centers and workstations,...

) HBAs to external (SAS) disk arrays. With SAS, this is known as an SFF-8470
Small Form Factor committee
The Small Form Factor committee is an ad hoc electronics industry group formed to quickly develop interoperability specifications ....

 connector, and is referred to as an "Infiniband style" Connector. The latest connectors used with QDR capable solutions are QSFP (Quad SFP).

In 2008 Oracle Corporation
Oracle Corporation
Oracle Corporation is an American multinational computer technology corporation that specializes in developing and marketing hardware systems and enterprise software products – particularly database management systems...

 released its HP Oracle Database Machine
Oracle Exadata
Oracle Exadata is a database appliance with support for both OLTP and OLAP workloads. It was initially designed in collaboration between Oracle Corporation and Hewlett Packard where Oracle designed the database, operating system , and storage software whereas HP designed the hardware for it...

 build as a RAC Database (Real Application Clustered Database) with storage provided on its Exadata Storage server which utilises InfiniBand as the backend interconnect for all IO and Interconnect traffic. Updated versions of the Exadata Storage system, now using Sun
Sun Microsystems
Sun Microsystems, Inc. was a company that sold :computers, computer components, :computer software, and :information technology services. Sun was founded on February 24, 1982...

 computing hardware, continue to utilize Infiniband infrastructure.

In 2009, IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

 announced a December 2009 release date for their DB2
IBM DB2
The IBM DB2 Enterprise Server Edition is a relational model database server developed by IBM. It primarily runs on Unix , Linux, IBM i , z/OS and Windows servers. DB2 also powers the different IBM InfoSphere Warehouse editions...

 pureScale offering, a shared-disk clustering scheme (inspired by parallel sysplex for DB2 z/OS) that uses a cluster of IBM System p
IBM System p
The System p, formerly known as RS/6000, was IBM's RISC/UNIX-based server and workstation product line.In April 2008, IBM announced a rebranding of the System p and its unification with the System i platform. The resulting product line is called IBM Power Systems.-History:It was originally a line...

 servers (POWER6
POWER6
The POWER6 is a microprocessor developed by IBM that implemented the Power ISA v.2.03. When it became available in systems in 2007, it succeeded the POWER5+ as IBM's flagship Power microprocessor...

/7
POWER7
POWER7 is a Power Architecture microprocessor released in 2010 that succeeded the POWER6. POWER7 was developed by IBM at several sites including IBM's Rochester, MN; Austin, TX; Essex Junction, Vermont; T. J. Watson Research Center, NY; Bromont, QC and Böblingen, Germany laboratories...

) communicating with each other over an InfiniBand interconnect.

In 2010, scale-out network storage manufacturers increasingly adopt InfiniBand as primary cluster interconnect for modern NAS designs, like Isilon IQ
Isilon Systems
Isilon Systems, a division of EMC, is headquartered in Seattle, Washington, USA and sells clustered storage systems and software for digital content and other unstructured data, which includes but is not limited to video, audio, digital images, computer models, PDF files, scanned information, and...

 or IBM SONAS. Since scale-out systems run distributed metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...

 operations without "master node", internal low latency communication is a critical success factor for highest scalability and performance (see TOP500
TOP500
The TOP500 project ranks and details the 500 most powerful known computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year...

 cluster architectures).

In 2010, Oracle releases Exadata and Exalogic machines, those implement the Infiniband QDR with 40 Gb/s (32 Gb/s effective) using Sun Switches (Sun Network QDR InfiniBand Gateway Switch). The Inifiniband fabric is used to connect compute nodes and those with the storage, and is used to connect several Exadata and Exalogic machines also.

In June of 2011, FDR switches and adapters were announced at the International Supercomputing Conference
International Supercomputing Conference
The International Supercomputing Conference is a yearly conference on supercomputing which has been held in Europe since 1986.-History:In 1986 Professor Dr. Hans Werner Meuer, director of the computer centre and professor for computer science at the University of Mannheim co-founded and organized...

.

See also

  • InfiniBand Trade Association
    InfiniBand Trade Association
    The InfiniBand Trade Association is the standards organization that defines and maintains the InfiniBand specification. It is an industry consortium....

  • SCSI RDMA Protocol
    SCSI RDMA Protocol
    In computing the SCSI RDMA Protocol is a protocol that allows one computer to access SCSI devices attached to another computer via remote direct memory access . The SRP protocol is also known as the SCSI Remote Protocol. The use of RDMA makes higher throughput and lower latency possible than what...

     (SRP)
  • RDMA over Converged Ethernet
    Rocé
    Rocé is a commune in the Loir-et-Cher department of central France.-See also:*Communes of the Loir-et-Cher department...

  • iWARP
    IWARP
    The Internet Wide Area RDMA Protocol is a computer networking protocol for transferring data efficiently.It is sometimes referred to simply as "RDMA", though RDMA is not a feature exclusive to iWARP.-History:...

  • List of device bandwidths
  • Optical interconnect
    Optical interconnect
    Optical interconnect is a way of communication by optical cables. Compared to traditional cables, optical wires are capable of a much higher bandwidth, from 10 Gb/s up to 100 Gb/s....

  • Interconnect bottleneck
    Interconnect bottleneck
    The interconnect bottleneck, the point at which integrated circuits reach their capacity, is expected sometime around 2010.Improved performance of computer systems has been achieved, in large part, by downscaling the IC minimum feature size. This allows the basic IC building block, the transistor,...

  • Optical fiber cable
    Optical fiber cable
    An optical fiber cable is a cable containing one or more optical fibers. The optical fiber elements are typically individually coated with plastic layers and contained in a protective tube suitable for the environment where the cable will be deployed....

  • Optical communication
    Optical communication
    Optical communication is any form of telecommunication that uses light as the transmission medium.An optical communication system consists of a transmitter, which encodes a message into an optical signal, a channel, which carries the signal to its destination, and a receiver, which reproduces the...

  • Parallel optical interface
    Parallel optical interface
    A parallel optical interface is a form of fiber optic technology aimed primarily at communications and networking over relatively short distances , and at high bandwidths....


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK