Scalable Coherent Interconnect
Encyclopedia
SCI, for Scalable Coherent Interface, is a high-speed interconnect standard for shared memory multiprocessing and message passing. The IEEE Std 1596-1992, IEEE Standard for Scalable Coherent Interface (SCI) was approved by the IEEE standards board on March 19, 1992. The goal was to create an interconnect that would scale well, provide system-wide coherency and a simple interface; i.e. a standard to replace buses in multiprocessor systems without the inherent scalability and performance limitations of buses. The working group soon realized that any form of buses would not suffice and came up with the idea of using point-to-point communication in the form of insertion rings as the right way to go. This approach avoids the lumped capacitance, limited physical length/speed of light problems and stub reflections in addition to allowing parallel transactions. The use of insertion rings is credited to Manolis Katevenis who suggested it at one of the early meetings of the working group. The working group for developing the standard was led by David B. Gustavsson (chair) and David V. James (Vice Chair).

History

SCI originally developed from the Futurebus
Futurebus
Futurebus is a computer bus standard, intended to replace all local bus connections in a computer, including the CPU, memory, plug-in cards and even, to some extent, LAN links between machines. The effort started in 1979 and didn't complete until 1987, and then immediately went into a redesign...

 (IEEE 896) program that started in 1987. Soon after the project started, members of the engineering teams predicted it would already be too slow for the high-end marketplace by the time it would be released in the early 1990s. In response, a group spun off to form the SCI standard targeted at this market. SCI was essentially a subset of Futurebus features that could be easily implemented at high speed, along with a few minor additions to make it easier to connect to other systems, such as VMEbus
VMEbus
VMEbus is a computer bus standard, originally developed for the Motorola 68000 line of CPUs, but later widely used for many applications and standardized by the IEC as ANSI/IEEE 1014-1987. It is physically based on Eurocard sizes, mechanicals and connectors , but uses its own signalling system,...

. Most of the individuals behind the standard had their background from high-speed buses. Representatives from many companies in the IT industry and research community were active participants in the working group. Among those were people from Amdahl, Apple Computer, BB&N, Hewlett Packard, CERN, Dolphin Server Technology, Cray Research, Sequent, AT&T, Digital Equipment Corporation, McDonnell Douglas, National Semiconductor, Stanford Linear Accelerator Center, Tektronix, Texas Instruments, Unisys, University of Oslo, University of Wisconsin.

The original intent was to create a single standard that could be used for all buses in the computer. To quote from the standards website, SCI is a: "combination computer backplane bus, processor memory bus, I/O bus, high performance switch, packet switch, ring, mesh, local area network, optical network, parallel bus, serial bus, information sharing and information communication system that provides distributed directory based cache coherency for a global shared memory model and uses electrical or fiber optic point-to-point unidirectional cables of various widths."

A large part of the intellectual work must be credited to David V. James as the major contributor for writing the specifications including the executable C-code. Stein Gjessing’s group at the University of Oslo used formal methods to verify the coherence protocol and Dolphin Server Technology implemented a node controller chip including the cache coherence logic.

Different versions and derivatives of SCI have been implemented and used in different applications by companies like Dolphin Interconnect Solutions, Convex, Data General (used cache controller and link controller chips from Dolphin Data General AViiON
Data General AViiON
AViiON was a series of computers from Data General that were the company's main product from the late 1980s until the company's server products were discontinued in 2001. Earlier AViiON models used the Motorola 88000 CPU, but later models moved to an all-Intel solution when Motorola stopped work on...

), Sequent and Cray Research. Dolphin Interconnect Solutions implemented a PCI and PCI-Express connected derivative of SCI that provides non-coherent shared memory access. This implementation was used by Sun Microsystems
Sun Microsystems
Sun Microsystems, Inc. was a company that sold :computers, computer components, :computer software, and :information technology services. Sun was founded on February 24, 1982...

 for its high-end clusters, Thales Group and several others including volume applications for message passing within HPC clustering and medical imaging. It was also used by Sequent Computer Systems
Sequent Computer Systems
Sequent Computer Systems, or Sequent, was a computer company that designed and manufactured multiprocessing computer systems. They were among the pioneers in high-performance symmetric multiprocessing open systems, innovating in both hardware and software Sequent Computer Systems, or Sequent, was...

 as the processor memory bus in their NUMA-Q systems. Numascale developed a derivative to connect with coherent HyperTransport.

The standard

The standard defines two interface levels; the physical level that deals with electrical signals, connectors, mechanical and thermal conditions and the logical level that describes the address space, data transfer protocols, cache coherence mechanisms, synchronization primitives, control and status registers, and initialization and error recovery facilities. This structure allows new developments in physical interface technology to be easily adapted without any redesign on the logical level.

Support for large systems (scalability) is achieved through a distributed directory based cache coherence model. (The other popular models for cache coherency are based on system-wide eavesdropping (snooping) of memory transactions – a scheme which is not very scalable.) In SCI each node contains a directory with a pointer to the next node in a linked list that shares a particular cache line.

SCI defines a 64-bit flat address space (16 exabytes) where 16 bits are used for identifying a node (64k, i.e. 65 536 nodes) and 48 bits for address within the node (256 terabytes). A node can contain many processors and/or memory. The SCI standard defines a packet switched network.

Topologies

SCI can be used to build systems with different types of switching topologies from centralized to fully distributed switching. With a central switch, each node is connected to the switch with a ringlet (in this case a two-node ring). In distributed switching systems, each node can be connected to a ring of arbitrary length and either all or some of the nodes can be connected to two or more rings. The most common way to describe these multi-dimensional topologies is k-ary n-cubes (or tori). The SCI standard specification mentions several such topologies as examples.

2-D Torus (rings in two dimensions)
The 2-D torus is a combination of rings in two dimensions. Switching between the two dimensions requires a small switching capability in the node. This can be expanded to three or more dimensions. The concept of folding rings can also be applied to the Torus topologies to avoid any long connection segments.

Transactions

SCI sends information in packets. Each packet consists of an unbroken sequence of 16-bit symbols. The symbol is accompanied by a flag bit. A transition of the flag bit from 0 to 1 indicates the start of a packet. A transition from 1 to 0 occurs 1 (for echoes) or 4 symbols before the packet end. A packet contains a header with address command and status information, payload (from 0 through optional lengths of data) and a CRC check symbol. The first symbol in the packet header contains the destination node address. If the address is not within the domain handled by the receiving node, the packet is passed to the output through the bypass FIFO. In the other case, the packet is fed to a receive queue and may be transferred to a ring in another dimension. All packets are marked when they pass the scrubber (a node is established as scrubber when the ring is initialized). Packets without a valid destination address will be removed when passing the scrubber for the second time to avoid filling the ring with packets that would otherwise circulate indefinitely.

Cache coherence

Cache coherence is crucial to ensure data consistency in multiprocessor systems. The simplest form applied in earlier systems was based on clearing the cache contents between context switches and disabling the cache for data that were shared between two or more processors. These methods were feasible when the performance difference between the cache and memory were less than one order of magnitude. Modern processors with caches that are more than two orders of magnitude faster than main memory would not perform anywhere near optimal without more sophisticated methods for data consistency. Bus based systems use eavesdropping (snooping) methods since buses are inherently broadcast. Modern systems with point-to point links use broadcast methods with snoop filter options to improve performance. Since broadcast and eavesdropping are inherently non-scalable, these are not used in SCI.

Instead SCI uses a distributed directory-based cache coherence protocol with a linked list of nodes containing processors that share a particular cache line. Each node holds a directory for the main memory of the node with a tag for each line of memory (same line length as the cache line). The memory tag holds a pointer to the head of the linked list and a state code for the line (three states – home, fresh, gone). Associated with each node is also a cache for holding remote data with a directory containing forward and backward pointers to nodes in the linked list sharing the cache line. The tag for the cache has seven states (invalid, only fresh, head fresh, only dirty, head dirty, mid valid, tail valid).

The distributed directory is scalable. The overhead for the directory based cache coherence is a constant percentage of the node’s memory and cache. This percentage is in the order of 4% for the memory and 7% for the cache.

Concluding remarks

SCI is a standard for connecting the different resources within a multiprocessor computer system (intra - system) and it is not as widely known to the public as for example the Ethernet network standard for connecting different systems (inter - system). Different system vendors have implemented different flavors of SCI for their internal system infrastructure. These different implementations are by nature somewhat different since they interface to very intricate mechanisms in processors and memory systems and each vendor has to preserve some degrees of compatibility for both hardware and software.

See also

  • Dolphin Interconnect Solutions
    Dolphin Interconnect Solutions
    Dolphin Interconnect Solutions is a manufacturer of high speed data communication systems, located in Oslo, Norway and Woodsville, New Hampshire, USA...

  • List of device bandwidths
  • NUMAlink
    NUMAlink
    NUMAlink is a system interconnect developed by SGI for use in its distributed shared memory ccNUMA computer systems. NUMAlink was originally developed by SGI for their Origin 2000 and Onyx2 systems...

  • QuickRing
    QuickRing
    QuickRing was a gigabit-rate interconnect that combined the functions of a computer bus and a network. It was designed at Apple Computer as a multimedia system to run "on top" of existing local bus systems inside a computer, but was later taken over by National Semiconductor and repositioned as an...

  • HIPPI
    HIPPI
    HIPPI is a computer bus for the attachment of high speed storage devices to supercomputers. It was popular in the late 1980s and into the mid-to-late 1990s, but has since been replaced by ever-faster standard interfaces like SCSI and Fibre Channel.The first HIPPI standard defined a 50-wire...

  • IEEE 1355
    IEEE 1355
    IEEE Standard 1355-1995, IEC 14575, or ISO 14575 is a data communications standard for Heterogeneous Interconnect . It is a low-cost, low latency, scalable serial interconnection system, originally intended for communication between large numbers of inexpensive computers. It lacks many of the...

  • RapidIO
    RapidIO
    The RapidIO architecture is a high-performance packet-switched, interconnect technology for interconnecting chips on a circuit board, and also circuit boards to each other using a backplane...

  • Myrinet
    Myrinet
    Myrinet, ANSI/VITA 26-1998, is a high-speed local area networking system designed by Myricom to be used as an interconnect between multiple machines to form computer clusters. Myrinet has much lower protocol overhead than standards such as Ethernet, and therefore provides better throughput, less...

  • QsNet
    QsNet
    QsNet is a high speed interconnect designed by Quadrics used in HPC clusters, particularly Linux Beowulf Clusters. Although it can be used with TCP/IP; like SCI, Myrinet and Infiniband it is usually used with a communication API such as MPI or SHMEM called from a parallel program.The interconnect...

  • Futurebus
    Futurebus
    Futurebus is a computer bus standard, intended to replace all local bus connections in a computer, including the CPU, memory, plug-in cards and even, to some extent, LAN links between machines. The effort started in 1979 and didn't complete until 1987, and then immediately went into a redesign...

  • InfiniBand
    InfiniBand
    InfiniBand is a switched fabric communications link used in high-performance computing and enterprise data centers. Its features include high throughput, low latency, quality of service and failover, and it is designed to be scalable...

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK