GlusterFS
Encyclopedia
GlusterFS is a scale-out NAS
Network-attached storage
Network-attached storage is file-level computer data storage connected to a computer network providing data access to heterogeneous clients. NAS not only operates as a file server, but is specialized for this task either by its hardware, software, or configuration of those elements...

 file system
File system
A file system is a means to organize data expected to be retained after a program terminates by providing procedures to store, retrieve and update data, as well as manage the available space on the device which contain it. A file system organizes data in an efficient manner and is tuned to the...

 developed by Gluster
Gluster
Gluster Inc. is a software company that provides an open source platform for scale-out Public and Private Cloud Storage. The company is privately funded and headquartered in Sunnyvale, California with an engineering center in Bangalore, India. Gluster is funded by Nexus Venture Partners and Index...

. It aggregates various storage servers over Ethernet
Ethernet
Ethernet is a family of computer networking technologies for local area networks commercially introduced in 1980. Standardized in IEEE 802.3, Ethernet has largely replaced competing wired LAN technologies....

 or Infiniband
InfiniBand
InfiniBand is a switched fabric communications link used in high-performance computing and enterprise data centers. Its features include high throughput, low latency, quality of service and failover, and it is designed to be scalable...

 RDMA
Remote Direct Memory Access
In computing, remote direct memory access is a direct memory access from the memory of one computer into that of another without involving either one's operating system...

 interconnect into one large parallel network file system. GlusterFS is based on a stackable user space design without compromising performance. It has found a variety of applications including cloud computing
Cloud computing
Cloud computing is the delivery of computing as a service rather than a product, whereby shared resources, software, and information are provided to computers and other devices as a utility over a network ....

, biomedical sciences and archival storage. GlusterFS is free software, licensed under GNU GPL v3 license.

Gluster
Gluster
Gluster Inc. is a software company that provides an open source platform for scale-out Public and Private Cloud Storage. The company is privately funded and headquartered in Sunnyvale, California with an engineering center in Bangalore, India. Gluster is funded by Nexus Venture Partners and Index...

, Inc. is the primary commercial sponsor of GlusterFS, and offers both commercial products and support for GlusterFS-based solutions. On October 2011, it was announced that Gluster
Gluster
Gluster Inc. is a software company that provides an open source platform for scale-out Public and Private Cloud Storage. The company is privately funded and headquartered in Sunnyvale, California with an engineering center in Bangalore, India. Gluster is funded by Nexus Venture Partners and Index...

, Inc. was to be purchased by Red Hat

Design

GlusterFS has a client and server component. Servers are typically deployed as storage bricks, with each server running a glusterfsd daemon to export a local file system as a volume. The glusterfs client process, which connects to servers with a custom protocol over TCP/IP, InfiniBand
InfiniBand
InfiniBand is a switched fabric communications link used in high-performance computing and enterprise data centers. Its features include high throughput, low latency, quality of service and failover, and it is designed to be scalable...

 or SDP
Sockets Direct Protocol
The Sockets Direct Protocol is a networking protocol originally defined by the Software Working Group of the InfiniBand Trade Association. Originally designed for InfiniBand , SDP now has been redefined as a transport agnostic protocol for Remote Direct Memory Access network fabrics...

, composes composite virtual volumes from multiple remote servers using stackable translators. By default, files are stored whole, but striping
Data striping
In computer data storage, data striping is the technique of segmenting logically sequential data, such as a file, in a way that accesses of sequential segments are made to different physical storage devices. Striping is useful when a processing device requests access to data more quickly than a...

 of files across multiple remote volumes is also supported. The final volume may then be mounted by the client host through the FUSE
Filesystem in Userspace
Filesystem in Userspace is a loadable kernel module for Unix-like computer operating systems that lets non-privileged users create their own file systems without editing kernel code...

 mechanism or accessed via libglusterfs client library without incurring FUSE filesystem overhead.

Most of the functionality of GlusterFS is implemented as translators, including:
  • File-based mirroring
    Mirror (computing)
    In computing, a mirror is an exact copy of a data set. On the Internet, a mirror site is an exact copy of another Internet site.Mirror sites are most commonly used to provide multiple sources of the same information, and are of particular value as a way of providing reliable access to large downloads...

     and replication
    Replication (computer science)
    Replication is the process of sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility. It could be data replication if the same data is stored on multiple storage devices, or...

  • File-based striping
    Data striping
    In computer data storage, data striping is the technique of segmenting logically sequential data, such as a file, in a way that accesses of sequential segments are made to different physical storage devices. Striping is useful when a processing device requests access to data more quickly than a...

  • File-based load balancing
    Load balancing (computing)
    Load balancing is a computer networking methodology to distribute workload across multiple computers or a computer cluster, network links, central processing units, disk drives, or other resources, to achieve optimal resource utilization, maximize throughput, minimize response time, and avoid...

  • Volume failover
    Failover
    In computing, failover is automatic switching to a redundant or standby computer server, system, or network upon the failure or abnormal termination of the previously active application, server, system, or network...

  • scheduling
    I/O scheduling
    Input/output scheduling is a term used to describe the method computer operating systems decide the order that block I/O operations will be submitted to storage volumes...

     and disk caching
  • Storage quotas
    Disk quota
    A disk quota is a limit set by a system administrator that restricts certain aspects of file system usage on modern operating systems. The function of using disk quotas is to allocate limited disk space in a reasonable way.-Types of quotas:...



The GlusterFS server is kept minimally simple: it exports an existing file system as-is, leaving it up to client-side translators to structure the store. The clients themselves are stateless, do not communicate with each other, and are expected to have translator configurations consistent with each other. GlusterFS relies on an elastic hashing algorithm, rather than using either a centralized or distributed metadata model. With version 3.1 and later of GlusterFS, volumes can be added, deleted, or migrated dynamically, helping to avoid coherency
Consistency model
In computer science, consistency models are used in distributed systems like distributed shared memory systems or distributed data stores . The system supports a given model, if operations on memory follow specific rules...

 problems, and allowing GlusterFS to scale up to several petabyte
Petabyte
A petabyte is a unit of information equal to one quadrillion bytes, or 1000 terabytes. The unit symbol for the petabyte is PB...

s on commodity hardware by avoiding bottlenecks that normally affect more tightly-coupled distributed file systems.

More elaborate and comprehensive documentation is written with wiki on GlusterFS Wiki and a work in progress Russian translation is also available Ru:GlusterFS Wiki

Academic References

GlusterFS has been used as the foundation for several pieces of academic research
and one survey article
.

See also

  • Distributed file system
    Distributed file system
    Network file system may refer to:* A distributed file system, which is accessed over a computer network* Network File System , a specific brand of distributed file system...

  • List of file systems, the distributed parallel fault-tolerant file system section
  • Ceph
    Ceph
    -External links:* *...

  • Fraunhofer Parallel File System (FhGFS)
    FhGFS
    The Fraunhofer Parallel File System, also known as FraunhoferFS and abbreviated FhGFS, is a parallel file system, optimized for usage in the field of High Performance Computing. The most important aspect is data throughput. It is developed at the Fraunhofer Institute for Industrial Mathematics in...

  • MooseFS
    Moose File System
    Moose File System is a distributed file system developed by Gemius SA. The lead developer is Jakub Kruszona-Zawadzki. MooseFS aims to be fault-tolerant, scalable, POSIX compliant, general-purpose file system for datacenters...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK