GridFTP
Encyclopedia
GridFTP is an extension of the standard File Transfer Protocol (FTP)
File Transfer Protocol
File Transfer Protocol is a standard network protocol used to transfer files from one host to another host over a TCP-based network, such as the Internet. FTP is built on a client-server architecture and utilizes separate control and data connections between the client and server...

 for use with Grid computing. It is defined as part of the Globus
Globus Toolkit
The Globus Toolkit, currently at version 5, is an open source toolkit for building computing grids developed and provided by the Globus Alliance.-Standards implementation:The Globus Toolkit is an implementation of the following standards:...

 toolkit, under the organisation of the Global Grid Forum (specifically, by the GridFTP working group).

The aim of GridFTP is to provide a more reliable and high performance file transfer for Grid computing applications. This is necessary because of the increased demands of transmitting data in Grid computing - it is frequently necessary to transmit very large files, and this needs to be done fast and reliably.

GridFTP is the answer to the problem of incompatibility between storage and access systems. Previously, each data provider would make their data available in their own specific way, providing a library of access functions. This made it difficult to obtain data from multiple sources, requiring a different access method for each, and thus dividing the total available data into partitions. GridFTP provides a uniform way of accessing the data, encompassing functions from all the different modes of access, building on and extending the universally accepted FTP standard. FTP was chosen as a basis for it because of its widespread use, and because it has a well defined architecture for extensions to the protocol (which may be dynamically discovered).

Features of GridFTP

GridFTP is useful for a number of reasons - including faster transfer and in-built security. It achieves this through the following alterations to normal FTP.

Security with GSI

GSI
Grid Security Infrastructure
The Grid Security Infrastructure , formerly called the Globus Security Infrastructure, is a specification for secret, tamper-proof, delegatable communication between software in a grid computing environment...

 - Grid Security Infrastructure - is another part of the Globus toolkit which provides authentication and encryption to file transfers, with user specified levels of confidentiality and data integrity. FTP itself is inherently insecure, and thus open to packet sniffing and eavesdropping, and has traditionally relied on things such as SSH and SSL for security.

Third party transfers

A useful feature of FTP is that it allows remote transfer between servers to be initiated by a local client. GridFTP builds on this, and adds security and authentication for the local initiator. This feature is similar to File eXchange Protocol
File eXchange Protocol
File eXchange Protocol and is a method of data transfer which uses FTP to transfer data from one remote server to another without routing this data through the client's connection. Conventional FTP involves a single server and a single client; all data transmission is done between these two...

(FXP) in FTP terminology.

Parallel and striped transfer

GridFTP achieves much greater use of bandwidth by allowing multiple simultaneous TCP streams. Files can be downloaded in pieces simultaneously from multiple sources; or even in separate parallel streams from the same source, which is still able to make better use of the bandwidth. Striped and interleaved transfers, again either from multiple or single sources, allow further speed increases.

Partial file transfer

Although FTP has the ability to resume an interrupted file transfer from a specific point in a file, it does not support the transmission of only a certain portion of a file. GridFTP allows a subset of a file to be sent. Such a feature is useful in applications where only small sections of a very large data file are required for processing (a motivating example being the processing of data from a high energy physics experiment, a traditional use of Grid technology).

Fault tolerance and restart

GridFTP provides a fault tolerant implementation of FTP, to handle network unavailability and server problems. Transfers can also be automatically restarted if a problem occurs.

Automatic TCP optimisation

The underlying TCP connection in FTP has numerous settings such as window size and buffer size. GridFTP allows automatic (or manual) negotiation of these settings to provide optimal transfer speeds and reliability (settings are likely to need to be different for best performance with large files and for large groups of files).

External links

  • http://globus.org/toolkit/docs/2.4/datagrid/deliverables/C2WPdraft3.pdf
  • http://globus.org/toolkit/docs/3.2/gridftp/
  • http://www.ogf.org/documents/GFD.20.pdf
  • http://www.ogf.org/documents/GFD.21.pdf
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK