Data set (IBM mainframe)
Encyclopedia
data set dataset (preferred), is a computer file
Computer file
A computer file is a block of arbitrary information, or resource for storing information, which is available to a computer program and is usually based on some kind of durable storage. A file is durable in the sense that it remains available for programs to use after the current program has finished...

 having a record organization. The term pertains to the IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

 mainframe
Mainframe computer
Mainframes are powerful computers used primarily by corporate and governmental organizations for critical applications, bulk data processing such as census, industry and consumer statistics, enterprise resource planning, and financial transaction processing.The term originally referred to the...

 operating system line, starting with OS/360, and is still used by its successors, including the current z/OS
Z/OS
z/OS is a 64-bit operating system for mainframe computers, produced by IBM. It derives from and is the successor to OS/390, which in turn followed a string of MVS versions.Starting with earliest:*OS/VS2 Release 2 through Release 3.8...

. Those systems historically preferred this term over a file. A dataset is typically stored on direct access storage device
Direct access storage device
In mainframe computers and some minicomputers, a direct access storage device, or DASD , is any secondary storage device which has relatively low access time relative to its capacity....

 (DASD) or magnetic tape
Magnetic tape
Magnetic tape is a medium for magnetic recording, made of a thin magnetizable coating on a long, narrow strip of plastic. It was developed in Germany, based on magnetic wire recording. Devices that record and play back audio and video using magnetic tape are tape recorders and video tape recorders...

, however unit record devices are also supported.

Datasets are not unstructured streams of byte
Byte
The byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer...

s, but rather are organized in various logical record and block structures determined by the DSORG (data set organization), RECFM (record format), and other parameters. These parameters are specified at the time of the data set allocation (creation), for example with the Job Control Language
Job Control Language
Job Control Language is a scripting language used on IBM mainframe operating systems to instruct the system on how to run a batch job or start a subsystem....

 DD statements. Inside a job they are stored in the Data Control Block
Data Control Block
In IBM mainframe operating systems, such as OS/360, MVS, z/OS, a Data Control Block is a description of a dataset in a program. A DCB is coded in Assembler programs using the DCB macro instruction...

 (DCB), which is a data structure used to access datasets, for example using access method
Access method
An access method is a function of a mainframe operating system that enables access to data on disk, tape or other external devices. They were introduced in 1963 in IBM OS/360 operating system...

s.

Dataset organization

OS/360, the DCB's DSORG parameter specifies how the dataset is organized. It may be physically sequential ("PS"), indexed sequential ("IS"), partitioned ("PO"), or Direct Access ("DA"). Datasets on tape may only be DSORG=PS. The choice of organization depends on how the data is to be accessed, and in particular, how it is to be updated.

Programmers utilize various access method
Access method
An access method is a function of a mainframe operating system that enables access to data on disk, tape or other external devices. They were introduced in 1963 in IBM OS/360 operating system...

s (such as QSAM or VSAM) in programs reading and writing data sets, their choice depending on given data set organization.

Record format (RECFM)

Regardless of organization, the physical structure of each record is essentially the same, and is uniform throughout the dataset. This is specified in the DCB RECFM parameter. RECFM=F means that the records are of fixed length, specified via the LRECL parameter, and RECFM=V specifies a variable-length record. V records when stored on media are prefixed by a Record Descriptor Word (RDW) containing the integer length of the record in bytes. With RECFM=FB and RECFM=VB, multiple logical records are grouped together into a single physical block
Block (data storage)
In computing , a block is a sequence of bytes or bits, having a nominal length . Data thus structured are said to be blocked. The process of putting data into blocks is called blocking. Blocking is used to facilitate the handling of the data-stream by the computer program receiving the data...

 on tape or disk. FB and VB are fixed-blocked, and variable-blocked, respectively. The BLKSIZE parameter specifies the maximum length of the block. RECFM=FBS could be also specified, meaning fixed-blocked standard, meaning the all blocks except the last one were required to be in full BLKSIZE length. RECFM=VBS, or variable-blocked spanned, means a logical record could be spanned across two or more blocks, with flags in the RDW indicating whether a record segment is continued into the next block and/or was continued from the previous one.

This mechanism eliminates the need for using any "delimiter" byte value to separate records. Thus data can be of any type, including binary integers, floating point, or characters, without introducing a false end-of-record condition. The data set is an abstraction of a collection of records, in contrast to files as unstructured streams of bytes.

Partitioned datasets

For example, a PDS or Partitioned Data Set is a dataset containing multiple members, each of which holds a separate sub-data set, similar to a directory
Directory (file systems)
In computing, a folder, directory, catalog, or drawer, is a virtual container originally derived from an earlier Object-oriented programming concept by the same name within a digital file system, in which groups of computer files and other folders can be kept and organized.A typical file system may...

 in other types of file system
File system
A file system is a means to organize data expected to be retained after a program terminates by providing procedures to store, retrieve and update data, as well as manage the available space on the device which contain it. A file system organizes data in an efficient manner and is tuned to the...

s. This type of dataset is often used to hold executable programs (load modules), source program libraries (especially Assembler macro definitions). A PDS may be compared to a Zip
ZIP (file format)
Zip is a file format used for data compression and archiving. A zip file contains one or more files that have been compressed, to reduce file size, or stored as is...

 file on microcomputer
Microcomputer
A microcomputer is a computer with a microprocessor as its central processing unit. They are physically small compared to mainframe and minicomputers...

s, except the files stored in a PDS are not compressed.

The Partitioned Data Set can only allocate on a single volume with the maximum size of 65536 tracks.

Besides members, a PDS consists also of their directory. Each member can be accessed directly using the directory structure. Once a member is located, the data stored in that member is handled in the same manner as a PS (sequential) data set.

Whenever a member is deleted, the space it occupied is unusable for storing other data. Likewise, if a member is re-written, it is stored in a new spot at the back of the PDS and leaves wasted “dead” space in the middle. The only way to recover “dead” space is to perform frequent file compression, that moves all members to the front of the data space and leaves free usable space at the back. (Note that in modern parlance, this kind of operation might be called defragmentation
Defragmentation
In the maintenance of file systems, defragmentation is a process that reduces the amount of fragmentation. It does this by physically organizing the contents of the mass storage device used to store files into the smallest number of contiguous regions . It also attempts to create larger regions of...

 or garbage collection
Garbage collection (computer science)
In computer science, garbage collection is a form of automatic memory management. The garbage collector, or just collector, attempts to reclaim garbage, or memory occupied by objects that are no longer in use by the program...

; data compression
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

 nowadays refers to a different, more complicated concept.) PDS files can only reside on disk in order to use the directory structure to access individual members, not on tape. They are most often used for storing multiple JCL files, utility control statements and executable modules.

An improvement of this scheme is a Partitioned Data Set Extended (PDSE or PDS/E, sometimes just libraries) introduced with MVS/XA system.

PDS/E structure is similar to PDS and is used to store the same types of data. However, PDS/E files have a better directory structure which does not require pre-allocation of directory blocks when the PDS/E is defined (and therefore does not run out of directory blocks if not enough were specified). Also, PDS/E automatically stores members in such a way that compression operation is not needed to reclaim "dead" space. PDS/E files can only reside on disk in order to use the directory structure to access individual members.

See also

  • Volume table of contents (VTOC), a structure describing data sets stored on the disk
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK