Data file
Encyclopedia
A data file is a computer file
Computer file
A computer file is a block of arbitrary information, or resource for storing information, which is available to a computer program and is usually based on some kind of durable storage. A file is durable in the sense that it remains available for programs to use after the current program has finished...

 which stores data to use by a computer application
Application software
Application software, also known as an application or an "app", is computer software designed to help the user to perform specific tasks. Examples include enterprise software, accounting software, office suites, graphics software and media players. Many application programs deal principally with...

 or system
System software
System software is computer software designed to operate the computer hardware and to provide a platform for running application software.The most basic types of system software are:...

. It generally does not refer to files that contain instructions or code to be executed (typically called program files), or to files which define the operation or structure of an application or system (which include configuration files, directory files, etc.); but specifically to information used as input, or written as output by some other software program. This is especially helpful when debugging a program.

Most computer programs work with file
File
File or filing may refer to:Tools:* File * Filing * Nail filePaper or computer records:* File folder, a folder for holding loose papers* Filing cabinet or file cabinet...

s. This is because files help in storing information permanently. Database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

 programs create files of information. Compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

s read source files and generate executable files. A file itself is a bunch of byte
Byte
The byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer...

s stored on some storage device like tape
Tape
Tape refers to a strip of long, thin and narrow material, usually rolled up. Most commonly, it refers to:- Recording media :* Cassette tape* Digital Audio Tape * Digital Compact Cassette * Digital Tape Format* Magnetic tape sound recording...

, magnetic disk, Optical disk etc. The data files are the files that store data pertaining to a specific application, for later use.

Storage types of Data file

The data files can be stored in two ways:
  1. Text files.
  2. Binary files.


A text file (also called as ASCII files) stores information in ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

 characters. A text file contains visible characters. We can see the contents of file on the monitor or edit it using any of the text editors. In text files ,each line of text is terminated,(delimited) with a special character known as EOL (End of Line)
Newline
In computing, a newline, also known as a line break or end-of-line marker, is a special character or sequence of characters signifying the end of a line of text. The name comes from the fact that the next character after the newline will appear on a new line—that is, on the next line below the...

 character. In text files some internal translations take place when this EOL character is read or written.


Examples of text files
  • A file containing a C++ program


A binary file is a file that contains information in the same format in which the information is held in memory i.e in the binary form. In binary file,there is no delimiter for a line. Also no translations occur in binary files. As a result,binary files are faster and easier for a program to read and write than the text files. As long as the file doesn't need to be read or need to be ported to a different type of system,binary files are the best way to store program information.

Examples of binary files
  • An executable file
  • An object file


In C++, a file, at its lowest level, is interpreted simply as a sequence, or stream, of bytes. One aspect of the file I/O library manages the transfer of these bytes. At this level, the notion of a data type is absent. On the other hand, file, at user level, consists of a sequence of possibly intermixed data types - characters, arithmetic values, class objects. A second aspect of file I/O library manages the interface between these two values.

Stream

A stream is a sequence of bytes. A stream is a general name given to a flow of data. Different streams are used to represent different kinds of data flow. Each stream is associated with a particular class,which contains member functions and definitions for dealing with that particular kind of data flow. The stream that supplies data to the program in known as input stream. It reads the data from the file and hands it over to the program. The stream that receives data from the program is known as output stream. It writes the received data to the file.Following figure illustrates it.
When the main function of your program is invoked, it already has three predefined streams open and available for use. These represent the "standard" input and output channels that have been established for the process.

The fstream.h header file

In C++, file input/output facilities are implemented through a component header file of C++ standard library. This header file is fstream.h. The fstream library predefines a set of operations for handling file related input and output. It defines certain classes that help one perform file input and output. For example, ifstream class ties a file to nthe program for input; ofstream class ties a file to the program for output; and fstream class ties a file to the program for both input and output. The classes defined inside fstream.h derive from classes under iostream.h, the header file that manages console I/O operations in C++. Following figure shows the stream class hierarchy.

The functions of these classes have been summarised in the following table -

Opening and Closing files in C++

In C++, opening of files can be achieved in two ways -
  1. Using the constructor function of the stream class.
  2. Using the function open.


The first method is preferred when a single file is used with a stream, however, for managing multiple files with the same stream, the second method is preferred.
  • Opening files using Constructors


ifstream input_file("DataFile");

The data being read from DataFile has been channelised through the input stream as shown:

The above given statement creates an object (input_file) of input file stream. The object name is a user defined name. After creating the ifstream object input_file, the file DataFile is opened and attached to the input stream input_file. Now both, the data being read from DataFile has been channelised through the input stream object.
The connections with a file are closed automatically when the input and output stream objects expire i.e., when they go out of scope. (For instance, a global object expires when the program terminates). Also you can close a connection with a file explicitly by using close method

input_file.close;

Closing such a connection does not eliminate the stream, it just disconnects it from the file. The stream still remains there. Closing a file flushes the buffer which means the data remaining in the buffer (input or output stream) is moved out of it in the direction it is ought to be.
  • Opening files using open function


ifstream filin; //create an input stream
filin.open("Master.dat"); //associate filin stream with file Master.dat
. //process Master.dat
.
filin.close; //terminate association with Master.dat
filin.open("Tran.dat"); //associate filin stream with file Tran.dat
. //process Tran.dat
.
filin.close; //terminate association

A stream can be connected to only one file at a time.

The concept of File Modes

The filemode describes how a file is to be used - to read from it, to write to it, to append it, and so on.

stream_object.open ("filename", (filemode) );

The following table lists the filemodes available and their meaning:

Steps to process a File in your Program

The five steps to use files in your C++ program are:
  1. Determine the type of link required.
  2. Declare s stream for the desired type of link.
  3. Attach the desired file to the stream.
  4. Now process as required.
  5. Close the file - link with stream.

The complete example program:

/*To get rollnumbers and marks of the students of a class (get from the user) and store these details into a file called 'Marks.dat' */
  1. include

void main
{
ofstream filout ; // stream decided and declared - steps 1 & 2
filout.open("marks.dat", ios :: out) ; // file linked - step 3
char ans = 'y' ; // process as required - step 4 begins
int rollno ;
float marks ;
while(ans

'y' || ans

'Y')
{
cout << " \n Enter Rollno. :" ;
cin >> rollno;
cout << " \n Enter Marks :" ;
cin >> marks ;
filout << rollno << " \n " << marks << " \n " ;
cout << " \n Want to enter more records?(y/n)..." ;
cin >> ans ;
}
filout.close ; // delink the file - step 5
}

File Handling in C++

  • get function:

Prototypes are :

istream & get (char * buf, int num, char delim = '\n') ;

The above first form reads characters into a character array pointed to by buf until either num characters have been read, or the character specified by delim has been encountered. For instance,

char line [40] ;
cin.get (line, 40, '$') ;

The above statements will read characters into line until either 40 characters are read or '$' character is encountered, whichever occurs earlier. If the input given in rtesponse to above statements is as follows :
Value is $ 177.5

Then line will be storing
Value is

And if the input given is as follows :
The amount is 17.5.

The contents of line will be
The amount is 17.5.

The array pointed to by buf will be null - terminated by get . If no delim character is specified, by default a newline character acts as a delimiter. If the delimiter character is encountered in the input stream the get function does not extract it. Rather, the delimiter character remains in the stream until the next input operation.

int get ;

The above second form of get returns the next character from the stream. It returns EOF
EOF
EOF may refer to:*End-of-file, the computing term for an end-of-file condition or its tangible indication*Empirical orthogonal functions, a statistical technique for simplifying a dataset*Enterprise Objects Framework, a product from Apple Computer...

 if the end - of - file is encountered. For instance, the following code fragment illustrates it :

int i ;
char ch ;
ch = i = fin.get ;

If the input given is A, then the value of i will be 65(ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

 value of A) and the value of ch will be A.
  • getline function

Prototype is :

istream & getline (char * buf, int num, char delim = '\n') ;

This function is virtually identical to get(buf, num, delim) version of get . The difference between get(buf, num, delim) and getline is that getline reads and removes the delimiter new - line character from the input stream if it is encountered which is not done by the get function. Following figure explains the difference between get and getline functions :

  • read and write functions :

Reading and writing blocks of binary data is to use C++'s read and write functions. Their prototypes are :

istream & read ( (char *) & buf, int sizeof (buf)) ;
ostream & write ( (char *) & buf, int sizeof (buf)) ;

The read function reads sizeof(buf) bytes from the associated stream and puts them in the buffer pointed to by buf. The write function writes sizeof(buf) bytes to the associated stream from the buffer pointed to by buf. The data written to a file using write can only be read accurately using read . The following program writes a structure to the disk and then reads it back using write and read functions.
  1. include
  2. include
  3. include // for clrscr

struct customer
{
char name [51] ;
float balance ;
};
void main
{
clrscr ;
customer savac;
strcpy(savac.name, "Tina Marshall") ; // copy value to structure
savac.balance = 21310.75 ; // variable savac
ofstream fout ;
fout.open("Saving", ios :: out | ios :: binary) ; // open output file
if(!fout)
{
cout << "File can't be opened \n" ;
return 1;
}
fout.write((char *) & savac, sizeopf(customer)) ; // write to file
fout.close ; // close connection
// read it back now
ifstream fin ;
fin.open("Saving", ios :: out | ios :: binary) ; // open input file
fin.read((char *) & savac, sizeopf(customer)) ; // read structure
cout << savac.name ; // display structure now
cout << "has the balance amount of Rs." << savac.balance << "\n" ;
fin.close;
}

As you can see, only a single call to read or write is necessary to read or write the entire structure. Each individual field need not be read or written separately. If the end - of - file is reached before the specified number of characters have been read, the read simply stops, and the buffer contains as many characters as were available.

File pointers and random access

Every file maintains two pointers called get_pointer (in input mode file) and put_pointer (in output mode file) which tell the current position in the file where writing or reading will take place. These pointers help attain random access in file. That means moving directly to any location in the file instead of moving through it sequentially.In C++, random access is achieved by manipulating seekg , seekp , tellg , tellp functions. The seekg and tellg functions are for input streams (ifstream) and seekp and tellp functions are for output streams (ofstream). However, if you use them with an fstream object then the above functions return the same value. The most common forms of these functions are :

seekg - istream & seekg (long) ; Form 1
istream & seekg (long, seek_dir) ; Form 2
seekp - ofstream & seekp (long) ; Form 1
ofstream & seekp (long, seek_dir) ; Form 2
tellg - long tellg
tellp - long tellp

The seekg (or seekp ) when used according to Form 1, it moves the get_pointer (or put_pointer) to an absolute position. For example,

ifstream fin ;
ofstream fout ;
fin.seekg(30) ; // will move the get_pointer (in ifstream) to byte number 30 in the file.
fout.seekp(30) ; // will move the put_pointer (in ofstream) to byte number 30 in the file.

When seekg (or seekp ) function is used according to Form 2, it moves the get_pointer (or put_pointer) to a position relative to the current position, following the definition of seek_dir. Seek_dir is an enumeration (defined in iostream.h) that has following values.

ios :: beg // refers to beginning of the file
ios :: cur // refers to current position in the file
ios :: end // refers to end of the file

For example,

fin.seekg(30, ios :: beg) ; // go to byte no. 30 from beginning of the file linked with fin.
fin.seekg(-2, ios :: cur) ; // back up 2 bytes from current.
fin.seekg(0, ios :: end) ; // go to the end of the file.
fin.seekg(-5, ios :: end) ; // back up 5 bytes from end of the file.

The methods tellp and tellg return the position (in terms of byte number) of put_pointer and get_pointer respectively in an output file and input file respectively.

Error handling during file I/O

Sometimes during file operations, errors may also creep in. For instance, a file being opened for reading might not exist. Or a file name used for a new file may already exist.Or an attempt could be made to read past the end - of - file, etc. To check for such errors and to ensure smooth processing, C++ file streams inherit stream - state members from the ios class that store the information on the status of a file that is being currently used. The current state of the I/O system is held in an integer, in which the following flags are encoded :

There are several error handling functions supported by class ios that help you read and process the status recorded in a file stream. Following table lists these error handling functions and their meaning :

These functions may be used in the appropriate places in a program to locate the status of a file stream and thereby take the necessary corrective measures. For example :

.
.
.
ifstream fin ;
fin.open ("Master") ;
while (! fin.fail )
{
. . . // process the file.
}
if (fin.eof )
{
. . . // terminate the program.
}
else if (fin.bad )
{
. . . // report fatal error.
}
else
{
fin.clear ; // clear error - state flags
. . .
}
.
.
.

Detecting EOF

You can detect when the end - of - file is reached by using the member functions eof which has the prototype
int eof ;

It returns non - zero when the end - of - file has been reached, otherwise it returns zero. For instance,

ifstream fin ;
fin.open ("Master", ios :: in | ios :: binary) ;
while(! fin.eof ) // as long as eof is zero.
{ // that is, the file's end is not reached.
. . . // process the file.
}
if (fin . eof ) // if non - zero
cout << "End of the file reached ! \n" ;

The above code fragment processes a file as long as EOF
EOF
EOF may refer to:*End-of-file, the computing term for an end-of-file condition or its tangible indication*Empirical orthogonal functions, a statistical technique for simplifying a dataset*Enterprise Objects Framework, a product from Apple Computer...

 is not reached. It uses eof function with the stream object to check for the file's end.

To detect end of file, without using EOF , you may check whether the stream object has become NULL or not e.g.,

ifstream fin ;
fin.open ("Master", ios :: in | ios :: binary) ;
while (fin )
{
. . .
}

Data file categories

Data files come in two broad categories: open and closed.

Closed data file formats
Closed data (frequently referred to as proprietary format files) files have their metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...

 data element
Data element
In metadata, the term data element is an atomic unit of data that has precise meaning or precise semantics. A data element has:# An identification such as a data element name# A clear data element definition# One or more representation terms...

s hidden, obscured or unavailable to users of the file. Application developers do this to discourage users from tampering with or corrupting the data files or importing the data into a competitor's application.

Open data file formats
Open data files have their internal structures available to users of the file through a process of metadata publishing
Metadata publishing
Metadata publishing is the process of making metadata data elements available to external users, both people and machines using a formal review process and a commitment to change control processes....

. Metadata publishing implies that the structure and semantics
Semantics
Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....

 of all the possible data elements within a file are available to users.

Examples of open data files include XML formats such as HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

 for storing web pages or SVG for storing scalable graphics.

See also

  • Index file
  • indexed file
    Indexed file
    An indexed file is a computer file with an index that allows easy random access to any record given its file key.The key must be such that it uniquely identifies a record...

  • Database
    Database
    A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

  • Serialisation
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK