Flat file database
Encyclopedia
A flat file database describes any of various means to encode a database model
Database model
A database model is the theoretical foundation of a database and fundamentally determines in which manner data can be stored, organized, and manipulated in a database system. It thereby defines the infrastructure offered by a particular database system...

 (most commonly a table
Table (database)
In relational databases and flat file databases, a table is a set of data elements that is organized using a model of vertical columns and horizontal rows. A table has a specified number of columns, but can have any number of rows...

) as a single file (such as .txt or .ini).

Overview

A "flat file" is a plain text
Plain text
In computing, plain text is the contents of an ordinary sequential file readable as textual material without much processing, usually opposed to formatted text....

 or mixed text and binary file
Binary file
A binary file is a computer file which may contain any type of data, encoded in binary form for computer storage and processing purposes; for example, computer document files containing formatted text...

 which usually contains one record
Record (computer science)
In computer science, a record is an instance of a product of primitive data types called a tuple. In C it is the compound data in a struct. Records are among the simplest data structures. A record is a value that contains other values, typically in fixed number and sequence and typically indexed...

 per line or 'physical' record (example on disc
Disk storage
Disk storage or disc storage is a general category of storage mechanisms, in which data are digitally recorded by various electronic, magnetic, optical, or mechanical methods on a surface layer deposited of one or more planar, round and rotating disks...

 or tape
Magnetic tape
Magnetic tape is a medium for magnetic recording, made of a thin magnetizable coating on a long, narrow strip of plastic. It was developed in Germany, based on magnetic wire recording. Devices that record and play back audio and video using magnetic tape are tape recorders and video tape recorders...

). Within such a record, the single field
Field (computer science)
In computer science, data that has several parts can be divided into fields. Relational databases arrange data as sets of database records, also called rows. Each record consists of several fields; the fields of all records form the columns....

s can be separated by delimiters, e.g. commas
Comma-separated values
A comma-separated values file stores tabular data in plain-text form. As a result, such a file is easily human-readable ....

, or have a fixed length. In the latter case, padding may be needed to achieve this length. Extra formatting may be needed to avoid delimiter collision. There are no structural relationships between the records.

Typical examples of flat files are /etc/passwd and /etc/group on Unix-like
Unix-like
A Unix-like operating system is one that behaves in a manner similar to a Unix system, while not necessarily conforming to or being certified to any version of the Single UNIX Specification....

 operating systems. Another example of a flat file is a name-and-address list with the fields Name, Address, and Phone Number.

A list of names, addresses, and phone numbers written on a sheet of paper is a flat file database. This can also be done with any typewriter
Typewriter
A typewriter is a mechanical or electromechanical device with keys that, when pressed, cause characters to be printed on a medium, usually paper. Typically one character is printed per keypress, and the machine prints the characters by making ink impressions of type elements similar to the pieces...

 or word processor
Word processor
A word processor is a computer application used for the production of any sort of printable material....

. A spreadsheet
Spreadsheet
A spreadsheet is a computer application that simulates a paper accounting worksheet. It displays multiple cells usually in a two-dimensional matrix or grid consisting of rows and columns. Each cell contains alphanumeric text, numeric values or formulas...

 or text editor
Text editor
A text editor is a type of program used for editing plain text files.Text editors are often provided with operating systems or software development packages, and can be used to change configuration files and programming language source code....

 program may be used to implement flat file databases.

History

The first uses of computing machines were implementations of simple databases. Herman Hollerith
Herman Hollerith
Herman Hollerith was an American statistician who developed a mechanical tabulator based on punched cards to rapidly tabulate statistics from millions of pieces of data. He was the founder of one of the companies that later merged and became IBM.-Personal life:Hollerith was born in Buffalo, New...

 conceived the idea that census data could be represented by holes punched in paper cards and tabulated by machine. He sold his concept to the US Census Bureau
United States Census Bureau
The United States Census Bureau is the government agency that is responsible for the United States Census. It also gathers other national demographic and economic data...

; thus, the Census of 1890 was the first ever computerized database—consisting, in essence, of thousands of boxes full of punched card
Punched card
A punched card, punch card, IBM card, or Hollerith card is a piece of stiff paper that contains digital information represented by the presence or absence of holes in predefined positions...

s.

Hollerith's enterprise grew into computer giant IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

, which dominated the data processing market for most of the 20th century. IBM's fixed-length field, 80-column punch cards became the ubiquitous means of inputting electronic data until the 1970s.

In the 1980s, configurable flat-file database computer applications were popular on DOS
DOS
DOS, short for "Disk Operating System", is an acronym for several closely related operating systems that dominated the IBM PC compatible market between 1981 and 1995, or until about 2000 if one includes the partially DOS-based Microsoft Windows versions 95, 98, and Millennium Edition.Related...

 and the Macintosh. These programs were designed to make it easy for individuals to design and use their own databases, and were almost on par with word processors and spreadsheet
Spreadsheet
A spreadsheet is a computer application that simulates a paper accounting worksheet. It displays multiple cells usually in a two-dimensional matrix or grid consisting of rows and columns. Each cell contains alphanumeric text, numeric values or formulas...

s in popularity. Examples of flat-file database products were early versions of FileMaker
FileMaker
FileMaker Pro is a cross-platform relational database application from FileMaker Inc., formerly Claris, a subsidiary of Apple Inc. It integrates a database engine with a GUI-based interface, allowing users to modify the database by dragging new elements into layouts, screens, or forms...

 and the shareware
Shareware
The term shareware is a proprietary software that is provided to users without payment on a trial basis and is often limited by any combination of functionality, availability, or convenience. Shareware is often offered as a download from an Internet website or as a compact disc included with a...

 PC-File
PC-File
PC-File was a flat file database computer application most often run on DOS. It was one of the first of three widely popular software products sold via the marketing method that became known as shareware...

. Some of these, like dBase II, offered limited relational capabilities, allowing some data to be shared between files.

Contemporary implementations

FairCom's c-tree
C-tree
c-treeACE is a cross-platform database engine developed by FairCom Corporation. Software developers typically embed the c-treeACE engine within the applications that they create and then deploy the application and engine together as an integrated solution....

 is an example of a modern enterprise-level solution, and spreadsheet
Spreadsheet
A spreadsheet is a computer application that simulates a paper accounting worksheet. It displays multiple cells usually in a two-dimensional matrix or grid consisting of rows and columns. Each cell contains alphanumeric text, numeric values or formulas...

 software is often used for this purpose, but aside from that there are very few programs available today that would allow a novice to create and use a general-purpose flat file database. This functionality is implemented in Microsoft Works
Microsoft Works
Microsoft Works is an integrated package software that is produced by Microsoft. Works is smaller, less expensive, and has fewer features than Microsoft Office or other major office suites. Its core functionality includes a word processor, a spreadsheet and a database management system...

 (available only for some versions of Windows) and Apple Works, sometimes named ClarisWorks Office (available for Macintosh and some versions on the Windows platform). Over time, products like Borland
Borland
Borland Software Corporation is a software company first headquartered in Scotts Valley, California, Cupertino, California and finally Austin, Texas. It is now a Micro Focus subsidiary. It was founded in 1983 by Niels Jensen, Ole Henriksen, Mogens Glad and Philippe Kahn.-The 1980s:...

's Paradox, and Microsoft
Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...

's Access
Microsoft Access
Microsoft Office Access, previously known as Microsoft Access, is a relational database management system from Microsoft that combines the relational Microsoft Jet Database Engine with a graphical user interface and software-development tools. It is a member of the Microsoft Office suite of...

 started offering some relational capabilities, as well as built-in programming languages. Database Management Systems (DBMS) like MySQL
MySQL
MySQL officially, but also commonly "My Sequel") is a relational database management system that runs as a server providing multi-user access to a number of databases. It is named after developer Michael Widenius' daughter, My...

 or Oracle
Oracle database
The Oracle Database is an object-relational database management system produced and marketed by Oracle Corporation....

 generally require programmers to build applications.

Flat file databases are still used internally by many computer applications to store configuration data. Many applications allow users to store and retrieve their own information from flat files using a per-defined set of fields. Examples are programs to manage collections of books or appointments. Some small address book
Address book
An address book or a name and address book is a book or a database used for storing entries called contacts. Each contact entry usually consists of a few standard fields...

 applications are essentially single-purpose flat file databases. As of 2011 one of the most popular flat file database engines is the SQLite
SQLite
SQLite is an ACID-compliant embedded relational database management system contained in a relatively small C programming library. The source code for SQLite is in the public domain and implements most of the SQL standard...

, which is part of the PHP
PHP
PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...

5 standard distribution.

Data transfer operations

Flat Files are used not only as data storage tools in DB and CMS systems, but also as data transfer tools to remote servers (in which case they become known as information streams).
In recent years, this latter implementation has been replaced with XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

 files, which not only contain but also describe the data. Those still using Flat Files to transfer information are mainframes employing specific procedures that nobody dares to modify.
One criticism often raised against the XML format as a way to perform mass data transfer operations is that file size is significantly large with respect to that of Flat Files, which is generally reduced to the bare minimum. The solution to this problem consists in XML file compression (a solution that applies equally well to Flat Files), which has nowadays gained EXI
Efficient XML Interchange
Efficient XML Interchange is a proposed data format from the Efficient XML Interchange Working Group of the World Wide Web Consortium...

 standards (i.e., Efficient XML Interchange, which is often used by mobile devices).
It is advisable that transfer data be performed via EXI rather than Flat Files because defining the compression method is not required, because libraries reading the file contents are readily available, and because there is no need for the two communicating systems to preliminarly establish a protocol describing data properties such as position, alignment, type, format, etc. However, in those circumstances where the sheer mass of data and/or the inadequacy of legacy systems becomes a problem, the only viable solution remains the use of Flat Files. In order to successfully handle those problems connected with data communication, format, validation, control and much else (be it a
Flat File or an XML file data source), it is advisable to adopt a Data Quality Firewall
Data Quality Firewall
A Data Quality Firewall is the use of software to protect a computer system from the entry of erroneous, duplicated or poor quality data. Gartner estimates that poor quality data causes failure in up to 50% of Customer relationship management systems...

.

XML is being gradually replaced by JSON
JSON
JSON , or JavaScript Object Notation, is a lightweight text-based open standard designed for human-readable data interchange. It is derived from the JavaScript scripting language for representing simple data structures and associative arrays, called objects...

, YAML
YAML
YAML is a human-readable data serialization format that takes concepts from programming languages such as C, Perl, and Python, and ideas from XML and the data format of electronic mail . YAML was first proposed by Clark Evans in 2001, who designed it together with Ingy döt Net and Oren Ben-Kiki...

 and possibly other structured text formats.

Terms

"Flat file database" may be defined very narrowly, or more broadly. The narrower interpretation is correct in database theory
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

; the broader covers the term as generally used.

Strictly, a flat file database should consist of nothing but data and, if records vary in length, delimiters. More broadly, the term refers to any database which exists in a single file in the form of rows and columns, with no relationships or links between records and fields except the table structure.

Terms used to describe different aspects of a database and its tools differ from one implementation to the next, but the concepts remain the same. FileMaker uses the term "Find", while MySQL uses the term "Query"; but the concept is the same. FileMaker "files", in version 7 and above, are equivalent to MySQL "databases", and so forth. To avoid confusing the reader, one consistent set of terms is used throughout this article.

However, the basic terms "record" and "field" are used in nearly every flat file database implementation.

Example database

The following example illustrates the basic elements of a flat-file database. The data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...

 arrangement consists of a series of columns and rows organized into a tabular format
Table (information)
A table is a means of arranging data in rows and columns.Production % of goalNorth 4087102%South 4093110% The use of tables is pervasive throughout all communication, research and data analysis. Tables appear in print media, handwritten notes, computer software, architectural...

. This specific example uses only one table.

The columns include: name (a person's name, second column); team (the name of an athletic team supported by the person, third column); and a numeric unique ID, (used to uniquely identify records, first column).

Here is an example textual representation of the described data:

id name team
1 Amy Blues
2 Bob Reds
3 Chuck Blues
4 Dick Blues
5 Ethel Reds
6 Fred Blues
7 Gilly Blues
8 Hank Reds

This type of data representation is quite standard for a flat-file database, although there are some additional considerations that are not readily apparent from the text:
  • Data types: each column in a database table such as the one above is ordinarily restricted to a specific data type
    Data type
    In computer programming, a data type is a classification identifying one of various types of data, such as floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of...

    . Such restrictions are usually established by convention, but not formally indicated unless the data is transferred to a relational database
    Relational database
    A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...

     system.
  • Separated columns: In the above example, individual columns are separated using whitespace
    Whitespace (computer science)
    In computer science, whitespace is any single character or series of characters that represents horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visual mark, but typically does occupy an area on a page...

     characters. This is also called indentation or "fixed-width" data formatting. Another common convention is to separate columns using one or more delimiter
    Delimiter
    A delimiter is a sequence of one or more characters used to specify the boundary between separate, independent regions in plain text or other data streams. An example of a delimiter is the comma character, which acts as a field delimiter in a sequence of comma-separated values.Delimiters represent...

     characters. There are many different conventions for depicting data such as that above in text. (See e.g., Comma-separated values
    Comma-separated values
    A comma-separated values file stores tabular data in plain-text form. As a result, such a file is easily human-readable ....

    , Delimiter-separated values, Markup language
    Markup language
    A markup language is a modern system for annotating a text in a way that is syntactically distinguishable from that text. The idea and terminology evolved from the "marking up" of manuscripts, i.e. the revision instructions by editors, traditionally written with a blue pencil on authors' manuscripts...

    , Programming language
    Programming language
    A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

    ). Using delimiters incurs some overhead
    Computational overhead
    In computer science, overhead is generally considered any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to attain a particular goal...

     in locating them every time they are processed (unlike fixed-width formatting) which may have some performance
    Computer performance
    Computer performance is characterized by the amount of useful work accomplished by a computer system compared to the time and resources used.Depending on the context, good computer performance may involve one or more of the following:...

     implications. However, use of character delimiters (especially commas) is also a crude form of data compression
    Data compression
    In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

     which may assist overall performance by reducing data volumes - especially for data transmission
    Data transmission
    Data transmission, digital transmission, or digital communications is the physical transfer of data over a point-to-point or point-to-multipoint communication channel. Examples of such channels are copper wires, optical fibres, wireless communication channels, and storage media...

     purposes. Use of character delimiters which include a length component (Declarative notation) is comparatively rare but vastly reduces the overhead associated with locating the extent of each field.
  • Relational algebra: Each row or record in the above table meets the standard definition of a tuple
    Tuple
    In mathematics and computer science, a tuple is an ordered list of elements. In set theory, an n-tuple is a sequence of n elements, where n is a positive integer. There is also one 0-tuple, an empty sequence. An n-tuple is defined inductively using the construction of an ordered pair...

     under relational algebra
    Relational algebra
    Relational algebra, an offshoot of first-order logic , deals with a set of finitary relations that is closed under certain operators. These operators operate on one or more relations to yield a relation...

     (the above example depicts a series of 3-tuples). Additionally, the first row specifies the field names that are associated with the values of each row.
  • Database management system: Since the formal operations possible with a text file are usually more limited than desired, the text in the above example would ordinarily represent an intermediary state of the data prior to being transferred into a database management system
    Database management system
    A database management system is a software package with computer programs that control the creation, maintenance, and use of a database. It allows organizations to conveniently develop databases for various applications by database administrators and other specialists. A database is an integrated...

    .
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK