Database

Database

Overview
A database is an organized collection of data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...

 for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality (for example, the availability of rooms in hotels), in a way that supports processes requiring this information (for example, finding a hotel with vacancies). This definition is very general, and is independent of the technology used.

The term "database" may be narrowed to specify particular aspects of organized collection of data and may refer to the logical database, to physical database as data content in computer data storage or to many other database sub-definitions.

The term database is correctly applied to the data and their supporting data structures, and not to the database management system
Database management system
A database management system is a software package with computer programs that control the creation, maintenance, and use of a database. It allows organizations to conveniently develop databases for various applications by database administrators and other specialists. A database is an integrated...

 (referred to by the acronym DBMS).
Discussion
Ask a question about 'Database'
Start a new discussion about 'Database'
Answer questions from other users
Full Discussion Forum
 
Recent Discussions
Encyclopedia
A database is an organized collection of data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...

 for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality (for example, the availability of rooms in hotels), in a way that supports processes requiring this information (for example, finding a hotel with vacancies). This definition is very general, and is independent of the technology used.

The term "database" may be narrowed to specify particular aspects of organized collection of data and may refer to the logical database, to physical database as data content in computer data storage or to many other database sub-definitions.

The term database is correctly applied to the data and their supporting data structures, and not to the database management system
Database management system
A database management system is a software package with computer programs that control the creation, maintenance, and use of a database. It allows organizations to conveniently develop databases for various applications by database administrators and other specialists. A database is an integrated...

 (referred to by the acronym DBMS). The database data collection with DBMS is called a database system
Database system
A database system is a term that is typically used to encapsulate the constructs of a data model, database Management system and database....

.

The term database system implies that the data is managed to some level of quality (measured in terms of accuracy, availability, usability, and resilience) and this in turn often implies the use of a general-purpose database management system (DBMS). A general-purpose DBMS is typically a complex software
Computer software
Computer software, or just software, is a collection of computer programs and related data that provide the instructions for telling a computer what to do and how to do it....

 system that meets many usage requirements, and the databases that it maintains are often large and complex. The utilization of databases is now spread to such a wide degree that virtually every technology and product relies on databases and DBMSs for its development and commercialization, or even may have such embedded in it. Also, organizations and companies, from small to large, heavily depend on databases for their operations.

Well known DBMSs include Oracle
Oracle Database
The Oracle Database is an object-relational database management system produced and marketed by Oracle Corporation....

, IBM DB2
IBM DB2
The IBM DB2 Enterprise Server Edition is a relational model database server developed by IBM. It primarily runs on Unix , Linux, IBM i , z/OS and Windows servers. DB2 also powers the different IBM InfoSphere Warehouse editions...

, Microsoft SQL Server
Microsoft SQL Server
Microsoft SQL Server is a relational database server, developed by Microsoft: It is a software product whose primary function is to store and retrieve data as requested by other software applications, be it those on the same computer or those running on another computer across a network...

, MS Access, PostgreSQL
PostgreSQL
PostgreSQL, often simply Postgres, is an object-relational database management system available for many platforms including Linux, FreeBSD, Solaris, MS Windows and Mac OS X. It is released under the PostgreSQL License, which is an MIT-style license, and is thus free and open source software...

, MySQL
MySQL
MySQL officially, but also commonly "My Sequel") is a relational database management system that runs as a server providing multi-user access to a number of databases. It is named after developer Michael Widenius' daughter, My...

, Lotus Domino, and SQLite
SQLite
SQLite is an ACID-compliant embedded relational database management system contained in a relatively small C programming library. The source code for SQLite is in the public domain and implements most of the SQL standard...

. A database is not generally portable
Software portability
Portability in high-level computer programming is the usability of the same software in different environments. The prerequirement for portability is the generalized abstraction between the application logic and system interfaces...

 across different DBMS, but different DBMSs can inter-operate
Interoperation
In engineering, interoperation is the setup of ad hoc components and methods to make two or more systems work together as a combined system with some partial functionality during a certain time, possibly requiring human supervision to perform necessary adjustments and corrections.This contrasts to...

 to some degree by using standards like SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....

 and ODBC to support together a single application. A DBMS also needs to provide effective run-time execution to properly support (e.g., in terms of performance
IT Performance Management
This entry describes performance management in an Information Technology context. See Performance Management for a description of performance management in a more general context....

, availability
Availability
In telecommunications and reliability theory, the term availability has the following meanings:* The degree to which a system, subsystem, or equipment is in a specified operable and committable state at the start of a mission, when the mission is called for at an unknown, i.e., a random, time...

, and security
Security
Security is the degree of protection against danger, damage, loss, and crime. Security as a form of protection are structures and processes that provide or improve security as a condition. The Institute for Security and Open Methodologies in the OSSTMM 3 defines security as "a form of protection...

) as many end-user
End-user
Economics and commerce define an end user as the person who uses a product. The end user or consumer may differ from the person who purchases the product...

s as needed.

The design, construction, and maintenance of a complex database requires specialist skills: the staff performing these functions are referred to as database application programmer
Programmer
A programmer, computer programmer or coder is someone who writes computer software. The term computer programmer can refer to a specialist in one area of computer programming or to a generalist who writes code for many kinds of software. One who practices or professes a formal approach to...

s and database administrator
Database administrator
A database administrator is a person responsible for the design, implementation, maintenance and repair of an organization's database. They are also known by the titles Database Coordinator or Database Programmer, and is closely related to the Database Analyst, Database Modeller, Programmer...

s. Their tasks are supported by tools provided either as part of the DBMS or as stand-alone software products. These tools include specialized database languages including data definition language
Data Definition Language
A data definition language or data description language is a syntax similar to a computer programming language for defining data structures, especially database schemas.-History:...

s (DDL), data manipulation language
Data Manipulation Language
A data manipulation language is a family of syntax elements similar to a computer programming language used for inserting, deleting and updating data in a database...

s (DML), and query language
Query language
Query languages are computer languages used to make queries into databases and information systems.Broadly, query languages can be classified according to whether they are database query languages or information retrieval query languages...

s. These can be seen as special-purpose programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

s, tailored specifically to manipulate databases; sometimes they are provided as extensions of existing programming languages, with added database commands. Database languages are generally specific to one data model
Data model
A data model in software engineering is an abstract model, that documents and organizes the business data for communication between team members and is used as a plan for developing applications, specifically how data is stored and accessed....

, and in many cases they are specific to one DBMS type. The most widely supported database language is SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....

, which has been developed for the relational data model
Relational model
The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...

 and combines the roles of both DDL, DML, and a query language.

A way to classify databases involves the type of their contents, for example: bibliographic, document-text, statistical, or multimedia objects. Another way is by their application area, for example: accounting, music compositions, movies, banking, manufacturing, or insurance.

The database concept


The database concept has evolved since the 1960s to ease increasing difficulties in designing, building, and maintaining complex information system
Information system
An information system - or application landscape - is any combination of information technology and people's activities that support operations, management, and decision making. In a very broad sense, the term information system is frequently used to refer to the interaction between people,...

s (typically with many concurrent end-users, and with a diverse large amount of data). It has evolved together with database management systems which enable the effective handling of databases. Though the terms database and DBMS define different entities, they are inseparable: a database's properties are determined by its supporting DBMS and vice-versa. The Oxford English dictionary
Oxford English Dictionary
The Oxford English Dictionary , published by the Oxford University Press, is the self-styled premier dictionary of the English language. Two fully bound print editions of the OED have been published under its current name, in 1928 and 1989. The first edition was published in twelve volumes , and...

 cites a 1962 technical report as the first to use the term "data-base." With the progress in technology in the areas of processors, computer memory
Computer memory
In computing, memory refers to the physical devices used to store programs or data on a temporary or permanent basis for use in a computer or other digital electronic device. The term primary memory is used for the information in physical systems which are fast In computing, memory refers to the...

, computer storage
Computer storage
Computer data storage, often called storage or memory, refers to computer components and recording media that retain digital data. Data storage is one of the core functions and fundamental components of computers....

. and computer networks, the sizes, capabilities, and performance of databases and their respective DBMSs have grown in orders of magnitudes. For decades it has been unlikely that a complex information system can be built effectively without a proper database supported by a DBMS. The utilization of databases is now spread to such a wide degree that virtually every technology and product relies on databases and DBMSs for its development and commercialization, or even may have such embedded in it. Also, organizations and companies, from small to large, heavily depend on databases for their operations.

No widely accepted exact definition exists for DBMS. However, a system needs to provide considerable functionality to qualify as a DBMS. Accordingly its supported data collection needs to meet respective usability requirements (broadly defined by the requirements below) to qualify as a database. Thus, a database and its supporting DBMS are defined here by a set of general requirements listed below. Virtually all existing mature DBMS products meet these requirements to a great extent, while less mature either meet them or converge to meet them.

Evolution of database and DBMS technology

See also Database management system#History


The introduction of the term database coincided with the availability of direct-access storage (disks and drums) from the mid-1960s onwards. The term represented a contrast with the tape-based systems of the past, allowing shared interactive use rather than daily batch processing.

In the earliest database systems, efficiency was perhaps the primary concern, but it was already recognized that there were other important objectives. One of the key aims was to make the data independent of the logic of application programs, so that the same data could be made available to different applications.

The first generation of database systems were navigational
Navigational database
A navigational database is a type of database characterized by the fact that objects in it are found primarily by following references from other objects...

, applications typically accessed data by following pointers from one record to another. The two main data models at this time were the hierarchical model, epitomized by IBM's IMS system, and the Codasyl
CODASYL
CODASYL is an acronym for "Conference on Data Systems Languages". This was a consortium formed in 1959 to guide the development of a standard programming language that could be used on many computers...

 model (Network model
Network model
The network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, is not restricted to being a hierarchy or lattice.The...

), implemented in a number of products such as IDMS
IDMS
IDMS is primarily a network database management system for mainframes. It was first developed at B.F. Goodrich and later marketed by Cullinane Database Systems...

.

The Relational model
Relational model
The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...

, first proposed in 1970 by Edgar F. Codd
Edgar F. Codd
Edgar Frank "Ted" Codd was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases...

, departed from this tradition by insisting that applications should search for data by content, rather than by following links. This was considered necessary to allow the content of the database to evolve without constant rewriting of applications. Relational systems placed heavy demands on processing resources, and it was not until the mid 1980s that computing hardware became powerful enough to allow them to be widely deployed. By the early 1990s, however, relational systems were dominant for all large-scale data processing applications, and they remain dominant today (2011) except in niche areas. The dominant database language is the standard SQL for the Relational model, which has influenced database languages also for other data models.

Because the relational model emphasizes search rather than navigation, it does not make relationships between different entities explicit in the form of pointers, but represents them rather using primary keys and foreign keys. While this is a good basis for a query language, it is less well suited as a modeling language. For this reason a different model, the Entity-relationship model
Entity-relationship model
In software engineering, an entity-relationship model is an abstract and conceptual representation of data. Entity-relationship modeling is a database modeling method, used to produce a type of conceptual schema or semantic data model of a system, often a relational database, and its requirements...

 which emerged shortly later (1976), gained popularity for database design
Database design
Database design is the process of producing a detailed data model of a database. This logical data model contains all the needed logical and physical design choices and physical storage parameters needed to generate a design in a Data Definition Language, which can then be used to create a database...

.

In the period since the 1970s database technology has kept pace with the increasing resources becoming available from the computing platform: notably the rapid increase in the capacity and speed (and reduction in price) of disk storage, and the increasing capacity of main memory. This has enabled ever larger databases and higher throughputs to be achieved.

The rigidity of the relational model, in which all data is held in tables with a fixed structure of rows and columns, has increasingly been seen as a limitation when handling information that is richer or more varied in structure than the traditional 'ledger-book' data of corporate information systems: for example, document databases, engineering databases, multimedia databases, or databases used in the molecular sciences. Various attempts have been made to address this problem, many of them gathering under banners such as post-relational or NoSQL. Two developments of note are the Object database
Object database
An object database is a database management system in which information is represented in the form of objects as used in object-oriented programming...

 and the XML database
XML database
An XML database is a data persistence software system that allows data to be stored in XML format. This data can then be queried, exported and serialized into the desired format.Two major classes of XML database exist:...

. The vendors of relational databases have fought off competition from these newer models by extending the capabilities of their own products to support a wider variety of data types.

General-purpose DBMS


A DBMS has evolved into a complex software system and its development typically requires thousands of person-years of development effort. Some general-purpose DBMSs, like Oracle, Microsoft SQL server
Microsoft SQL Server
Microsoft SQL Server is a relational database server, developed by Microsoft: It is a software product whose primary function is to store and retrieve data as requested by other software applications, be it those on the same computer or those running on another computer across a network...

, and IBM DB2, have been in on-going development and enhancement for thirty years or more. General-purpose DBMSs aim to satisfy as many applications as possible, which typically makes them even more complex than special-purpose databases. However, the fact that they can be used "off the shelf", as well as their amortized cost over many applications and instances, makes them an attractive alternative (Vs. one-time development) whenever they meet an application's requirements.

Though attractive in many cases, a general-purpose DBMS is not always the optimal solution: When certain applications are pervasive with many operating instances, each with many users, a general-purpose DBMS may introduce unnecessary overhead and too large "footprint" (too large amount of unnecessary, unutilized software code). Such applications usually justify dedicated development. Typical examples are email
Email
Electronic mail, commonly known as email or e-mail, is a method of exchanging digital messages from an author to one or more recipients. Modern email operates across the Internet or other computer networks. Some early email systems required that the author and the recipient both be online at the...

 systems, though they need to possess certain DBMS properties: email systems are built in a way that optimizes email messages handling and managing, and do not need significant portions of a general-purpose DBMS functionality.
Types of people involved

Three types of people are involved with a general-purpose DBMS:
  1. DBMS developers - These are the people that design and build the DBMS product, and the only ones who touch its code. They are typically the employees of a DBMS vendor (e.g., Oracle
    Oracle Corporation
    Oracle Corporation is an American multinational computer technology corporation that specializes in developing and marketing hardware systems and enterprise software products – particularly database management systems...

    , IBM
    IBM
    International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

    , Microsoft
    Microsoft
    Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...

    , Sybase
    Sybase
    Sybase, an SAP company, is an enterprise software and services company offering software to manage, analyze, and mobilize information, using relational databases, analytics and data warehousing solutions and mobile applications development platforms....

    ), or, in the case of Open source
    Open source
    The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...

     DBMSs (e.g., MySQL), volunteers or people supported by interested companies and organizations. They are typically skilled systems programmers. DBMS development is a complicated task, and some of the popular DBMSs have been under development and enhancement (also to follow progress in technology) for decades.
  2. Application developers
    Computer programming
    Computer programming is the process of designing, writing, testing, debugging, and maintaining the source code of computer programs. This source code is written in one or more programming languages. The purpose of programming is to create a program that performs specific operations or exhibits a...

    and Database administrator
    Database administrator
    A database administrator is a person responsible for the design, implementation, maintenance and repair of an organization's database. They are also known by the titles Database Coordinator or Database Programmer, and is closely related to the Database Analyst, Database Modeller, Programmer...

    s
    - These are the people that design and build a database-based application that uses the DBMS. The latter group members design the needed database and maintain it. The first group members write the needed application programs which the application comprises. Both are well familiar with the DBMS product and use its user interfaces (as well as usually other tools) for their work. Sometimes the application itself is packaged and sold as a separate product, which may include the DBMS inside (see Embedded database
    Embedded Database
    An embedded database system is a database management system which is tightly integrated with an application software that requires access to stored data, such that the database system is “hidden” from the application’s end-user and requires little or no ongoing maintenance...

    ; subject to proper DBMS licensing), or sold separately as an add-on to the DBMS.
  3. Application's end-users (e.g., accountants, insurance people, medical doctors, etc.) - These people know the application and its end-user interfaces, but need not know nor understand the underlying DBMS. Thus, though they are the intended and main beneficiaries of a DBMS, they are only indirectly involved with it.

Database machines and appliances


In the 1970s and 1980s attempts were made to build database systems with integrated hardware and software. The underlying philosophy was that such integration would provide higher performance at lower cost. Examples were IBM System/38
System/38
The System/38 was a midrange computer server platform manufactured and sold by the IBM Corporation. The system offered a number of innovative features, and was the brainchild of IBM engineer Dr. Frank Soltis...

, the early offering of Teradata
Teradata
Teradata Corporation is a vendor specializing in data warehousing and analytic applications. Its products are commonly used by companies to manage data warehouses for analytics and business intelligence purposes. Teradata was formerly a division of NCR Corporation, with the spinoff from NCR on...

, and the Britton Lee, Inc.
Britton Lee, Inc.
Britton Lee Inc. was a pioneering relational database company. Renamed ShareBase, it was acquired by Teradata in June, 1990.-History:Britton Lee was founded in 1979 by David L. Britton, Geoffrey M...

 database machine.
Another approach to hardware support for database management was ICL's CAFS
Content Addressable File Store
The Content Addressable File Store was a hardware device developed by International Computers Limited that provided a disk storage with built-in search capability...

 accelerator, a hardware disk controller with programmable search capabilities.
In the long term these efforts were generally unsuccessful because specialized database machines could not keep pace with the rapid development and progress of general-purpose computers. Thus most database systems nowadays are software systems running on general-purpose hardware, using general-purpose computer data storage. However this idea is still pursued for certain applications by some companies like Netezza
Netezza
Netezza designs and markets high-performance data warehouse appliances and advanced analytics applications for uses including enterprise data warehousing, business intelligence, predictive analytics and business continuity planning....

 and Oracle
Oracle
In Classical Antiquity, an oracle was a person or agency considered to be a source of wise counsel or prophetic predictions or precognition of the future, inspired by the gods. As such it is a form of divination....

 (Exadata).

Database research


Database research has been an active and diverse area, with many specializations, carried out since the early days of dealing with the database concept in the 1960s. It has strong ties with database technology and DBMS products. Database research has taken place at research and development groups of companies (e.g., notably at IBM Research
IBM Research
IBM Research, a division of IBM, is a research and advanced development organization and currently consists of eight locations throughout the world and hundreds of projects....

, who contributed technologies and ideas virtually to any DBMS existing today), research institute
Research institute
A research institute is an establishment endowed for doing research. Research institutes may specialize in basic research or may be oriented to applied research...

s, and Academia
Academia
Academia is the community of students and scholars engaged in higher education and research.-Etymology:The word comes from the akademeia in ancient Greece. Outside the city walls of Athens, the gymnasium was made famous by Plato as a center of learning...

. Research has been done both through Theory
Database theory
Database theory encapsulates a broad range of topics related to the study and research of the theoretical realm of databases and database management systems....

 and Prototype
Prototype
A prototype is an early sample or model built to test a concept or process or to act as a thing to be replicated or learned from.The word prototype derives from the Greek πρωτότυπον , "primitive form", neutral of πρωτότυπος , "original, primitive", from πρῶτος , "first" and τύπος ,...

s. The interaction between research and database related product development has been very productive to the database area, and many related key concepts and technologies emerged from it. Notable are the Relational and the Entity-relationship models
Data model
A data model in software engineering is an abstract model, that documents and organizes the business data for communication between team members and is used as a plan for developing applications, specifically how data is stored and accessed....

, the Atomic transaction
Database transaction
A transaction comprises a unit of work performed within a database management system against a database, and treated in a coherent and reliable way independent of other transactions...

 concept and related Concurrency control
Concurrency control
In information technology and computer science, especially in the fields of computer programming , operating systems , multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible.Computer...

 techniques, Query languages and Query optimization
Query optimization
Query optimization is a function of many relational database management systems in which multiple query plans for satisfying a query are examined and a good query plan is identified. This may or not be the absolute best strategy because there are many ways of doing plans. There is a trade-off...

 methods, RAID
RAID
RAID is a storage technology that combines multiple disk drive components into a logical unit...

, and more. Research has provided deep insight
Insight
Insight is the understanding of a specific cause and effect in a specific context. Insight can be used with several related meanings:*a piece of information...

 to virtually all aspects of databases, though not always has been pragmatic, effective (and cannot and should not always be: research is exploratory in nature, and not always leads to accepted or useful ideas). Ultimately market forces and real needs determine the selection of problem solutions and related technologies, also among those proposed by research. However, occasionally, not the best and most elegant
Elegance
Elegance is a synonym for beauty that has come to acquire the additional connotations of unusual effectiveness and simplicity. It is frequently used as a standard of tastefulness particularly in the areas of visual design, decoration, the sciences, and the esthetics of mathematics...

 solution wins (e.g., SQL). Along their history DBMSs and respective databases, to a great extent, have been the outcome of such research, while real product requirements and challenges triggered database research directions and sub-areas.

The database research area has several notable dedicated academic journal
Academic journal
An academic journal is a peer-reviewed periodical in which scholarship relating to a particular academic discipline is published. Academic journals serve as forums for the introduction and presentation for scrutiny of new research, and the critique of existing research...

s (e.g., ACM Transactions on Database Systems
ACM Transactions on Database Systems
The ACM Transactions on Database Systems is one of the journals produced by the Association for Computing Machinery. TODS publishes one volume yearly. Each volume has four issues, which appear in March, June, September and December....

-TODS, Data and Knowledge Engineering
Data and Knowledge Engineering
Data & Knowledge Engineering is a journal in database systems and knowledgebase systems. It is published by Elsevier. It was founded in 1985, and is held in over 250 academic libraries.The editor-in-chief is P.P. Chen Data & Knowledge Engineering (DKE) is a journal in database systems and...

-DKE, and more) and annual conference
Academic conference
An academic conference or symposium is a conference for researchers to present and discuss their work. Together with academic or scientific journals, conferences provide an important channel for exchange of information between researchers.-Overview:Conferences are usually composed of various...

s (e.g., ACM
Association for Computing Machinery
The Association for Computing Machinery is a learned society for computing. It was founded in 1947 as the world's first scientific and educational computing society. Its membership is more than 92,000 as of 2009...

 SIGMOD
SIGMOD
SIGMOD is the Association for Computing Machinery's Special Interest Group on Management of Data, which specializes in large-scale data management problems and databases....

, ACM PODS
Symposium on Principles of Database Systems
The ACM Symposium on Principles of Database Systems is the premier international research conference on database theory, and has been held yearly since 1982. It is sponsored by three Association for Computing Machinery SIGs, SIGART, SIGACT, and SIGMOD...

, VLDB
VLDB
VLDB is an annual conference held by the non-profit Very Large Data Base Endowment Inc.. The mission of VLDB is to promote and exchange scholarly work in databases and related fields throughout the world...

, IEEE ICDE, and more), as well as an active and quite heterogeneous (subject-wise) research community all over the world.

Database type examples


The following are examples of various database types. Some of them are not main-stream types, but most of them have received special attention (e.g., in research) due to end-user requirements. Some exist as specialized DBMS products, and some have their functionality types incorporated in existing general-purpose DBMSs.
An active database is a database that includes an event-driven architecture which can respond to conditions both inside and outside the database. Possible uses include security monitoring, alerting, statistics gathering and authorization.

Most modern relational databases include active database features in the form of database trigger
Database trigger
A database trigger is procedural code that is automatically executed in response to certain events on a particular table or view in a database. The trigger is mostly used for keeping the integrity of the information on the database...

.
A Cloud database is a database that relies on cloud technology
Cloud computing
Cloud computing is the delivery of computing as a service rather than a product, whereby shared resources, software, and information are provided to computers and other devices as a utility over a network ....

. Both the database and most of its DBMS reside remotely, "in the cloud," while its applications are both developed by programmers and later maintained and utilized by (application's) end-users through a Web browser
Web browser
A web browser is a software application for retrieving, presenting, and traversing information resources on the World Wide Web. An information resource is identified by a Uniform Resource Identifier and may be a web page, image, video, or other piece of content...

 and Open API
Open API
Open API is a word used to describe sets of technologies that enable websites to interact with each other by using REST, SOAP, JavaScript and other web technologies...

s. More and more such database products are emerging, both of new vendors and by virtually all established database vendors.
Data warehouses archive data from operational databases and often from external sources such as market research firms. Often operational data undergoes transformation on its way into the warehouse, getting summarized, anonymized, reclassified, etc. The warehouse becomes the central source of data for use by managers and other end-users who may not have access to operational data. For example, sales data might be aggregated to weekly totals and converted from internal product codes to use UPC
Universal Product Code
The Universal Product Code is a barcode symbology , that is widely used in North America, and in countries including the UK, Australia, and New Zealand for tracking trade items in stores. Its most common form, the UPC-A, consists of 12 numerical digits, which are uniquely assigned to each trade item...

s so that it can be compared with ACNielsen
ACNielsen
ACNielsen is a global marketing research firm, with worldwide headquarters in New York City. Regional headquarters for North America are located in Schaumburg, Illinois. As of May 2010, it is part of The Nielsen Company.-History:...

 data. Some basic and essential components of data warehousing include retrieving, analyzing, and mining
Data mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...

 data, transforming,loading and managing data so as to make it available for further use.

Operations in a data warehouse are typically concerned with bulk data manipulation, and as such, it is unusual and inefficient to target individual rows for update, insert or delete. Bulk native loaders for input data and bulk SQL passes for aggregation are the norm.
The definition of a distributed database is broad, and may be utilized in different meanings. In general it typically refers to a modular DBMS architecture that allows distinct DBMS instances to cooperate as a single DBMS over processes, computers, and sites, while managing a single database distributed itself over multiple computers, and different sites.

Examples are databases of local work-groups and departments at regional offices, branch offices, manufacturing plants and other work sites. These databases can include both segments shared by multiple sites, and segments specific to one site and used only locally in that site.
Utilized to conveniently store, manage, edit and retrieve documents.
An embedded database system is a DBMS which is tightly integrated with an application software
Application software
Application software, also known as an application or an "app", is computer software designed to help the user to perform specific tasks. Examples include enterprise software, accounting software, office suites, graphics software and media players. Many application programs deal principally with...

 that requires access to stored data in a way that the DBMS is “hidden” from the application’s end-user and requires little or no ongoing maintenance. It is actually a broad technology category that includes DBMSs with differing properties and target markets. The term "embedded database" can be confusing because only a small subset of embedded database products is used in real-time
Real-time computing
In computer science, real-time computing , or reactive computing, is the study of hardware and software systems that are subject to a "real-time constraint"— e.g. operational deadlines from event to system response. Real-time programs must guarantee response within strict time constraints...

 embedded systems such as telecommunications switches and consumer electronics
Consumer electronics
Consumer electronics are electronic equipment intended for everyday use, most often in entertainment, communications and office productivity. Radio broadcasting in the early 20th century brought the first major consumer product, the broadcast receiver...

 devices.
These databases consist of data developed by individual end-users. Examples of these are collections of documents, spreadsheets, presentations, multimedia, and other files. Several products exist to support such databases. Some of them are much simpler than full fledged DBMSs, with more elementary DBMS functionality (e.g., not supporting multiple concurrent end-users on a same database), with basic programming interfaces, and a relatively small "foot-print" (not much code to run as in "regular" general-purpose databases). However, also available general-purpose DBMSs can often be used for such purpose, if they provide basic user-interfaces for straightforward database applications (limited query and data display; no real programming needed), while still enjoying the database qualities and protections that these DBMSs can provide.
A federated database is an integrated database that comprises several distinct databases, each with its own DBMS. It is handled as a single database by a federated database management system
Federated database system
A federated database system is a type of meta-database management system , which transparently integrates multiple autonomous database systems into a single federated database. The constituent databases are interconnected via a computer network and may be geographically decentralized...

 (FDBMS), which transparently integrates multiple autonomous DBMSs, possibly of different types (which makes it a heterogeneous database
Heterogeneous Database System
A Heterogeneous Database System is an automated system for the integration of heterogeneous, disparate database management systems to present a user with a single, unified query interface....

), and provides them with an integrated conceptual view. The constituent databases are interconnected via computer network
Computer network
A computer network, often simply referred to as a network, is a collection of hardware components and computers interconnected by communication channels that allow sharing of resources and information....

, and may be geographically decentralized.

Sometime the term multi-database is used as a synonym to federated database, though it may refer to a less integrated (e.g., without an FDBMS and a managed integrated schema) group of databases that cooperate in a single application. In this case typically middleware
Middleware
Middleware is computer software that connects software components or people and their applications. The software consists of a set of services that allows multiple processes running on one or more machines to interact...

 for distribution is used which typically includes an atomic commit protocol (ACP), e.g., the two-phase commit protocol
Two-phase commit protocol
In transaction processing, databases, and computer networking, the two-phase commit protocol is a type of atomic commitment protocol . It is a distributed algorithm that coordinates all the processes that participate in a distributed atomic transaction on whether to commit or abort the transaction...

, to allow distributed (global) transactions
Distributed transaction
A distributed transaction is an operations bundle, in which two or more network hosts are involved. Usually, hosts provide transactional resources, while the transaction manager is responsible for creating and managing a global transaction that encompasses all operations against such resources...

 (vs. local transactions confined to a single DBMS) across the participating databases.
A graph database is a kind of NoSQL database that uses graph structures
Graph (data structure)
In computer science, a graph is an abstract data structure that is meant to implement the graph and hypergraph concepts from mathematics.A graph data structure consists of a finite set of ordered pairs, called edges or arcs, of certain entities called nodes or vertices...

 with nodes, edges, and properties to represent and store information. General graph databases that can store any graph are distinct from specialized graph databases such as triplestore
Triplestore
A triplestore is a purpose-built database for the storage and retrieval of Resource Description Framework metadata.Much like a relational database, one stores information in a triplestore and retrieves it via a query language...

s and network databases.
The World Wide Web
World Wide Web
The World Wide Web is a system of interlinked hypertext documents accessed via the Internet...

 can be thought of as a database, albeit one spread across millions of independent computing systems. Web browsers "process" this data one page at a time, while Web crawlers and other software provide the equivalent of database indexes to support search and other activities.
An in-memory database (IMDB; also main memory database or MMDB) is a database that primarily resides in main memory, but typically backed-up by non-volatile computer data storage. Main memory databases are faster than disk databases. Accessing data in memory reduces the I/O reading activity when, for example, querying the data. In applications where response time is critical, such as telecommunications network equipment, main memory databases are often used.
A knowledge base (abbreviated KB, kb or Δ) is a special kind of database for knowledge management
Knowledge management
Knowledge management comprises a range of strategies and practices used in an organization to identify, create, represent, distribute, and enable adoption of insights and experiences...

, providing the means for the computerized collection, organization, and retrieval
Information retrieval
Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...

 of knowledge
Knowledge
Knowledge is a familiarity with someone or something unknown, which can include information, facts, descriptions, or skills acquired through experience or education. It can refer to the theoretical or practical understanding of a subject...

. Also a collection of data representing problems with their solutions and related experiences.
These databases store detailed data about the operations of an organization. They are typically organized by subject matter, process relatively high volumes of updates using transactions. Essentially every major organization on earth uses such databases. Examples include customer databases
Customer relationship management
Customer relationship management is a widely implemented strategy for managing a company’s interactions with customers, clients and sales prospects. It involves using technology to organize, automate, and synchronize business processes—principally sales activities, but also those for marketing,...

 that record contact, credit, and demographic information about a business' customers, personnel databases that hold information such as salary, benefits, skills data about employees, Enterprise resource planning that record details about product components, parts inventory, and financial databases that keep track of the organization's money, accounting and financial dealings.
A parallel database, run by a parallel DBMS, seeks to improve performance through parallelization
Parallel computing
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...

 for tasks such as loading data, building indexes and evaluating queries. Parallel databases improve processing and input/output
Input/output
In computing, input/output, or I/O, refers to the communication between an information processing system , and the outside world, possibly a human, or another information processing system. Inputs are the signals or data received by the system, and outputs are the signals or data sent from it...

 speeds by using multiple central processing unit
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...

s (CPUs) (including multi-core processors) and storage
Data storage
Data storage can refer to:* Computer data storage; memory, components, devices and media that retain digital computer data used for computing for some interval of time....

 in parallel. In parallel processing, many operations are performed simultaneously, as opposed to serial, sequential processing, where operations are performed with no time overlap.

The major parallel DBMS architectures (which are induced by the underlying hardware
Computer hardware
Personal computer hardware are component devices which are typically installed into or peripheral to a computer case to create a personal computer upon which system software is installed including a firmware interface such as a BIOS and an operating system which supports application software that...

 architecture are:
  • Shared memory architecture, where multiple processors share the main memory space, as well as other data storage.
  • Shared disk architecture, where each processing unit (typically consisting of multiple processors) has its own main memory, but all units share the other storage.
  • Shared nothing architecture
    Shared nothing architecture
    A shared nothing architecture is a distributed computing architecture in which each node is independent and self-sufficient, and there is no single point of contention across the system...

    , where each processing unit has its own main memory and other storage.

An unstructured-data database is intended to store in a manageable and protected way diverse objects that do not fit naturally and conveniently in common databases. It may include email messages, documents, journals, multimedia objects etc. The name may be misleading since some objects can be highly structured. However, the entire possible object collection does not fit into a predefined structured framework. Most established DBMSs now support unstructured data in various ways, and new dedicated DBMSs are emerging.

Major database usage requirements


The major purpose of a database is to provide the information system (in its broadest sense) that utilizes it with the information the system needs according to its own requirements. A certain broad set of requirements refines this general goal. These database requirements translate to requirements for the respective DBMS, to allow conveniently building a proper database for the given application. If this goal is met by a DBMS, then the designers and builders of the specific database can concentrate on the application's aspects, and not deal with building and maintaining the underlying DBMS. Also, since a DBMS is complex and expensive to build and maintain, it is not economical to build such a new tool (DBMS) for every application. Rather it is desired to provide a flexible tool for handling databases for as many as possible given applications, i.e., a general-purpose DBMS.

Functional requirements


Certain general functional requirements need to be met in conjunction with a database. They describe what is needed to be defined in a database for any specific application.
The database needs to be based on a data model that is sufficiently rich to describe in the database all the needed respective application's aspects. A data definition language exists to describe the databases within the data model. Such language is typically data model specific.
A database data model needs support by a sufficiently rich data manipulation language to allow all database manipulations and information generation (from the data) as needed by the respective application. Such language is typically data model specific.
The DB needs built-in security means to protect its content (and users) from dangers of unauthorized users (either human
Human
Humans are the only living species in the Homo genus...

s or programs
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...

). Protection is also provided from types of unintentional breach. Security types and levels should be defined by the database owners.
Manipulating database data often involves processes of several interdependent steps, at different times (e.g., when different people's interactions are involved; e.g., generating an insurance policy). Data manipulation languages are typically intended to describe what is needed in a single such step. Dealing with multiple steps typically requires writing quite complex programs. Most applications are programmed using common programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

s and software development tools. However the area of process description has evolved in the frameworks of workflow
Workflow
A workflow consists of a sequence of connected steps. It is a depiction of a sequence of operations, declared as work of a person, a group of persons, an organization of staff, or one or more simple or complex mechanisms. Workflow may be seen as any abstraction of real work...

and business processes
Business process modeling
Business Process Modeling in systems engineering is the activity of representing processes of an enterprise, so that the current process may be analyzed and improved. BPM is typically performed by business analysts and managers who are seeking to improve process efficiency and quality...

with supporting languages and software packages which considerably simplify the tasks. Traditionally these frameworks have been out of the scope of common DBMSs, but utilization of them has become common-place, and often they are provided as add-on's to DBMSs.
Operational requirements are needed to be met by a database in order to effectively support an application when operational. Though it typically may be expected that operational requirements are automatically met by a DBMS, in fact it is not so in most of the cases: To be met substantial work of design and tuning is typically needed by database administrators. This is typically done by specific instructions/operations through special database user interfaces and tools, and thus may be viewed as secondary functional requirements (which are not less important than the primary).

Availability

A DB should maintain needed levels of availability, i.e., the DB needs to be available in a way that a user's action does not need to wait beyond a certain time range before starting executing upon the DB. Availability also relates to failure and recovery from it (see Recovery from failure and disaster below): Upon failure and during recovery normal availability changes, and special measures are needed to satisfy availability requirements.

Performance

Users' actions upon the DB should be executed within needed time ranges.

Isolation between users

When multiple users access the database concurrently the actions of a user should be uninterrupted and unaffected by actions of other users. These concurrent actions should maintain the DB's consistency (i.e., keep the DB from corruption).
All computer systems, including DBMSs, are prone to failures for many reasons (both software and hardware related). Failures typically corrupt the DB, typically to the extent that it is impossible to repair it without special measures. The DBMS should provide automatic recovery
Data recovery
Data recovery is the process of salvaging data from damaged, failed, corrupted, or inaccessible secondary storage media when it cannot be accessed normally. Often the data are being salvaged from storage media such as internal or external hard disk drives, solid-state drives , USB flash drive,...

 from failure procedures that repair the DB and return it to a well defined state.

A different type of failure is due to a disaster, either by Nature (e.g., Earthquake
Earthquake
An earthquake is the result of a sudden release of energy in the Earth's crust that creates seismic waves. The seismicity, seismism or seismic activity of an area refers to the frequency, type and size of earthquakes experienced over a period of time...

, Flood
Flood
A flood is an overflow of an expanse of water that submerges land. The EU Floods directive defines a flood as a temporary covering by water of land not normally covered by water...

, Tornado
Tornado
A tornado is a violent, dangerous, rotating column of air that is in contact with both the surface of the earth and a cumulonimbus cloud or, in rare cases, the base of a cumulus cloud. They are often referred to as a twister or a cyclone, although the word cyclone is used in meteorology in a wider...

) or by Man (e.g., intentional physical systems' sabotage, destructive acts of war
War
War is a state of organized, armed, and often prolonged conflict carried on between states, nations, or other parties typified by extreme aggression, social disruption, and usually high mortality. War should be understood as an actual, intentional and widespread armed conflict between political...

). Recovery from disasters (Disaster recovery
Disaster recovery
Disaster recovery is the process, policies and procedures related to preparing for recovery or continuation of technology infrastructure critical to an organization after a natural or human-induced disaster. Disaster recovery is a subset of business continuity...

), which typically incapacitate whole computer systems beyond repair (and different from software failure or hardware component failure) requires special protecting means.
Sometimes it is desired to bring a database back to a previous state (for many reasons, e.g., cases when the database is found corrupted due to a software error, or if it has been updated with erroneous data). To achieve this a backup operation is done occasionally or continuously, where each desired database state (i.e., the values of its data and their embedding in database's data structures) is kept within dedicated backup files (many techniques exist to do this effectively). When this state is needed, i.e., when it is decided by a database administrator to bring the database back to this state (e.g., by specifying this state by a desired point in time when the database was in this state), these files are utilized to restore that state.
Data independence pertains to a database's life cycle
Systems Development Life Cycle
The systems development life cycle , or software development life cycle in systems engineering, information systems and software engineering, is a process of creating or altering information systems, and the models and methodologies that people use to develop these systems.In software engineering...

 (see Database building, maintaining, and tuning below). It strongly impacts the convenience and cost of maintaining an application and its database, and has been the major motivation for the emergence and success of the Relational model, as well as the convergence to a common database architecture. In general the term "data independence" means that changes in the database's structure do not require changes in its application's computer programs, and that changes in the database at a certain architectural level (see below) do not affect the database's levels above. Data independence is achieved to a great extent in contemporary DBMS, but it is not completely attainable, and achieved at different degrees for different types of database structural changes.
The functional areas are domains and subjects that have evolved in order to provide proper answers and solutions to the functional requirements above.
A data model is an abstract structure that provides the means to effectively describe specific data structures needed to model an application. As such a data model needs sufficient expressive power to capture the needed aspects of applications. These applications are often typical to commercial companies and other organizations (like manufacturing, human-resources, stock, banking, etc.). For effective utilization and handling it is desired that a data model is relatively simple and intuitive. This may be in conflict with high expressive power needed to deal with certain complex applications. Thus any popular general-purpose data model usually well balances between being intuitive and relatively simple, and very complex with high expressive power. The application's semantics is usually not explicitly expressed in the model, but rather implicit (and detailed by documentation external to the model) and hinted to by data item types' names (e.g., "part-number") and their connections (as expressed by generic data structure types provided by each specific model).

Early data models


These models were popular in the 1960s, 1970s, but nowadays can be found primarily in old legacy system
Legacy system
A legacy system is an old method, technology, computer system, or application program that continues to be used, typically because it still functions for the users' needs, even though newer technology or more efficient methods of performing a task are now available...

s. They are characterized primarily by being navigational
Navigational database
A navigational database is a type of database characterized by the fact that objects in it are found primarily by following references from other objects...

 with strong connections between their logical and physical representations, and deficiencies in data independence
Data independence
Data independence is the type of data transparency that matters for a centralized DBMS. It refers to the immunity of user applications to make changes in the definition and organization of data....

.
In the Hierarchical model different record types (representing real-world entities) are embedded in a predefined hierarchical (tree
Tree (data structure)
In computer science, a tree is a widely-used data structure that emulates a hierarchical tree structure with a set of linked nodes.Mathematically, it is an ordered directed tree, more specifically an arborescence: an acyclic connected graph where each node has zero or more children nodes and at...

-like) structure. This hierarchy is used as the physical order of records in storage. Record access is done by navigating through the data structure using pointers combined with sequential accessing.

This model has been supported primarily by the IBM IMS
Information Management System
IBM Information Management System is a joint hierarchical database and information management system with extensive transaction processing capabilities.- History :...

 DBMS, one of the earliest DBMSs. Various limitations of the model have been compensated at later IMS versions by additional logical hierarchies imposed on the base physical hierarchy.
In this model a hierarchical relationship between two record types (representing real-world entities) is established via the set construct. A set consists of circular linked list
Linked list
In computer science, a linked list is a data structure consisting of a group of nodes which together represent a sequence. Under the simplest form, each node is composed of a datum and a reference to the next node in the sequence; more complex variants add additional links...

s where one record type, the set owner or parent, appears once in each circle, and a second record type, the subordinate or child, may appear multiple times in each circle. In this way a hierarchy may be established between any two record types, e.g., type A is the owner of B. At the same time another set may be defined where B is the owner of A. Thus all the sets comprise a general directed graph
Directed graph
A directed graph or digraph is a pair G= of:* a set V, whose elements are called vertices or nodes,...

 (ownership defines a direction), or network construct. Access to records is either sequential (usually in each record type) or by navigation in the circular linked lists.

This model is more general and powerful than the hierarchical, and has been the most popular before being replaced by the Relational model. It has been standardized
Standardization
Standardization is the process of developing and implementing technical standards.The goals of standardization can be to help with independence of single suppliers , compatibility, interoperability, safety, repeatability, or quality....

 by CODASYL
CODASYL
CODASYL is an acronym for "Conference on Data Systems Languages". This was a consortium formed in 1959 to guide the development of a standard programming language that could be used on many computers...

. Popular DBMS products that utilized it were Cincom Systems' Total and Cullinet
Cullinet
Cullinet was a software company whose products included the database management system IDMS and the integrated software package Goldengate. In 1989, the company was bought by Computer Associates...

's IDMS
IDMS
IDMS is primarily a network database management system for mainframes. It was first developed at B.F. Goodrich and later marketed by Cullinane Database Systems...

.
An inverted file or inverted index
Inverted index
In computer science, an inverted index is an index data structure storing a mapping from content, such as words or numbers, to its locations in a database file, or in a document or a set of documents...

of a first file, by a field in this file (the inversion field), is a second file in which this field is the key. A record in the second file includes a key and pointers to records in the first file where the inversion field has the value of the key. This is also the logical structure of contemporary database indexes
Index (database)
A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of slower writes and increased storage space...

. The related Inverted file data model utilizes inverted files of primary database files to efficiently directly access needed records in these files.

Notable for using this data model is the ADABAS
Adabas
ADABAS is Software AG’s primary database management system.- History :First released in 1970, ADABAS is considered by some to have been one of the earliest commercially available database products...

 DBMS of Software AG
Software AG
Founded in 1969, Software AG is an enterprise software company with over 10,000 enterprise customers in over 70 countries. The company is the second largest software vendor in Germany, the fourth in Europe and among the top 25 globally...

, introduced in 1970. ADABAS has gained considerable customer base and exists and supported until today. In the 1980s it has adopted the Relational model and SQL in addition to its original tools and languages.
In , the object-oriented paradigm has been applied in areas such as engineering and spatial databases, telecommunications and in various scientific domains. The conglomeration of object oriented programming and database technology led to this new kind of database. These databases attempt to bring the database world and the application-programming world closer together, in particular by ensuring that the database uses the same type system
Type system
A type system associates a type with each computed value. By examining the flow of these values, a type system attempts to ensure or prove that no type errors can occur...

 as the application program. This aims to avoid the overhead (sometimes referred to as the impedance mismatch
Object-Relational impedance mismatch
The object-relational impedance mismatch is a set of conceptual and technical difficulties that are often encountered when a relational database management system is being used by a program written in an object-oriented programming language or style; particularly when objects or class definitions...

) of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects). At the same time, object databases attempt to introduce key ideas of object programming, such as encapsulation and polymorphism, into the world of databases.

A variety of these ways have been tried for storing objects in a database. Some products have approached the problem from the application-programming side, by making the objects manipulated by the program persistent
Persistence (computer science)
Persistence in computer science refers to the characteristic of state that outlives the process that created it. Without this capability, state would only exist in RAM, and would be lost when this RAM loses power, such as a computer shutdown....

. This also typically requires the addition of some kind of query language, since conventional programming languages do not provide language-level functionality for finding objects based on their information content. Others have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabilities as well as traditional query facilities.
Products offering a more general data model than the relational model are sometimes classified as post-relational. Alternate terms include "hybrid database", "Object-enhanced RDBMS" and others. The data model in such products incorporates relations
Relation (database)
In relational model:A relation value, which is assigned to a certain relation variable, is time-varying. By using a Data Definition Language , it is able to define relation variables.The following is an example of a heading which consists of three attributes....

 but is not constrained by E.F. Codd's Information Principle, which requires that

Some of these extensions to the relational model integrate concepts from technologies that pre-date the relational model. For example, they allow representation of a directed graph with trees on the nodes. The German company sones implements this concept in its GraphDB.

Some post-relational products extend relational systems with non-relational features. Others arrived in much the same place by adding relational features to pre-relational systems. Paradoxically, this allows products that are historically pre-relational, such as PICK
Pick operating system
The Pick operating system is a demand-paged, multiuser, virtual memory, time-sharing operating system based around a unique "multivalued" database. Pick is used primarily for business data processing...

 and MUMPS
MUMPS
MUMPS , or alternatively M, is a programming language created in the late 1960s, originally for use in the healthcare industry. It was designed for the production of multi-user database-driven applications...

, to make a plausible claim to be post-relational.

The resource space model (RSM) is a non-relational data model based on multi-dimensional classification.

Database languages


Database languages are dedicated programming languages, tailored and utilized to
  • define a database (i.e., its specific data types and the relationships among them),
  • manipulate its content (e.g., insert new data occurrences, and update or delete existing ones), and
  • query it (i.e., request information: compute and retrieve any information based on its data).

Database languages are data-model-specific, i.e., each language assumes and is based on a certain structure of the data (which typically differs among different data models). They typically have commands to instruct execution of the desired operations in the database. Each such command is equivalent to a complex expression (program) in a regular programming language, and thus programming in dedicated (database) languages simplifies the task of handling databases considerably. An expressions in a database language is automatically transformed (by a compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

 or interpreter, as regular programming languages) to a proper computer program that runs while accessing the database and providing the needed results. The following are notable examples:

SQL for the Relational model

A major Relational model language supported by all the relational DBMSs and a standard.

SQL was one of the first commercial languages for the relational model. Despite not adhering to the relational model as described by Codd
Codd's 12 rules
Codd's twelve rules are a set of thirteen rules proposed by Edgar F. Codd, a pioneer of the relational model for databases, designed to define what is required from a database management system in order for it to be considered relational, i.e., a relational database management system...

, it has become the most widely used database language. Though often described as, and to a great extent is a declarative language
Declarative programming
In computer science, declarative programming is a programming paradigm that expresses the logic of a computation without describing its control flow. Many languages applying this style attempt to minimize or eliminate side effects by describing what the program should accomplish, rather than...

, SQL also includes procedural
Procedural programming
Procedural programming can sometimes be used as a synonym for imperative programming , but can also refer to a programming paradigm, derived from structured programming, based upon the concept of the procedure call...

 elements. SQL became a standard of the American National Standards Institute
American National Standards Institute
The American National Standards Institute is a private non-profit organization that oversees the development of voluntary consensus standards for products, services, processes, systems, and personnel in the United States. The organization also coordinates U.S. standards with international...

 (ANSI) in 1986, and of the International Organization for Standards (ISO) in 1987. Since then the standard has been enhanced several times with added features. However, issues of SQL code portability between major RDBMS products still exist due to lack of full compliance with, or different interpretations of the standard. Among the reasons mentioned are the large size, and incomplete specification of the standard, as well as vendor lock-in
Vendor lock-in
In economics, vendor lock-in, also known as proprietary lock-in or customer lock-in, makes a customer dependent on a vendor for products and services, unable to use another vendor without substantial switching costs...

.

OQL for the Object model

An Object model
Object database
An object database is a database management system in which information is represented in the form of objects as used in object-oriented programming...

 language standard (by the Object Data Management Group
Object Data Management Group
The Object Data Management Group was conceived in the summer of 1991 at a breakfast with object database vendors that was organized by Rick Cattell of Sun Microsystems...

) that has influenced the design of some of the newer query languages like JDOQL and EJB QL
EJB QL
EJB QL or EJB-QL is a portable database query language for Enterprise Java Beans. It was used in Java EE applications. Compared to SQL, however, it is less complex but less powerful as well.-History:...

, though they cannot be considered as different flavors of OQL.

XQuery for the XML model

XQuery is an XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

 based database language (also named XQL).

Database architecture

Database architecture (to be distinguished from DBMS architecture; see below) may be viewed, to some extent, as an extension of Data modeling
Data modeling
Data modeling in software engineering is the process of creating a data model for an information system by applying formal data modeling techniques.- Overview :...

. It is used to conveniently answer requirements of different end-users from a same database, as well as for other benefits. For example, a financial department of a company needs the payment details of all employees as part of the company's expenses, but not other many details about employees, that are the interest of the human resources
Human resources
Human resources is a term used to describe the individuals who make up the workforce of an organization, although it is also applied in labor economics to, for example, business sectors or even whole nations...

 department. Thus different departments need different views of the company's database, that both include the employees' payments, possibly in a different level of detail (and presented in different visual forms). To meet such requirement effectively database architecture consists of three levels: external, conceptual and internal. Clearly separating the three levels was a major feature of the relational database model implementations that dominate 21st century databases.
  • The external level defines how each end-user type understands the organization of its respective relevant data in the database, i.e., the different needed end-user views. A single database can have any number of views at the external level.
  • The conceptual level unifies the various external views into a coherent whole, global view. It provides the common-denominator of all the external views. It comprises all the end-user needed generic data, i.e., all the data from which any view may be derived/computed. It is provided in the simplest possible way of such generic data, and comprises the back-bone of the database. It is out of the scope of the various database end-users, and serves database application developers and defined by database administrators that build the database.
  • The Internal level (or Physical level) is as a matter of fact part of the database implementation inside a DBMS (see Implementation section below). It is concerned with cost, performance, scalability and other operational matters. It deals with storage layout of the conceptual level, provides supporting storage-structures like indexes
    Index (database)
    A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of slower writes and increased storage space...

    , to enhance performance, and occasionally stores data of individual views (materialized view
    Materialized view
    A materialized view is a database object that contains the results of a query. They are local copies of data located remotely, or are used to create summary tables based on aggregations of a table's data. Materialized views, which store data based on remote tables, are also known as snapshots...

    s), computed from generic data, if performance justification exists for such redundancy. It balances all the external views' performance requirements, possibly conflicting, in attempt to optimize the overall database usage by all its end-uses according to the database goals and priorities.


All the three levels are maintained and updated according to changing needs by database administrators who often also participate in the database design.

The above three-level database architecture also relates to and being motivated by the concept of data independence
Data independence
Data independence is the type of data transparency that matters for a centralized DBMS. It refers to the immunity of user applications to make changes in the definition and organization of data....

which has been described for long time as a desired database property and was one of the major initial driving forces of the Relational model. In the context of the above architecture it means that changes made at a certain level do not affect definitions and software developed with higher level interfaces, and are being incorporated at the higher level automatically. For example, changes in the internal level do not affect application programs written using conceptual level interfaces, which saves substantial change work that would be needed otherwise.

In summary, the conceptual is a level of indirection between internal and external. On one hand it provides a common view of the database, independent of different external view structures, and on the other hand it is uncomplicated by details of how the data is stored or managed (internal level). In principle every level, and even every external view, can be presented by a different data model. In practice usually a given DBMS uses the same data model for both the external and the conceptual levels (e.g., relational model). The internal level, which is hidden inside the DBMS and depends on its implementation (see Implementation section below), requires a different level of detail and uses its own data structure types, typically different in nature from the structures of the external and conceptual levels which are exposed to DBMS users (e.g., the data models above): While the external and conceptual levels are focused on and serve DBMS users, the concern of the internal level is effective implementation details.

Database security


Database security deals with all various aspects of protecting the database content, its owners, and its users. It ranges from protection from intentional unauthorized database uses to unintentional database accesses by unauthorized entities (e.g., a person or a computer program).

The following are major areas of database security (among many others).

Access control
Database access control deals with controlling who (a person or a certain computer program) is allowed to access what information in the database. The information may comprise specific database objects (e.g., record types, specific records, data structures), certain computations over certain objects (e.g., query types, or specific queries), or utilizing specific access paths to the former (e.g., using specific indexes or other data structures to access information).

Database access controls are set by special authorized (by the database owner) personnel that uses dedicated protected security DBMS interfaces.

Data security
The definition of data security varies and may overlap with other database security aspects. Broadly it deals with protecting specific chunks of data, both physically (i.e., from corruption, or destruction, or removal; e.g., see Physical security
Physical security
Physical security describes measures that are designed to deny access to unauthorized personnel from physically accessing a building, facility, resource, or stored information; and guidance on how to design structures to resist potentially hostile acts...

), or the interpretation of them, or parts of them to meaningful information (e.g., by looking at the strings of bits that they comprise, concluding specific valid credit-card numbers; e.g., see Data encryption).

Database audit
Database audit primarily involves monitoring that no security breach, in all aspects, has happened. If security breach is discovered then all possible corrective actions are taken.

Database design
Database design is done before building it to meet needs of end-users within a given application/information-system that the database is intended to support. The database design defines the needed data and data structures that such a database comprises. A design is typically carried out according to the common three architectural levels of a database (see Database architecture above). First, the conceptual level is designed, which defines the over-all picture/view of the database, and reflects all the real-world elements (entities) the database intends to model, as well as the relationships among them. On top of it the external level, various views of the database, are designed according to (possibly completely different) needs of specific end-user types. More external views can be added later. External views requirements may modify the design of the conceptual level (i.e., add/remove entities and relationships), but usually a well designed conceptual level for an application well supports most of the needed external views. The conceptual view also determines the internal level (which primarily deals with data layout in storage) to a great extent. External views requirement may add supporting storage structures, like materialized views and indexes, for enhanced performance. Typically the internal layer is optimized for top performance, in an average way that takes into account performance requirements (possibly conflicting) of different external views according to their relative importance. While the conceptual and external levels design can usually be done independently of any DBMS (DBMS-independent design software packages exist, possibly with interfaces to some specific popular DBMSs), the internal level design highly relies on the capabilities and internal data structure of the specific DBMS utilized (see the Implementation section below).

A common way to carry out conceptual level design is to use the Entity-relationship model
Entity-relationship model
In software engineering, an entity-relationship model is an abstract and conceptual representation of data. Entity-relationship modeling is a database modeling method, used to produce a type of conceptual schema or semantic data model of a system, often a relational database, and its requirements...

 (ERM) (both the basic one, and with possible enhancement that it has gone over), since it provides a straightforward, intuitive perception of an application's elements and semantics. An alternative approach, which preceded the ERM, is using the Relational model and dependencies (mathematical relationships) among data to normalize
Database normalization
In the design of a relational database management system , the process of organizing data to minimize redundancy is called normalization. The goal of database normalization is to decompose relations with anomalies in order to produce smaller, well-structured relations...

 the database, i.e., to define the ("optimal") relations (data record or tupple types) in the database. Though a large body of research exists for this method it is more complex, less intuitive, and not more effective than the ERM method. Thus normalization is less utilized in practice than the ERM method.

The ERM may be less subtle than normalization in several aspects, but it captures the main needed dependencies which are induced by keys
Unique key
In relational database design, a unique key can uniquely identify each row in a table, and is closely related to the Superkey concept. A unique key comprises a single column or a set of columns. No two distinct rows in a table can have the same value in those columns if NULL values are not used...

/identifiers of entities and relationships. Also the ERM inherently includes the important inclusion dependencies (i.e., an entity instance that does not exist (has not been explicitly inserted) cannot appear in a relationship with other entities) which usually have been ignored in normalization. In addition the ERM allows entity type generalization (the Is-a
Is-a
In knowledge representation, object-oriented programming and design, is-a or is_a or is a is a relationship where one class D is a subclass of another class B ....

 relationship) and implied property (attribute) inheritance
Inheritance (object-oriented programming)
In object-oriented programming , inheritance is a way to reuse code of existing objects, establish a subtype from an existing object, or both, depending upon programming language support...

 (similarly to the that found in the Object model
Object model
In computing, object model has two related but distinct meanings:# The properties of objects in general in a specific computer programming language, technology, notation or methodology that uses them. For example, the Java objects model, the COM object model, or the object model of OMT...

).

Another aspect of database design is its security. It involves both defining access control
Access control
Access control refers to exerting control over who can interact with a resource. Often but not always, this involves an authority, who does the controlling. The resource can be a given building, group of buildings, or computer-based information system...

 to database objects (e.g., Entities, Views) as well as defining security levels and methods for the data itself (See Database security above).

Entities and relationships


The most common database design methods are based on the Entity relationship model (ERM, or ER model). This model views the world in a simplistic but very powerful way: It consists of "Entities" and the "Relationships" among them. Accordingly a database consists of entity and relationship types, each with defined attributes (field types) that model concrete entities and relationships. Modeling a database in this way typically yields an effective one with desired properties (as in some normal forms; see normalization below). Such models can be translated to any other data model required by any specific DBMS for building an effective database.

Database normalization


In the design of a relational database
Relational database
A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...

, the process of organizing database relations to minimize redundancy is called normalization. The goal is to produce well-structured relations so that additions, deletions, and modifications of a field can be made in just one relation (table) without worrying about appearance and update of the same field in other relations. The process is algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

ic and based on dependencies (mathematical relations) that exist among relations' field types. The process result is bringing the database relations into a certain "normal form". Several normal forms exist with different properties.

Database building, maintaining, and tuning


After designing a database for an application arrives the stage of building the database. Typically an appropriate general-purpose DBMS can be selected to be utilized for this purpose. A DBMS provides the needed user interface
User interface
The user interface, in the industrial design field of human–machine interaction, is the space where interaction between humans and machines occurs. The goal of interaction between a human and a machine at the user interface is effective operation and control of the machine, and feedback from the...

s to be utilized by database administrators to define the needed application's data structures within the DBMS's respective data model. Other user interfaces are used to select needed DBMS parameters (like security related, storage allocation parameters, etc.).

When the database is ready (all its data structures and other needed components are defined) it is typically populated with initial application's data (database initialization, which is typically a distinct project; in many cases using specialized DBMS interfaces that support bulk insertion) before making it operational. In some cases the database becomes operational while empty from application's data, and data are accumulated along its operation.

After completing building the database and making it operational arrives the database maintenance stage: Various database parameters may need changes and tuning
Database tuning
Database tuning describes a group of activities used to optimize and homogenize the performance of a database. It usually overlaps with query tuning, but refers to design of the database files, selection of the database management system , operating system and CPU the DBMS runs on.The goal is to...

 for better performance, application's data structures may be changed or added, new related application programs may be written to add to the application's functionality, etc.

Database migration between DBMSs

See also Database migration in Data migration
Data migration
Data migration is the process of transferring data between storage types, formats, or computer systems. Data migration is usually performed programmatically to achieve an automated migration, freeing up human resources from tedious tasks...



A database built with one DBMS is not portable
Portable
Portable may refer to:* Portable building, a manufactured structure that is built off site and moved in upon completion of site and utility work...

 to another DBMS (i.e., the other DBMS cannot run it). However, in some situations it is desirable to move, migrate a database from one DBMS to another. The reasons are primarily economical (different DBMSs may have different total costs of ownership
Total cost of ownership
Total cost of ownership is a financial estimate whose purpose is to help consumers and enterprise managers determine direct and indirect costs of a product or system...

-TCO), functional, and operational (different DBMSs may have different capabilities). The migration involves the database's transformation from one DBMS type to another. The transformation should maintain (if possible) the database related application (i.e., all related application programs) intact. Thus, the database's conceptual and external architectural levels should be maintained in the transformation. It may be desired that also some aspects of the architecture internal level are maintained. A complex or large database migration may be a complicated and costly (one-time) project by itself, which should be factored into the decision to migrate. This in spite of the fact that tools may exist to help migration between specific DBMS. Typically a DBMS vendor provides tools to help importing databases from other popular DBMSs.

Implementation: Database management systems


or How database usage requirements are met

A database management system (DBMS) is a system that allows to build and maintain databases, as well as to utilize their data and retrieve information from it. A DBMS defines the database type that it supports, as well as its functionality and operational capabilities. A DBMS provides the internal processes for external applications built on them. The end-users of some such specific application are usually exposed only to that application and do not directly interact with the DBMS. Thus end-users enjoy the effects of the underlying DBMS, but its internals are completely invisible to end-users. Database designers and database administrators interact with the DBMS through dedicated interfaces to build and maintain the applications' databases, and thus need some more knowledge and understanding about how DBMSs operate and the DBMSs' external interfaces and tuning parameters.

A DBMS consists of software that operates databases, providing storage, access, security, backup and other facilities to meet needed requirements. DBMSs can be categorized according to the database model
Database model
A database model is the theoretical foundation of a database and fundamentally determines in which manner data can be stored, organized, and manipulated in a database system. It thereby defines the infrastructure offered by a particular database system...

(s) that they support, such as relational or XML, the type(s) of computer they support, such as a server cluster or a mobile phone, the query language
Query language
Query languages are computer languages used to make queries into databases and information systems.Broadly, query languages can be classified according to whether they are database query languages or information retrieval query languages...

(s) that access the database, such as SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....

 or XQuery
XQuery
- Features :XQuery provides the means to extract and manipulate data from XML documents or any data source that can be viewed as XML, such as relational databases or office documents....

, performance trade-offs, such as maximum scale or maximum speed or others. Some DBMSs cover more than one entry in these categories, e.g., supporting multiple query languages. Database software typically support the Open Database Connectivity
Open Database Connectivity
In computing, ODBC is a standard C interface for accessing database management systems . The designers of ODBC aimed to make it independent of database systems and operating systems...

 (ODBC) standard which allows the database to integrate (to some extent) with other databases.

The development of a mature general-purpose DBMS typically takes several years and many man-years. Developers of DBMS typically update their product to follow and take advantage of progress in computer and storage technologies. Several DBMS products like Oracle and IBM DB2 have been in on-going development since the 1970s-1980s. Since DBMSs comprise a significant economical
Economy
An economy consists of the economic system of a country or other area; the labor, capital and land resources; and the manufacturing, trade, distribution, and consumption of goods and services of that area...

 market
Market
A market is one of many varieties of systems, institutions, procedures, social relations and infrastructures whereby parties engage in exchange. While parties may exchange goods and services by barter, most markets rely on sellers offering their goods or services in exchange for money from buyers...

, computer and storage vendors often take into account DBMS requirements in their own development plans.

DBMS architecture: major DBMS components


DBMS architecture
Software architecture
The software architecture of a system is the set of structures needed to reason about the system, which comprise software elements, relations among them, and properties of both...

 specifies its components (including descriptions of their functions) and their interfaces. DBMS architecture is distinct from database architecture. The following are major DBMS components:
  • DBMS external interfaces - They are the means to communicate with the DBMS (both ways, to and from the DBMS) to perform all the operations needed for the DBMS. These can be operations on a database, or operations to operate and manage the DBMS. For example:
- Direct database operations: defining data types, assigning security levels, updating data, querying the database, etc.
- Operations related to DBMS operation and management: backup and restore, database recovery, security monitoring, database storage allocation and database layout configuration monitoring, performance monitoring and tuning, etc.
An external interface can be either a user interface
User interface
The user interface, in the industrial design field of human–machine interaction, is the space where interaction between humans and machines occurs. The goal of interaction between a human and a machine at the user interface is effective operation and control of the machine, and feedback from the...

(e.g., typically for a database administrator), or an application programming interface
Application programming interface
An application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...

(API) used for communication between an application program and the DBMS.
  • Database language engines (or processors) - Most operations upon databases are performed through expression in Database languages (see above). Languages exist for data definition, data manipulation and queries (e.g., SQL), as well as for specifying various aspects of security, and more. Language expressions are fed into a DBMS through proper interfaces. A language engine processes the language expressions (by a compiler or language interpreter) to extract the intended database operations from the expression in a way that they can be executed by the DBMS.
  • Query optimizer
    Query optimizer
    The query optimizer is the component of a database management system that attempts to determine the most efficient way to execute a query. The optimizer considers the possible query plans for a given input query, and attempts to determine which of those plans will be the most efficient...

    - Performs query optimization
    Query optimization
    Query optimization is a function of many relational database management systems in which multiple query plans for satisfying a query are examined and a good query plan is identified. This may or not be the absolute best strategy because there are many ways of doing plans. There is a trade-off...

     on every query to choose for it the most efficient query plan
    Query plan
    A query plan is an ordered set of steps used to access or modify information in a SQL relational database management system. This is a specific case of the relational model concept of access plans....

    (a partial order (tree) of operations) to be executed to compute the query result.
  • Database engine
    Database engine
    A database engine is the underlying software component that a database management system uses to create, read, update and delete data from a database....

    - Performs the received database operations on the database objects, typically at their higher-level representation.
  • Storage engine - translates the operations to low-level operations on the storage bit
    Bit
    A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...

    s. In some references the Storage engine is viewed as part of the Database engine.
  • Transaction
    Database transaction
    A transaction comprises a unit of work performed within a database management system against a database, and treated in a coherent and reliable way independent of other transactions...

     engine
    - for correctness and reliability purposes most DBMS internal operations are performed encapsulated in transactions (see below). Transactions can also be specified externally to the DBMS to encapsulate a group of operations. The transaction engine tracks all the transactions and manages their execution according to the transaction rules (e.g., proper concurrency control, and proper commit or abort for each).
  • DBMS management and operation component - Comprises many components that deal with all the DBMS management and operational aspects like performance monitoring and tuning, backup and restore, recovery from failure, security management and monitoring, database storage allocation and database storage layout monitoring, etc.

Database storage


Database storage is the container of the physical materialization of a database. It comprises the Internal (physical) level in the database architecture. It also contains all the information needed (e.g., metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...

, "data about the data", and internal data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...

s) to reconstruct the Conceptual level and External level from the Internal level when needed. It is not part of the DBMS but rather manipulated by the DBMS (by its Storage engine; see above) to manage the database that resides in it. Though typically accessed by a DBMS through the underlying Operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

 (and often utilizing the operating systems' File system
File system
A file system is a means to organize data expected to be retained after a program terminates by providing procedures to store, retrieve and update data, as well as manage the available space on the device which contain it. A file system organizes data in an efficient manner and is tuned to the...

s as intermediates for storage layout), storage properties and configuration setting are extremely important for the efficient operation of the DBMS, and thus are closely maintained by database administrators. A DBMS, while in operation, always has its database residing in several types of storage (e.g., memory and external storage). The database data and the additional needed information, possibly in very large amounts, are coded into bits. Data typically reside in the storage in structures that look completely different from the way the data look in the conceptual and external levels, but in ways that attempt to optimize (the best possible) these levels' reconstruction when needed by users and programs, as well as for computing additional types of needed information from the data (e.g., when querying the database).

In principle the database storage can be viewed as a linear
Linear
In mathematics, a linear map or function f is a function which satisfies the following two properties:* Additivity : f = f + f...

 address space
Address space
In computing, an address space defines a range of discrete addresses, each of which may correspond to a network host, peripheral device, disk sector, a memory cell or other logical or physical entity.- Overview :...

, where every bit of data has its unique address in this address space. Practically only a very small percentage of addresses is kept as initial reference points (which also requires storage), and most of the database data is accessed by indirection using displacement calculations (distance in bits from the reference points) and data structures which define access paths (using pointers) to all needed data in effective manner, optimized for the needed data access operations.
Coding the data and Error-correcting codes

  • Data is encoded by assigning a bit pattern to each language alphabet
    Alphabet
    An alphabet is a standard set of letters—basic written symbols or graphemes—each of which represents a phoneme in a spoken language, either as it exists now or as it was in the past. There are other systems, such as logographies, in which each character represents a word, morpheme, or semantic...

     character, digit
    Numerical digit
    A digit is a symbol used in combinations to represent numbers in positional numeral systems. The name "digit" comes from the fact that the 10 digits of the hands correspond to the 10 symbols of the common base 10 number system, i.e...

    , other numerical patterns, and multimedia
    Multimedia
    Multimedia is media and content that uses a combination of different content forms. The term can be used as a noun or as an adjective describing a medium as having multiple content forms. The term is used in contrast to media which use only rudimentary computer display such as text-only, or...

     object. Many standards exist for encoding (e.g., ASCII
    ASCII
    The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

    , JPEG
    JPEG
    In computing, JPEG . The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and image quality. JPEG typically achieves 10:1 compression with little perceptible loss in image quality....

    , MPEG-4
    MPEG-4
    MPEG-4 is a method of defining compression of audio and visual digital data. It was introduced in late 1998 and designated a standard for a group of audio and video coding formats and related technology agreed upon by the ISO/IEC Moving Picture Experts Group under the formal standard ISO/IEC...

    ).
  • By adding bits to each encoded unit, the redundancy allows both to detect errors in coded data and to correct them based on mathematical algorithms. Errors occur regularly in low probabilities due to random bit value flipping, or "physical bit fatigue," loss of the physical bit in storage its ability to maintain distinguishable value (0 or 1), or due to errors in inter or intra-computer communication. A random bit flip (e.g., due to random radiation
    Radiation
    In physics, radiation is a process in which energetic particles or energetic waves travel through a medium or space. There are two distinct types of radiation; ionizing and non-ionizing...

    ) is typically corrected upon detection. A bit, or a group of malfunctioning physical bits (not always the specific defective bit is known; group definition depends on specific storage device) is typically automatically fenced-out, taken out of use by the device, and replaced with another functioning equivalent group in the device, where the corrected bit values are restored (if possible). The Cyclic redundancy check
    Cyclic redundancy check
    A cyclic redundancy check is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to raw data...

     (CRC) method is typically used in storage for error detection.

Data compression


Data compression methods allow in many cases to represent a string of bits by a shorter bit string ("compress") and reconstruct the original string ("decompress") when needed. This allows to utilize substantially less storage (tens of percents) for many types of data at the cost of more computation (compress and decompress when needed). Analysis of trade-off between storage cost saving and costs of related computations and possible delays in data availability is done before deciding whether to keep certain data in a database compressed or not.

Data compression is typically controlled through the DBMS's data definition interface, but in some cases may be a default
Default (computer science)
A default, in computer science, refers to a setting or value automatically assigned to a software application, computer program or device, outside of user intervention. Such settings are also called presets, especially for electronic devices...

 and automatic.
Data encryption


For security reasons certain types of data (e.g., credit-card information) may be kept encrypted
Encryption
In cryptography, encryption is the process of transforming information using an algorithm to make it unreadable to anyone except those possessing special knowledge, usually referred to as a key. The result of the process is encrypted information...

 in storage to prevent the possibility of unauthorized information reconstruction from chunks of storage snapshots (taken either via unforeseen vulnerabilities in a DBMS, or more likely, by bypassing it).

Data encryption is typically controlled through the DBMS's data definition interface, but in some cases may be a default and automatic.

Data storage types


This collection of bits describes both the contained database data and its related metadata (i.e., data that describes the contained data and allows computer programs to manipulate the database data correctly). The size of a database can nowadays be tens of Terabyte
Terabyte
The terabyte is a multiple of the unit byte for digital information. The prefix tera means 1012 in the International System of Units , and therefore 1 terabyte is , or 1 trillion bytes, or 1000 gigabytes. 1 terabyte in binary prefixes is 0.9095 tebibytes, or 931.32 gibibytes...

s, where a byte
Byte
The byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer...

 is eight bits. The physical materialization of a bit can employ various existing technologies, while new and improved technologies are constantly under development. Common examples are:
  • Magnetic medium (e.g., in Magnetic disk) - Orientation of magnetic field
    Magnetic field
    A magnetic field is a mathematical description of the magnetic influence of electric currents and magnetic materials. The magnetic field at any given point is specified by both a direction and a magnitude ; as such it is a vector field.Technically, a magnetic field is a pseudo vector;...

     in magnetic regions on a surface of material (two orientation directions, for 0 and 1).
  • Dynamic random-access memory (DRAM) - State of a miniature electronic circuit
    Electronic circuit
    An electronic circuit is composed of individual electronic components, such as resistors, transistors, capacitors, inductors and diodes, connected by conductive wires or traces through which electric current can flow...

     consisting of few transistor
    Transistor
    A transistor is a semiconductor device used to amplify and switch electronic signals and power. It is composed of a semiconductor material with at least three terminals for connection to an external circuit. A voltage or current applied to one pair of the transistor's terminals changes the current...

    s (among millions nowadays) in an integrated circuit
    Integrated circuit
    An integrated circuit or monolithic integrated circuit is an electronic circuit manufactured by the patterned diffusion of trace elements into the surface of a thin substrate of semiconductor material...

     (two states for 0 and 1).

These two examples are respectively for two major storage types:
  • Nonvolatile storage can maintain its bit states (0s and 1s) without electrical power supply, or when power supply is interrupted;
  • Volatile storage loses its bit values when power supply is interrupted (i.e., its content is erased).


Sophisticated storage units, which can, in fact, be effective dedicated parallel computers that support a large amount of nonvolatile storage, typically must include also components with volatile storage. Some such units employ batteries
Battery (electricity)
An electrical battery is one or more electrochemical cells that convert stored chemical energy into electrical energy. Since the invention of the first battery in 1800 by Alessandro Volta and especially since the technically improved Daniell cell in 1836, batteries have become a common power...

 that can provide power for several hours in case of external power interruption (e.g., see the EMC Symmetrix) and thus maintain the content of the volatile storage parts intact. Just before such a device's batteries lose their power the device typically automatically backs-up its volatile content portion (into nonvolatile) and shuts off to protect its data.

Databases are usually too expensive (in terms of importance and needed investment in resources, e.g., time, money, to build them) to be lost by a power interruption. Thus at any point in time most of their content resides in nonvolatile storage. Even if for operational reason very large portions of them reside in volatile storage (e.g., tens of Gigabyte
Gigabyte
The gigabyte is a multiple of the unit byte for digital information storage. The prefix giga means 109 in the International System of Units , therefore 1 gigabyte is...

s in volatile memory, for in-memory databases), most of this is backed-up in nonvolatile storage. A relatively small portion of this, which temporarily may not have nonvolatile backup, can be reconstructed by proper automatic database recovery procedures after volatile storage content loss.

More examples of storage types:
  • Volatile storage can be found in processors, computer memory (e.g., DRAM), etc.
  • Non-volatile storage types include ROM
    Rom
    ROM, Rom, or rom is an abbreviation and name that may refer to:-In computers and mathematics:* Read-only memory, a type of storage media which is used in computers and other electronic devices....

    , EPROM
    EPROM
    An EPROM , or erasable programmable read only memory, is a type of memory chip that retains its data when its power supply is switched off. In other words, it is non-volatile. It is an array of floating-gate transistors individually programmed by an electronic device that supplies higher voltages...

    , Hard disk drives, Flash memory
    Flash memory
    Flash memory is a non-volatile computer storage chip that can be electrically erased and reprogrammed. It was developed from EEPROM and must be erased in fairly large blocks before these can be rewritten with new data...

     and drives, Storage arrays, etc.

Storage metrics


Databases always use several types of storage when operational (and implied several when idle). Different types may significantly differ in their properties, and the optimal mix of storage types is determined by the types and quantities of operations that each storage type needs to perform, as well as considerations like physical space and energy consumption and dissipation (which may become critical for a large database). Storage types can be categorized by the following attributes:
  • Volatile/Nonvolatile.
  • Cost of the medium (e.g., per Megabyte), Cost to operate (cost of energy consumed per unit time).
  • Access speed (e.g., bytes per second).
  • Granularity — from fine to coarse (e.g., size in bytes of access operation).
  • Reliability (the probability of spontaneous bit value change under various conditions).
  • Maximal possible number of writes (of any specific bit or specific group of bits; could be constrained by the technology used (e.g., "write once" or "write twice"), or due to "physical bit fatigue," loss of ability to distinguish between the 0, 1 states due to many state changes (e.g., in Flash memory)).
  • Power needed to operate (Energy per time; energy per byte accessed), Energy efficiency, Heat to dissipate.
  • Packaging density (e.g., realistic number of bytes per volume unit)

Protecting storage device content: Device mirroring (replication) and RAID

See also Disk storage replication


While a group of bits malfunction may be resolved by error detection and correction mechanisms (see above), storage device malfunction requires different solutions. The following solutions are commonly used and valid for most storage devices:
  • Device mirroring
    Disk mirroring
    In data storage, disk mirroring or RAID1 is the replication of logical disk volumes onto separate physical hard disks in real time to ensure continuous availability...

     (replication)
    - A common solution to the problem is constantly maintaining an identical copy of device content on another device (typically of a same type). The down side is that this doubles the storage, and both devices (copies) need to be updated simultaneously with some overhead and possibly some delays. The up side is possible concurrent read of a same data group by two independent processes, which increases performance. When one of the replicated devices is detected to be defective, the other copy is still operational, and is being utilized to generate a new copy on another device (usually available operational in a pool of stand-by devices for this purpose).
  • Redundant array of independent disks (RAID
    RAID
    RAID is a storage technology that combines multiple disk drive components into a logical unit...

    ) - This method generalizes the device mirroring above by allowing one device in a group of N devices to fail and be replaced with content restored (Device mirroring is RAID with N=2). RAID groups of N=5 or N=6 are common. N>2 saves storage, when comparing with N=2, at the cost of more processing during both regular operation (with often reduced performance) and defective device replacement.


Device mirroring and typical RAID are designed to handle a single device failure in the RAID group of devices. However, if a second failure occurs before the RAID group is completely repaired from the first failure, then data can be lost. The probability of a single failure is typically small. Thus the probability of two failures in a same RAID group in time proximity is much smaller (approximately the probability squared, i.e., multiplied by itself). If a database cannot tolerate even such smaller probability of data loss, then the RAID group itself is replicated (mirrored). In many cases such mirroring is done geographically remotely, in a different storage array, to handle also recovery from disasters (see disaster recovery above).

Database storage layout


Database bits are laid-out in storage in data-structures and grouping that can take advantage of both known effective algorithms to retrieve and manipulate them and the storage own properties. Typically the storage itself is design to meet requirements of various areas that extensively utilize storage, including databases. A DBMS in operation always simultaneously utilizes several storage types (e.g., memory, and external storage), with respective layout methods.
Database storage hierarchy

A database, while in operation, resides simultaneously in several types of storage. By the nature of contemporary computers most of the database part inside a computer that hosts the DBMS resides (partially replicated) in volatile storage. Data (pieces of the database) that are being processed/manipulated reside inside a processor, possibly in processor's cache
Cache
In computer engineering, a cache is a component that transparently stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere...

s. This data are being read from/written to memory, typically through a computer bus (so far typically volatile storage components). Computer memory is communicating data (transferred to/from) external storage, typically through standard
Standard
A technical standard is an established norm or requirement about technical systems. It is usually a formal document that establishes uniform engineering or technical criteria, methods, processes and practices. In contrast, a custom, convention, company product, corporate standard, etc...

 storage interfaces or networks (e.g., fibre channel
Fibre Channel
Fibre Channel, or FC, is a gigabit-speed network technology primarily used for storage networking. Fibre Channel is standardized in the T11 Technical Committee of the InterNational Committee for Information Technology Standards , an American National Standards Institute –accredited standards...

, iSCSI
ISCSI
In computing, iSCSI , is an abbreviation of Internet Small Computer System Interface, an Internet Protocol -based storage networking standard for linking data storage facilities. By carrying SCSI commands over IP networks, iSCSI is used to facilitate data transfers over intranets and to manage...

). A storage array
Disk array
A disk array is a disk storage system which contains multiple disk drives. It is differentiated from a disk enclosure, in that an array has cache memory and advanced functionality, like RAID and virtualization.Components of a typical disk array include:...

, a common external storage unit, typically has storage hierarchy of it own, from a fast cache, typically consisting of (volatile and fast) DRAM
Dram
Dram or DRAM may refer to:As a unit of measure:* Dram , an imperial unit of mass and volume* Armenian dram, a monetary unit* Dirham, a unit of currency in several Arab nationsOther uses:...

, which is connected (again via standard interfaces) to drives, possibly with different speeds, like flash drives and magnetic disk drives (non-volatile). The drives may be connected to magnetic tape
Magnetic tape
Magnetic tape is a medium for magnetic recording, made of a thin magnetizable coating on a long, narrow strip of plastic. It was developed in Germany, based on magnetic wire recording. Devices that record and play back audio and video using magnetic tape are tape recorders and video tape recorders...

s, on which typically the least active parts of a large database may reside, or database backup generations.

Typically a correlation exists currently between storage speed and price, while the faster storage is typically volatile.
Data structures

A data structure is an abstract construct that embeds data in a well defined manner. An efficient data structure allows to manipulate the data in efficient ways. The data manipulation may include data insertion, deletion, updating and retrieval in various modes. A certain data structure type may be very effective in certain operations, and very ineffective in others. A data structure type is selected upon DBMS development to best meet the operations needed for the types of data it contains. Type of data structure selected for a certain task typically also takes into consideration the type of storage it resides in (e.g., speed of access, minimal size of storage chunk accessed, etc.). In some DBMSs database administrators have the flexibility to select among options of data structures to contain user data for performance reasons. Sometimes the data structures have selectable parameters to tune the database performance.

Databases may store data in many data structure types. Common examples are the following:
  • ordered/unordered flat files
    Flat file database
    A flat file database describes any of various means to encode a database model as a single file .- Overview :...

  • hash table
    Hash table
    In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...

    s
  • B+ tree
    B+ tree
    In computer science, a B+ tree or B plus tree is a type of tree which represents sorted data in a way that allows for efficient insertion, retrieval and removal of records, each of which is identified by a key. It is a dynamic, multilevel index, with maximum and minimum bounds on the number of...

    s
  • ISAM
    ISAM
    ISAM stands for Indexed Sequential Access Method, a method for indexing data for fast retrieval. ISAM was originally developed by IBM for mainframe computers...

  • heaps
    Heap (data structure)
    In computer science, a heap is a specialized tree-based data structure that satisfies the heap property: if B is a child node of A, then key ≥ key. This implies that an element with the greatest key is always in the root node, and so such a heap is sometimes called a max-heap...


Application data and DBMS data

A typical DBMS cannot store the data of the application it serves alone. In order to handle the application data the DBMS need to store this data in data structures that comprise specific data by themselves. In addition the DBMS needs its own data structures and many types of bookkeeping data like indexes and logs. The DBMS data is an integral part of the database and may comprise a substantial portion of it.
Data dictionary

A DBMS Data dictionary, or Catalog, is a repository with data about the data and data structures, as well as associated data for various functions related to them (e.g., datum-names and types for language processors, and attributes for security management). Various DBMS components use this repository for their operation.
Database indexing

Indexing
Index (database)
A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of slower writes and increased storage space...

 is a technique for improving database performance. The many types of indexes share the common property that they reduce the need to examine every entry when running a query. In large databases, this can reduce query time/cost by orders of magnitude. The simplest form of index is a sorted list of values that can be searched using a binary search with an adjacent reference to the location of the entry, analogous to the index in the back of a book. The same data can have multiple indexes (an employee database could be indexed by last name and hire date.)

Indexes affect performance, but not results. Database designers can add or remove indexes without changing application logic, reducing maintenance costs as the database grows and database usage evolves.

Given a particular query, the DBMS' query optimizer is responsible for devising the most efficient strategy for finding matching data.

Indexes can speed up data access, but they consume space in the database, and must be updated each time the data is altered. Indexes therefore can speed data access but slow data maintenance. These two properties determine whether a given index is worth the cost.
Database data clustering

In many cases substantial performance improvement is gained if different types of database objects that are usually utilized together are laid in storage in proximity, being clustered. This usually allows to retrieve needed related objects from storage in minimum number of input operations (each sometimes substantially time consuming). Even for in-memory databases clustering provides performance advantage due to common utilization of large caches for input-output operations in memory, with similar resulting behavior.

For example it may be beneficial to cluster a record of an item in stock with all its respective order records. The decision of whether to cluster certain objects or not depends on the objects' utilization statistics, object sizes, caches sizes, storage types, etc. In a relational database clustering the two respective relations "Items" and "Orders" results in saving the expensive execution of a Join operation between the two relations whenever such a join is needed in a query (the join result is already ready in storage by the clustering, available to be utilized).
Database materialized views

Often storage redundancy is employed to increase performance. A common example is storing materialized view
Materialized view
A materialized view is a database object that contains the results of a query. They are local copies of data located remotely, or are used to create summary tables based on aggregations of a table's data. Materialized views, which store data based on remote tables, are also known as snapshots...

s
, which are frequently-needed External views. Storing such external views saves expensive computing of them each time they are needed.
Database and database object replication

See also Replication below


Occasionally a database employs storage redundancy by database objects replication (with one or more copies) to increase data availability (both to improve performance of simultaneous multiple end-user accesses to a same database object, and to provide resiliency in a case of partial failure of a distributed database). Updates of a replicated object need to be synchronized across the object copies. In many cases the entire database is replicated.

Database transactions



As with every software system, a DBMS that operates in a faulty computing environment is prone to failures of many kinds. A failure can corrupt the respective database unless special measures are taken to prevent this. A DBMS achieves certain levels of fault tolerance by encapsulating operations within transactions. The concept of a database transaction (or atomic transaction) has evolved in order to enable both a well understood database system behavior in a faulty environment where crashes can happen any time, and recovery from a crash to a well understood database state. A database transaction is a unit of work, typically encapsulating a number of operations over a database (e.g., reading a database object, writing, acquiring lock, etc.), an abstraction supported in database and also other systems. Each transaction has well defined boundaries in terms of which program/code executions are included in that transaction (determined by the transaction's programmer via special transaction commands).

The ACID rules



Every database transaction obeys the following rules:
  • Atomicity - Either the effects of all or none of its operations remain ("all or nothing" semantics) when a transaction is completed (committed or aborted respectively). In other words, to the outside world a committed transaction appears (by its effects on the database) to be indivisible, atomic, and an aborted transaction does not leave effects on the database at all, as if never existed.
  • Consistency - Every transaction must leave the database in a consistent (correct) state, i.e., maintain the predetermined integrity rules of the database (constraints upon and among the database's objects). A transaction must transform a database from one consistent state to another consistent state (however, it is the responsibility of the transaction's programmer to make sure that the transaction itself is correct, i.e., performs correctly what it intends to perform (from the application's point of view) while the predefined integrity rules are enforced by the DBMS). Thus since a database can be normally changed only by transactions, all the database's states are consistent. An aborted transaction does not change the database state it has started from, as if it never existed (atomicity above).
  • Isolation - Transactions cannot interfere with each other (as an end result of their executions). Moreover, usually (depending on concurrency control method) the effects of an incomplete transaction are not even visible to another transaction. Providing isolation is the main goal of concurrency control.
  • Durability
    Durability (computer science)
    In database systems, durability is the ACID property which guarantees that transactions that have committed will survive permanently.For example, if a flight booking reports that a seat has successfully been booked, then the seat will remain booked even if the system crashes.Durability can be...

    - Effects of successful (committed) transactions must persist through crash
    Crash (computing)
    A crash in computing is a condition where a computer or a program, either an application or part of the operating system, ceases to function properly, often exiting after encountering errors. Often the offending program may appear to freeze or hang until a crash reporting service documents...

    es (typically by recording the transaction's effects and its commit event in a non-volatile memory
    Non-volatile memory
    Non-volatile memory, nonvolatile memory, NVM or non-volatile storage, in the most basic sense, is computer memory that can retain the stored information even when not powered. Examples of non-volatile memory include read-only memory, flash memory, ferroelectric RAM, most types of magnetic computer...

    ).

Isolation, concurrency control, and locking



Isolation provides the ability for multiple users to operate on the database at the same time without corrupting the data.
  • Concurrency control
    Concurrency control
    In information technology and computer science, especially in the fields of computer programming , operating systems , multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible.Computer...

    comprises the underlying mechanisms in a DBMS which handle isolation and guarantee related correctness. It is heavily utilized by the Database and Storage engines (see above) both to guarantee the correct execution of concurrent transactions, and (different mechanisms) the correctness of other DBMS processes. The transaction-related mechanisms typically constrain the database data access operations' timing (transaction schedules
    Schedule (computer science)
    In the fields of databases and transaction processing , a schedule of a system is an abstract model to describe execution of transactions running in the system. Often it is a list of operations ordered by time, performed by a set of transactions that are executed together in the system...

    ) to certain orders characterized as the Serializability
    Serializability
    In concurrency control of databases, transaction processing , and various transactional applications , both centralized and distributed, a transaction schedule is serializable, has the serializability property, if its outcome In concurrency control of databases, transaction processing (transaction...

     and Recoverabiliry schedule properties. Constraining database access operation execution typically means reduced performance (rates of execution), and thus concurrency control mechanisms are typically designed to provide the best performance possible under the constraints. Often, when possible without harming correctness, the serializability property is compromised for better performance. However, recoverability cannot be compromised, since such typically results in a quick database integrity violation.
  • Locking
    Two-phase locking
    In databases and transaction processing two-phase locking, is a concurrency control method that guarantees serializability.It is also the name of the resulting set of database transaction schedules...

    is the most common transaction concurrency control method in DBMSs, used to provide both serializability and recoverability for correctness. In order to access a database object a transaction first needs to acquire a lock for this object. Depending on the access operation type (e.g., reading or writing an object) and on the lock type, acquiring the lock may be blocked and postponed, if another transaction is holding a lock for that object.

Query optimization


A query is a request for information from a database. It can be as simple as "finding the address of a person with SS# 123-123-1234," or more complex like "finding the average salary of all the employed married men in California between the ages 30 to 39, that earn less than their wives." Queries results are generated by accessing relevant database data and manipulating it in a way that yields the requested information. Since database structures are complex, in most cases, and especially for not-very-simple queries, the needed data for a query can be collected from a database by accessing it in different ways, through different data-structures, and in different orders. Each different way typically requires different processing time. Processing times of a same query may have large variance, from a fraction of a second to hours, depending on the way selected. The purpose of query optimization, which is an automated process, is to find the way to process a given query in minimum time. The large possible variance in time justifies performing query optimization, though finding the exact optimal way to execute a query, among all possibilities, is typically very complex, time consuming by itself, may be too costly, and often practically impossible. Thus query optimization typically tries to approximate the optimum by comparing several common-sense alternatives to provide in a reasonable time a "good enough" plan which typically does not deviate much from the best possible result.

DBMS support for the development and maintenance of a database and its application



A DBMS typically intends to provide convenient environment to develop and later maintain an application built around its respective database type. A DBMS either provides such tools, or allows integration with such external tools. Examples for tools relate to database design, application programming, application program maintenance, database performance analysis and monitoring, database configuration monitoring, DBMS hardware configuration (a DBMS and related database may span computers, networks, and storage units) and related database mapping (especially for a distributed DBMS), storage allocation and database layout monitoring, storage migration, etc.

See also

  • Comparison of relational database management systems
    Comparison of relational database management systems
    The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. This article is not all-inclusive or necessarily up to date...

  • Comparison of database tools
    Comparison of database tools
    The following tables compare general and technical information for a number of available database administrator tools. Please see individual product articles for further information...

  • Data hierarchy
    Data hierarchy
    Data Hierarchy refers to the systematic organization of data, often in a hierarchical form. Data organization involves fields, records, files and so on....

  • Data store
    Data store
    A data store is a data repository of a set of integrated objects. These objects are modeled using classes defined in database schemas. Data store includes not only data repositories like databases, it is a more general concept that includes also flat files that can store data.Some data stores do...

  • Database-centric architecture
    Database-centric architecture
    Database-centric architecture or data-centric architecture has several distinct meanings, generally relating to software architectures in which databases play a crucial role. Often this description is meant to contrast the design to an alternative approach...


Further reading


  • Ling Liu and Tamer M. Özsu (Eds.) (2009). "Encyclopedia of Database Systems, 4100 p. 60 illus. ISBN 978-0-387-49616-0. Table of Content available at http://refworks.springer.com/mrw/index.php?id=1217
  • Beynon-Davies, P. (2004). Database Systems. 3rd Edition. Palgrave, Houndmills, Basingstoke.
  • Connolly, Thomas and Carolyn Begg. Database Systems. New York: Harlow, 2002.
  • Gray, J. and Reuter, A. Transaction Processing: Concepts and Techniques, 1st edition, Morgan Kaufmann Publishers, 1992.
  • Kroenke, David M. and David J. Auer. Database Concepts. 3rd ed. New York: Prentice, 2007.
  • Teorey, T.; Lightstone, S. and Nadeau, T. Database Modeling & Design: Logical Design, 4th edition, Morgan Kaufmann Press, 2005. ISBN 0-12-685352-5