The
relational model for
databaseA database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
management is a
database modelA database model is the theoretical foundation of a database and fundamentally determines in which manner data can be stored, organized, and manipulated in a database system. It thereby defines the infrastructure offered by a particular database system...
based on
firstorder predicate logicFirstorder logic is a formal logical system used in mathematics, philosophy, linguistics, and computer science. It goes by many names, including: firstorder predicate calculus, the lower predicate calculus, quantification theory, and predicate logic...
, first formulated and proposed in 1969 by
Edgar F. CoddEdgar Frank "Ted" Codd was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases...
.
The purpose of the relational model is to provide a
declarativeIn computer science, declarative programming is a programming paradigm that expresses the logic of a computation without describing its control flow. Many languages applying this style attempt to minimize or eliminate side effects by describing what the program should accomplish, rather than...
method for specifying data and queries: users directly state what information the database contains and what information they want from it, and let the database management system software take care of describing data structures for storing the data and retrieval procedures for answering queries.
IBMInternational Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...
's original implementation of Codd's ideas was System R. There have been several commercial and open source products based on Codd's ideas, including IBM's
DB2The IBM DB2 Enterprise Server Edition is a relational model database server developed by IBM. It primarily runs on Unix , Linux, IBM i , z/OS and Windows servers. DB2 also powers the different IBM InfoSphere Warehouse editions...
,
Oracle DatabaseThe Oracle Database is an objectrelational database management system produced and marketed by Oracle Corporation....
,
Microsoft SQL ServerMicrosoft SQL Server is a relational database server, developed by Microsoft: It is a software product whose primary function is to store and retrieve data as requested by other software applications, be it those on the same computer or those running on another computer across a network...
,
PostgreSQLPostgreSQL, often simply Postgres, is an objectrelational database management system available for many platforms including Linux, FreeBSD, Solaris, MS Windows and Mac OS X. It is released under the PostgreSQL License, which is an MITstyle license, and is thus free and open source software...
,
MySQLMySQL officially, but also commonly "My Sequel") is a relational database management system that runs as a server providing multiuser access to a number of databases. It is named after developer Michael Widenius' daughter, My...
, and many others. Most of these use the
SQLSQL is a programming language designed for managing data in relational database management systems ....
data definition and query language. A
table in an SQL database schema corresponds to a predicate variable; the contents of a table to a relation; key constraints, other constraints, and SQL queries correspond to predicates. However, SQL databases, including DB2, deviate from the relational model in many details; Codd fiercely argued against deviations that compromise the original principles.
Overview
The relational model's central idea is to describe a database as a collection of predicates over a finite set of predicate variables, describing constraints on the possible values and combinations of values. The content of the database at any given time is a finite (logical) model of the database, i.e. a set of
relationIn relational model:A relation value, which is assigned to a certain relation variable, is timevarying. By using a Data Definition Language , it is able to define relation variables.The following is an example of a heading which consists of three attributes....
s, one per predicate variable, such that all predicates are satisfied. A request for information from the database (a database query) is also a predicate.
Alternatives to the relational model
Other models are the hierarchical model and
network modelThe network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, is not restricted to being a hierarchy or lattice.The...
. Some
systemSystem is a set of interacting or interdependent components forming an integrated whole....
s using these older architectures are still in use today in
data centerA data center is a facility used to house computer systems and associated components, such as telecommunications and storage systems...
s with high data volume needs, or where existing systems are so complex and abstract it would be cost prohibitive to migrate to systems employing the relational model; also of note are newer
objectoriented databasesAn object database is a database management system in which information is represented in the form of objects as used in objectoriented programming...
.
A recent development is the ObjectRelation typeObject model, which is based on the assumption that any fact can be expressed in the form of one or more binary relationships. The model is used in
Object Role ModelingObject Role Modeling is a method for conceptual modeling, and can be used as a tool for information and rules analysis, ontological analysis, and data modeling in the field of software engineering. Overview :...
(ORM),
RDFThe Resource Description Framework is a family of World Wide Web Consortium specifications originally designed as a metadata data model...
/
Notation 3Notation3, or N3 as it is more commonly known, is a shorthand nonXML serialization of Resource Description Framework models, designed with humanreadability in mind: N3 is much more compact and readable than XML RDF notation...
(N3) and in
Gellish EnglishGellish English is a variant of Gellish and is a formal language, which means that it is structured and formalised subset of natural English that is computer interpretable. Its definition includes an English dictionary of concepts that is arranged in a taxonomy and that is extended into an ontology...
.
The relational model was the first database model to be described in formal mathematical terms. Hierarchical and network databases existed before relational databases, but their specifications were relatively informal. After the relational model was defined, there were many attempts to compare and contrast the different models, and this led to the emergence of more rigorous descriptions of the earlier models; though the procedural nature of the data manipulation interfaces for hierarchical and network databases limited the scope for formalization.
Implementation
There have been several attempts to produce a true implementation of the relational database model as originally defined by
CoddEdgar Frank "Ted" Codd was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases...
and explained by
DateChris Date is an independent author, lecturer, researcher, and consultant, specializing in relational database theory.Biography:Chris Date attended High Wycombe Royal Grammar School from 1951 to 1958 and received his BA in Mathematics from Cambridge University in 1962. He entered the computer...
,
DarwenHugh Darwen is a computer scientist who was an employee of IBM United Kingdom from 1967 to 2004, and has been involved in the history of the relational model. Work :...
and others, but none have been popular successes so far.
RelRel is an open source true relational database management system that implements a significant portion of Chris Date and Hugh Darwen's Tutorial D query language.Primarily intended for teaching purposes, Rel is written in the Java programming language....
is one of the more recent attempts to do this.
History
The relational model was invented by
E.F. (Ted) CoddEdgar Frank "Ted" Codd was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases...
as a general model of data, and subsequently maintained and developed by
Chris DateChris Date is an independent author, lecturer, researcher, and consultant, specializing in relational database theory.Biography:Chris Date attended High Wycombe Royal Grammar School from 1951 to 1958 and received his BA in Mathematics from Cambridge University in 1962. He entered the computer...
and
Hugh DarwenHugh Darwen is a computer scientist who was an employee of IBM United Kingdom from 1967 to 2004, and has been involved in the history of the relational model. Work :...
among others. In
The Third ManifestoThe Third Manifesto is Christopher J. Date's and Hugh Darwen's proposal for future database management systems, a response to two earlier Manifestos with the same purpose. The theme of the manifestos is how to avoid the 'objectrelational impedance mismatch' between objectoriented programming...
(first published in 1995) Date and Darwen show how the relational model can accommodate certain desired objectoriented features.
Controversies
Codd himself, some years after publication of his 1970 model, proposed a threevalued logic (True, False, Missing or
NULLNull is a special marker used in Structured Query Language to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL Null serves to fulfill the requirement that all true relational database management systems support...
) version of it to deal with missing information, and in his
The Relational Model for Database Management Version 2 (1990) he went a step further with a fourvalued logic (True, False, Missing but Applicable, Missing but Inapplicable) version. But these have never been implemented, presumably because of attending complexity. SQL's NULL construct was intended to be part of a threevalued logic system, but fell short of that due to logical errors in the standard and in its implementations.
The model
The fundamental assumption of the relational model is that all
dataThe term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...
is represented as mathematical
n
aryIn logic, mathematics, and computer science, the arity of a function or operation is the number of arguments or operands that the function takes. The arity of a relation is the dimension of the domain in the corresponding Cartesian product...
relationIn relational model:A relation value, which is assigned to a certain relation variable, is timevarying. By using a Data Definition Language , it is able to define relation variables.The following is an example of a heading which consists of three attributes....
s, an
nary relation being a
subsetIn mathematics, especially in set theory, a set A is a subset of a set B if A is "contained" inside B. A and B may coincide. The relationship of one set being a subset of another is called inclusion or sometimes containment...
of the
Cartesian productIn mathematics, a Cartesian product is a construction to build a new set out of a number of given sets. Each member of the Cartesian product corresponds to the selection of one element each in every one of those sets...
of
n domains. In the mathematical model, reasoning about such data is done in twovalued
predicate logicIn mathematical logic, predicate logic is the generic term for symbolic formal systems like firstorder logic, secondorder logic, manysorted logic or infinitary logic. This formal system is distinguished from other systems in that its formulae contain variables which can be quantified...
, meaning there are two possible
evaluationEvaluation is systematic determination of merit, worth, and significance of something or someone using criteria against a set of standards.Evaluation often is used to characterize and appraise subjects of interest in a wide range of human enterprises, including the arts, criminal justice,...
s for each
propositionIn logic and philosophy, the term proposition refers to either the "content" or "meaning" of a meaningful declarative sentence or the pattern of symbols, marks, or sounds that make up a meaningful declarative sentence...
: either
true or
false (and in particular no third value such as
unknown, or
not applicable, either of which are often associated with the concept of
NULLNull is a special marker used in Structured Query Language to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL Null serves to fulfill the requirement that all true relational database management systems support...
). Some think twovalued
logicIn philosophy, Logic is the formal systematic study of the principles of valid inference and correct reasoning. Logic is used in most intellectual activities, but is studied primarily in the disciplines of philosophy, mathematics, semantics, and computer science...
is an important part of the relational model, while others think a system that uses a form of threevalued logic can still be considered relational.
Data are operated upon by means of a
relational calculusRelational calculus consists of two calculi, the tuple relational calculus and the domain relational calculus, that are part of the relational model for databases and provide a declarative way to specify database queries...
or
relational algebraRelational algebra, an offshoot of firstorder logic , deals with a set of finitary relations that is closed under certain operators. These operators operate on one or more relations to yield a relation...
, these being equivalent in
expressive powerIn computer science, the expressive power of a language describes the ideas expressible in that language.For example, the Web Ontology Language expression language profile lacks ideas which can be expressed in OWL2 RL . OWL2 EL may therefore be said to have less expressive power than OWL2 RL...
.
The relational model of data permits the database designer to create a consistent, logical representation of
informationInformation in its most restricted technical sense is a message or collection of messages that consists of an ordered sequence of symbols, or it is the meaning that can be interpreted from such a message or collection of messages. Information can be recorded or transmitted. It can be recorded as...
. Consistency is achieved by including declared
constraints in the database design, which is usually referred to as the logical schema. The theory includes a process of
database normalizationIn the design of a relational database management system , the process of organizing data to minimize redundancy is called normalization. The goal of database normalization is to decompose relations with anomalies in order to produce smaller, wellstructured relations...
whereby a design with certain desirable properties can be selected from a set of
logically equivalentIn logic, statements p and q are logically equivalent if they have the same logical content.Syntactically, p and q are equivalent if each can be proved from the other...
alternatives. The access plans and other implementation and operation details are handled by the DBMS engine, and are not reflected in the logical model. This contrasts with common practice for SQL DBMSs in which
performance tuningPerformance tuning is the improvement of system performance. This is typically a computer application, but the same methods can be applied to economic markets, bureaucracies or other complex systems. The motivation for such activity is called a performance problem, which can be real or anticipated....
often requires changes to the logical model.
The basic relational building block is the
domainIn data management and database analysis, a data domain refers to all the unique values which a data element may contain. The rule for determining the domain boundary may be as simple as a data type with an enumerated list of values....
or
data typeIn computer programming, a data type is a classification identifying one of various types of data, such as floatingpoint, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of...
, usually abbreviated nowadays to
type. A
tupleIn mathematics and computer science, a tuple is an ordered list of elements. In set theory, an ntuple is a sequence of n elements, where n is a positive integer. There is also one 0tuple, an empty sequence. An ntuple is defined inductively using the construction of an ordered pair...
is an ordered set of
attribute values. An
attributeIn computing, an attribute is a specification that defines a property of an object, element, or file. It may also refer to or set the specific value for a given instance of such....
is an ordered pair of
attribute name and
type name. An attribute value is a specific valid value for the type of the attribute. This can be either a scalar value or a more complex type.
A relation consists of a
heading and a
body. A heading is a set of attributes. A body (of an
nary relation) is a set of
ntuples. The heading of the relation is also the heading of each of its tuples.
A relation is defined as a set of
ntuples. In both mathematics and the relational database model, a set is an
unordered collection of unique, nonduplicated items, although some DBMSs impose an order to their data. In mathematics, a
tupleIn mathematics and computer science, a tuple is an ordered list of elements. In set theory, an ntuple is a sequence of n elements, where n is a positive integer. There is also one 0tuple, an empty sequence. An ntuple is defined inductively using the construction of an ordered pair...
has an order, and allows for duplication.
E.F. CoddEdgar Frank "Ted" Codd was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases...
originally defined tuples using this mathematical definition. Later, it was one of
E.F. CoddEdgar Frank "Ted" Codd was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases...
's great insights that using attribute names instead of an ordering would be so much more convenient (in general) in a computer language based on relations . This insight is still being used today. Though the concept has changed, the name "tuple" has not. An immediate and important consequence of this distinguishing feature is that in the relational model the
Cartesian productIn mathematics, a Cartesian product is a construction to build a new set out of a number of given sets. Each member of the Cartesian product corresponds to the selection of one element each in every one of those sets...
becomes commutative.
A
tableIn relational databases and flat file databases, a table is a set of data elements that is organized using a model of vertical columns and horizontal rows. A table has a specified number of columns, but can have any number of rows...
is an accepted visual representation of a relation; a tuple is similar to the concept of
rowIn the context of a relational database, a row—also called a record or tuple—represents a single, implicitly structured data item in a table. In simple terms, a database table can be thought of as consisting of rows and columns or fields...
, but note that in the database language SQL the columns and the rows of a table are ordered.
A
relvarIn relational databases, a relvar is a term coined by C. J. Date as an abbreviation for the concept of relation variable, which is the actual term used by the inventor of the relational model, E. F. Codd, regarding the same concept...
is a named variable of some specific relation type, to which at all times some relation of that type is assigned, though the relation may contain zero tuples.
The basic principle of the relational model is the Information Principle: all
informationInformation in its most restricted technical sense is a message or collection of messages that consists of an ordered sequence of symbols, or it is the meaning that can be interpreted from such a message or collection of messages. Information can be recorded or transmitted. It can be recorded as...
is represented by data values in relations. In accordance with this Principle, a
relational databaseA relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...
is a set of relvars and the result of every query is presented as a relation.
The consistency of a relational database is enforced, not by rules built into the applications that use it, but rather by
constraints, declared as part of the logical schema and enforced by the DBMS for all applications. In general, constraints are expressed using relational comparison operators, of which just one, "is subset of" (⊆), is theoretically sufficient. In practice, several useful shorthands are expected to be available, of which the most important are
candidate keyIn the relational model of databases, a candidate key of a relation is a minimal superkey for that relation; that is, a set of attributes such that# the relation does not have two distinct tuples In the relational model of databases, a candidate key of a relation is a minimal superkey for that...
(really,
superkeyA superkey is defined in the relational model of database organization as a set of attributes of a relation variable for which it holds that in all relations assigned to that variable, there are no two distinct tuples that have the same values for the attributes in this set...
) and
foreign keyIn the context of relational databases, a foreign key is a referential constraint between two tables.A foreign key is a field in a relational table that matches a candidate key of another table...
constraints.
Interpretation
To fully appreciate the relational model of data it is essential to understand the intended
interpretation of a
relationIn relational model:A relation value, which is assigned to a certain relation variable, is timevarying. By using a Data Definition Language , it is able to define relation variables.The following is an example of a heading which consists of three attributes....
.
The body of a relation is sometimes called its extension. This is because it is to be interpreted as a representation of the
extensionThe extension of a predicatea truthvalued functionis the set of tuples of values that, used as arguments, satisfy the predicate. Such a set of tuples is a relation.For example the statement "d2 is the weekday following d1"...
of some
predicate, this being the set of true
propositionIn logic and philosophy, the term proposition refers to either the "content" or "meaning" of a meaningful declarative sentence or the pattern of symbols, marks, or sounds that make up a meaningful declarative sentence...
s that can be formed by replacing each free variable in that predicate by a name (a term that designates something).
There is a
onetoone correspondenceA bijection is a function giving an exact pairing of the elements of two sets. A bijection from the set X to the set Y has an inverse function from Y to X. If X and Y are finite sets, then the existence of a bijection means they have the same number of elements...
between the free variables of the predicate and the attribute names of the relation heading. Each tuple of the relation body provides attribute values to instantiate the predicate by substituting each of its free variables. The result is a proposition that is deemed, on account of the appearance of the tuple in the relation body, to be true. Contrariwise, every tuple whose heading conforms to that of the relation but which does not appear in the body is deemed to be false. This assumption is known as the
closed world assumptionThe closed world assumption is the presumption that what is not currently known to be true, is false. The same name also refers to a logical formalization of this assumption by Raymond Reiter. The opposite of the closed world assumption is the open world assumption , stating that lack of knowledge...
: it is often violated in practical databases, where the absence of a tuple might mean that the truth of the corresponding proposition is unknown. For example, the absence of the tuple ('John', 'Spanish') from a table of language skills cannot necessarily be taken as evidence that John does not speak Spanish.
For a formal exposition of these ideas, see the section
Settheoretic Formulation, below.
Application to databases
A
data typeIn data management and database analysis, a data domain refers to all the unique values which a data element may contain. The rule for determining the domain boundary may be as simple as a data type with an enumerated list of values....
as used in a typical relational database might be the set of integers, the set of character strings, the set of dates, or the two boolean values
true and
false, and so on. The corresponding
type names for these types might be the strings "int", "char", "date", "boolean", etc. It is important to understand, though, that relational theory does not dictate what types are to be supported; indeed, nowadays provisions are expected to be available for
userdefined types in addition to the
builtin ones provided by the system.
AttributeIn the context of a relational database table, a column is a set of data values of a particular simple type, one for each row of the table. The columns provide the structure according to which the rows are composed....
is the term used in the theory for what is commonly referred to as a
column. Similarly,
table is commonly used in place of the theoretical term
relation (though in SQL the term is by no means synonymous with relation). A table data structure is specified as a list of column definitions, each of which specifies a unique column name and the type of the values that are permitted for that column. An
attribute value is the entry in a specific column and row, such as "John Doe" or "35".
A
tupleIn mathematics and computer science, a tuple is an ordered list of elements. In set theory, an ntuple is a sequence of n elements, where n is a positive integer. There is also one 0tuple, an empty sequence. An ntuple is defined inductively using the construction of an ordered pair...
is basically the same thing as a
rowIn the context of a relational database, a row—also called a record or tuple—represents a single, implicitly structured data item in a table. In simple terms, a database table can be thought of as consisting of rows and columns or fields...
, except in an SQL DBMS, where the column values in a row are ordered. (Tuples are not ordered; instead, each attribute value is identified solely by the
attribute name and never by its ordinal position within the tuple.) An attribute name might be "name" or "age".
A
relationIn relational model:A relation value, which is assigned to a certain relation variable, is timevarying. By using a Data Definition Language , it is able to define relation variables.The following is an example of a heading which consists of three attributes....
is a
tableIn relational databases and flat file databases, a table is a set of data elements that is organized using a model of vertical columns and horizontal rows. A table has a specified number of columns, but can have any number of rows...
structure definition (a set of column definitions) along with the data appearing in that structure. The structure definition is the
heading and the data appearing in it is the
body, a set of rows. A database
relvarIn relational databases, a relvar is a term coined by C. J. Date as an abbreviation for the concept of relation variable, which is the actual term used by the inventor of the relational model, E. F. Codd, regarding the same concept...
(relation variable) is commonly known as a
base table. The heading of its assigned value at any time is as specified in the table declaration and its body is that most recently assigned to it by invoking some
update operator (typically, INSERT, UPDATE, or DELETE). The heading and body of the table resulting from evaluation of some query are determined by the definitions of the operators used in the expression of that query. (Note that in SQL the heading is not always a set of column definitions as described above, because it is possible for a column to have no name and also for two or more columns to have the same name. Also, the body is not always a set of rows because in SQL it is possible for the same row to appear more than once in the same body.)
SQL and the relational model
SQL, initially pushed as the
standardStandardization is the process of developing and implementing technical standards.The goals of standardization can be to help with independence of single suppliers , compatibility, interoperability, safety, repeatability, or quality....
language for
relational databaseA relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...
s, deviates from the relational model in several places. The current
ISOThe International Organization for Standardization , widely known as ISO, is an international standardsetting body composed of representatives from various national standards organizations. Founded on February 23, 1947, the organization promulgates worldwide proprietary, industrial and commercial...
SQL standard doesn't mention the relational model or use relational terms or concepts. However, it is possible to create a database conforming to the relational model using SQL if one does not use certain SQL features.
The following deviations from the relational model have been noted in SQL. Note that few database servers implement the entire SQL standard and in particular do not allow some of these deviations. Whereas NULL is ubiquitous, for example, allowing duplicate column names within a table or anonymous columns is uncommon.
Duplicate rows
 The same row can appear more than once in an SQL table. The same tuple cannot appear more than once in a relation
In relational model:A relation value, which is assigned to a certain relation variable, is timevarying. By using a Data Definition Language , it is able to define relation variables.The following is an example of a heading which consists of three attributes....
.
Anonymous columns
 A column in an SQL table can be unnamed and thus unable to be referenced in expressions. The relational model requires every attribute to be named and referenceable.
Duplicate column names
 Two or more columns of the same SQL table can have the same name and therefore cannot be referenced, on account of the obvious ambiguity. The relational model requires every attribute to be referenceable.
Column order significance
 The order of columns in an SQL table is defined and significant, one consequence being that SQL's implementations of Cartesian product and union are both noncommutative. The relational model requires there to be no significance to any ordering of the attributes of a relation.
Views without CHECK OPTION
 Updates to a view defined without CHECK OPTION can be accepted but the resulting update to the database does not necessarily have the expressed effect on its target. For example, an invocation of INSERT can be accepted but the inserted rows might not all appear in the view, or an invocation of UPDATE can result in rows disappearing from the view. The relational model requires updates to a view to have the same effect as if the view were a base relvar.
Columnless tables unrecognized
 SQL requires every table to have at least one column, but there are two relations of degree zero (of cardinality
In SQL , the term cardinality refers to the uniqueness of data values contained in a particular column of a database table. The lower the cardinality, the more duplicated elements in a column. Thus, a column with the lowest possible cardinality would have the same value for every row...
one and zero) and they are needed to represent extensions of predicates that contain no free variables.
NULL
 This special mark can appear instead of a value wherever a value can appear in SQL, in particular in place of a column value in some row. The deviation from the relational model arises from the fact that the implementation of this ad hoc concept in SQL involves the use of threevalued logic
In logic, a manyvalued logic is a propositional calculus in which there are more than two truth values. Traditionally, in Aristotle's logical calculus, there were only two possible values for any proposition...
, under which the comparison of NULL with itself does not yield true but instead yields the third truth value, unknown; similarly the comparison NULL with something other than itself does not yield false but instead yields unknown. It is because of this behaviour in comparisons that NULL is described as a mark rather than a value. The relational model depends on the law of excluded middleIn logic, the law of excluded middle is the third of the socalled three classic laws of thought. It states that for any proposition, either that proposition is true, or its negation is....
under which anything that is not true is false and anything that is not false is true; it also requires every tuple in a relation body to have a value for every attribute of that relation. This particular deviation is disputed by some if only because E.F. Codd himself eventually advocated the use of special marks and a 4valued logic, but this was based on his observation that there are two distinct reasons why one might want to use a special mark in place of a value, which led opponents of the use of such logics to discover more distinct reasons and at least as many as 19 have been noted, which would require a 21valued logic. SQL itself uses NULL for several purposes other than to represent "value unknown". For example, the sum of the empty set is NULL, meaning zero, the average of the empty set is NULL, meaning undefined, and NULL appearing in the result of a LEFT JOIN can mean "no value because there is no matching row in the righthand operand".
Relational operations
Users (or programs) request data from a relational database by sending it a query that is written in a special language, usually a dialect of SQL. Although SQL was originally intended for endusers, it is much more common for SQL queries to be embedded into software that provides an easier user interface. Many web sites, such as Wikipedia, perform SQL queries when generating pages.
In response to a query, the database returns a result set, which is just a list of rows containing the answers. The simplest query is just to return all the rows from a table, but more often, the rows are filtered in some way to return just the answer wanted.
Often, data from multiple tables are combined into one, by doing a
joinAn SQL join clause combines records from two or more tables in a database. It creates a set that can be saved as a table or used as is. A JOIN is a means for combining fields from two tables by using values common to each. ANSI standard SQL specifies four types of JOINs: INNER, OUTER, LEFT, and RIGHT...
. Conceptually, this is done by taking all possible combinations of rows (the
Cartesian productIn mathematics, a Cartesian product is a construction to build a new set out of a number of given sets. Each member of the Cartesian product corresponds to the selection of one element each in every one of those sets...
), and then filtering out everything except the answer. In practice, relational database management systems rewrite ("
optimizeThe query optimizer is the component of a database management system that attempts to determine the most efficient way to execute a query. The optimizer considers the possible query plans for a given input query, and attempts to determine which of those plans will be the most efficient...
") queries to perform faster, using a variety of techniques.
There are a number of relational operations in addition to join. These include project (the process of eliminating some of the columns), restrict (the process of eliminating some of the rows), union (a way of combining two tables with similar structures), difference (which lists the rows in one table that are not found in the other), intersect (which lists the rows found in both tables), and product (mentioned above, which combines each row of one table with each row of the other). Depending on which other sources you consult, there are a number of other operators – many of which can be defined in terms of those listed above. These include semijoin, outer operators such as outer join and outer union, and various forms of division. Then there are operators to rename columns, and summarizing or aggregating operators, and if you permit
relationIn relational model:A relation value, which is assigned to a certain relation variable, is timevarying. By using a Data Definition Language , it is able to define relation variables.The following is an example of a heading which consists of three attributes....
values as attributes (RVA – relationvalued attribute), then operators such as group and ungroup. The SELECT statement in SQL serves to handle all of these except for the group and ungroup operators.
The flexibility of relational databases allows programmers to write queries that were not anticipated by the database designers. As a result, relational databases can be used by multiple applications in ways the original designers did not foresee, which is especially important for databases that might be used for a long time (perhaps several decades). This has made the idea and implementation of relational databases very popular with businesses.
Database normalization
RelationIn relational model:A relation value, which is assigned to a certain relation variable, is timevarying. By using a Data Definition Language , it is able to define relation variables.The following is an example of a heading which consists of three attributes....
s are classified based upon the types of anomalies to which they're vulnerable. A database that's in the first normal form is vulnerable to all types of anomalies, while a database that's in the domain/key normal form has no modification anomalies. Normal forms are hierarchical in nature. That is, the lowest level is the first normal form, and the database cannot meet the requirements for higher level normal forms without first having met all the requirements of the lesser normal forms.
Database
An idealized, very simple example of a description of some
relvarIn relational databases, a relvar is a term coined by C. J. Date as an abbreviation for the concept of relation variable, which is the actual term used by the inventor of the relational model, E. F. Codd, regarding the same concept...
s (
relationIn relational model:A relation value, which is assigned to a certain relation variable, is timevarying. By using a Data Definition Language , it is able to define relation variables.The following is an example of a heading which consists of three attributes....
variables) and their attributes:
 Customer(Customer ID, Tax ID, Name, Address, City, State, Zip, Phone, Email)
 Order(Order No, Customer ID, Invoice No, Date Placed, Date Promised, Terms, Status)
 Order Line(Order No, Order Line No, Product Code, Qty)
 Invoice(Invoice No, Customer ID, Order No, Date, Status)
 Invoice Line(Invoice No, Invoice Line No, Product Code, Qty Shipped)
 Product(Product Code, Product Description)
In this
designDesign as a noun informally refers to a plan or convention for the construction of an object or a system while “to design” refers to making this plan...
we have six relvars: Customer, Order, Order Line, Invoice, Invoice Line and Product. The bold, underlined attributes are
candidate keyIn the relational model of databases, a candidate key of a relation is a minimal superkey for that relation; that is, a set of attributes such that# the relation does not have two distinct tuples In the relational model of databases, a candidate key of a relation is a minimal superkey for that...
s. The nonbold, underlined attributes are
foreign keyIn the context of relational databases, a foreign key is a referential constraint between two tables.A foreign key is a field in a relational table that matches a candidate key of another table...
s.
Usually one
candidate keyIn the relational model of databases, a candidate key of a relation is a minimal superkey for that relation; that is, a set of attributes such that# the relation does not have two distinct tuples In the relational model of databases, a candidate key of a relation is a minimal superkey for that...
is arbitrarily chosen to be called the primary key and used in
preferenceDefinitions in different disciplines:The term “preferences” is used in a variety of related, but not identical, ways in the scientific literature. This makes it necessary to make explicit the sense in which the term is used in different social sciences....
over the other candidate keys, which are then called alternate keys.
A
candidate key is a unique
identifierAn identifier is a name that identifies either a unique object or a unique class of objects, where the "object" or class may be an idea, physical [countable] object , or physical [noncountable] substance...
enforcing that no
tupleIn mathematics and computer science, a tuple is an ordered list of elements. In set theory, an ntuple is a sequence of n elements, where n is a positive integer. There is also one 0tuple, an empty sequence. An ntuple is defined inductively using the construction of an ordered pair...
will be duplicated; this would make the
relationIn relational model:A relation value, which is assigned to a certain relation variable, is timevarying. By using a Data Definition Language , it is able to define relation variables.The following is an example of a heading which consists of three attributes....
into something else, namely a bag, by violating the basic definition of a set. Both foreign keys and superkeys (which includes candidate keys) can be composite, that is, can be composed of several attributes. Below is a tabular depiction of a relation of our example Customer relvar; a relation can be thought of as a value that can be attributed to a relvar.
Customer relation
Customer ID Tax ID Name Address [More fields....]
1234567890 5555512222 Munmun 323 Broadway ...
2223344556 5555523232 Wile E. 1200 Main Street ...
3334445563 5555533323 Ekta 871 1st Street ...
4232342432 5555325523 E. F. Codd 123 It Way ...
If we attempted to
insert a new customer with the ID
1234567890, this would violate the design of the relvar since
Customer ID is a
primary key and we already have a customer
1234567890. The DBMS must reject a
transactionA transaction comprises a unit of work performed within a database management system against a database, and treated in a coherent and reliable way independent of other transactions...
such as this that would render the
databaseA database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
inconsistent by a violation of an
integrity constraintDatabase integrity ensures that data entered into the database is accurate, valid, and consistent. Any applicable integrity constraints and data validation rules must be satisfied before permitting a change to the database....
.
Foreign keyIn the context of relational databases, a foreign key is a referential constraint between two tables.A foreign key is a field in a relational table that matches a candidate key of another table...
s are integrity constraints enforcing that the
valueIn computer science, a value is an expression which cannot be evaluated any further . The members of a type are the values of that type. For example, the expression "1 + 2" is not a value as it can be reduced to the expression "3"...
of the attribute set is drawn from a
candidate keyIn the relational model of databases, a candidate key of a relation is a minimal superkey for that relation; that is, a set of attributes such that# the relation does not have two distinct tuples In the relational model of databases, a candidate key of a relation is a minimal superkey for that...
in another
relationIn relational model:A relation value, which is assigned to a certain relation variable, is timevarying. By using a Data Definition Language , it is able to define relation variables.The following is an example of a heading which consists of three attributes....
. For example in the Order relation the attribute
Customer ID is a foreign key. A
joinJoin may refer to:* Join , to include additional counts or additional defendants on an indictment* Join , a least upper bound of set orders in lattice theory* Join , a type of binary operator...
is the
operationThe general operation as explained on this page should not be confused with the more specific operators on vector spaces. For a notion in elementary mathematics, see arithmetic operation....
that draws on
informationInformation in its most restricted technical sense is a message or collection of messages that consists of an ordered sequence of symbols, or it is the meaning that can be interpreted from such a message or collection of messages. Information can be recorded or transmitted. It can be recorded as...
from several relations at once. By joining relvars from the example above we could
query the database for all of the Customers, Orders, and Invoices. If we only wanted the tuples for a specific customer, we would specify this using a restriction condition.
If we wanted to retrieve all of the Orders for Customer
1234567890, we could
queryInformation retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...
the database to return every row in the Order table with
Customer ID 1234567890 and join the Order table to the Order Line table based on
Order No.
There is a flaw in our
database designDatabase design is the process of producing a detailed data model of a database. This logical data model contains all the needed logical and physical design choices and physical storage parameters needed to generate a design in a Data Definition Language, which can then be used to create a database...
above. The Invoice relvar contains an Order No attribute. So, each tuple in the Invoice relvar will have one Order No, which implies that there is precisely one Order for each Invoice. But in
realityIn philosophy, reality is the state of things as they actually exist, rather than as they may appear or might be imagined. In a wider definition, reality includes everything that is and has been, whether or not it is observable or comprehensible...
an invoice can be created against many orders, or indeed for no particular order. Additionally the Order relvar contains an Invoice No attribute, implying that each Order has a corresponding Invoice. But again this is not always true in the real world. An order is sometimes paid through several invoices, and sometimes paid without an invoice. In other words there can be many Invoices per Order and many Orders per Invoice. This is a
manytomanyIn systems analysis, a manytomany relationship is a type of cardinality that refers to the relationship between two entities A and B in which A may contain a parent row for which there are many children in B and vice versa. For instance, think of A as Authors, and B as Books...
relationship between Order and Invoice (also called a
nonspecific relationship). To represent this relationship in the database a new relvar should be introduced whose
roleA role or a social role is a set of connected behaviours, rights and obligations as conceptualised by actors in a social situation. It is an expected or free or continuously changing behaviour and may have a given individual social status or social position...
is to specify the correspondence between Orders and Invoices:
OrderInvoice(
Order No,
Invoice No)
Now, the Order relvar has a
onetomanyOnetomany may refer to:* Multivalued function, a onetomany function in mathematics* Fat link, a onetomany link in hypertext* Pointtomultipoint communication, communication which has a onetomany relationSee also:*Onetoone...
relationship to the OrderInvoice table, as does the Invoice relvar. If we want to retrieve every Invoice for a particular Order, we can query for all orders where
Order No in the Order relation equals the
Order No in OrderInvoice, and where
Invoice No in OrderInvoice equals the
Invoice No in Invoice.
Settheoretic formulation
Basic notions in the relational model are
relationIn relational model:A relation value, which is assigned to a certain relation variable, is timevarying. By using a Data Definition Language , it is able to define relation variables.The following is an example of a heading which consists of three attributes....
names and
attribute names. We will represent these as strings such as "Person" and "name" and we will usually use the variables
and
to range over them. Another basic notion is the set of
atomic values that contains values such as numbers and strings.
Our first definition concerns the notion of
tuple, which formalizes the notion of row or record in a table:
TupleIn mathematics and computer science, a tuple is an ordered list of elements. In set theory, an ntuple is a sequence of n elements, where n is a positive integer. There is also one 0tuple, an empty sequence. An ntuple is defined inductively using the construction of an ordered pair...
 A tuple is a partial function
In mathematics, a partial function from X to Y is a function ƒ: X' → Y, where X' is a subset of X. It generalizes the concept of a function by not forcing f to map every element of X to an element of Y . If X' = X, then ƒ is called a total function and is equivalent to a function...
from attribute names to atomic values.
Header
 A header is a finite set of attribute names.
Projection
 The projection of a tuple on a finite set of attributes is .
The next definition defines
relation which formalizes the contents of a table as it is defined in the relational model.
RelationIn relational model:A relation value, which is assigned to a certain relation variable, is timevarying. By using a Data Definition Language , it is able to define relation variables.The following is an example of a heading which consists of three attributes....
 A relation is a tuple with , the header, and , the body, a set of tuples that all have the domain .
Such a relation closely corresponds to what is usually called the extension of a predicate in
firstorder logicFirstorder logic is a formal logical system used in mathematics, philosophy, linguistics, and computer science. It goes by many names, including: firstorder predicate calculus, the lower predicate calculus, quantification theory, and predicate logic...
except that here we identify the places in the predicate with attribute names. Usually in the relational model a
database schemaA Logical Schema is a data model of a specific problem domain expressed in terms of a particular data management technology. Without being specific to a particular database management product, it is in terms of either relational tables and columns, objectoriented classes, or XML tags...
is said to consist of a set of relation names, the headers that are associated with these names and the constraints that should hold for every instance of the database schema.
Relation universe
 A relation universe over a header is a nonempty set of relations with header .
Relation schema
 A relation schema consists of a header and a predicate that is defined for all relations with header . A relation satisfies a relation schema if it has header and satisfies .
Key constraints and functional dependencies
One of the simplest and most important types of relation constraints is the
key constraint. It tells us that in every instance of a certain relational schema the tuples can be identified by their values for certain attributes.
SuperkeyA superkey is defined in the relational model of database organization as a set of attributes of a relation variable for which it holds that in all relations assigned to that variable, there are no two distinct tuples that have the same values for the attributes in this set...
 A superkey is written as a finite set of attribute names.
 A superkey holds in a relation if:
 and
 there exist no two distinct tuples such that .
 A superkey holds in a relation universe if it holds in all relations in .
 Theorem: A superkey holds in a relation universe over if and only if and holds in .
Candidate keyIn the relational model of databases, a candidate key of a relation is a minimal superkey for that relation; that is, a set of attributes such that# the relation does not have two distinct tuples In the relational model of databases, a candidate key of a relation is a minimal superkey for that...
 A superkey holds as a candidate key for a relation universe if it holds as a superkey for and there is no proper subset of that also holds as a superkey for .
Functional dependencyA functional dependency is a constraint between two sets of attributes in a relation from a database.Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y, also in R, if, and only if, each X value is associated with precisely one Y value...
 A functional dependency (FD for short) is written as for finite sets of attribute names.
 A functional dependency holds in a relation if:
 A functional dependency holds in a relation universe if it holds in all relations in .
Trivial functional dependency
 A functional dependency is trivial under a header if it holds in all relation universes over .
 Theorem: An FD is trivial under a header if and only if .
Closure
 Armstrong's axioms
Armstrong's axioms are a set of axioms used to infer all the functional dependencies on a relational database. They were developed by William W. Armstrong on his paper. The axioms are sound in that they generate only functional dependencies in the closure of a set of functional dependencies when...
: The closure of a set of FDs under a header , written as , is the smallest superset of such that:
 (reflexivity)
 (transitivity) and
 (augmentation)
 Theorem: Armstrong's axioms are sound and complete; given a header and a set of FDs that only contain subsets of , if and only if holds in all relation universes over in which all FDs in hold.
Completion
 The completion of a finite set of attributes under a finite set of FDs , written as , is the smallest superset of such that:
 The completion of an attribute set can be used to compute if a certain dependency is in the closure of a set of FDs.
 Theorem: Given a set of FDs, if and only if .
Irreducible cover
 An irreducible cover of a set of FDs is a set of FDs such that:

 there exists no such that
 is a singleton set and
 .
Algorithm to derive candidate keys from functional dependencies
INPUT: a set
S of FDs that contain only subsets of a header
H
OUTPUT: the set
C of superkeys that hold as candidate keys in
all relation universes over
H in which all FDs in
S hold
begin
C := ∅; // found candidate keys
Q := {
H }; // superkeys that contain candidate keys
while Q <> ∅
do
let
K be some element from
Q;
Q :=
Q – {
K };
minimal :=
true;
for each X>Y in S do
K' := (
K –
Y) ∪
X; // derive new superkey
if K' ⊂
K then
minimal :=
false;
Q :=
Q ∪ {
K' };
end if
end for
if minimal and there is not a subset of
K in
C then
remove all supersets of
K from
C;
C :=
C ∪ {
K };
end if
end while
end
See also
 Domain relational calculus
In computer science, domain relational calculus is a calculus that was introduced by Michel Lacroix and Alain Pirotte as a declarative database query language for the relational data model.In DRC, queries have the form:...
 Life cycle of a relational database
The life cycle of a relational database is the cycle of development and changes that a relational database goes through during the course of its life. The cycle typically consists of several stages. There is a possibility that the database designer/developer can go back to any of the previous stages...
 List of relational database management systems
 Query language
Query languages are computer languages used to make queries into databases and information systems.Broadly, query languages can be classified according to whether they are database query languages or information retrieval query languages...
 Database query language
 Information retrieval query language
An information retrieval query language is a query language used to make queries into database, where the semantics of the query are defined not by a precise rendering of a formal syntax, but by an interpretation of the most suitable results of the query....
 Relation
In set theory and logic, a relation is a property that assigns truth values to ktuples of individuals. Typically, the property describes a possible connection between the components of a ktuple...
 Relational database
A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...
 Relational database management system
A relational database management system is a database management system that is based on the relational model as introduced by E. F. Codd. Most popular databases currently in use are based on the relational database model....
 The Third Manifesto
The Third Manifesto is Christopher J. Date's and Hugh Darwen's proposal for future database management systems, a response to two earlier Manifestos with the same purpose. The theme of the manifestos is how to avoid the 'objectrelational impedance mismatch' between objectoriented programming...
(TTMTTM may refer to:* TTM, Tata Motors's stock symbol on the New York Stock Exchange* Time to market, the length of time it takes from a product being conceived to its being available for sale* Transtheoretical model of change, a concept in health psychology...
)
 TransRelational model
 Tupleversioning
Tupleversioning is a mechanism used in a relational database management system to store past states of a relation. Normally, only the current state is captured....
Further reading
 Date, C. J., Darwen, H. (2000). Foundation for Future Database Systems: The Third Manifesto, 2nd edition, AddisonWesley
AddisonWesley was a book publisher in Boston, Massachusetts, best known for its textbooks and computer literature. As well as publishing books, AddisonWesley also distributed its technical titles through the Safari Books Online ereference service...
Professional. ISBN 0201709287.
 Date, C. J. (2003). Introduction to Database Systems. 8th edition, AddisonWesley. ISBN 0321197844.
External links