Home      Discussion      Topics      Dictionary      Almanac
Signup       Login
Relational model

Relational model

Discussion
Ask a question about 'Relational model'
Start a new discussion about 'Relational model'
Answer questions from other users
Full Discussion Forum
 
Encyclopedia
The relational model for database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

 management is a database model
Database model
A database model is the theoretical foundation of a database and fundamentally determines in which manner data can be stored, organized, and manipulated in a database system. It thereby defines the infrastructure offered by a particular database system...

 based on first-order predicate logic
First-order logic
First-order logic is a formal logical system used in mathematics, philosophy, linguistics, and computer science. It goes by many names, including: first-order predicate calculus, the lower predicate calculus, quantification theory, and predicate logic...

, first formulated and proposed in 1969 by Edgar F. Codd
Edgar F. Codd
Edgar Frank "Ted" Codd was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases...

.

The purpose of the relational model is to provide a declarative
Declarative programming
In computer science, declarative programming is a programming paradigm that expresses the logic of a computation without describing its control flow. Many languages applying this style attempt to minimize or eliminate side effects by describing what the program should accomplish, rather than...

 method for specifying data and queries: users directly state what information the database contains and what information they want from it, and let the database management system software take care of describing data structures for storing the data and retrieval procedures for answering queries.

IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

's original implementation of Codd's ideas was System R. There have been several commercial and open source products based on Codd's ideas, including IBM's DB2
IBM DB2
The IBM DB2 Enterprise Server Edition is a relational model database server developed by IBM. It primarily runs on Unix , Linux, IBM i , z/OS and Windows servers. DB2 also powers the different IBM InfoSphere Warehouse editions...

, Oracle Database
Oracle Database
The Oracle Database is an object-relational database management system produced and marketed by Oracle Corporation....

, Microsoft SQL Server
Microsoft SQL Server
Microsoft SQL Server is a relational database server, developed by Microsoft: It is a software product whose primary function is to store and retrieve data as requested by other software applications, be it those on the same computer or those running on another computer across a network...

, PostgreSQL
PostgreSQL
PostgreSQL, often simply Postgres, is an object-relational database management system available for many platforms including Linux, FreeBSD, Solaris, MS Windows and Mac OS X. It is released under the PostgreSQL License, which is an MIT-style license, and is thus free and open source software...

, MySQL
MySQL
MySQL officially, but also commonly "My Sequel") is a relational database management system that runs as a server providing multi-user access to a number of databases. It is named after developer Michael Widenius' daughter, My...

, and many others. Most of these use the SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....

 data definition and query language. A table in an SQL database schema corresponds to a predicate variable; the contents of a table to a relation; key constraints, other constraints, and SQL queries correspond to predicates. However, SQL databases, including DB2, deviate from the relational model in many details; Codd fiercely argued against deviations that compromise the original principles.

Overview


The relational model's central idea is to describe a database as a collection of predicates over a finite set of predicate variables, describing constraints on the possible values and combinations of values. The content of the database at any given time is a finite (logical) model of the database, i.e. a set of relation
Relation (database)
In relational model:A relation value, which is assigned to a certain relation variable, is time-varying. By using a Data Definition Language , it is able to define relation variables.The following is an example of a heading which consists of three attributes....

s, one per predicate variable, such that all predicates are satisfied. A request for information from the database (a database query) is also a predicate.

Alternatives to the relational model


Other models are the hierarchical model and network model
Network model
The network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, is not restricted to being a hierarchy or lattice.The...

. Some system
System
System is a set of interacting or interdependent components forming an integrated whole....

s using these older architectures are still in use today in data center
Data center
A data center is a facility used to house computer systems and associated components, such as telecommunications and storage systems...

s with high data volume needs, or where existing systems are so complex and abstract it would be cost prohibitive to migrate to systems employing the relational model; also of note are newer object-oriented databases
Object database
An object database is a database management system in which information is represented in the form of objects as used in object-oriented programming...

.

A recent development is the Object-Relation type-Object model, which is based on the assumption that any fact can be expressed in the form of one or more binary relationships. The model is used in Object Role Modeling
Object role modeling
Object Role Modeling is a method for conceptual modeling, and can be used as a tool for information and rules analysis, ontological analysis, and data modeling in the field of software engineering.- Overview :...

 (ORM), RDF
Resource Description Framework
The Resource Description Framework is a family of World Wide Web Consortium specifications originally designed as a metadata data model...

/Notation 3
Notation 3
Notation3, or N3 as it is more commonly known, is a shorthand non-XML serialization of Resource Description Framework models, designed with human-readability in mind: N3 is much more compact and readable than XML RDF notation...

 (N3) and in Gellish English
Gellish English
Gellish English is a variant of Gellish and is a formal language, which means that it is structured and formalised subset of natural English that is computer interpretable. Its definition includes an English dictionary of concepts that is arranged in a taxonomy and that is extended into an ontology...

.

The relational model was the first database model to be described in formal mathematical terms. Hierarchical and network databases existed before relational databases, but their specifications were relatively informal. After the relational model was defined, there were many attempts to compare and contrast the different models, and this led to the emergence of more rigorous descriptions of the earlier models; though the procedural nature of the data manipulation interfaces for hierarchical and network databases limited the scope for formalization.

Implementation


There have been several attempts to produce a true implementation of the relational database model as originally defined by Codd
Edgar F. Codd
Edgar Frank "Ted" Codd was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases...

 and explained by Date
Christopher J. Date
Chris Date is an independent author, lecturer, researcher, and consultant, specializing in relational database theory.-Biography:Chris Date attended High Wycombe Royal Grammar School from 1951 to 1958 and received his BA in Mathematics from Cambridge University in 1962. He entered the computer...

, Darwen
Hugh Darwen
Hugh Darwen is a computer scientist who was an employee of IBM United Kingdom from 1967 to 2004, and has been involved in the history of the relational model.- Work :...

 and others, but none have been popular successes so far. Rel
Rel (DBMS)
Rel is an open source true relational database management system that implements a significant portion of Chris Date and Hugh Darwen's Tutorial D query language.Primarily intended for teaching purposes, Rel is written in the Java programming language....

 is one of the more recent attempts to do this.

History


The relational model was invented by E.F. (Ted) Codd
Edgar F. Codd
Edgar Frank "Ted" Codd was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases...

 as a general model of data, and subsequently maintained and developed by Chris Date
Christopher J. Date
Chris Date is an independent author, lecturer, researcher, and consultant, specializing in relational database theory.-Biography:Chris Date attended High Wycombe Royal Grammar School from 1951 to 1958 and received his BA in Mathematics from Cambridge University in 1962. He entered the computer...

 and Hugh Darwen
Hugh Darwen
Hugh Darwen is a computer scientist who was an employee of IBM United Kingdom from 1967 to 2004, and has been involved in the history of the relational model.- Work :...

 among others. In The Third Manifesto
The Third Manifesto
The Third Manifesto is Christopher J. Date's and Hugh Darwen's proposal for future database management systems, a response to two earlier Manifestos with the same purpose. The theme of the manifestos is how to avoid the 'object-relational impedance mismatch' between object-oriented programming...

 (first published in 1995) Date and Darwen show how the relational model can accommodate certain desired object-oriented features.

Controversies


Codd himself, some years after publication of his 1970 model, proposed a three-valued logic (True, False, Missing or NULL
Null (SQL)
Null is a special marker used in Structured Query Language to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL Null serves to fulfill the requirement that all true relational database management systems support...

) version of it to deal with missing information, and in his The Relational Model for Database Management Version 2 (1990) he went a step further with a four-valued logic (True, False, Missing but Applicable, Missing but Inapplicable) version. But these have never been implemented, presumably because of attending complexity. SQL's NULL construct was intended to be part of a three-valued logic system, but fell short of that due to logical errors in the standard and in its implementations.

The model


The fundamental assumption of the relational model is that all data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...

 is represented as mathematical n-ary
Arity
In logic, mathematics, and computer science, the arity of a function or operation is the number of arguments or operands that the function takes. The arity of a relation is the dimension of the domain in the corresponding Cartesian product...

 relation
Relation (database)
In relational model:A relation value, which is assigned to a certain relation variable, is time-varying. By using a Data Definition Language , it is able to define relation variables.The following is an example of a heading which consists of three attributes....

s
, an n-ary relation being a subset
Subset
In mathematics, especially in set theory, a set A is a subset of a set B if A is "contained" inside B. A and B may coincide. The relationship of one set being a subset of another is called inclusion or sometimes containment...

 of the Cartesian product
Cartesian product
In mathematics, a Cartesian product is a construction to build a new set out of a number of given sets. Each member of the Cartesian product corresponds to the selection of one element each in every one of those sets...

 of n domains. In the mathematical model, reasoning about such data is done in two-valued predicate logic
Predicate logic
In mathematical logic, predicate logic is the generic term for symbolic formal systems like first-order logic, second-order logic, many-sorted logic or infinitary logic. This formal system is distinguished from other systems in that its formulae contain variables which can be quantified...

, meaning there are two possible evaluation
Evaluation
Evaluation is systematic determination of merit, worth, and significance of something or someone using criteria against a set of standards.Evaluation often is used to characterize and appraise subjects of interest in a wide range of human enterprises, including the arts, criminal justice,...

s for each proposition
Proposition
In logic and philosophy, the term proposition refers to either the "content" or "meaning" of a meaningful declarative sentence or the pattern of symbols, marks, or sounds that make up a meaningful declarative sentence...

: either true or false (and in particular no third value such as unknown, or not applicable, either of which are often associated with the concept of NULL
Null (SQL)
Null is a special marker used in Structured Query Language to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL Null serves to fulfill the requirement that all true relational database management systems support...

). Some think two-valued logic
Logic
In philosophy, Logic is the formal systematic study of the principles of valid inference and correct reasoning. Logic is used in most intellectual activities, but is studied primarily in the disciplines of philosophy, mathematics, semantics, and computer science...

 is an important part of the relational model, while others think a system that uses a form of three-valued logic can still be considered relational.

Data are operated upon by means of a relational calculus
Relational calculus
Relational calculus consists of two calculi, the tuple relational calculus and the domain relational calculus, that are part of the relational model for databases and provide a declarative way to specify database queries...

 or relational algebra
Relational algebra
Relational algebra, an offshoot of first-order logic , deals with a set of finitary relations that is closed under certain operators. These operators operate on one or more relations to yield a relation...

, these being equivalent in expressive power
Expressive power
In computer science, the expressive power of a language describes the ideas expressible in that language.For example, the Web Ontology Language expression language profile lacks ideas which can be expressed in OWL2 RL . OWL2 EL may therefore be said to have less expressive power than OWL2 RL...

.

The relational model of data permits the database designer to create a consistent, logical representation of information
Information
Information in its most restricted technical sense is a message or collection of messages that consists of an ordered sequence of symbols, or it is the meaning that can be interpreted from such a message or collection of messages. Information can be recorded or transmitted. It can be recorded as...

. Consistency is achieved by including declared constraints in the database design, which is usually referred to as the logical schema. The theory includes a process of database normalization
Database normalization
In the design of a relational database management system , the process of organizing data to minimize redundancy is called normalization. The goal of database normalization is to decompose relations with anomalies in order to produce smaller, well-structured relations...

 whereby a design with certain desirable properties can be selected from a set of logically equivalent
Logical equivalence
In logic, statements p and q are logically equivalent if they have the same logical content.Syntactically, p and q are equivalent if each can be proved from the other...

 alternatives. The access plans and other implementation and operation details are handled by the DBMS engine, and are not reflected in the logical model. This contrasts with common practice for SQL DBMSs in which performance tuning
Performance tuning
Performance tuning is the improvement of system performance. This is typically a computer application, but the same methods can be applied to economic markets, bureaucracies or other complex systems. The motivation for such activity is called a performance problem, which can be real or anticipated....

 often requires changes to the logical model.

The basic relational building block is the domain
Data domain
In data management and database analysis, a data domain refers to all the unique values which a data element may contain. The rule for determining the domain boundary may be as simple as a data type with an enumerated list of values....

 or data type
Data type
In computer programming, a data type is a classification identifying one of various types of data, such as floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of...

, usually abbreviated nowadays to type. A tuple
Tuple
In mathematics and computer science, a tuple is an ordered list of elements. In set theory, an n-tuple is a sequence of n elements, where n is a positive integer. There is also one 0-tuple, an empty sequence. An n-tuple is defined inductively using the construction of an ordered pair...

is an ordered set of attribute values. An attribute
Attribute (computing)
In computing, an attribute is a specification that defines a property of an object, element, or file. It may also refer to or set the specific value for a given instance of such....

 is an ordered pair of attribute name and type name. An attribute value is a specific valid value for the type of the attribute. This can be either a scalar value or a more complex type.

A relation consists of a heading and a body. A heading is a set of attributes. A body (of an n-ary relation) is a set of n-tuples. The heading of the relation is also the heading of each of its tuples.

A relation is defined as a set of n-tuples. In both mathematics and the relational database model, a set is an unordered collection of unique, non-duplicated items, although some DBMSs impose an order to their data. In mathematics, a tuple
Tuple
In mathematics and computer science, a tuple is an ordered list of elements. In set theory, an n-tuple is a sequence of n elements, where n is a positive integer. There is also one 0-tuple, an empty sequence. An n-tuple is defined inductively using the construction of an ordered pair...

 has an order, and allows for duplication. E.F. Codd
Edgar F. Codd
Edgar Frank "Ted" Codd was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases...

 originally defined tuples using this mathematical definition. Later, it was one of E.F. Codd
Edgar F. Codd
Edgar Frank "Ted" Codd was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases...

's great insights that using attribute names instead of an ordering would be so much more convenient (in general) in a computer language based on relations . This insight is still being used today. Though the concept has changed, the name "tuple" has not. An immediate and important consequence of this distinguishing feature is that in the relational model the Cartesian product
Cartesian product
In mathematics, a Cartesian product is a construction to build a new set out of a number of given sets. Each member of the Cartesian product corresponds to the selection of one element each in every one of those sets...

 becomes commutative.

A table
Table (database)
In relational databases and flat file databases, a table is a set of data elements that is organized using a model of vertical columns and horizontal rows. A table has a specified number of columns, but can have any number of rows...

 is an accepted visual representation of a relation; a tuple is similar to the concept of row
Row (database)
In the context of a relational database, a row—also called a record or tuple—represents a single, implicitly structured data item in a table. In simple terms, a database table can be thought of as consisting of rows and columns or fields...

, but note that in the database language SQL the columns and the rows of a table are ordered.

A relvar
Relvar
In relational databases, a relvar is a term coined by C. J. Date as an abbreviation for the concept of relation variable, which is the actual term used by the inventor of the relational model, E. F. Codd, regarding the same concept...

is a named variable of some specific relation type, to which at all times some relation of that type is assigned, though the relation may contain zero tuples.

The basic principle of the relational model is the Information Principle: all information
Information
Information in its most restricted technical sense is a message or collection of messages that consists of an ordered sequence of symbols, or it is the meaning that can be interpreted from such a message or collection of messages. Information can be recorded or transmitted. It can be recorded as...

 is represented by data values in relations. In accordance with this Principle, a relational database
Relational database
A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...

 is a set of relvars and the result of every query is presented as a relation.

The consistency of a relational database is enforced, not by rules built into the applications that use it, but rather by constraints, declared as part of the logical schema and enforced by the DBMS for all applications. In general, constraints are expressed using relational comparison operators, of which just one, "is subset of" (⊆), is theoretically sufficient. In practice, several useful shorthands are expected to be available, of which the most important are candidate key
Candidate key
In the relational model of databases, a candidate key of a relation is a minimal superkey for that relation; that is, a set of attributes such that# the relation does not have two distinct tuples In the relational model of databases, a candidate key of a relation is a minimal superkey for that...

 (really, superkey
Superkey
A superkey is defined in the relational model of database organization as a set of attributes of a relation variable for which it holds that in all relations assigned to that variable, there are no two distinct tuples that have the same values for the attributes in this set...

) and foreign key
Foreign key
In the context of relational databases, a foreign key is a referential constraint between two tables.A foreign key is a field in a relational table that matches a candidate key of another table...

 constraints.

Interpretation


To fully appreciate the relational model of data it is essential to understand the intended interpretation of a relation
Relation (database)
In relational model:A relation value, which is assigned to a certain relation variable, is time-varying. By using a Data Definition Language , it is able to define relation variables.The following is an example of a heading which consists of three attributes....

.

The body of a relation is sometimes called its extension. This is because it is to be interpreted as a representation of the extension
Extension (predicate logic)
The extension of a predicatea truth-valued functionis the set of tuples of values that, used as arguments, satisfy the predicate. Such a set of tuples is a relation.For example the statement "d2 is the weekday following d1"...

 of some predicate, this being the set of true proposition
Proposition
In logic and philosophy, the term proposition refers to either the "content" or "meaning" of a meaningful declarative sentence or the pattern of symbols, marks, or sounds that make up a meaningful declarative sentence...

s that can be formed by replacing each free variable in that predicate by a name (a term that designates something).

There is a one-to-one correspondence
Bijection
A bijection is a function giving an exact pairing of the elements of two sets. A bijection from the set X to the set Y has an inverse function from Y to X. If X and Y are finite sets, then the existence of a bijection means they have the same number of elements...

 between the free variables of the predicate and the attribute names of the relation heading. Each tuple of the relation body provides attribute values to instantiate the predicate by substituting each of its free variables. The result is a proposition that is deemed, on account of the appearance of the tuple in the relation body, to be true. Contrariwise, every tuple whose heading conforms to that of the relation but which does not appear in the body is deemed to be false. This assumption is known as the closed world assumption
Closed world assumption
The closed world assumption is the presumption that what is not currently known to be true, is false. The same name also refers to a logical formalization of this assumption by Raymond Reiter. The opposite of the closed world assumption is the open world assumption , stating that lack of knowledge...

: it is often violated in practical databases, where the absence of a tuple might mean that the truth of the corresponding proposition is unknown. For example, the absence of the tuple ('John', 'Spanish') from a table of language skills cannot necessarily be taken as evidence that John does not speak Spanish.

For a formal exposition of these ideas, see the section Set-theoretic Formulation, below.

Application to databases


A data type
Data domain
In data management and database analysis, a data domain refers to all the unique values which a data element may contain. The rule for determining the domain boundary may be as simple as a data type with an enumerated list of values....

as used in a typical relational database might be the set of integers, the set of character strings, the set of dates, or the two boolean values true and false, and so on. The corresponding type names for these types might be the strings "int", "char", "date", "boolean", etc. It is important to understand, though, that relational theory does not dictate what types are to be supported; indeed, nowadays provisions are expected to be available for user-defined types in addition to the built-in ones provided by the system.

Attribute
Column (database)
In the context of a relational database table, a column is a set of data values of a particular simple type, one for each row of the table. The columns provide the structure according to which the rows are composed....

is the term used in the theory for what is commonly referred to as a column. Similarly, table is commonly used in place of the theoretical term relation (though in SQL the term is by no means synonymous with relation). A table data structure is specified as a list of column definitions, each of which specifies a unique column name and the type of the values that are permitted for that column. An attribute value is the entry in a specific column and row, such as "John Doe" or "35".

A tuple
Tuple
In mathematics and computer science, a tuple is an ordered list of elements. In set theory, an n-tuple is a sequence of n elements, where n is a positive integer. There is also one 0-tuple, an empty sequence. An n-tuple is defined inductively using the construction of an ordered pair...

is basically the same thing as a row
Row (database)
In the context of a relational database, a row—also called a record or tuple—represents a single, implicitly structured data item in a table. In simple terms, a database table can be thought of as consisting of rows and columns or fields...

, except in an SQL DBMS, where the column values in a row are ordered. (Tuples are not ordered; instead, each attribute value is identified solely by the attribute name and never by its ordinal position within the tuple.) An attribute name might be "name" or "age".

A relation
Relation (database)
In relational model:A relation value, which is assigned to a certain relation variable, is time-varying. By using a Data Definition Language , it is able to define relation variables.The following is an example of a heading which consists of three attributes....

is a table
Table (database)
In relational databases and flat file databases, a table is a set of data elements that is organized using a model of vertical columns and horizontal rows. A table has a specified number of columns, but can have any number of rows...

structure definition (a set of column definitions) along with the data appearing in that structure. The structure definition is the heading and the data appearing in it is the body, a set of rows. A database relvar
Relvar
In relational databases, a relvar is a term coined by C. J. Date as an abbreviation for the concept of relation variable, which is the actual term used by the inventor of the relational model, E. F. Codd, regarding the same concept...

(relation variable) is commonly known as a base table. The heading of its assigned value at any time is as specified in the table declaration and its body is that most recently assigned to it by invoking some update operator (typically, INSERT, UPDATE, or DELETE). The heading and body of the table resulting from evaluation of some query are determined by the definitions of the operators used in the expression of that query. (Note that in SQL the heading is not always a set of column definitions as described above, because it is possible for a column to have no name and also for two or more columns to have the same name. Also, the body is not always a set of rows because in SQL it is possible for the same row to appear more than once in the same body.)

SQL and the relational model


SQL, initially pushed as the standard
Standardization
Standardization is the process of developing and implementing technical standards.The goals of standardization can be to help with independence of single suppliers , compatibility, interoperability, safety, repeatability, or quality....

 language for relational database
Relational database
A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...

s, deviates from the relational model in several places. The current ISO
International Organization for Standardization
The International Organization for Standardization , widely known as ISO, is an international standard-setting body composed of representatives from various national standards organizations. Founded on February 23, 1947, the organization promulgates worldwide proprietary, industrial and commercial...

 SQL standard doesn't mention the relational model or use relational terms or concepts. However, it is possible to create a database conforming to the relational model using SQL if one does not use certain SQL features.

The following deviations from the relational model have been noted in SQL. Note that few database servers implement the entire SQL standard and in particular do not allow some of these deviations. Whereas NULL is ubiquitous, for example, allowing duplicate column names within a table or anonymous columns is uncommon.

Duplicate rows
The same row can appear more than once in an SQL table. The same tuple cannot appear more than once in a relation
Relation (database)
In relational model:A relation value, which is assigned to a certain relation variable, is time-varying. By using a Data Definition Language , it is able to define relation variables.The following is an example of a heading which consists of three attributes....

.

Anonymous columns
A column in an SQL table can be unnamed and thus unable to be referenced in expressions. The relational model requires every attribute to be named and referenceable.

Duplicate column names
Two or more columns of the same SQL table can have the same name and therefore cannot be referenced, on account of the obvious ambiguity. The relational model requires every attribute to be referenceable.

Column order significance
The order of columns in an SQL table is defined and significant, one consequence being that SQL's implementations of Cartesian product and union are both noncommutative. The relational model requires there to be no significance to any ordering of the attributes of a relation.

Views without CHECK OPTION
Updates to a view defined without CHECK OPTION can be accepted but the resulting update to the database does not necessarily have the expressed effect on its target. For example, an invocation of INSERT can be accepted but the inserted rows might not all appear in the view, or an invocation of UPDATE can result in rows disappearing from the view. The relational model requires updates to a view to have the same effect as if the view were a base relvar.

Columnless tables unrecognized
SQL requires every table to have at least one column, but there are two relations of degree zero (of cardinality
Cardinality (SQL statements)
In SQL , the term cardinality refers to the uniqueness of data values contained in a particular column of a database table. The lower the cardinality, the more duplicated elements in a column. Thus, a column with the lowest possible cardinality would have the same value for every row...

 one and zero) and they are needed to represent extensions of predicates that contain no free variables.

NULL
This special mark can appear instead of a value wherever a value can appear in SQL, in particular in place of a column value in some row. The deviation from the relational model arises from the fact that the implementation of this ad hoc concept in SQL involves the use of three-valued logic
Multi-valued logic
In logic, a many-valued logic is a propositional calculus in which there are more than two truth values. Traditionally, in Aristotle's logical calculus, there were only two possible values for any proposition...

, under which the comparison of NULL with itself does not yield true but instead yields the third truth value, unknown; similarly the comparison NULL with something other than itself does not yield false but instead yields unknown. It is because of this behaviour in comparisons that NULL is described as a mark rather than a value. The relational model depends on the law of excluded middle
Law of excluded middle
In logic, the law of excluded middle is the third of the so-called three classic laws of thought. It states that for any proposition, either that proposition is true, or its negation is....

 under which anything that is not true is false and anything that is not false is true; it also requires every tuple in a relation body to have a value for every attribute of that relation. This particular deviation is disputed by some if only because E.F. Codd himself eventually advocated the use of special marks and a 4-valued logic, but this was based on his observation that there are two distinct reasons why one might want to use a special mark in place of a value, which led opponents of the use of such logics to discover more distinct reasons and at least as many as 19 have been noted, which would require a 21-valued logic. SQL itself uses NULL for several purposes other than to represent "value unknown". For example, the sum of the empty set is NULL, meaning zero, the average of the empty set is NULL, meaning undefined, and NULL appearing in the result of a LEFT JOIN can mean "no value because there is no matching row in the right-hand operand".

Relational operations


Users (or programs) request data from a relational database by sending it a query that is written in a special language, usually a dialect of SQL. Although SQL was originally intended for end-users, it is much more common for SQL queries to be embedded into software that provides an easier user interface. Many web sites, such as Wikipedia, perform SQL queries when generating pages.

In response to a query, the database returns a result set, which is just a list of rows containing the answers. The simplest query is just to return all the rows from a table, but more often, the rows are filtered in some way to return just the answer wanted.

Often, data from multiple tables are combined into one, by doing a join
Join (SQL)
An SQL join clause combines records from two or more tables in a database. It creates a set that can be saved as a table or used as is. A JOIN is a means for combining fields from two tables by using values common to each. ANSI standard SQL specifies four types of JOINs: INNER, OUTER, LEFT, and RIGHT...

. Conceptually, this is done by taking all possible combinations of rows (the Cartesian product
Cartesian product
In mathematics, a Cartesian product is a construction to build a new set out of a number of given sets. Each member of the Cartesian product corresponds to the selection of one element each in every one of those sets...

), and then filtering out everything except the answer. In practice, relational database management systems rewrite ("optimize
Query optimizer
The query optimizer is the component of a database management system that attempts to determine the most efficient way to execute a query. The optimizer considers the possible query plans for a given input query, and attempts to determine which of those plans will be the most efficient...

") queries to perform faster, using a variety of techniques.

There are a number of relational operations in addition to join. These include project (the process of eliminating some of the columns), restrict (the process of eliminating some of the rows), union (a way of combining two tables with similar structures), difference (which lists the rows in one table that are not found in the other), intersect (which lists the rows found in both tables), and product (mentioned above, which combines each row of one table with each row of the other). Depending on which other sources you consult, there are a number of other operators – many of which can be defined in terms of those listed above. These include semi-join, outer operators such as outer join and outer union, and various forms of division. Then there are operators to rename columns, and summarizing or aggregating operators, and if you permit relation
Relation (database)
In relational model:A relation value, which is assigned to a certain relation variable, is time-varying. By using a Data Definition Language , it is able to define relation variables.The following is an example of a heading which consists of three attributes....

 values as attributes (RVA – relation-valued attribute), then operators such as group and ungroup. The SELECT statement in SQL serves to handle all of these except for the group and ungroup operators.

The flexibility of relational databases allows programmers to write queries that were not anticipated by the database designers. As a result, relational databases can be used by multiple applications in ways the original designers did not foresee, which is especially important for databases that might be used for a long time (perhaps several decades). This has made the idea and implementation of relational databases very popular with businesses.

Database normalization



Relation
Relation (database)
In relational model:A relation value, which is assigned to a certain relation variable, is time-varying. By using a Data Definition Language , it is able to define relation variables.The following is an example of a heading which consists of three attributes....

s are classified based upon the types of anomalies to which they're vulnerable. A database that's in the first normal form is vulnerable to all types of anomalies, while a database that's in the domain/key normal form has no modification anomalies. Normal forms are hierarchical in nature. That is, the lowest level is the first normal form, and the database cannot meet the requirements for higher level normal forms without first having met all the requirements of the lesser normal forms.

Database


An idealized, very simple example of a description of some relvar
Relvar
In relational databases, a relvar is a term coined by C. J. Date as an abbreviation for the concept of relation variable, which is the actual term used by the inventor of the relational model, E. F. Codd, regarding the same concept...

s (relation
Relation (database)
In relational model:A relation value, which is assigned to a certain relation variable, is time-varying. By using a Data Definition Language , it is able to define relation variables.The following is an example of a heading which consists of three attributes....

 variables) and their attributes:
  • Customer(Customer ID, Tax ID, Name, Address, City, State, Zip, Phone, Email)
  • Order(Order No, Customer ID, Invoice No, Date Placed, Date Promised, Terms, Status)
  • Order Line(Order No, Order Line No, Product Code, Qty)
  • Invoice(Invoice No, Customer ID, Order No, Date, Status)
  • Invoice Line(Invoice No, Invoice Line No, Product Code, Qty Shipped)
  • Product(Product Code, Product Description)


In this design
Design
Design as a noun informally refers to a plan or convention for the construction of an object or a system while “to design” refers to making this plan...

 we have six relvars: Customer, Order, Order Line, Invoice, Invoice Line and Product. The bold, underlined attributes are candidate key
Candidate key
In the relational model of databases, a candidate key of a relation is a minimal superkey for that relation; that is, a set of attributes such that# the relation does not have two distinct tuples In the relational model of databases, a candidate key of a relation is a minimal superkey for that...

s
. The non-bold, underlined attributes are foreign key
Foreign key
In the context of relational databases, a foreign key is a referential constraint between two tables.A foreign key is a field in a relational table that matches a candidate key of another table...

s
.

Usually one candidate key
Candidate key
In the relational model of databases, a candidate key of a relation is a minimal superkey for that relation; that is, a set of attributes such that# the relation does not have two distinct tuples In the relational model of databases, a candidate key of a relation is a minimal superkey for that...

 is arbitrarily chosen to be called the primary key and used in preference
Preference
-Definitions in different disciplines:The term “preferences” is used in a variety of related, but not identical, ways in the scientific literature. This makes it necessary to make explicit the sense in which the term is used in different social sciences....

 over the other candidate keys, which are then called alternate keys.

A candidate key is a unique identifier
Identifier
An identifier is a name that identifies either a unique object or a unique class of objects, where the "object" or class may be an idea, physical [countable] object , or physical [noncountable] substance...

 enforcing that no tuple
Tuple
In mathematics and computer science, a tuple is an ordered list of elements. In set theory, an n-tuple is a sequence of n elements, where n is a positive integer. There is also one 0-tuple, an empty sequence. An n-tuple is defined inductively using the construction of an ordered pair...

 will be duplicated; this would make the relation
Relation (database)
In relational model:A relation value, which is assigned to a certain relation variable, is time-varying. By using a Data Definition Language , it is able to define relation variables.The following is an example of a heading which consists of three attributes....

 into something else, namely a bag, by violating the basic definition of a set. Both foreign keys and superkeys (which includes candidate keys) can be composite, that is, can be composed of several attributes. Below is a tabular depiction of a relation of our example Customer relvar; a relation can be thought of as a value that can be attributed to a relvar.

Customer relation



Customer ID Tax ID Name Address [More fields....]

1234567890 555-5512222 Munmun 323 Broadway ...
2223344556 555-5523232 Wile E. 1200 Main Street ...
3334445563 555-5533323 Ekta 871 1st Street ...
4232342432 555-5325523 E. F. Codd 123 It Way ...


If we attempted to insert a new customer with the ID 1234567890, this would violate the design of the relvar since Customer ID is a primary key and we already have a customer 1234567890. The DBMS must reject a transaction
Database transaction
A transaction comprises a unit of work performed within a database management system against a database, and treated in a coherent and reliable way independent of other transactions...

 such as this that would render the database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

 inconsistent by a violation of an integrity constraint
Database integrity
Database integrity ensures that data entered into the database is accurate, valid, and consistent. Any applicable integrity constraints and data validation rules must be satisfied before permitting a change to the database....

.

Foreign key
Foreign key
In the context of relational databases, a foreign key is a referential constraint between two tables.A foreign key is a field in a relational table that matches a candidate key of another table...

s
are integrity constraints enforcing that the value
Value (computer science)
In computer science, a value is an expression which cannot be evaluated any further . The members of a type are the values of that type. For example, the expression "1 + 2" is not a value as it can be reduced to the expression "3"...

 of the attribute set is drawn from a candidate key
Candidate key
In the relational model of databases, a candidate key of a relation is a minimal superkey for that relation; that is, a set of attributes such that# the relation does not have two distinct tuples In the relational model of databases, a candidate key of a relation is a minimal superkey for that...

in another relation
Relation (database)
In relational model:A relation value, which is assigned to a certain relation variable, is time-varying. By using a Data Definition Language , it is able to define relation variables.The following is an example of a heading which consists of three attributes....

. For example in the Order relation the attribute Customer ID is a foreign key. A join
Join
Join may refer to:* Join , to include additional counts or additional defendants on an indictment* Join , a least upper bound of set orders in lattice theory* Join , a type of binary operator...

is the operation
Operation (mathematics)
The general operation as explained on this page should not be confused with the more specific operators on vector spaces. For a notion in elementary mathematics, see arithmetic operation....

 that draws on information
Information
Information in its most restricted technical sense is a message or collection of messages that consists of an ordered sequence of symbols, or it is the meaning that can be interpreted from such a message or collection of messages. Information can be recorded or transmitted. It can be recorded as...

 from several relations at once. By joining relvars from the example above we could query the database for all of the Customers, Orders, and Invoices. If we only wanted the tuples for a specific customer, we would specify this using a restriction condition.

If we wanted to retrieve all of the Orders for Customer 1234567890, we could query
Information retrieval
Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...

 the database to return every row in the Order table with Customer ID 1234567890 and join the Order table to the Order Line table based on Order No.

There is a flaw in our database design
Database design
Database design is the process of producing a detailed data model of a database. This logical data model contains all the needed logical and physical design choices and physical storage parameters needed to generate a design in a Data Definition Language, which can then be used to create a database...

 above. The Invoice relvar contains an Order No attribute. So, each tuple in the Invoice relvar will have one Order No, which implies that there is precisely one Order for each Invoice. But in reality
Reality
In philosophy, reality is the state of things as they actually exist, rather than as they may appear or might be imagined. In a wider definition, reality includes everything that is and has been, whether or not it is observable or comprehensible...

 an invoice can be created against many orders, or indeed for no particular order. Additionally the Order relvar contains an Invoice No attribute, implying that each Order has a corresponding Invoice. But again this is not always true in the real world. An order is sometimes paid through several invoices, and sometimes paid without an invoice. In other words there can be many Invoices per Order and many Orders per Invoice. This is a many-to-many
Many-to-many (data model)
In systems analysis, a many-to-many relationship is a type of cardinality that refers to the relationship between two entities A and B in which A may contain a parent row for which there are many children in B and vice versa. For instance, think of A as Authors, and B as Books...

relationship between Order and Invoice (also called a non-specific relationship). To represent this relationship in the database a new relvar should be introduced whose role
Role
A role or a social role is a set of connected behaviours, rights and obligations as conceptualised by actors in a social situation. It is an expected or free or continuously changing behaviour and may have a given individual social status or social position...

 is to specify the correspondence between Orders and Invoices:

OrderInvoice(Order No,Invoice No)

Now, the Order relvar has a one-to-many
One-to-many
One-to-many may refer to:* Multivalued function, a one-to-many function in mathematics* Fat link, a one-to-many link in hypertext* Point-to-multipoint communication, communication which has a one-to-many relation-See also:*One-to-one...

relationship to the OrderInvoice table, as does the Invoice relvar. If we want to retrieve every Invoice for a particular Order, we can query for all orders where Order No in the Order relation equals the Order No in OrderInvoice, and where Invoice No in OrderInvoice equals the Invoice No in Invoice.
Set-theoretic formulation
Basic notions in the relational model are relation
Relation (database)
In relational model:A relation value, which is assigned to a certain relation variable, is time-varying. By using a Data Definition Language , it is able to define relation variables.The following is an example of a heading which consists of three attributes....

 names
and attribute names. We will represent these as strings such as "Person" and "name" and we will usually use the variables and to range over them. Another basic notion is the set of atomic values that contains values such as numbers and strings.

Our first definition concerns the notion of tuple, which formalizes the notion of row or record in a table:

Tuple
Tuple
In mathematics and computer science, a tuple is an ordered list of elements. In set theory, an n-tuple is a sequence of n elements, where n is a positive integer. There is also one 0-tuple, an empty sequence. An n-tuple is defined inductively using the construction of an ordered pair...

A tuple is a partial function
Partial function
In mathematics, a partial function from X to Y is a function ƒ: X' → Y, where X' is a subset of X. It generalizes the concept of a function by not forcing f to map every element of X to an element of Y . If X' = X, then ƒ is called a total function and is equivalent to a function...

 from attribute names to atomic values.

Header
A header is a finite set of attribute names.

Projection
The projection of a tuple on a finite set of attributes is .


The next definition defines relation which formalizes the contents of a table as it is defined in the relational model.

Relation
Relation (database)
In relational model:A relation value, which is assigned to a certain relation variable, is time-varying. By using a Data Definition Language , it is able to define relation variables.The following is an example of a heading which consists of three attributes....

A relation is a tuple with , the header, and , the body, a set of tuples that all have the domain .


Such a relation closely corresponds to what is usually called the extension of a predicate in first-order logic
First-order logic
First-order logic is a formal logical system used in mathematics, philosophy, linguistics, and computer science. It goes by many names, including: first-order predicate calculus, the lower predicate calculus, quantification theory, and predicate logic...

 except that here we identify the places in the predicate with attribute names. Usually in the relational model a database schema
Logical schema
A Logical Schema is a data model of a specific problem domain expressed in terms of a particular data management technology. Without being specific to a particular database management product, it is in terms of either relational tables and columns, object-oriented classes, or XML tags...

 is said to consist of a set of relation names, the headers that are associated with these names and the constraints that should hold for every instance of the database schema.

Relation universe
A relation universe over a header is a non-empty set of relations with header .

Relation schema
A relation schema consists of a header and a predicate that is defined for all relations with header . A relation satisfies a relation schema if it has header and satisfies .

Key constraints and functional dependencies


One of the simplest and most important types of relation constraints is the key constraint. It tells us that in every instance of a certain relational schema the tuples can be identified by their values for certain attributes.

Superkey
Superkey
A superkey is defined in the relational model of database organization as a set of attributes of a relation variable for which it holds that in all relations assigned to that variable, there are no two distinct tuples that have the same values for the attributes in this set...

A superkey is written as a finite set of attribute names.
A superkey holds in a relation if:
  • and
  • there exist no two distinct tuples such that .
A superkey holds in a relation universe if it holds in all relations in .
Theorem: A superkey holds in a relation universe over if and only if and holds in .

Candidate key
Candidate key
In the relational model of databases, a candidate key of a relation is a minimal superkey for that relation; that is, a set of attributes such that# the relation does not have two distinct tuples In the relational model of databases, a candidate key of a relation is a minimal superkey for that...

A superkey holds as a candidate key for a relation universe if it holds as a superkey for and there is no proper subset of that also holds as a superkey for .

Functional dependency
Functional dependency
A functional dependency is a constraint between two sets of attributes in a relation from a database.Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y, also in R, if, and only if, each X value is associated with precisely one Y value...

A functional dependency (FD for short) is written as for finite sets of attribute names.
A functional dependency holds in a relation if:
  • and
  • tuples ,
A functional dependency holds in a relation universe if it holds in all relations in .

Trivial functional dependency
A functional dependency is trivial under a header if it holds in all relation universes over .
Theorem: An FD is trivial under a header if and only if .

Closure
Armstrong's axioms
Armstrong's axioms
Armstrong's axioms are a set of axioms used to infer all the functional dependencies on a relational database. They were developed by William W. Armstrong on his paper. The axioms are sound in that they generate only functional dependencies in the closure of a set of functional dependencies when...

: The closure of a set of FDs under a header , written as , is the smallest superset of such that:
  • (reflexivity)
  • (transitivity) and
  • (augmentation)
Theorem: Armstrong's axioms are sound and complete; given a header and a set of FDs that only contain subsets of , if and only if holds in all relation universes over in which all FDs in hold.

Completion
The completion of a finite set of attributes under a finite set of FDs , written as , is the smallest superset of such that:
The completion of an attribute set can be used to compute if a certain dependency is in the closure of a set of FDs.
Theorem: Given a set of FDs, if and only if .

Irreducible cover
An irreducible cover of a set of FDs is a set of FDs such that:
  • there exists no such that
  • is a singleton set and
  • .

Algorithm to derive candidate keys from functional dependencies


INPUT: a set S of FDs that contain only subsets of a header H
OUTPUT: the set C of superkeys that hold as candidate keys in
all relation universes over H in which all FDs in S hold
begin
C := ∅; // found candidate keys
Q := { H }; // superkeys that contain candidate keys
while Q <> ∅ do
let K be some element from Q;
Q := Q – { K };
minimal := true;
for each X->Y in S do
K' := (K – Y) ∪ X; // derive new superkey
if K' K then
minimal := false;
Q := Q ∪ { K' };
end if
end for
if minimal and there is not a subset of K in C then
remove all supersets of K from C;
C := C ∪ { K };
end if
end while
end
See also

  • Domain relational calculus
    Domain relational calculus
    In computer science, domain relational calculus is a calculus that was introduced by Michel Lacroix and Alain Pirotte as a declarative database query language for the relational data model.In DRC, queries have the form:...

  • Life cycle of a relational database
    Life cycle of a relational database
    The life cycle of a relational database is the cycle of development and changes that a relational database goes through during the course of its life. The cycle typically consists of several stages. There is a possibility that the database designer/developer can go back to any of the previous stages...

  • List of relational database management systems
  • Query language
    Query language
    Query languages are computer languages used to make queries into databases and information systems.Broadly, query languages can be classified according to whether they are database query languages or information retrieval query languages...

    • Database query language
    • Information retrieval query language
      Information retrieval query language
      An information retrieval query language is a query language used to make queries into database, where the semantics of the query are defined not by a precise rendering of a formal syntax, but by an interpretation of the most suitable results of the query....


  • Relation
    Relation (mathematics)
    In set theory and logic, a relation is a property that assigns truth values to k-tuples of individuals. Typically, the property describes a possible connection between the components of a k-tuple...

  • Relational database
    Relational database
    A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...

  • Relational database management system
    Relational database management system
    A relational database management system is a database management system that is based on the relational model as introduced by E. F. Codd. Most popular databases currently in use are based on the relational database model....

  • The Third Manifesto
    The Third Manifesto
    The Third Manifesto is Christopher J. Date's and Hugh Darwen's proposal for future database management systems, a response to two earlier Manifestos with the same purpose. The theme of the manifestos is how to avoid the 'object-relational impedance mismatch' between object-oriented programming...

     (TTM
    TTM
    TTM may refer to:* TTM, Tata Motors's stock symbol on the New York Stock Exchange* Time to market, the length of time it takes from a product being conceived to its being available for sale* Transtheoretical model of change, a concept in health psychology...

    )
  • TransRelational model
  • Tuple-versioning
    Tuple-versioning
    Tuple-versioning is a mechanism used in a relational database management system to store past states of a relation. Normally, only the current state is captured....



Further reading

  • Date, C. J., Darwen, H. (2000). Foundation for Future Database Systems: The Third Manifesto, 2nd edition, Addison-Wesley
    Addison-Wesley
    Addison-Wesley was a book publisher in Boston, Massachusetts, best known for its textbooks and computer literature. As well as publishing books, Addison-Wesley also distributed its technical titles through the Safari Books Online e-reference service...

    Professional. ISBN 0-201-70928-7.
  • Date, C. J. (2003). Introduction to Database Systems. 8th edition, Addison-Wesley. ISBN 0-321-19784-4.

External links