Tuple relational calculus
Encyclopedia
Tuple calculus is a calculus
Calculus (disambiguation)
Calculus in its most general sense is any method or system of calculation.Calculus may refer to:- In mathematics and computer science :...

 that was introduced by Edgar F. Codd
Edgar F. Codd
Edgar Frank "Ted" Codd was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases...

 as part of the relational model
Relational model
The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...

, in order to provide a declarative database-query language for this data model
Data model
A data model in software engineering is an abstract model, that documents and organizes the business data for communication between team members and is used as a plan for developing applications, specifically how data is stored and accessed....

. It formed the inspiration for the database-query languages QUEL
QUEL query languages
QUEL is a relational database access language, similar in most ways to SQL. It was created as a part of the Ingres effort at University of California, Berkeley, based on Codd's earlier suggested but not implemented Data Sub-Language ALPHA. QUEL was used for a short time in most products based on...

 and SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....

, of which the latter, although far less faithful to the original relational model and calculus, is now the de-facto-standard database-query language; viz., a dialect of SQL is used by nearly every relational-database-management system
Relational database management system
A relational database management system is a database management system that is based on the relational model as introduced by E. F. Codd. Most popular databases currently in use are based on the relational database model....

. Lacroix and Pirotte proposed domain calculus, which is closer to first-order logic and which showed that both of these calculi (as well as relational algebra
Relational algebra
Relational algebra, an offshoot of first-order logic , deals with a set of finitary relations that is closed under certain operators. These operators operate on one or more relations to yield a relation...

) are equivalent in expressive power. Subsequently, query languages for the relational model were called relationally complete if they could express at least all of these queries.

Relational database

Since the calculus is a query language for relational database
Relational database
A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...

s we first have to define a relational database. The basic relational building block is the domain
Data domain
In data management and database analysis, a data domain refers to all the unique values which a data element may contain. The rule for determining the domain boundary may be as simple as a data type with an enumerated list of values....

, or data type
Data type
In computer programming, a data type is a classification identifying one of various types of data, such as floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of...

. A tuple
Tuple
In mathematics and computer science, a tuple is an ordered list of elements. In set theory, an n-tuple is a sequence of n elements, where n is a positive integer. There is also one 0-tuple, an empty sequence. An n-tuple is defined inductively using the construction of an ordered pair...

 is an ordered multiset
Multiset
In mathematics, the notion of multiset is a generalization of the notion of set in which members are allowed to appear more than once...

 of attribute
Attribute (computing)
In computing, an attribute is a specification that defines a property of an object, element, or file. It may also refer to or set the specific value for a given instance of such....

s, which are ordered pairs of domain and value; or just a row. A relvar (relation variable) is a set of ordered pair
Ordered pair
In mathematics, an ordered pair is a pair of mathematical objects. In the ordered pair , the object a is called the first entry, and the object b the second entry of the pair...

s of domain and name, which serves as the header
Header (information technology)
In information technology, header refers to supplemental data placed at the beginning of a block of data being stored or transmitted. In data transmission, the data following the header are sometimes called the payload or body....

 for a relation. A relation
Relational model
The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...

 is a set of tuples. Although these relational concepts are mathematically defined, those definitions map loosely to traditional database concepts. A table
Table (database)
In relational databases and flat file databases, a table is a set of data elements that is organized using a model of vertical columns and horizontal rows. A table has a specified number of columns, but can have any number of rows...

 is an accepted visual representation of a relation; a tuple is similar to the concept of row
Row (database)
In the context of a relational database, a row—also called a record or tuple—represents a single, implicitly structured data item in a table. In simple terms, a database table can be thought of as consisting of rows and columns or fields...

.

We first assume the existence of a set C of column names, examples of which are "name", "author", "address" et cetera. We define headers as finite subsets of C. A relational database schema is defined as a tuple
Tuple
In mathematics and computer science, a tuple is an ordered list of elements. In set theory, an n-tuple is a sequence of n elements, where n is a positive integer. There is also one 0-tuple, an empty sequence. An n-tuple is defined inductively using the construction of an ordered pair...

 S = (D, R, h) where D is the domain of atomic values (see relational model
Relational model
The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...

 for more on the notions of domain and atomic value), R is a finite set of relation names, and
h : R → 2C


a function
Function (mathematics)
In mathematics, a function associates one quantity, the argument of the function, also known as the input, with another quantity, the value of the function, also known as the output. A function assigns exactly one output to each input. The argument and the value may be real numbers, but they can...

 that associates a header with each relation name in R. (Note that this is a simplification from the full relational model where there is more than one domain and a header is not just a set of column names but also maps these column names to a domain.) Given a domain D we define a tuple over D as a partial function
Partial function
In mathematics, a partial function from X to Y is a function ƒ: X' → Y, where X' is a subset of X. It generalizes the concept of a function by not forcing f to map every element of X to an element of Y . If X' = X, then ƒ is called a total function and is equivalent to a function...


t : CD


that maps some column names to an atomic value in D. An example would be (name : "Harry", age : 25).

The set of all tuples over D is denoted as TD. The subset of C for which a tuple t is defined is called the domain of t (not to be confused with the domain in the schema) and denoted as dom(t).

Finally we define a relational database given a schema S = (D, R, h) as a function
db : R → 2TD


that maps the relation names in R to finite subsets of TD, such that for every relation name r in R and tuple t in db(r) it holds that
dom(t) = h(r).


The latter requirement simply says that all the tuples in a relation should contain the same column names, namely those defined for it in the schema.

Atoms

For the construction of the formulae we will assume an infinite set V of tuple variables. The formulas are defined given a database schema S = (D, R, h) and a partial function type : V -> 2C that defines a type assignment that assigns headers to some tuple variables. We then define the set of atomic formula
Atomic formula
In mathematical logic, an atomic formula is a formula with no deeper propositional structure, that is, a formula that contains no logical connectives or equivalently a formula that has no strict subformulas. Atoms are thus the simplest well-formed formulas of the logic...

s
A[S,type] with the following rules:
  1. if v and w in V, a in type(v) and b in type(w) then the formula " v.a = w.b " is in A[S,type],
  2. if v in V, a in type(v) and k denotes a value in D then the formula " v.a = k " is in A[S,type], and
  3. if v in V, r in R and type(v) = h(r) then the formula " r(v) " is in A[S,type].


Examples of atoms are:
  • (t.age = s.age) — t has an age attribute and s has an age attribute with the same value
  • (t.name = "Codd") — tuple t has a name attribute and its value is "Codd"
  • Book(t) — tuple t is present in relation Book.


The formal semantics of such atoms is defined given a database db over S and a tuple variable binding val : V -> TD that maps tuple variables to tuples over the domain in S:
  1. " v.a = w.b " is true if and only if val(v)(a) = val(w)(b)
  2. " v.a = k " is true if and only if val(v)(a) = k
  3. " r(v) " is true if and only if val(v) is in db(r)

Formulas

The atoms can be combined into formulas, as is usual in first-order logic, with the logical operators ∧ (and), ∨ (or) and ¬ (not), and we can use the existential quantifier (∃) and the universal quantifier (∀) to bind the variables. We define the set of formulas F[S,type] inductively with the following rules:
  1. every atom in A[S,type] is also in F[S,type]
  2. if f1 and f2 are in F[S,type] then the formula " f1f2 " is also in F[S,type]
  3. if f1 and f2 are in F[S,type] then the formula " f1f2 " is also in F[S,type]
  4. if f is in F[S,type] then the formula " ¬ f " is also in F[S,type]
  5. if v in V, H a header and f a formula in F[S,type[v->H]] then the formula " ∃ v : H ( f ) " is also in F[S,type], where type[v->H] denotes the function that is equal to type except that it maps v to H,
  6. if v in V, H a header and f a formula in F[S,type[v->H]] then the formula " ∀ v : H ( f ) " is also in F[S,type]


Examples of formulas:
  • t.name = "C. J. Date" ∨ t.name = "H. Darwen"
  • Book(t) ∨ Magazine(t)
  • t : {author, title, subject} ( ¬ ( Book(t) ∧ t.author = "C. J. Date" ∧ ¬ ( t.subject = "relational model")))

Note that the last formula states that all books that are written by C. J. Date have as their subject the relational model. As usual we omit brackets if this causes no ambiguity about the semantics of the formula.

We will assume that the quantifiers quantify over the universe of all tuples over the domain in the schema. This leads to the following formal semantics for formulas given a database db over S and a tuple variable binding val : V -> TD:
  1. " f1f2 " is true if and only if " f1 " is true and " f2 " is true,
  2. " f1f2 " is true if and only if " f1 " is true or " f2 " is true or both are true,
  3. " ¬ f " is true if and only if " f " is not true,
  4. " ∃ v : H ( f ) " is true if and only if there is a tuple t over D such that dom(t) = H and the formula " f " is true for val[v->t], and
  5. " ∀ v : H ( f ) " is true if and only if for all tuples t over D such that dom(t) = H the formula " f " is true for val[v->t].

Queries

Finally we define what a query expression looks like given a schema S = (D, R, h):
{ v : H | f(v) }


where v is a tuple variable, H a header and f(v) a formula in F[S,type] where type = { (v, H) } and with v as its only free variable. The result of such a query for a given database db over S is the set of all tuples t over D with dom(t) = H such that f is true for db and val = { (v, t) }.

Examples of query expressions are:
  • { t : {name} | ∃ s : {name, wage} ( Employee(s) ∧ s.wage = 50.000 ∧ t.name = s.name ) }
  • { t : {supplier, article} | ∃ s : {s#, sname} ( Supplier(s) ∧ s.sname = t.supplier ∧ ∃ p : {p#, pname} ( Product(p) ∧ p.pname = t.article ∧ ∃ a : {s#, p#} ( Supplies(a) ∧ s.s# = a.s# ∧ a.p# = p.p# ) }

Domain-independent queries

Because the semantics of the quantifiers is such that they quantify over all the tuples over the domain in the schema it can be that a query may return a different result for a certain database if another schema is presumed. For example, consider the two schemas S1 = ( D1, R, h ) and S2 = ( D2, R, h ) with domains D1 = { 1 }, D2 = { 1, 2 }, relation names R = { "r1" } and headers h = { ("r1", {"a"}) }. Both schemas have a common instance:
db = { ( "r1", { ("a", 1) } ) }

If we consider the following query expression
{ t : {a} | t.a = t.a }

then its result on db is either { (a : 1) } under S1 or { (a : 1), (a : 2) } under S2. It will also be clear that if we take the domain to be an infinite set, then the result of the query will also be infinite. To solve these problems we will restrict our attention to those queries that are domain independent, i.e., the queries that return the same result for a database under all of its schemas.

An interesting property of these queries is that if we assume that the tuple variables range over tuples over the so-called active domain of the database, which is the subset of the domain that occurs in at least one tuple in the database or in the query expression, then the semantics of the query expressions does not change. In fact, in many definitions of the tuple calculus this is how the semantics of the quantifiers is defined, which makes all queries by definition domain independent.

Safe queries

In order to limit the query expressions such that they express only domain-independent queries a syntactical notion of safe query is usually introduced. To determine whether a query expression is safe we will derive two types of information from a query. The first is whether a variable-column pair t.a is bound to the column of a relation or a constant, and the second is whether two variable-column pairs are directly or indirectly equated (denoted t.v s.w).

For deriving boundedness we introduce the following reasoning rules:
  1. in " v.a = w.b " no variable-column pair is bound,
  2. in " v.a = k " the variable-column pair v.a is bound,
  3. in " r(v) " all pairs v.a are bound for a in type(v),
  4. in " f1f2 " all pairs are bound that are bound either in f1 or in f2,
  5. in " f1f2 " all pairs are bound that are bound both in f1 and in f2,
  6. in " ¬ f " no pairs are bound,
  7. in " ∃ v : H ( f ) " a pair w.a is bound if it is bound in f and w <> v, and
  8. in " ∀ v : H ( f ) " a pair w.a is bound if it is bound in f and w <> v.


For deriving equatedness we introduce the following reasoning rules (next to the usual reasoning rules for equivalence relations: reflexivity, symmetry and transitivity):
  1. in " v.a = w.b " it holds that v.a w.b,
  2. in " v.a = k " no pairs are equated,
  3. in " r(v) " no pairs are equated,
  4. in " f1f2 " it holds that v.a

    w.b if it holds either in f1 or in f2,

  5. in " f1f2 " it holds that v.a w.b if it holds both in f1 and in f2,
  6. in " ¬ f " no pairs are equated,
  7. in " ∃ v : H ( f ) " it holds that w.a

    x.b if it holds in f and w<>v and x<>v, and

  8. in " ∀ v : H ( f ) " it holds that w.a x.b if it holds in f and w<>v and x<>v.


We then say the a query expression { v : H | f(v) } is safe if
  • for every column name a in H we can derive that v.a is equated with a bound pair in f,
  • for every subexpression of f of the form " ∀ w : G ( g ) " we can derive that for every column name a in G we can derive that w.a is equated with a bound pair in g, and
  • for every subexpression of f of the form " ∃ w : G ( g ) " we can derive that for every column name a in G we can derive that w.a is equated with a bound pair in g.


The restriction to safe query expressions does not limit the expressiveness since all domain-independent queries that could be expressed can also be expressed by a safe query expression. This can be proven by showing that for a schema S = (D, R, h), a given set K of constants in the query expression, a tuple variable v and a header H we can construct a safe formula for every pair v.a with a in H that states that its value is in the active domain. For example, assume that K={1,2}, R={"r"} and h = { ("r", {"a, "b"}) } then the corresponding safe formula for v.b is:
v.b = 1 ∨ v.b = 2 ∨ ∃ w ( r(w) ∧ ( v.b = w.a ∨ v.b = w.b ) )

This formula, then, can be used to rewrite any unsafe query expression to an equivalent safe query expression by adding such a formula for every variable v and column name a in its type where it is used in the expression. Effectively this means that we let all variables range over the active domain, which, as was already explained, does not change the semantics if the expressed query is domain independent.

See also

  • Relational algebra
    Relational algebra
    Relational algebra, an offshoot of first-order logic , deals with a set of finitary relations that is closed under certain operators. These operators operate on one or more relations to yield a relation...

  • Relational calculus
    Relational calculus
    Relational calculus consists of two calculi, the tuple relational calculus and the domain relational calculus, that are part of the relational model for databases and provide a declarative way to specify database queries...


    • Domain relational calculus
      Domain relational calculus
      In computer science, domain relational calculus is a calculus that was introduced by Michel Lacroix and Alain Pirotte as a declarative database query language for the relational data model.In DRC, queries have the form:...

      (DRC)
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK