Third normal form
Encyclopedia
In computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, the third normal form (3NF) is a normal form used in database normalization
Database normalization
In the design of a relational database management system , the process of organizing data to minimize redundancy is called normalization. The goal of database normalization is to decompose relations with anomalies in order to produce smaller, well-structured relations...

. 3NF was originally defined by E.F. Codd in 1971. Codd's definition states that a table is in 3NF if and only if
If and only if
In logic and related fields such as mathematics and philosophy, if and only if is a biconditional logical connective between statements....

 both of the following conditions hold:
  • The relation R (table) is in second normal form
    Second normal form
    Second normal form is a normal form used in database normalization. 2NF was originally defined by E.F. Codd in 1971.A table that is in first normal form must meet additional criteria if it is to qualify for second normal form...

     (2NF)
  • Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent) on every super key of R.


A non-prime attribute of R is an attribute that does not belong to any candidate key of R. A transitive dependency
Transitive dependency
In mathematics, a transitive dependency is a functional dependency which holds by virtue of transitivity. A transitive dependency can occur only in a relation that has three or more attributes. Let A, B, and C designate three distinct attributes in the relation...

 is a functional dependency
Functional dependency
A functional dependency is a constraint between two sets of attributes in a relation from a database.Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y, also in R, if, and only if, each X value is associated with precisely one Y value...

 in which XZ (X determines Z) indirectly, by virtue of XY and YZ (where it is not the case that YX).

A 3NF definition that is equivalent to Codd's, but expressed differently, was given by Carlo Zaniolo in 1982. This definition states that a table is in 3NF if and only if, for each of its functional dependencies XA, at least one of the following conditions holds:
  • X contains A (that is, XA is trivial functional dependency), or
  • X is a superkey
    Superkey
    A superkey is defined in the relational model of database organization as a set of attributes of a relation variable for which it holds that in all relations assigned to that variable, there are no two distinct tuples that have the same values for the attributes in this set...

    , or
  • A-X, the set difference between A and X is a prime attribute (i.e., A-X is contained within a candidate key
    Candidate key
    In the relational model of databases, a candidate key of a relation is a minimal superkey for that relation; that is, a set of attributes such that# the relation does not have two distinct tuples In the relational model of databases, a candidate key of a relation is a minimal superkey for that...

    )


Zaniolo's definition gives a clear sense of the difference between 3NF and the more stringent Boyce–Codd normal form (BCNF). BCNF simply eliminates the third alternative ("A is a prime attribute").

"Nothing but the key"

A memorable statement of Codd's definition of 3NF, paralleling the traditional pledge
Sworn testimony
Sworn testimony is evidence given by a witness who has made a commitment to tell the truth. If the witness is later found to have lied whilst bound by the commitment, they can often be charged with the crime of perjury. The types of commitment can include oaths, affirmations and promises which are...

 to give true evidence in a court of law, was given by Bill Kent: "[Every] non-key [attribute] must provide a fact about the key, the whole key, and nothing but the key." A common variation supplements this definition with the oath: "so help me Codd
Edgar F. Codd
Edgar Frank "Ted" Codd was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases...

".

Requiring existence of "the key" ensures that the table is in 1NF
First normal form
First normal form is a normal form used in database normalization. A relational database table that adheres to 1NF is one that meets a certain minimum set of criteria...

; requiring that non-key attributes be dependent on "the whole key" ensures 2NF
Second normal form
Second normal form is a normal form used in database normalization. 2NF was originally defined by E.F. Codd in 1971.A table that is in first normal form must meet additional criteria if it is to qualify for second normal form...

; further requiring that non-key attributes be dependent on "nothing but the key" ensures 3NF.

Chris Date
Christopher J. Date
Chris Date is an independent author, lecturer, researcher, and consultant, specializing in relational database theory.-Biography:Chris Date attended High Wycombe Royal Grammar School from 1951 to 1958 and received his BA in Mathematics from Cambridge University in 1962. He entered the computer...

 refers to Kent's summary as "an intuitively attractive characterization" of 3NF, and notes that with slight adaptation it may serve as a definition of the slightly-stronger Boyce–Codd normal form: "Each attribute must represent a fact about the key, the whole key, and nothing but the key." The 3NF version of the definition is weaker than Date's BCNF variation, as the former is concerned only with ensuring that non-key attributes are dependent on keys. Prime attributes (which are keys or parts of keys) must not be functionally dependent at all; they each represent a fact about the key in the sense of providing part or all of the key itself. (It should be noted here that this rule applies only to functionally dependent attributes, as applying it to all attributes would implicitly prohibit composite candidate keys, since each part of any such key would violate the "whole key" clause.)

An example of a 2NF table that fails to meet the requirements of 3NF is:
Tournament Winners
Tournament Year Winner Winner Date of Birth
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open 1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977


Because each row in the table needs to tell us who won a particular Tournament in a particular Year, the composite key {Tournament, Year} is a minimal set of attributes guaranteed to uniquely identify a row. That is, {Tournament, Year} is a candidate key for the table.

The breach of 3NF occurs because the non-prime attribute Winner Date of Birth is transitively dependent on the candidate key {Tournament, Year} via the non-prime attribute Winner. The fact that Winner Date of Birth is functionally dependent on Winner makes the table vulnerable to logical inconsistencies, as there is nothing to stop the same person from being shown with different dates of birth on different records.

In order to express the same facts without violating 3NF, it is necessary to split the table into two:
Tournament Winners
Tournament Year Winner
Indiana Invitational 1998 Al Fredrickson
Cleveland Open 1999 Bob Albertson
Des Moines Masters 1999 Al Fredrickson
Indiana Invitational 1999 Chip Masterson
Player Dates of Birth
Player Date of Birth
Chip Masterson 14 March 1977
Al Fredrickson 21 July 1975
Bob Albertson 28 September 1968

Update anomalies cannot occur in these tables, which are both in 3NF.

Derivation of Zaniolo's conditions

The definition of 3NF offered by Carlo Zaniolo in 1982, and given above, is proved in the following way: Let X → A be a nontrivial FD
Functional dependency
A functional dependency is a constraint between two sets of attributes in a relation from a database.Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y, also in R, if, and only if, each X value is associated with precisely one Y value...

 (i.e. one where X does not contain A) and let A be a non-key attribute. Also let Y be a key of R. Then Y → X. Therefore A is not transitively dependent on Y if and only if X → Y, that is, if and only if X is a superkey.

Normalization beyond 3NF

Most 3NF tables are free of update, insertion, and deletion anomalies. Certain types of 3NF tables, rarely met with in practice, are affected by such anomalies; these are tables which either fall short of Boyce–Codd normal form (BCNF) or, if they meet BCNF, fall short of the higher normal forms 4NF
Fourth normal form
Fourth normal form is a normal form used in database normalization. Introduced by Ronald Fagin in 1977, 4NF is the next level of normalization after Boyce–Codd normal form . Whereas the second, third, and Boyce–Codd normal forms are concerned with functional dependencies, 4NF is concerned with a...

 or 5NF
Fifth normal form
Fifth normal form , also known as Project-join normal form is a level of database normalization designed to reduce redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships...

.

See also

  • Attribute-value system
    Attribute-value system
    An attribute-value system is a basic knowledge representation framework comprising a table with columns designating "attributes" and rows designating "objects" An attribute-value system is a basic knowledge representation framework comprising a table with columns designating "attributes" (also...

  • Boyce–Codd normal form
  • First normal form
    First normal form
    First normal form is a normal form used in database normalization. A relational database table that adheres to 1NF is one that meets a certain minimum set of criteria...

  • Second normal form
    Second normal form
    Second normal form is a normal form used in database normalization. 2NF was originally defined by E.F. Codd in 1971.A table that is in first normal form must meet additional criteria if it is to qualify for second normal form...

  • Fourth normal form
    Fourth normal form
    Fourth normal form is a normal form used in database normalization. Introduced by Ronald Fagin in 1977, 4NF is the next level of normalization after Boyce–Codd normal form . Whereas the second, third, and Boyce–Codd normal forms are concerned with functional dependencies, 4NF is concerned with a...

  • Fifth normal form
    Fifth normal form
    Fifth normal form , also known as Project-join normal form is a level of database normalization designed to reduce redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships...


Further reading



External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK