Boyce-Codd normal form
Encyclopedia
Boyce–Codd normal form is a normal form used in database normalization
Database normalization
In the design of a relational database management system , the process of organizing data to minimize redundancy is called normalization. The goal of database normalization is to decompose relations with anomalies in order to produce smaller, well-structured relations...

. It is a slightly stronger version of the third normal form
Third normal form
In computer science, the third normal form is a normal form used in database normalization. 3NF was originally defined by E.F. Codd in 1971. Codd's definition states that a table is in 3NF if and only if both of the following conditions hold:...

 (3NF). A table is in Boyce–Codd normal form if and only if
If and only if
In logic and related fields such as mathematics and philosophy, if and only if is a biconditional logical connective between statements....

 for every one of its nontrivial
Nontrivial
Nontrivial is the opposite of trivial. In contexts where trivial has a formal meaning, nontrivial is its antonym.It is a term common among communities of engineers and mathematicians, to indicate a statement or theorem that is not obvious or easy to prove.-Examples:*In mathematics, it is often...

 dependencies
Functional dependency
A functional dependency is a constraint between two sets of attributes in a relation from a database.Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y, also in R, if, and only if, each X value is associated with precisely one Y value...

 X → Y, X is a superkey
Superkey
A superkey is defined in the relational model of database organization as a set of attributes of a relation variable for which it holds that in all relations assigned to that variable, there are no two distinct tuples that have the same values for the attributes in this set...

—that is, X is either a candidate key
Candidate key
In the relational model of databases, a candidate key of a relation is a minimal superkey for that relation; that is, a set of attributes such that# the relation does not have two distinct tuples In the relational model of databases, a candidate key of a relation is a minimal superkey for that...

 or a superset thereof.

BCNF was developed in 1974 by Raymond F. Boyce
Raymond F. Boyce
Raymond 'Ray' Boyce was an American computer scientist who was known for his research in relational databases.Boyce grew up in New York, and went to college in Providence, Rhode Island. He earned his PhD in computer science at Purdue in 1971 . After leaving Purdue he worked on database projects...

 and Edgar F. Codd
Edgar F. Codd
Edgar Frank "Ted" Codd was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases...

 to address certain types of anomaly not dealt with by 3NF as originally defined.

Chris Date has pointed out that a definition of what we now know as BCNF appeared in a paper by Ian Heath in 1971. Date writes:
"Since that definition predated Boyce and Codd's own definition by some three years, it seems to me that BCNF ought by rights to be called Heath normal form. But it isn't."

3NF tables not meeting BCNF

Only in rare cases does a 3NF table not meet the requirements of BCNF. A 3NF table which does not have multiple overlapping candidate keys is guaranteed to be in BCNF. Depending on what its functional dependencies are, a 3NF table with two or more overlapping candidate keys may or may not be in BCNF

An example of a 3NF table that does not meet BCNF is:
|+ Today's Court Bookings
|-
!Court!!Start Time!!End Time!!Rate Type
|-
|1>
09:30 10:30 >-
|1
11:00 12:00 >-
|1
14:00 15:30 >-
|2
10:00 11:30 >-
|2
11:30 13:30 >-
|2
15:00 16:30

  • Each row in the table represents a court booking at a tennis club that has one hard court (Court 1) and one grass court (Court 2)
  • A booking is defined by its Court and the period for which the Court is reserved
  • Additionally, each booking has a Rate Type associated with it. There are four distinct rate types:
  • SAVER, for Court 1 bookings made by members
  • STANDARD, for Court 1 bookings made by non-members
  • PREMIUM-A, for Court 2 bookings made by members
  • PREMIUM-B, for Court 2 bookings made by non-members


The table's superkeys are:
  • S1 = {Court, Start Time}
  • S2 = {Court, End Time}
  • S3 = {Rate Type, Start Time}
  • S4 = {Rate Type, End Time}
  • S5 = {Court, Start Time, End Time}
  • S6 = {Rate Type, Start Time, End Time}
  • S7 = {Court, Rate Type, Start Time}
  • S8 = {Court, Rate Type, End Time}
  • ST = {Court, Rate Type, Start Time, End Time}, the trivial superkey


Note that even though in the above table Start Time and End Time attributes have no duplicate values for each of them, we still have to admit that in some other days two different bookings on court 1 and court 2 could start at the same time or end at the same time. This is the reason why {Start Time} and {End Time} cannot be considered as the table's superkeys.

However, only S1 to S4 are candidate keys
Candidate key
In the relational model of databases, a candidate key of a relation is a minimal superkey for that relation; that is, a set of attributes such that# the relation does not have two distinct tuples In the relational model of databases, a candidate key of a relation is a minimal superkey for that...

 (that is, minimal superkeys for that relation) because e.g. S1 ⊂ S5, so S5 cannot be a candidate key.

Recall that 2NF
Second normal form
Second normal form is a normal form used in database normalization. 2NF was originally defined by E.F. Codd in 1971.A table that is in first normal form must meet additional criteria if it is to qualify for second normal form...

 prohibits partial functional dependencies of non-prime attributes (ie an attribute that does not occur in ANY candidate key) on candidate keys, and that 3NF
Third normal form
In computer science, the third normal form is a normal form used in database normalization. 3NF was originally defined by E.F. Codd in 1971. Codd's definition states that a table is in 3NF if and only if both of the following conditions hold:...

 prohibits transitive functional dependencies
Transitive dependency
In mathematics, a transitive dependency is a functional dependency which holds by virtue of transitivity. A transitive dependency can occur only in a relation that has three or more attributes. Let A, B, and C designate three distinct attributes in the relation...

 of non-prime attributes on candidate keys.

In Today's Court Bookings table, there are no non-prime attributes: that is, all attributes belong to some candidate key. Therefore the table adheres to both 2NF and 3NF.

The table does not adhere to BCNF. This is because of the dependency Rate Type → Court, in which the determining attribute (Rate Type) is neither a candidate key nor a superset of a candidate key.

Dependency Rate Type → Court is respected as a Rate Type should only ever apply to a single Court.

The design can be amended so that it meets BCNF:
|+ Rate Types
|-
!Rate Type!!Court!!Member Flag
|-
|SAVER>
1 >-
|STANDARD
1 >-
|PREMIUM-A
2 >-
|PREMIUM-B
2
|+ Today's Bookings
|-
!Rate Type!!Start Time!!End Time
|-
|SAVER>
09:30 >-
|SAVER
11:00 >-
|STANDARD
14:00 >-
|PREMIUM-B
10:00 >-
|PREMIUM-B
11:30 >-
|PREMIUM-A
15:00


The candidate keys for the Rate Types table are {Rate Type} and {Court, Member Flag}; the candidate keys for the Today's Bookings table are {Rate Type, Start Time} and {Rate Type, End Time}. Both tables are in BCNF. Having one Rate Type associated with two different Courts is now impossible, so the anomaly affecting the original table has been eliminated.

Achievability of BCNF

In some cases, a non-BCNF table cannot be decomposed into tables that satisfy BCNF and preserve the dependencies that held in the original table. Beeri and Bernstein showed in 1979 that, for example, a set of functional dependencies {AB → C, C → B} cannot be represented by a BCNF schema. Thus, unlike the first three normal forms, BCNF is not always achievable.

Consider the following non-BCNF table whose functional dependencies follow the {AB → C, C → B} pattern:
|+ Nearest Shops
|-
!Person!!Shop Type!!Nearest Shop
|-
|Davidson>
Optician >-
|Davidson
Hairdresser >-
|Wright
Bookshop >-
|Fuller
Bakery >-
|Fuller
Hairdresser >-
|Fuller
Optician


For each Person / Shop Type combination, the table tells us which shop of this type is geographically nearest to the person's home. We assume for simplicity that a single shop cannot be of more than one type.

The candidate keys of the table are:
  • {Person, Shop Type}
  • {Person, Nearest Shop}


Because all three attributes are prime attributes (i.e. belong to candidate keys), the table is in 3NF. The table is not in BCNF, however, as the Shop Type attribute is functionally dependent on a non-superkey: Nearest Shop.

The violation of BCNF means that the table is subject to anomalies. For example, Eagle Eye might have its Shop Type changed to "Optometrist" on its "Fuller" record while retaining the Shop Type "Optician" on its "Davidson" record. This would imply contradictory answers to the question: "What is Eagle Eye's Shop Type?" Holding each shop's Shop Type only once would seem preferable, as doing so would prevent such anomalies from occurring:
|+ Shop Near Person
|-
!Person!!Shop
|-
|Davidson>
>-
|Davidson
>-
|Wright
>-
|Fuller
>-
|Fuller
>-
|Fuller
|+ Shop
|-
!Shop>
>-
|Eagle Eye
>-
|Snippets
>-
|Merlin Books
>-
|Doughy's
>-
|Sweeney Todd's

In this revised design, the "Shop Near Person" table has a candidate key of {Person, Shop}, and the "Shop" table has a candidate key of {Shop}. Unfortunately, although this design adheres to BCNF, it is unacceptable on different grounds: it allows us to record multiple shops of the same type against the same person. In other words, its candidate keys do not guarantee that the functional dependency {Person, Shop Type} → {Shop} will be respected.

A design that eliminates all of these anomalies (but does not conform to BCNF) is possible. This design consists of the original "Nearest Shops" table supplemented by the "Shop" table described above.
|+ Nearest Shops
|-
!Person!!Shop Type!!Nearest Shop
|-
|Davidson>
Optician >-
|Davidson
Hairdresser >-
|Wright
Bookshop >-
|Fuller
Bakery >-
|Fuller
Hairdresser >-
|Fuller
Optician
|+ Shop
|-
!Shop!!Shop Type
|-
|Eagle Eye>
>-
|Snippets
>-
|Merlin Books
>-
|Doughy's
>-
|Sweeney Todd's

If a referential integrity constraint
Referential integrity
Referential integrity is a property of data which, when satisfied, requires every value of one attribute of a relation to exist as a value of another attribute in a different relation ....

is defined to the effect that {Shop Type, Nearest Shop} from the first table must refer to a {Shop Type, Shop} from the second table, then the data anomalies described previously are prevented.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK