First normal form
Encyclopedia
First normal form is a normal form used in database normalization
Database normalization
In the design of a relational database management system , the process of organizing data to minimize redundancy is called normalization. The goal of database normalization is to decompose relations with anomalies in order to produce smaller, well-structured relations...

. A relational database
Relational database
A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...

 table
Table (database)
In relational databases and flat file databases, a table is a set of data elements that is organized using a model of vertical columns and horizontal rows. A table has a specified number of columns, but can have any number of rows...

 that adheres to 1NF is one that meets a certain minimum set of criteria. These criteria are basically concerned with ensuring that the table is a faithful representation of a relation
Relation (mathematics)
In set theory and logic, a relation is a property that assigns truth values to k-tuples of individuals. Typically, the property describes a possible connection between the components of a k-tuple...

and that it is free of repeating groups.

The concept of a "repeating group" is, however, understood in different ways by different theorists. As a consequence, there is no universal agreement as to which features would disqualify a table from being in 1NF. Most notably, 1NF as defined by some authors (for example, Ramez Elmasri and Shamkant B. Navathe, following the precedent established by Edgar F. Codd
Edgar F. Codd
Edgar Frank "Ted" Codd was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases...

) excludes relation-valued attributes (tables within tables); whereas 1NF as defined by other authors (for example, Chris Date) permits them.

1NF tables as representations of relations

According to Date's definition of 1NF, a table is in 1NF if and only if
If and only if
In logic and related fields such as mathematics and philosophy, if and only if is a biconditional logical connective between statements....

 it is "isomorphic
Isomorphism
In abstract algebra, an isomorphism is a mapping between objects that shows a relationship between two properties or operations.  If there exists an isomorphism between two structures, the two structures are said to be isomorphic.  In a certain sense, isomorphic structures are...

 to some relation", which means, specifically, that it satisfies the following five conditions:
Violation of any of these conditions would mean that the table is not strictly relational, and therefore that it is not in 1NF.

Examples of tables (or views
View (database)
In database theory, a view consists of a stored query accessible as a virtual table in a relational database or a set of documents in a document-oriented database composed of the result set of a query or map and reduce functions...

) that would not meet this definition of 1NF are:
  • A table that lacks a unique key
    Unique key
    In relational database design, a unique key can uniquely identify each row in a table, and is closely related to the Superkey concept. A unique key comprises a single column or a set of columns. No two distinct rows in a table can have the same value in those columns if NULL values are not used...

    . Such a table would be able to accommodate duplicate rows, in violation of condition 3.
  • A view whose definition mandates that results be returned in a particular order, so that the row-ordering is an intrinsic and meaningful aspect of the view. This violates condition 1. The tuple
    Tuple
    In mathematics and computer science, a tuple is an ordered list of elements. In set theory, an n-tuple is a sequence of n elements, where n is a positive integer. There is also one 0-tuple, an empty sequence. An n-tuple is defined inductively using the construction of an ordered pair...

    s in true relations are not ordered with respect to each other.
  • A table with at least one nullable
    Null (SQL)
    Null is a special marker used in Structured Query Language to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL Null serves to fulfill the requirement that all true relational database management systems support...

     attribute. A nullable attribute would be in violation of condition 4, which requires every field to contain exactly one value from its column's domain. It should be noted, however, that this aspect of condition 4 is controversial. It marks an important departure from Codd
    Edgar F. Codd
    Edgar Frank "Ted" Codd was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases...

    's later vision of the relational model
    Relational model
    The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...

    , which made explicit provision for nulls.

Repeating groups

Date's fourth condition, which expresses "what most people think of as the defining feature of 1NF", is concerned with repeating groups. The following scenario illustrates how a database design might incorporate repeating groups, in violation of 1NF.

Domains and values

Suppose a designer wishes to record the names and telephone numbers of customers. He defines a customer table which looks like this:
Customer
Customer ID First Name Surname Telephone Number
123 Robert Ingram 555-861-2025
456 Jane Wright 555-403-1659
789 Maria Fernandez 555-808-9633


The designer then becomes aware of a requirement to record multiple telephone numbers for some customers. He reasons that the simplest way of doing this is to allow the "Telephone Number" field in any given record to contain more than one value:
Customer
Customer ID First Name Surname Telephone Number
123 Robert Ingram 555-861-2025
456 Jane Wright 555-403-1659
555-776-4100
789 Maria Fernandez 555-808-9633


Assuming, however, that the Telephone Number column is defined on some Telephone Number-like domain (e.g. the domain of strings 12 characters in length), the representation above is not in 1NF. 1NF (and, for that matter, the RDBMS) prevents a single field from containing more than one value from its column's domain.

Repeating groups across columns

The designer might attempt to get around this restriction by defining multiple Telephone Number columns:
Customer
Customer ID First Name Surname Tel. No. 1 Tel. No. 2 Tel. No. 3
123 Robert Ingram 555-861-2025
456 Jane Wright 555-403-1659 555-776-4100 555-403-1659
789 Maria Fernandez 555-808-9633


This representation, however, makes use of nullable columns, and therefore does not conform to Date's definition of 1NF (in violation to condition 4). Even if the view is taken that nullable columns are allowed, the design is not in keeping with the spirit of 1NF (in violation to condition 2). Tel. No. 1, Tel. No. 2., and Tel. No. 3. share exactly the same domain and exactly the same meaning; the splitting of Telephone Number into three headings is artificial and causes logical problems. These problems include:
  • Difficulty in querying the table. Answering such questions as "Which customers have telephone number X?" and "Which pairs of customers share a telephone number?" is awkward.
  • Inability to enforce uniqueness of Customer-to-Telephone Number links through the RDBMS. Customer 789 might mistakenly be given a Tel. No. 2 value that is exactly the same as her Tel. No. 1 value.
  • Restriction of the number of telephone numbers per customer to three. If a customer with four telephone numbers comes along, we are constrained to record only three and leave the fourth unrecorded. This means that the database design is imposing constraints on the business process, rather than (as should ideally be the case) vice-versa.

Repeating groups within columns

The designer might, alternatively, retain the single Telephone Number column but alter its domain, making it a string of sufficient length to accommodate multiple telephone numbers:
Customer
Customer ID First Name Surname Telephone Numbers
123 Robert Ingram 555-861-2025
456 Jane Wright 555-403-1659, 555-776-4100
789 Maria Fernandez 555-808-9633


This design is consistent with 1NF, but still presents several design issues. The Telephone Number heading becomes semantically
Semantics
Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....

 non-specific, as it can now represent either a telephone number, a list of telephone numbers, or indeed anything at all. A query such as "Which pairs of customers share a telephone number?" is more difficult to formulate, given the necessity to cater for lists of telephone numbers as well as individual telephone numbers. Meaningful constraints on telephone numbers are also very difficult to define in the RDBMS with this design.

A design that complies with 1NF

A design that is unambiguously in 1NF makes use of two tables: a Customer Name table and a Customer Telephone Number table.
Customer Name
Customer ID First Name Surname
123 Robert Ingram
456 Jane Wright
789 Maria Fernandez
Customer Telephone Number
Customer ID Telephone Number
123 555-861-2025
456 555-403-1659
456 555-776-4100
789 555-808-9633

Repeating groups of telephone numbers do not occur in this design. Instead, each Customer-to-Telephone Number link appears on its own record. With Customer ID as key fields, a "parent-child" or one-to-many (1:M) relationship exists between the two tables, since a customer record (in the "parent" table, Customer Name) can have many telephone number records (in the "child" table, Customer Telephone Number), but each telephone number usually has one, and only one customer. In the case where several customers could share the same telephone number, an additional column is needed in the Customer Telephone Number table to represent a unique key. It is worth noting that this design meets the additional requirements for second
Second normal form
Second normal form is a normal form used in database normalization. 2NF was originally defined by E.F. Codd in 1971.A table that is in first normal form must meet additional criteria if it is to qualify for second normal form...

 and third normal form (3NF)
Third normal form
In computer science, the third normal form is a normal form used in database normalization. 3NF was originally defined by E.F. Codd in 1971. Codd's definition states that a table is in 3NF if and only if both of the following conditions hold:...

.

Atomicity

Some definitions of 1NF, most notably that of Edgar F. Codd
Edgar F. Codd
Edgar Frank "Ted" Codd was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases...

, make reference to the concept of atomicity. Codd states that the "values in the domains on which each relation is defined are required to be atomic with respect to the DBMS." Codd defines an atomic value as one that "cannot be decomposed into smaller pieces by the DBMS (excluding certain special functions)." Meaning a field should not be divided into parts with more than one kind of data in it such that what one part means to the DBMS depends on another part of the same field.

Hugh Darwen
Hugh Darwen
Hugh Darwen is a computer scientist who was an employee of IBM United Kingdom from 1967 to 2004, and has been involved in the history of the relational model.- Work :...

 and Chris Date have suggested that Codd's concept of an "atomic value" is ambiguous, and that this ambiguity has led to widespread confusion about how 1NF should be understood. In particular, the notion of a "value that cannot be decomposed" is problematic, as it would seem to imply that few, if any, data types are atomic:
  • A character string would seem not to be atomic, as the RDBMS typically provides operators to decompose it into substrings.
  • A fixed-point number would seem not to be atomic, as the RDBMS typically provides operators to decompose it into integer and fractional components.


Date suggests that "the notion of atomicity has no absolute meaning": a value may be considered atomic for some purposes, but may be considered an assemblage of more basic elements for other purposes. If this position is accepted, 1NF cannot be defined with reference to atomicity. Columns of any conceivable data type (from string types and numeric types to array types and table types) are then acceptable in a 1NF table—although perhaps not always desirable (For example, it would be more desirable to separate a Customer Name field into two separate fields as First Name, Surname). Date argues that relation-valued attributes, by means of which a field within a table can contain a table, are useful in rare cases.

Normalization beyond 1NF

Any table that is in second normal form
Second normal form
Second normal form is a normal form used in database normalization. 2NF was originally defined by E.F. Codd in 1971.A table that is in first normal form must meet additional criteria if it is to qualify for second normal form...

 (2NF) or higher is, by definition, also in 1NF (each normal form has more stringent criteria than its predecessor). On the other hand, a table that is in 1NF may or may not be in 2NF; if it is in 2NF, it may or may not be in 3NF
Third normal form
In computer science, the third normal form is a normal form used in database normalization. 3NF was originally defined by E.F. Codd in 1971. Codd's definition states that a table is in 3NF if and only if both of the following conditions hold:...

, and so on.

Normal forms higher than 1NF are intended to deal with situations in which a table suffers from design problems that may compromise the integrity
Data integrity
Data Integrity in its broadest meaning refers to the trustworthiness of system resources over their entire life cycle. In more analytic terms, it is "the representational faithfulness of information to the true state of the object that the information represents, where representational faithfulness...

 of the data within it. For example, the following table is in 1NF, but is not in 2NF and therefore is vulnerable to logical inconsistencies:
Subscriber Email Addresses
Subscriber ID Email Address Subscriber First Name Subscriber Surname
108 steve@aardvarkmail.net Steve Wallace
252 carol@mongoosemail.org Carol Robertson
252 crobertson@aardvarkmail.net Carol Robertson
360 hclark@antelopemail.com Harriet Clark


The table's key
Candidate key
In the relational model of databases, a candidate key of a relation is a minimal superkey for that relation; that is, a set of attributes such that# the relation does not have two distinct tuples In the relational model of databases, a candidate key of a relation is a minimal superkey for that...

 is {Subscriber ID, Email Address}.

If Carol Robertson changes her surname by marriage, the change must be applied to two rows. If the change is only applied to one row, a contradiction results: the question "What is Customer 252's name?" has two conflicting answers. 2NF addresses this problem. Note that Carol Robertson's record is appearing in the table twice because it has more than one email address related to it.

A practical way to think of 1NF in the above table is to ask a series of questions about the relationships that records (rows) can have between entities (tables) or attributes (columns), based on given business rules or constraints. For example, could a Subscriber record relate to many Email Address records? Could an Email Address record relate to many Subscriber records? In the above table we can see that Carol Robertson has more than one email address. We could answer the questions by saying there's a one-to-many relationship (1:M) between Subscriber and Email Address in the above table, since a subscriber can have many email addresses, and an email address usually has one, and only one subscriber. We would then create a separate table called Subscribers and move the Subscriber First Name and Subscriber Surname columns from the Subscriber Email Addresses table into the new Subscribers table, adding a third column Subscriber ID as the primary key. Thus, a one-to-many relationship exists between the Subscribers table (with Subscriber ID as the primary key) and the Subscriber Email Addresses table (with Subscriber ID as the foreign key). The tables would conform to 2NF
Second normal form
Second normal form is a normal form used in database normalization. 2NF was originally defined by E.F. Codd in 1971.A table that is in first normal form must meet additional criteria if it is to qualify for second normal form...

 in addition to 1NF.

See also

  • Attribute-value system
    Attribute-value system
    An attribute-value system is a basic knowledge representation framework comprising a table with columns designating "attributes" and rows designating "objects" An attribute-value system is a basic knowledge representation framework comprising a table with columns designating "attributes" (also...

  • Entity-attribute-value model
    Entity-Attribute-Value model
    Entity–attribute–value model is a data model to describe entities where the number of attributes that can be used to describe them is potentially vast, but the number that will actually apply to a given entity is relatively modest. In mathematics, this model is known as a sparse matrix...

  • Second normal form
    Second normal form
    Second normal form is a normal form used in database normalization. 2NF was originally defined by E.F. Codd in 1971.A table that is in first normal form must meet additional criteria if it is to qualify for second normal form...

  • Third normal form
    Third normal form
    In computer science, the third normal form is a normal form used in database normalization. 3NF was originally defined by E.F. Codd in 1971. Codd's definition states that a table is in 3NF if and only if both of the following conditions hold:...



Further reading



External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK