All Topics  
Database

 

   Email Print
   Bookmark   Link






 

Database



 
 
A database is a structured collection of records or data
DATA

Debt, AIDS, Trade in Africa is a multinational Non-governmental organization founded in January 2002 in London by U2's Bono along with Robert Sargent Shriver III and activists from the Jubilee 2000 Drop the Debt campaign....
 that is stored in a computer system. The structure is achieved by organizing the data according to a database model
Database model

A database model or database schema is the structure or format of a database, described in a formal language supported by the database management system....
. The model in most common use today is the relational model
Relational model

The relational model for database management is a database model based on first-order logic, first formulated and proposed in 1969 by Edgar F. Codd....
. Other models such as the hierarchical model
Hierarchical model

A hierarchical data model is a data model in which the data is organized into a Tree data structure-like structure. The structure allows repeating information using parent/child relationships: each parent can have many children but each child only has one parent....
 and the network model
Network model

The network model is a database model conceived as a flexible way of representing objects and their relationships.File:Network Model.jpgThe network model original inventor was Charles Bachman, and it was developed into a standard specification published in 1969 by the CODASYL Consortium....
 use a more explicit representation of relationships.

Database topics
Architecture
Depending on the intended use, there are a number of database architectures in use.






Discussion
Ask a question about 'Database'
Start a new discussion about 'Database'
Answer questions from other users
Full Discussion Forum



Encyclopedia


A database is a structured collection of records or data
DATA

Debt, AIDS, Trade in Africa is a multinational Non-governmental organization founded in January 2002 in London by U2's Bono along with Robert Sargent Shriver III and activists from the Jubilee 2000 Drop the Debt campaign....
 that is stored in a computer system. The structure is achieved by organizing the data according to a database model
Database model

A database model or database schema is the structure or format of a database, described in a formal language supported by the database management system....
. The model in most common use today is the relational model
Relational model

The relational model for database management is a database model based on first-order logic, first formulated and proposed in 1969 by Edgar F. Codd....
. Other models such as the hierarchical model
Hierarchical model

A hierarchical data model is a data model in which the data is organized into a Tree data structure-like structure. The structure allows repeating information using parent/child relationships: each parent can have many children but each child only has one parent....
 and the network model
Network model

The network model is a database model conceived as a flexible way of representing objects and their relationships.File:Network Model.jpgThe network model original inventor was Charles Bachman, and it was developed into a standard specification published in 1969 by the CODASYL Consortium....
 use a more explicit representation of relationships.

Database topics


Architecture


Depending on the intended use, there are a number of database architectures in use. Many databases use a combination of strategies. On-line Transaction Processing systems (OLTP) often use a row-oriented datastore architecture, while data-warehouse and other retrieval-focused applications like Google
Google

Google Inc. is an United States public company, earning revenue from AdWords related to its Google search, Gmail, Google Maps, Google Apps, Orkut, and YouTube services as well as selling advertising-free versions of the Google Search Appliance....
's BigTable
BigTable

BigTable is a Data compression, high performance, and Proprietary software DBMS built on Google File System , Distributed_lock_manager#Google.27s_Chubby_lock_service, and a few other Google programs; it is currently not distributed or used outside of Google, although Google offers access to it as part of their Google App Engine....
, or bibliographic database (library catalogue) systems may use a Column-oriented DBMS
Column-oriented DBMS

A column-oriented DBMS is a database management system which stores its content by column rather than by row. This has advantages for databases such as data warehouses and library catalogues, where aggregates are computed over large numbers of similar data items....
 architecture.

Document-Oriented, XML, knowledgebases, as well as frame databases and RDF
Resource Description Framework

The Resource Description Framework is a family of World Wide Web Consortium specifications originally designed as a metadata data model. It has come to be used as a general method for conceptual description or modeling, of information that is implemented in web resources; using a variety of syntax formats....
-stores (aka triple-stores
Triplestore

A triplestore is a purpose-built database for the storage and retrieval of Resource Description Framework metadata.Much like a relational database, one stores information in a triplestore and retrieves it via a query language....
), may also use a combination of these architectures in their implementation.

Finally, it should be noted that not all databases have or need a database 'schema' (so called schema-less databases).

Over many years the database industry has been dominated by General Purpose database systems, which offer a wide range of functions that are applicable to many, if not most circumstances in modern data processing. These have been enhanced with extensible datatypes, pioneered in the PostgreSQL
PostgreSQL

PostgreSQL is an object-relational database management system . It is released under a BSD licenses and is thus free software. As with many other open-source programs, PostgreSQL is not controlled by any single company, but has a global community of developers and companies to develop it....
 project, to allow a very wide range of applications to be developed.

There are also other types of database which cannot be classified as relational databases.

Database management systems

A computer database relies on software to organize the storage of data. This software is known as a database management system
Database management system

A database management system is computer software that manages databases. DBMSes may use any of a variety of database models, such as the network model or relational model....
 (DBMS). Database management systems are categorized according to the database model
Database model

A database model or database schema is the structure or format of a database, described in a formal language supported by the database management system....
 that they support. The model tends to determine the query languages that are available to access the database. A great deal of the internal engineering of a DBMS, however, is independent of the data model, and is concerned with managing factors such as performance, concurrency, integrity, and recovery from hardware failures. In these areas there are large differences between products.

A Relational Database Management System (RDBMS) implements the features of the relational model outlined above. In this context, Date
Christopher J. Date

Chris Date is an independent author, lecturer, researcher, and consultant, specializing in relational database technology....
's "Information Principle" states: "the entire information content of the database is represented in one and only one way. Namely as explicit values in column positions (attributes) and rows in relations (tuple
Tuple

In mathematics, a tuple is a sequence of a specific number of values, called the components of the tuple. These components can be any kind of mathematical objects, where each component of a tuple is a value of a specified type....
s). Therefore, there are no explicit pointers between related tables."

Database models


Post-relational database models
Products offering a more general data model than the relational model are sometimes classified as post-relational. The data model in such products incorporates relations but is not constrained by the Information Principle, which requires that all information is represented by data values in relations.

Some of these extensions to the relational model actually integrate concepts from technologies that pre-date the relational model
Relational model

The relational model for database management is a database model based on first-order logic, first formulated and proposed in 1969 by Edgar F. Codd....
. For example, they allow representation of a directed graph
Directed graph

A directed graph or digraph is a pair G= of:* a Set V, whose element are called vertices or nodes,* a set A of ordered pairs of vertices, called arcs, directed edges, or arrows....
 with trees on the nodes
Data structure

A data structure in computer science is a way of storing data in a computer so that it can be used efficiently. It is an organization of mathematical and logical concepts of data....
.

Some products implementing such models have been built by extending relational database systems with non-relational features. Others, however, have arrived in much the same place by adding relational features to pre-relational systems. Paradoxically, this allows products that are historically pre-relational, such as PICK
Pick operating system

The Pick operating system is a Demand paging, multiuser, virtual memory, time-sharing operating system based around a unique MultiValue. It is used primarily for business data processing....
 and MUMPS
MUMPS

MUMPS , or alternatively M, is a programming language created in the late 1960s, originally for use in the Health care. It was designed for the production of multi-user database-driven applications....
, to make a plausible claim to be post-relational in their current architecture.

Object database models
In recent years, the object-oriented paradigm has been applied to database technology, creating a new programming model known as object database
Object database

An object database is a database model in which information is represented in the form of Object as used in object-oriented programming.Object databases are generally recommended when there is a business need for high performance processing on complex data....
s. These databases attempt to bring the database world and the application programming world closer together, in particular by ensuring that the database uses the same type system
Type system

In computer science, a type system may be defined as "a tractable syntactic method for proving the absence of certain program behaviors by classifying phrases according to the kinds of values they compute."....
 as the application program. This aims to avoid the overhead (sometimes referred to as the impedance mismatch
Object-Relational impedance mismatch

The object-relational impedance mismatch is a set of conceptual and technical difficulties that are often encountered when a relational database management system is being used by a program written in an object-oriented programming language or style; particularly when objects or class definitions are mapped in a straightforward way to databas...
) of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects). At the same time, object databases attempt to introduce the key ideas of object programming, such as encapsulation
Encapsulation (computer science)

In computer science, Encapsulation is the hiding of the internal mechanisms and data structures of a software component behind a defined interface, in such a way that users of the component only need to know what the component does, and cannot make themselves dependent on the details of how it does it....
 and polymorphism, into the world of databases.

A variety of these ways have been tried for storing objects in a database. Some products have approached the problem from the application programming end, by making the objects manipulated by the program persistent
Persistence (computer science)

Persistence in computer science refers to the characteristic of data that outlives the execution of the program that created it. Without this capability, data only exists in RAM, and will be lost when the memory loses power, such as on computer shutdown....
. This also typically requires the addition of some kind of query language, since conventional programming languages do not have the ability to find objects based on their information content. Others have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabilities as well as traditional query facilities.

Database storage structures

Relational database tables/indexes are typically stored in memory or on hard disk in one of many forms, ordered/unordered flat files
Flat file database

A flat file database describes any of various means to encode a database model as a plain text file....
, ISAM
ISAM

ISAM stands for Indexed Sequential Access Method, a method for Index data for fast retrieval. ISAM was originally developed by IBM for mainframe computers....
, heaps
Heap (data structure)

In computer science, a heap is a specialized tree data structure-based data structure that satisfies the heap property: if B is a child node of A, then key ≥ key....
, hash buckets
Hash table

In computer science, a hash table, or a hash map, is a data structure that associates Unique key with value .The primary operation that hash functions support efficiently is a lookup: given a key , find the corresponding value ....
 or B+ tree
B+ tree

In computer science, a B+ tree is a type of tree data structure which represents sorted data in a way that allows for efficient insertion, retrieval and removal of records, each of which is identified by a key....
s. These have various advantages and disadvantages discussed further in the main article on this topic. The most commonly used are B+ trees and ISAM.

Object databases use a range of storage mechanisms. Some use virtual memory mapped files to make the native language (C++
C++

C++ is a general-purpose programming language. It is regarded as a middle-level language, as it comprises a combination of both high-level programming language and low-level programming language language features....
, Java
Java (programming language)

Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java ....
 etc.) objects persistent. This can be highly efficient but it can make multi-language access more difficult. Others break the objects down into fixed and varying length components that are then clustered tightly together in fixed sized blocks on disk and reassembled into the appropriate format either for the client or in the client address space. Another popular technique is to store the objects in tuples, much like a relational database, which the database server then reassembles for the client.

Other important design choices relate to the clustering of data by category (such as grouping data by month, or location), creating pre-computed views known as materialized views, partitioning data by range or hash. Memory management and storage topology can be important design choices for database designers as well. Just as normalization
Database normalization

In the field of relational database design, normalization is a systematic way of ensuring that a database structure is suitable for general-purpose querying and free of certain undesirable characteristics?insertion, update, and deletion anomalies?that could lead to a loss of data integrity....
 is used to reduce storage requirements and improve the extensibility
Extensibility

In software engineering, extensibility is a system design principle where the implementation takes into consideration future growth. It is a systemic measure of the ability to extend a system and the level of effort required to implement the extension....
 of the database, conversely denormalization is often used to reduce join complexity and reduce execution time for queries.

Indexing

All of these databases can take advantage of indexing
Index (database)

A database index is a data structure that improves the speed of operations on a Table . Indexes can be created using one or more column , providing the basis for both rapid random look ups and efficient access of ordered records....
 to increase their speed. This technology has advanced tremendously since its early uses in the 1960s and 1970s. The most common kind of index is a sorted list of the contents of some particular table column, with pointers to the row associated with the value. An index allows a set of table rows matching some criterion to be located quickly. Typically, indexes are also stored in the various forms of data-structure mentioned above (such as B-tree
B-tree

In computer science, a B-tree is a tree data structure that keeps data sorted and allows searches, insertions, and deletions in logarithmic Amortized analysis....
s, hash
Hash table

In computer science, a hash table, or a hash map, is a data structure that associates Unique key with value .The primary operation that hash functions support efficiently is a lookup: given a key , find the corresponding value ....
es, and linked lists). Usually, a specific technique is chosen by the database designer to increase efficiency in the particular case of the type of index required.

Most relational DBMS's and some object DBMSs have the advantage that indexes can be created or dropped without changing existing applications making use of it. The database chooses between many different strategies based on which one it estimates will run the fastest. In other words, indexes are transparent to the application or end-user querying the database; while they affect performance, any SQL command will run with or without index to compute the result of an SQL
SQL

SQL is a database computer language designed for the retrieval and management of data in relational database management systems , database schema creation and modification, and database object access control management....
 statement. The RDBMS will produce a plan of how to execute the query, which is generated by analyzing the run times of the different algorithms and selecting the quickest. Some of the key algorithms that deal with joins
Join (SQL)

An SQL JOIN clause combines records from two table s in a database. It creates a set that can be saved as a table or used as is. A JOIN is a means for combining fields from two tables by using values common to each....
 are nested loop join
Nested loop join

The naive algorithm that joins two relations and by making two nested loops: For each tuple in R as r do For each tuple in S as s do If r and s satisfy the join condition...
, sort-merge join
Sort-merge join

The Sort-Merge Join is an example of a join algorithm and is used in the implementation of a relational database database management system.The basic problem of a join algorithm is to find, for each distinct value of the join attribute, the set of tuples in each relation which display that value....
 and hash join
Hash join

The Hash join is an example of a Join and is used in the implementation of a relational database database management system.The task of a join algorithm is to find, for each distinct value of the join attribute, the set of Tuple#Relational model in each relation which have that value....
. Which of these is chosen depends on whether an index exists, what type it is, and its cardinality
Cardinality (SQL statements)

In SQL , the term cardinality refers to the uniqueness of data values contained in a particular column of a Relational database Database table....
.

An index speeds up access to data, but it has disadvantages as well. First, every index increases the amount of storage on the hard drive necessary for the database file, and second, the index must be updated each time the data are altered, and this costs time. (Thus an index saves time in the reading of data, but it costs time in entering and altering data. It thus depends on the use to which the data are to be put whether an index is on the whole a net plus or minus in the quest for efficiency.)

A special case of an index is a primary index, or primary key, which is distinguished in that the primary index must ensure a unique reference to a record. Often, for this purpose one simply uses a running index number (ID number). Primary indexes play a significant role in relational databases, and they can speed up access to data considerably.

Transactions and concurrency

In addition to their data model, most practical databases ("transactional databases") attempt to enforce a database transaction
Database transaction

A database transaction comprises a unit of work performed within a database management system against a database, and treated in a coherent and reliable way independent of other transactions....
. Ideally, the database software should enforce the ACID
Acid

An acid is traditionally considered any chemical compound that, when dissolved in water, gives a solution with a hydrogen ion Activity greater than in pure water, i.e....
 rules, summarized here:
  • Atomicity
    Atomicity

    In database systems, atomicity is one of the ACID database transaction properties. In an atomic transaction, a series of database operations either all occur, or nothing occurs....
    : Either all the tasks in a transaction must be done, or none of them. The transaction must be completed, or else it must be undone (rolled back).
  • Consistency
    Database consistency

    In database systems, a consistent database transaction is one that does not violate any integrity constraints during its execution. If a transaction leaves the database in an illegal state, it is aborted and an error is reported....
    : Every transaction must preserve the integrity constraints — the declared consistency rules — of the database. It cannot place the data in a contradictory state.
  • Isolation
    Isolation

    The term Isolation may refer to:isolation: the act of being alone; separation.* Solitude, a social state* Solitary confinement* Isolation , measures taken to prevent the spread of communicable disease in a patient....
    : Two simultaneous transactions cannot interfere with one another. Intermediate results within a transaction are not visible to other transactions.
  • Durability
    Durability (computer science)

    In database systems, durability is the ACID property which guarantees that database transactions that have committed will survive permanently.For example, if a flight booking reports that a seat has successfully been booked, then the seat will remain booked even if the system crashes....
    : Completed transactions cannot be aborted later or their results discarded. They must persist through (for instance) restarts of the DBMS after crashes


In practice, many DBMSs allow most of these rules to be selectively relaxed for better performance.

Concurrency control
Concurrency control

In computer science, especially in the fields of computer programming , operating systems , multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible....
 is a method used to ensure that transactions are executed in a safe manner and follow the ACID rules. The DBMS must be able to ensure that only serializable
Serializability

In databases and transaction processing, a Schedule is serializable, has the Serializability property, if its outcome is equal to the outcome of its transactions executed serially, i.e., sequentially without overlapping in time....
, recoverable
Serializability

In databases and transaction processing, a Schedule is serializable, has the Serializability property, if its outcome is equal to the outcome of its transactions executed serially, i.e., sequentially without overlapping in time....
 schedules are allowed, and that no actions of committed transactions are lost while undoing aborted transactions.

Replication

Replication of databases is closely related to transactions. If a database can log its individual actions, it is possible to create a duplicate of the data in real time. The duplicate can be used to improve performance or availability of the whole database system. Common replication concepts include:
  • Master/Slave Replication: All write requests are performed on the master and then replicated to the slaves
  • Quorum: The result of Read and Write requests are calculated by querying a "majority" of replicas.
  • Multimaster: Two or more replicas sync each other via a transaction identifier.


Parallel synchronous replication of databases enables transactions to be replicated on multiple servers simultaneously, which provides a method for backup and security as well as data availability.

Security

Database security
Database security

Database security is the system, processes, and procedures that protect a database from unintended activity. Unintended activity can be categorized as authenticated misuse, malicious attacks or inadvertent mistakes made by authorized individuals or processes....
 denotes the system, processes, and procedures that protect a database from unintended activity.

Security is usually enforced through access control, auditing, and encryption.
  • Access control ensures and restricts who can connect and what can be done to the database.
  • Auditing logs what action or change has been performed, when and by whom.
  • Encryption: Since security has become a major issue in recent years, many commercial database vendors provide built-in encryption mechanisms. Data is encoded natively into the tables and deciphered "on the fly" when a query comes in. Connections can also be secured and encrypted if required using DSA, MD5, SSL or legacy encryption standard.


Enforcing security is one of the major tasks of the DBA.

In the United Kingdom, legislation protecting the public from unauthorized disclosure of personal information held on databases falls under the Office of the Information Commissioner. United Kingdom based organizations holding personal data in electronic format (databases for example) are required to register with the Data Commissioner.

Locking

Locking
Lock (computer science)

In computer science, a lock is a Synchronization mechanism for enforcing limits on access to a resource in an environment where there are many thread ....
 is how the database handles multiple concurrent operations. This is how concurrency and some form of basic integrity is managed within the database system. Such locks can be applied on a row level, or on other levels like page (a basic data block), extent (multiple array of pages) or even an entire table. This helps maintain the integrity of the data by ensuring that only one process at a time can modify the same data. In basic filesystem files or folders, only one lock at a time can be set, restricting the usage to one process only. Databases, on the other hand, can set and hold mutiple locks at the same time on the different level of the physical data structure. How locks are set, last is determined by the database engine locking scheme based on the submitted SQL or transactions by the users. Generally speaking, no activity on the database should be translated by no or very light locking.

For most DBMS systems existing on the market, locks are generally shared or exclusive. Exclusive locks mean that no other lock can acquire the current data object as long as the exclusive lock lasts. Exclusive locks are usually set while the database needs to change data, like during an UPDATE or DELETE operation.

Shared locks can take ownership one from the other of the current data structure. Shared locks are usually used while the database is reading data, during a SELECT operation. The number, nature of locks and time the lock holds a data block can have a huge impact on the database performances. Bad locking can lead to disastrous performance response (usually the result of poor SQL requests, or inadequate database physical structure)

Default locking behavior is enforced by the isolation level of the dataserver. Changing the isolation level will affect how shared or exclusive locks must be set on the data for the entire database system. Default isolation is generally 1, where data can not be read while it is modified, forbidding to return "ghost data" to end user.

At some point intensive or inappropriate exclusive locking, can lead to the "dead lock" situation between two locks. Where none of the locks can be released because they try to acquire resources mutually from each other. The Database has a fail safe mechanism and will automatically "sacrifice" one of the locks releasing the resource. Doing so processes or transactions involved in the "dead lock" will be rolled back.

Databases can also be locked for other reasons, like access restrictions for given levels of user. Some databases are also locked for routine database maintenance, which prevents changes being made during the maintenance. See for more detail.) However, many modern databases don't lock the database during routine maintenance. e.g. for PostgreSQL.

Applications of databases

Databases are used in many applications, spanning virtually the entire range of computer software
Computer software

Computer software, or just software is a general term used to describe a collection of computer programs, Algorithm and Software documentation that perform some tasks on a computer system....
. Databases are the preferred method of storage for large multiuser applications, where coordination between many users is needed. Even individual users find them convenient, and many electronic mail programs and personal organizers are based on standard database technology. Software database drivers are available for most database platforms so that application software
Application software

Application software is any tool that functions and is operated by means of a computer, with the purpose of supporting or improving the software user 's work....
 can use a common Application Programming Interface
Application programming interface

An application programming interface is a set of subroutine, data structures, class and/or Protocol provided by library and/or operating system Service s in order to support the building of applications....
 to retrieve the information stored in a database. Two commonly used database APIs are JDBC and ODBC.

See also

  • Comparison of relational database management systems
    Comparison of relational database management systems

    The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information....
  • Comparison of database tools
    Comparison of database tools

    The following tables compare general and technical information for a number of available Database administrator#Performance. Please see individual product articles for further information....
  • Database-centric architecture
    Database-centric architecture

    Database-centric architecture or data-centric architecture has several distinct meanings, generally relating to software architectures in which databases play a crucial role....
  • Database theory
    Database theory

    Database theory encapsulates a broad range of topics related to the study and research of the theoretical realm of databases and database management systems....
  • Government database
    Government database

    Government databases collect personal information for various reasons ....
  • Object database
    Object database

    An object database is a database model in which information is represented in the form of Object as used in object-oriented programming.Object databases are generally recommended when there is a business need for high performance processing on complex data....
  • Online database
    Online database

    An online database is a database accessible by the internet.It differs from a local database, held in an individual computer or its attached storage, such as a CD....
  • Real time database
    Real time database

    A real-time database is a processing system designed to handle workloads whose state is constantly changing . This differs from traditional databases containing persistent data, mostly unaffected by time....
  • Relational database
    Relational database

    A relational database is a database that groups data using common attributes found in the data set. The resulting "clumps" of organized data are much easier for people to understand....


Further reading

  • Connolly, Thomas, and Carolyn Begg. Database Systems. New York: Harlow, 2002.
  • Date, C. J. An Introduction to Database Systems, Eighth Edition, Addison Wesley, 2003.
  • Galindo, J., Urrutia, A., Piattini, M., Fuzzy Databases: Modeling, Design and Implementation (FSQL guide). Idea Group Publishing Hershey, USA, 2006.
  • Galindo, J., Ed. Handbook on Fuzzy Information Processing in Databases. Hershey, PA: Information Science Reference (an imprint of Idea Group Inc.), 2008.
  • Gray, J. and Reuter, A. Transaction Processing: Concepts and Techniques, 1st edition, Morgan Kaufmann Publishers, 1992.
  • Kroenke, David M. Database Processing: Fundamentals, Design, and Implementation (1997), Prentice-Hall, Inc., pages 130-144.
  • Kroenke, David M., and David J. Auer. Database Concepts. 3rd ed. New York: Prentice, 2007.
  • Lightstone, S., T. Teorey, and T. Nadeau, Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more, Morgan Kaufmann Press, 2007. ISBN 0-12369-389-6.
  • Shih, J. "", white paper, 2007.
  • Teorey, T.; Lightstone, S. and Nadeau, T. Database Modeling & Design: Logical Design, 4th edition, Morgan Kaufmann Press, 2005. ISBN 0-12-685352-5
  • Tukey, John W. Exploratory Data Analysis. Reading, MA: Addison Wesley, 1977.


External links

  • - a briefing paper at Digital Preservation Europe
    Digital Preservation Europe

    DigitalPreservationEurope is a European Union research project aimed at digital preservation coordination and dissemination activities within Europe....