.png)
A database management system (DBMS) is a software package with
computer programA computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...
s that control the creation, maintenance, and use of a
databaseA database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
. It allows organizations to conveniently develop databases for various applications by
database administratorA database administrator is a person responsible for the design, implementation, maintenance and repair of an organization's database. They are also known by the titles Database Coordinator or Database Programmer, and is closely related to the Database Analyst, Database Modeller, Programmer...
s (DBAs) and other specialists. A
databaseA database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
is an integrated collection of data records, files, and other
objectIn computer science, an object is any entity that can be manipulated by the commands of a programming language, such as a value, variable, function, or data structure...
s. A DBMS allows different user application programs to concurrently access the same database. DBMSs may use a variety of
database modelA database model is the theoretical foundation of a database and fundamentally determines in which manner data can be stored, organized, and manipulated in a database system. It thereby defines the infrastructure offered by a particular database system...
s, such as the
relational modelThe relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...
or
object modelAn object database is a database management system in which information is represented in the form of objects as used in object-oriented programming...
, to conveniently describe and support applications. It typically supports
query languageQuery languages are computer languages used to make queries into databases and information systems.Broadly, query languages can be classified according to whether they are database query languages or information retrieval query languages...
s, which are in fact high-level programming languages, dedicated database languages that considerably simplify writing database application programs. Database languages also simplify the database organization as well as retrieving and presenting information from it. A DBMS provides facilities for controlling
data accessData access typically refers to software and activities related to storing, retrieving, or acting on data housed in a database or other repository...
, enforcing
data integrityData Integrity in its broadest meaning refers to the trustworthiness of system resources over their entire life cycle. In more analytic terms, it is "the representational faithfulness of information to the true state of the object that the information represents, where representational faithfulness...
, managing
concurrency controlIn information technology and computer science, especially in the fields of computer programming , operating systems , multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible.Computer...
, and
recoveringData recovery is the process of salvaging data from damaged, failed, corrupted, or inaccessible secondary storage media when it cannot be accessed normally. Often the data are being salvaged from storage media such as internal or external hard disk drives, solid-state drives , USB flash drive,...
the database after failures and restoring it from backup files, as well as maintaining database
securitySecurity is the degree of protection against danger, damage, loss, and crime. Security as a form of protection are structures and processes that provide or improve security as a condition. The Institute for Security and Open Methodologies in the OSSTMM 3 defines security as "a form of protection...
.
Overview
A DBMS is a set of software programs that controls the system
organizationAn organization is a social group which distributes tasks for a collective goal. The word itself is derived from the Greek word organon, itself derived from the better-known word ergon - as we know `organ` - and it means a compartment for a particular job.There are a variety of legal types of...
,
storageComputer data storage, often called storage or memory, refers to computer components and recording media that retain digital data. Data storage is one of the core functions and fundamental components of computers....
, management, and
retrievalInformation retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...
of data in a
databaseA database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
. DBMSs are categorized according to their data structures or types. The DBMS accepts requests for data from an application program and instructs the
operating systemAn operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...
to transfer the appropriate data. The
queriesInformation retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...
and responses must be submitted and received according to a format that conforms to one or more applicable protocols. When a DBMS is used,
information systemAn information system - or application landscape - is any combination of information technology and people's activities that support operations, management, and decision making. In a very broad sense, the term information system is frequently used to refer to the interaction between people,...
s can be changed more easily as the organization's information requirements change. New categories of data can be added to the database without disruption to the existing system.
Database servers are dedicated computers that hold the actual databases and run only the DBMS and related software. Database servers are usually
multiprocessorComputer system having two or more processing units each sharing main memory and peripherals, in order to simultaneously process programs.Sometimes the term Multiprocessor is confused with the term Multiprocessing....
computers, with generous memory and RAID disk arrays used for stable storage. Hardware database accelerators, connected to one or more servers via a high-speed channel, are also used in large volume transaction processing environments. DBMSs are found at the heart of most
database applicationA database application is a computer program whose primary purpose is entering and retrieving information from a computer-managed database. Early examples of database applications were accounting systems and airline reservations systems, such as SABRE, developed starting in 1957.A characteristic of...
s. DBMSs may be built around a custom
multitaskingIn computing, multitasking is a method where multiple tasks, also known as processes, share common processing resources such as a CPU. In the case of a computer with a single CPU, only one task is said to be running at any point in time, meaning that the CPU is actively executing instructions for...
kernel with built-in
networkingA computer network, often simply referred to as a network, is a collection of hardware components and computers interconnected by communication channels that allow sharing of resources and information....
support, but modern DBMSs typically rely on a standard
operating systemAn operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...
to provide these functions.
History
Databases have been in use since the earliest days of electronic computing. Unlike modern systems, which can be applied to widely different databases and needs, the vast majority of older systems were tightly linked to the custom databases in order to gain speed at the expense of flexibility. Originally DBMSs were found only in large organizations with the
computerA computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...
hardware needed to support large data sets.
1960s Navigational DBMS
As computers grew in speed and capability, a number of general-purpose database systems emerged; by the mid-1960s there were a number of such systems in commercial use. Interest in a standard began to grow, and
Charles BachmanCharles William "Charlie" Bachman is an American computer scientist, who spent his entire career as an industrial researcher rather than in academia...
, author of one such product, the
Integrated Data StoreIntegrated Data Store is a network database largely used by industry for its performance.IDS was designed by Charles Bachman at General Electric in the 1960s. It was not known to be easy to use or implement applications with, because it was designed to maximize performance using the hardware...
(IDS), founded the "Database Task Group" within
CODASYLCODASYL is an acronym for "Conference on Data Systems Languages". This was a consortium formed in 1959 to guide the development of a standard programming language that could be used on many computers...
, the group responsible for the creation and standardization of
COBOLCOBOL is one of the oldest programming languages. Its name is an acronym for COmmon Business-Oriented Language, defining its primary domain in business, finance, and administrative systems for companies and governments....
. In 1971 they delivered their standard, which generally became known as the "Codasyl approach", and soon a number of commercial products based on this approach were made available.
The Codasyl approach was based on the "manual" navigation of a linked data set which was formed into a large network. When the database was first opened, the program was handed back a link to the first record in the database, which also contained pointers to other pieces of data. To find any particular record the programmer had to step through these pointers one at a time until the required record was returned. Simple queries like "find all the people in India" required the program to walk the entire data set and collect the matching results one by one. There was, essentially, no concept of "find" or "search". This may sound like a serious limitation today, but in an era when most data was stored on
magnetic tapeMagnetic tape is a medium for magnetic recording, made of a thin magnetizable coating on a long, narrow strip of plastic. It was developed in Germany, based on magnetic wire recording. Devices that record and play back audio and video using magnetic tape are tape recorders and video tape recorders...
such operations were too expensive to contemplate anyway.
IBM also had their own DBMS system in 1968, known as
IMS.
IMSIBM Information Management System is a joint hierarchical database and information management system with extensive transaction processing capabilities.- History :...
was a development of software written for the Apollo program on the
System/360The IBM System/360 was a mainframe computer system family first announced by IBM on April 7, 1964, and sold between 1964 and 1978. It was the first family of computers designed to cover the complete range of applications, from small to large, both commercial and scientific...
. IMS was generally similar in concept to Codasyl, but used a strict hierarchy for its model of data navigation instead of Codasyl's network model. Both concepts later became known as
navigational databaseA navigational database is a type of database characterized by the fact that objects in it are found primarily by following references from other objects...
s due to the way data was accessed, and Bachman's 1973
Turing AwardThe Turing Award, in full The ACM A.M. Turing Award, is an annual award given by the Association for Computing Machinery to "an individual selected for contributions of a technical nature made to the computing community. The contributions should be of lasting and major technical importance to the...
award presentation was
The Programmer as Navigator. IMS is classified as a hierarchical database.
IDMSIDMS is primarily a network database management system for mainframes. It was first developed at B.F. Goodrich and later marketed by Cullinane Database Systems...
and CINCOM's TOTAL database are classified as
network databasesThe network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, is not restricted to being a hierarchy or lattice.The...
.
1970s relational DBMS
Edgar Codd worked at IBM in
San Jose, CaliforniaSan Jose is the third-largest city in California, the tenth-largest in the U.S., and the county seat of Santa Clara County which is located at the southern end of San Francisco Bay...
, in one of their offshoot offices that was primarily involved in the development of
hard diskA hard disk drive is a non-volatile, random access digital magnetic data storage device. It features rotating rigid platters on a motor-driven spindle within a protective enclosure. Data is magnetically read from and written to the platter by read/write heads that float on a film of air above the...
systems. He was unhappy with the navigational model of the Codasyl approach, notably the lack of a "search" facility. In 1970, he wrote a number of papers that outlined a new approach to database construction that eventually culminated in the groundbreaking
A Relational Model of Data for Large Shared Data Banks.
In this paper, he described a new system for storing and working with large databases. Instead of records being stored in some sort of
linked listIn computer science, a linked list is a data structure consisting of a group of nodes which together represent a sequence. Under the simplest form, each node is composed of a datum and a reference to the next node in the sequence; more complex variants add additional links...
of free-form records as in Codasyl, Codd's idea was to use a "
tableIn relational databases and flat file databases, a table is a set of data elements that is organized using a model of vertical columns and horizontal rows. A table has a specified number of columns, but can have any number of rows...
" of fixed-length records. A linked-list system would be very inefficient when storing "sparse" databases where some of the data for any one record could be left empty. The relational model solved this by splitting the data into a series of normalized tables, with optional elements being moved out of the main table to where they would take up room only if needed.
For instance, a common use of a database system is to track information about users, their name, login information, various addresses and phone numbers. In the navigational approach all of these data would be placed in a single record, and unused items would simply not be placed in the database. In the relational approach, the data would be
normalized into a user table, an address table and a phone number table (for instance). Records would be created in these optional tables only if the address or phone numbers were actually provided.
Linking the information back together is the key to this system. In the relational model, some bit of information was used as a "key", uniquely defining a particular record. When information was being collected about a user, information stored in the optional (or
related) tables would be found by searching for this key. For instance, if the login name of a user is unique, addresses and phone numbers for that user would be recorded with the login name as its key. This "re-linking" of related data back into a single collection is something that traditional computer languages are not designed for.
Just as the navigational approach would require programs to loop in order to collect records, the relational approach would require loops to collect information about any one record. Codd's solution to the necessary looping was a set-oriented language, a suggestion that would later spawn the ubiquitous
SQLSQL is a programming language designed for managing data in relational database management systems ....
. Using a branch of mathematics known as tuple calculus, he demonstrated that such a system could support all the operations of normal databases (inserting, updating etc.) as well as providing a simple system for finding and returning
sets of data in a single operation.
Codd's paper was picked up by two people at Berkeley, Eugene Wong and
Michael StonebrakerMichael Ralph Stonebraker is a computer scientist specializing in database research.Through a series of academic prototypes and commercial startups, Stonebraker's research and products are central to many relational database systems on the market today...
. They started a project known as
INGRESIngres Database is a commercially supported, open-source SQL relational database management system intended to support large commercial and government applications...
using funding that had already been allocated for a geographical database project, using student programmers to produce code. Beginning in 1973, INGRES delivered its first test products which were generally ready for widespread use in 1979. During this time, a number of people had moved "through" the group — perhaps as many as 30 people worked on the project, about five at a time. INGRES was similar to System R in a number of ways, including the use of a "language" for data access, known as
QUELQUEL is a relational database access language, similar in most ways to SQL. It was created as a part of the Ingres effort at University of California, Berkeley, based on Codd's earlier suggested but not implemented Data Sub-Language ALPHA. QUEL was used for a short time in most products based on...
— QUEL was in fact relational, having been based on Codd's own Alpha language, but has since been corrupted to follow SQL, thus violating much the same concepts of the relational model as SQL itself.
IBM itself did one test implementation of the relational model,
PRTVPRTV was the world's first relational database management system that could handle significant data volumes....
, and a production one,
Business System 12Business System 12, or simply BS12, was one of the first fully relational database management systems, designed and implemented by IBM's Bureau Service subsidiary at the company's international development centre in Uithoorn, Netherlands. Programming started in 1978 and the first version was...
, both now discontinued.
HoneywellHoneywell International, Inc. is a major conglomerate company that produces a variety of consumer products, engineering services, and aerospace systems for a wide variety of customers, from private consumers to major corporations and governments....
did MRDS for
MulticsMultics was an influential early time-sharing operating system. The project was started in 1964 in Cambridge, Massachusetts...
, and now there are two new implementations:
Alphora DataphorDataphor is an open-source truly relational database management system and its accompanying user interface technologies, which together are designed to provide highly declarative software application development...
and
RelRel is an open source true relational database management system that implements a significant portion of Chris Date and Hugh Darwen's Tutorial D query language.Primarily intended for teaching purposes, Rel is written in the Java programming language....
. All other DBMS implementations usually called
relational are actually SQL DBMSs. In 1968, the University of Michigan began development of the
Micro DBMSMicro was one of the earliest set theoretic/relational database management systems. Its major underpinnings and algorithms were based on the set-theoretic model of David Childs of the University of Michigan's CONCOMP Project. It was also influenced to a lesser extent by the relational model made...
. It was used to manage very large data sets by the US Department of Labor, the Environmental Protection Agency and researchers from University of Alberta, the University of Michigan and Wayne State University. It ran on mainframe computers using
Michigan Terminal SystemThe Michigan Terminal System is one of the first time-sharing computer operating systems. Initially developed in 1967 at the University of Michigan for use on IBM S/360-67, S/370 and compatible mainframe computers, it was developed and used by a consortium of eight universities in the United...
. The system remained in production until 1996.
Late-1970s SQL DBMS
IBM started working on a prototype system loosely based on Codd's concepts as
System R in the early 1970s. The first version was ready in 1974/5, and work then started on multi-table systems in which the data could be split so that all of the data for a record (some of which is optional) did not have to be stored in a single large "chunk". Subsequent multi-user versions were tested by customers in 1978 and 1979, by which time a standardized
query languageQuery languages are computer languages used to make queries into databases and information systems.Broadly, query languages can be classified according to whether they are database query languages or information retrieval query languages...
–
SQLSQL is a programming language designed for managing data in relational database management systems ....
– had been added. Codd's ideas were establishing themselves as both workable and superior to Codasyl, pushing IBM to develop a true production version of System R, known as
SQL/DS, and, later,
Database 2 (
DB2The IBM DB2 Enterprise Server Edition is a relational model database server developed by IBM. It primarily runs on Unix , Linux, IBM i , z/OS and Windows servers. DB2 also powers the different IBM InfoSphere Warehouse editions...
).
Many of the people involved with INGRES became convinced of the future commercial success of such systems, and formed their own companies to commercialize the work but with an SQL interface.
SybaseSybase, an SAP company, is an enterprise software and services company offering software to manage, analyze, and mobilize information, using relational databases, analytics and data warehousing solutions and mobile applications development platforms....
,
InformixIBM Informix is a family of relational database management system developed by IBM. It is positioned as IBM's flagship data server for online transaction processing as well as integrated solutions...
,
NonStop SQLNonstop SQL is software that is developed and sold by Hewlett Packard. Nonstop SQL is a commercial relational database management system that is designed for fault tolerance and scalability. The latest version of the product is SQL/MX 3.0. This was released in February 2011.The product was...
and eventually Ingres itself were all being sold as offshoots to the original INGRES product in the 1980s. Even
Microsoft SQL ServerMicrosoft SQL Server is a relational database server, developed by Microsoft: It is a software product whose primary function is to store and retrieve data as requested by other software applications, be it those on the same computer or those running on another computer across a network...
is actually a re-built version of Sybase, and thus, INGRES. Only
Larry EllisonLawrence Joseph "Larry" Ellison is the co-founder and chief executive officer of Oracle Corporation, one of the world's leading enterprise software companies. As of 2011, he is the third wealthiest American citizen, with an estimated worth of $33 billion.- Early life :Larry Ellison was born in the...
's
OracleThe Oracle Database is an object-relational database management system produced and marketed by Oracle Corporation....
started from a different chain, based on IBM's papers on System R, and beat IBM to market when the first version was released in 1978.
Stonebraker went on to apply the lessons from INGRES to develop a new database, Postgres, which is now known as
PostgreSQLPostgreSQL, often simply Postgres, is an object-relational database management system available for many platforms including Linux, FreeBSD, Solaris, MS Windows and Mac OS X. It is released under the PostgreSQL License, which is an MIT-style license, and is thus free and open source software...
. PostgreSQL is often used for global mission critical applications (the .org and .info domain name registries use it as their primary data store, as do many large companies and financial institutions).
In Sweden, Codd's paper was also read and
Mimer SQLMimer SQL is an SQL-based relational database management system from the Swedish company Mimer Information Technology AB , which has been developed and produced since the 1970s. The Mimer SQL database engine is available for Microsoft Windows, Mac OS X, Linux, Symbian OS, Unix, VxWorks and OpenVMS...
was developed from the mid-70s at
Uppsala UniversityUppsala University is a research university in Uppsala, Sweden, and is the oldest university in Scandinavia, founded in 1477. It consistently ranks among the best universities in Northern Europe in international rankings and is generally considered one of the most prestigious institutions of...
. In 1984, this project was consolidated into an independent enterprise. In the early 1980s, Mimer in c introduced transaction handling for high robustness in applications, an idea that was subsequently implemented on most other DBMS.
1980s object-oriented databases
The 1980s, along with a rise in object oriented programming, saw a growth in how data in various databases were handled. Programmers and designers began to treat the data in their databases as objects. That is to say that if a person's data were in a database, that person's attributes, such as their address, phone number, and age, were now considered to belong to that person instead of being extraneous data. This allows for relations between data to be relations to objects and their attributes and not to individual fields.
Another big game changer for databases in the 1980s was the focus on increasing reliability and access speeds. In 1989, two professors from the University of Wisconsin at Madison published an article at an ACM associated conference outlining their methods on increasing database performance. The idea was to replicate specific important, and often queried information, and store it in a smaller temporary database that linked these key features back to the main database. This meant that a query could search the smaller database much quicker, rather than search the entire dataset. This eventually leads to the practice of indexing, which is used by almost every operating system from Windows to the system that operates Apple iPod devices.
21st century NoSQL databases
In the 21st century a new trend of
NoSQLIn computing, NoSQL is a broad class of database management systems that differ from the classic model of the relational database management system in some significant ways. These data stores may not require fixed table schemas, usually avoid join operations, and typically scale horizontally...
databases was started.
Those non-relational databases are significantly different from the classic relational databases. They often do not require fixed table schemas, avoid join operations by storing
denormalizedIn computing, denormalization is the process of attempting to optimise the read performance of a database by adding redundant data or by grouping data. In some cases, denormalisation helps cover up the inefficiencies inherent in relational database software...
data, and are designed to scale horizontally. Most of them can be classified as either key-value stores or
document-oriented databaseA document-oriented database is a computer program designed for storing, retrieving, and managing document-oriented, or semi structured data, information...
s.
In recent years there was a high demand for massively distributed databases with high partition tolerance but according to the
CAP theoremIn theoretical computer science the CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:...
it is impossible for a
distributed systemDistributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal...
to simultaneously provide
consistencyIn computer science, consistency models are used in distributed systems like distributed shared memory systems or distributed data stores . The system supports a given model, if operations on memory follow specific rules...
,
availabilityIn telecommunications and reliability theory, the term availability has the following meanings:* The degree to which a system, subsystem, or equipment is in a specified operable and committable state at the start of a mission, when the mission is called for at an unknown, i.e., a random, time...
and partition tolerance guarantees. A distributed system can satisfy any two of these guarantees at the same time, but not all three. For that reason many NoSQL databases are using what is called
eventual consistencyEventual consistency is one of the consistency models used in the domain of parallel programming, for example in distributed shared memory, distributed transactions, and optimistic replication, it means that given a sufficiently long period of time over which no changes are sent, all updates can be...
to provide both availability and partition tolerance guarantees with a maximum level of data consistency.
The most popular software in that category include:
memcachedIn computing, memcached is a general-purpose distributed memory caching system that was originally developed by Danga Interactive for LiveJournal, but is now used by many other sites. It is often used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the...
,
RedisRedis is used to refer to Romani people.Redis may also refer to:* Redis , an advanced key-value store...
,
MongoDBMongoDB is an open source, high-performance, schema-free, document-oriented database written in the C++ programming language...
,
CouchDBApache CouchDB, commonly referred to as CouchDB, is an open source document-oriented database written mostly in the Erlang programming language. It is part of the NoSQL group of data stores and is designed for local replication and to scale horizontally across a wide range of devices...
, Apache Cassandra and
HBaseHBase is an open source, non-relational, distributed database modeled after Google's BigTable and is written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS , providing BigTable-like capabilities for Hadoop...
.
Current trends
In 1998, database management was in need of a new style of databases to solve current database management problems. Researchers realized that the old trends of database management were becoming too complex and there was a need for automated configuration and management. Surajit Chaudhuri, Gerhard Weikum and Michael Stonebraker were the pioneers that dramatically affected the thought of database management systems. They believed that database management needed a more modular approach and there were too many specifications needed for users. Since this new development process of database management there are more possibilities. Database management is no longer limited to “monolithic entities”. Many solutions have been developed to satisfy the individual needs of users. The development of numerous database options has created flexibility in database management.
There are several ways database management has affected the field of technology. Because organizations' demand for directory services has grown as they expand in size, businesses use directory services that provide prompted searches for company information. Mobile devices are able to store more than just the contact information of users, and can cache and display a large amount of information on smaller displays. Search engine queries are able to locate data within the World Wide Web. Retailers have also benefited from the developments with data warehousing, recording customer transactions. Online transactions have become tremendously popular for e-business. Consumers and businesses are able to make payments securely through some company websites.
Components
- DBMS engine accepts logical requests from various other DBMS subsystems, converts them into physical equivalents, and actually accesses the database and data dictionary as they exist on a storage device.
- Data definition subsystem helps the user create and maintain the data dictionary and define the structure of the files in a database.
- Data manipulation subsystem helps the user to add, change, and delete information in a database and query it for valuable information. Software tools within the data manipulation subsystem are most often the primary interface between user and the information contained in a database. It allows the user to specify its logical information requirements.
- Application generation subsystem contains facilities to help users develop transaction-intensive applications. It usually requires that the user perform a detailed series of tasks to process a transaction. It facilitates easy-to-use data entry screens, programming languages, and interfaces.
- Data administration subsystem helps users manage the overall database environment by providing facilities for backup and recovery, security management, query optimization, concurrency control, and change management.
Modeling language
A modeling language is a
data modelingData modeling in software engineering is the process of creating a data model for an information system by applying formal data modeling techniques.- Overview :...
languageA formal language is a set of words—that is, finite strings of letters, symbols, or tokens that are defined in the language. The set from which these letters are taken is the alphabet over which the language is defined. A formal language is often defined by means of a formal grammar...
to define the
schemaA Logical Schema is a data model of a specific problem domain expressed in terms of a particular data management technology. Without being specific to a particular database management product, it is in terms of either relational tables and columns, object-oriented classes, or XML tags...
of each database hosted in the DBMS, according to the DBMS database model. Database management systems (DBMS) are designed to use one of five database structures to provide simplistic access to information stored in databases. The five database structures are:
- the hierarchical model
A hierarchical database model is a data model in which the data is organized into a tree-like structure. The structure allows representing information using parent/child relationships: each parent can have many children, but each child has only one parent...
,
- the network model
The network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, is not restricted to being a hierarchy or lattice.The...
,
- the relational model
The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...
,
- the multidimensional model, and
- the object model
An object database is a database management system in which information is represented in the form of objects as used in object-oriented programming...
.
Inverted lists and other methods are also used. A given database management system may provide one or more of the five models. The optimal structure depends on the natural organization of the application's data, and on the application's requirements, which include transaction rate (speed), reliability, maintainability, scalability, and cost.
The
hierarchical structure was used in early mainframe DBMS. Records’ relationships form a treelike model. This structure is simple but nonflexible because the relationship is confined to a one-to-many relationship. IBM’s IMS system and the RDM Mobile are examples of a hierarchical database system with multiple hierarchies over the same data. RDM Mobile is a newly designed embedded database for a mobile computer system. The hierarchical structure is used primarily today for storing geographic information and file systems.
The
network structure consists of more complex relationships. Unlike the hierarchical structure, it can relate to many records and accesses them by following one of several paths. In other words, this structure allows for many-to-many relationships.
The
relational structure is the most commonly used today. It is used by mainframe, midrange and microcomputer systems. It uses two-dimensional rows and columns to store data. The tables of records can be connected by common key values. While working for IBM, E.F. Codd designed this structure in 1970. The model is not easy for the end user to run queries with because it may require a complex combination of many tables.
The
multidimensional structure is similar to the relational model. The dimensions of the cube-like model have data relating to elements in each cell. This structure gives a spreadsheet-like view of data. This structure is easy to maintain because records are stored as fundamental attributes—in the same way they are viewed—and the structure is easy to understand. Its high performance has made it the most popular database structure when it comes to enabling online analytical processing (OLAP).
The
object-oriented structure has the ability to handle graphics, pictures, voice and text, types of data, without difficultly unlike the other database structures. This structure is popular for
multimediaMultimedia is media and content that uses a combination of different content forms. The term can be used as a noun or as an adjective describing a medium as having multiple content forms. The term is used in contrast to media which use only rudimentary computer display such as text-only, or...
Web-based applications. It was designed to work with object-oriented programming languages such as
JavaJava is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
.
The dominant model in use today is the ad hoc one embedded in
SQLSQL is a programming language designed for managing data in relational database management systems ....
,despite the objections of purists who believe this model is a corruption of the relational model since it violates several fundamental principles for the sake of practicality and performance. Many DBMSs also support the
Open Database ConnectivityIn computing, ODBC is a standard C interface for accessing database management systems . The designers of ODBC aimed to make it independent of database systems and operating systems...
APIAn application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...
that supports a standard way for programmers to access the DBMS.
Before the database management approach, organizations relied on file processing systems to organize, store, and process data files. End users criticized file processing because the data is stored in many different files and each organized in a different way. Each file was specialized to be used with a specific application. File processing was bulky, costly and nonflexible when it came to supplying needed data accurately and promptly. Data redundancy is an issue with the file processing system because the independent data files produce duplicate data so when updates were needed each separate file would need to be updated. Another issue is the lack of data integration. The data is dependent on other data to organize and store it. Lastly, there was not any consistency or standardization of the data in a file processing system which makes maintenance difficult. For these reasons, the database management approach was produced.
Data structure
Data structureIn computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...
s (
fieldsIn computer science, data that has several parts can be divided into fields. Relational databases arrange data as sets of database records, also called rows. Each record consists of several fields; the fields of all records form the columns....
, records, files and objects) optimized to deal with very large amounts of data stored on a
permanentDigital permanence addresses the history and development of digital storage techniques specifically quantifying the expected lifetime of data stored on various digital media and the factors which influence the permanence of digital data. It is often a mix of ensuring the data itself can be retained...
data storage devicethumb|200px|right|A reel-to-reel tape recorder .The magnetic tape is a data storage medium. The recorder is data storage equipment using a portable medium to store the data....
(which implies relatively slow access
comparedComputer data storage, often called storage or memory, refers to computer components and recording media that retain digital data. Data storage is one of the core functions and fundamental components of computers....
to volatile main memory).
Database query language
A database query language and report object allows users to interactively interrogate the database, analyze its data and update it according to the
users privilegesIn computing, privilege is defined as the delegation of authority over a computer system. A privilege is a permission to perform an action. Examples of various privileges include the ability to create a file in a directory, or to read or delete a file, access a device, or have read or write...
on data. It also controls the security of the database.
Data securityData security is the means of ensuring that data is kept safe from corruption and that access to it is suitably controlled. Thus data security helps to ensure privacy. It also helps in protecting personal data. Data security is part of the larger practice of Information security.- Disk Encryption...
prevents unauthorized users from viewing or updating the database. Using passwords, users are allowed access to the entire database or subsets of it called
subschemas. For example, an employee database can contain all the data about an individual employee, but one group of users may be authorized to view only payroll data, while others are allowed access to only work history and medical data.
If the DBMS provides a way to interactively enter and update the database, as well as interrogate it, this capability allows for managing personal databases. However, it may not leave an audit trail of actions or provide the kinds of controls necessary in a multi-user organization. These controls are only available when a set of application programs are customized for each data entry and updating function.
Transaction mechanism
A
database transactionA transaction comprises a unit of work performed within a database management system against a database, and treated in a coherent and reliable way independent of other transactions...
mechanism ideally guarantees
ACIDIn computer science, ACID is a set of properties that guarantee database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction...
properties in order to ensure
data integrityData Integrity in its broadest meaning refers to the trustworthiness of system resources over their entire life cycle. In more analytic terms, it is "the representational faithfulness of information to the true state of the object that the information represents, where representational faithfulness...
despite
concurrent user accessesIn computer science, concurrency is a property of systems in which several computations are executing simultaneously, and potentially interacting with each other...
(
concurrency controlIn information technology and computer science, especially in the fields of computer programming , operating systems , multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible.Computer...
), and faults (fault tolerance). It also maintains the
integrityData Integrity in its broadest meaning refers to the trustworthiness of system resources over their entire life cycle. In more analytic terms, it is "the representational faithfulness of information to the true state of the object that the information represents, where representational faithfulness...
of the data in the database. The DBMS can maintain the integrity of the database by not allowing more than one user to update the same record at the same time. The DBMS can help prevent duplicate records via unique index constraints; for example, no two customers with the same customer numbers (key fields) can be entered into the database. See
ACIDIn computer science, ACID is a set of properties that guarantee database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction...
properties for more information.bitin ang info
External, logical and internal view
A DBMS Provides the ability for many different users to share data and process resources. As there can be many different users, there are many different database needs. The question is: How can a single, unified database meet varying requirements of so many users?
A DBMS minimizes these problems by providing three views of the database data: an external view (or user view), logical view (or conceptual view) and physical (or internal) view. The user’s view of a database program represents data in a format that is meaningful to a user and to the software programs that process those data.
One strength of a DBMS is that while there is typically only one conceptual (or logical) and physical (or internal) view of the data, there can be an endless number of different external views. This feature allows users to see database information in a more business-related way rather than from a technical, processing viewpoint. Thus the logical view refers to the way the user views the data, and the physical view refers to the way the data are physically stored and processed.
Features and capabilities
Alternatively, and especially in connection with the relational model of database management, the
relationIn relational model:A relation value, which is assigned to a certain relation variable, is time-varying. By using a Data Definition Language , it is able to define relation variables.The following is an example of a heading which consists of three attributes....
between attributes drawn from a specified set of domains can be seen as being primary. For instance, the database might indicate that a car that was originally "red" might fade to "pink" in time, provided it was of some particular "make" with an inferior paint job. Such higher
arityIn logic, mathematics, and computer science, the arity of a function or operation is the number of arguments or operands that the function takes. The arity of a relation is the dimension of the domain in the corresponding Cartesian product...
relationships provide information on all of the underlying domains at the same time, with none of them being privileged above the others.
Simple definition
A database management system is the system in which related data is stored in an efficient or compact manner. "Efficient" means that the data which is stored in the DBMS can be accessed quickly and "compact" means that the data takes up very little space in the computer's memory. The phrase "related data" means that the data stored pertains to a particular topic.
Specialized databases have existed for scientific, imaging, document storage and like uses. Functionality drawn from such applications has begun appearing in mainstream DBMS's as well. However, the main focus, at least when aimed at the commercial data processing market, is still on descriptive attributes on repetitive record structures.
Thus, the DBMSs of today roll together frequently needed services and features of attribute management. By externalizing such functionality to the DBMS, applications effectively share code with each other and are relieved of much internal complexity. Features commonly offered by database management systems include:
Query ability : Querying is the process of requesting attribute information from various perspectives and combinations of factors. Example: "How many 2-door cars in Texas are green?" A database query language and report writer allow users to interactively interrogate the database, analyze its data and update it according to the users privileges on data.
Backup and replication : Copies of attributes need to be made regularly in case primary disks or other equipment fails. A periodic copy of attributes may also be created for a distant organization that cannot readily access the original. DBMS usually provide utilities to facilitate the process of extracting and disseminating attribute sets. When data is replicated between database servers, so that the information remains consistent throughout the database system and users cannot tell or even know which server in the DBMS they are using, the system is said to exhibit replication transparency.
Rule enforcement : Often one wants to apply rules to attributes so that the attributes are clean and reliable. For example, we may have a rule that says each car can have only one engine associated with it (identified by Engine Number). If somebody tries to associate a second engine with a given car, we want the DBMS to deny such a request and display an error message. However, with changes in the model specification such as, in this example, hybrid gas-electric cars, rules may need to change. Ideally such rules should be able to be added and removed as needed without significant data layout redesign.
Security : For security reasons, it is desirable to limit who can see or change specific attributes or groups of attributes. This may be managed directly on an individual basis, or by the assignment of individuals and privileges to groups, or (in the most elaborate models) through the assignment of individuals and groups to roles which are then granted entitlements.
Computation : Common computations requested on attributes are counting, summing, averaging, sorting, grouping, cross-referencing, and so on. Rather than have each computer application implement these from scratch, they can rely on the DBMS to supply such calculations.
Change and access logging : This describes who accessed which attributes, what was changed, and when it was changed. Logging services allow this by keeping a record of access occurrences and changes.
Automated optimization : For frequently occurring usage patterns or requests, some DBMS can adjust themselves to improve the speed of those interactions. In some cases the DBMS will merely provide tools to monitor performance, allowing a human expert to make the necessary adjustments after reviewing the statistics collected.
Meta-data repository
MetadataThe term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...
is data describing data. For example, a listing that describes what attributes are allowed to be in data sets is called "meta-information".
Advanced DBMS
An example of an advanced DBMS is Distributed Data Base Management System (DDBMS), a collection of data which logically belong to the same system but are spread out over the sites of the computer network. The two aspects of a distributed database are distribution and logical correlation:
- Distribution: The fact that the data are not resident at the same site, so that we can distinguish a distributed database from a single, centralized database.
- Logical Correlation: The fact that the data have some properties which tie them together, so that we can distinguish a distributed database from a set of local databases or files which are resident at different sites of a computer network.
See also
- Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
- Column-oriented DBMS
A column-oriented DBMS is a database management system that stores its content by column rather than by row. This has advantages for data warehouses and library catalogues where aggregates are computed over large numbers of similar data items....
- Data warehouse
In computing, a data warehouse is a database used for reporting and analysis. The data stored in the warehouse is uploaded from the operational systems. The data may pass through an operational data store for additional operations before it is used in the DW for reporting.A data warehouse...
- Database-centric architecture
Database-centric architecture or data-centric architecture has several distinct meanings, generally relating to software architectures in which databases play a crucial role. Often this description is meant to contrast the design to an alternative approach...
Further reading