Data model
Encyclopedia
A data model in software engineering
Software engineering
Software Engineering is the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software, and the study of these approaches; that is, the application of engineering to software...

 is an abstract model, that documents and organizes the business data for communication between team members and is used as a plan for developing applications, specifically how data is stored and accessed.

According to Hoberman (2009), "A data model is a wayfinding
Wayfinding
Wayfinding encompasses all of the ways in which people and animals orient themselves in physical space and navigate from place to place.-Historical:...

 tool for both business and IT professionals, which uses a set of symbols and text to precisely explain a subset of real information to improve communication within the organization and thereby lead to a more flexible and stable application environment."

A data model explicitly determines the structure of data or structured data. Typical applications of data models include database model
Database model
A database model is the theoretical foundation of a database and fundamentally determines in which manner data can be stored, organized, and manipulated in a database system. It thereby defines the infrastructure offered by a particular database system...

s, design of information system
Information system
An information system - or application landscape - is any combination of information technology and people's activities that support operations, management, and decision making. In a very broad sense, the term information system is frequently used to refer to the interaction between people,...

s, and enabling exchange of data. Usually data models are specified in a data modeling
Data modeling
Data modeling in software engineering is the process of creating a data model for an information system by applying formal data modeling techniques.- Overview :...

 language.

Communication
Communication
Communication is the activity of conveying meaningful information. Communication requires a sender, a message, and an intended recipient, although the receiver need not be present or aware of the sender's intent to communicate at the time of communication; thus communication can occur across vast...

 and precision
Precision (computer science)
In computer science, precision of a numerical quantity is a measure of the detail in which the quantity is expressed. This is usually measured in bits, but sometimes in decimal digits. It is related to precision in mathematics, which describes the number of digits that are used to express a...

 are the two key benefits that make a data model important to applications that use and exchange data. A data model is the medium which project team members from different backgrounds and with different levels of experience can communicate with one another. Precision means that the terms and rules on a data model can be interpreted only one way and are not ambiguous.

A data model can be sometimes referred to as a data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...

, especially in the context of programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

s. Data models are often complemented by function model
Function model
A function model or functional model in systems engineering and software engineering is a structured representation of the functions within the modeled system or subject area....

s, especially in the context of enterprise models.

Overview

Managing large quantities of structured and unstructured data is a primary function of information system
Information system
An information system - or application landscape - is any combination of information technology and people's activities that support operations, management, and decision making. In a very broad sense, the term information system is frequently used to refer to the interaction between people,...

s. Data models describe structured data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...

 for storage in data management systems such as relational databases. They typically do not describe unstructured data, such as word processing
Word processor
A word processor is a computer application used for the production of any sort of printable material....

 documents, email messages
Email
Electronic mail, commonly known as email or e-mail, is a method of exchanging digital messages from an author to one or more recipients. Modern email operates across the Internet or other computer networks. Some early email systems required that the author and the recipient both be online at the...

, pictures, digital audio, and video.

The role of data models

The main aim of data models is to support the development of information system
Information system
An information system - or application landscape - is any combination of information technology and people's activities that support operations, management, and decision making. In a very broad sense, the term information system is frequently used to refer to the interaction between people,...

s by providing the definition and format of data. According to West and Fowler (1999) "if this is done consistently across systems then compatibility of data can be achieved. If the same data structures are used to store and access data then different applications can share data. The results of this are indicated above. However, systems and interfaces often cost more than they should, to build, operate, and maintain. They may also constrain the business rather than support it. A major cause is that the quality of the data models implemented in systems and interfaces is poor".
  • "Business rules, specific to how things are done in a particular place, are often fixed in the structure of a data model. This means that small changes in the way business is conducted lead to large changes in computer systems and interfaces".
  • "Entity types are often not identified, or incorrectly identified. This can lead to replication of data, data structure, and functionality, together with the attendant costs of that duplication in development and maintenance".
  • "Data models for different systems are arbitrarily different. The result of this is that complex interfaces are required between systems that share data. These interfaces can account for between 25-70% of the cost of current systems".
  • "Data cannot be shared electronically with customers and suppliers, because the structure and meaning of data has not been standardised. For example, engineering design data and drawings for process plant are still sometimes exchanged on paper".

The reason for these problems is a lack of standards that will ensure that data models will both meet business needs and be consistent. According to Hoberman (2009), "A data model is a wayfinding tool for both business and IT professionals, which uses a set of symbols and text to precisely explain a subset of real information to improve communication within the organization and thereby lead to a more flexible and stable application environment."[2]

A data model explicitly determines the structure of data or structured data. Typical applications of data models include database models, design of information systems, and enabling exchange of data. Usually data models are specified in a data modeling language.[3]

Communication and precision are the two key benefits that make a data model important to applications that use and exchange data. A data model is the medium which project team members from different backgrounds and with different levels of experience can communicate with one another. Precision means that the terms and rules on a data model can be interpreted only one way and are not ambiguous.[2]

A data model can be sometimes referred to as a data structure, especially in the context of programming languages. Data models are often complemented by function models, especially in the context of enterprise models.

Three perspectives

A data model instance may be one of three kinds according to ANSI
Ansi
Ansi is a village in Kaarma Parish, Saare County, on the island of Saaremaa, Estonia....

 in 1975:
  • Conceptual schema
    Conceptual schema
    A conceptual schema or conceptual data model is a map of concepts and their relationships. This describes the semantics of an organization and represents a series of assertions about its nature...

     : describes the semantics of a domain, being the scope of the model. For example, it may be a model of the interest area of an organization or industry. This consists of entity classes, representing kinds of things of significance in the domain, and relationships assertions about associations between pairs of entity classes. A conceptual schema specifies the kinds of facts or propositions that can be expressed using the model. In that sense, it defines the allowed expressions in an artificial 'language' with a scope that is limited by the scope of the model. The use of conceptual schema has evolved to become a powerful communication tool with business users. Often called a subject area model (SAM) or high-level data model (HDM), this model is used to communicate core data concepts, rules, and definitions to a business user as part of an overall application development or enterprise initiative. The number of objects should be very small and focused on key concepts. Try to limit this model to one page, although for extremely large organizations or complex projects, the model might span two or more pages.
  • Logical schema
    Logical schema
    A Logical Schema is a data model of a specific problem domain expressed in terms of a particular data management technology. Without being specific to a particular database management product, it is in terms of either relational tables and columns, object-oriented classes, or XML tags...

     : describes the semantics, as represented by a particular data manipulation technology. This consists of descriptions of tables and columns, object oriented classes, and XML tags, among other things.
  • Physical schema
    Physical schema
    Describes how data are to be represented and stored in secondary storage using a particular DBMS .Physical Schema is a term used in relation to data management....

     : describes the physical means by which data are stored. This is concerned with partitions, CPUs, tablespaces, and the like.


The significance of this approach, according to ANSI, is that it allows the three perspectives to be relatively independent of each other. Storage technology can change without affecting either the logical or the conceptual model. The table/column structure can change without (necessarily) affecting the conceptual model. In each case, of course, the structures must remain consistent with the other model. The table/column structure may be different from a direct translation of the entity classes and attributes, but it must ultimately carry out the objectives of the conceptual entity class structure. Early phases of many software development projects emphasize the design of a conceptual data model
Conceptual schema
A conceptual schema or conceptual data model is a map of concepts and their relationships. This describes the semantics of an organization and represents a series of assertions about its nature...

. Such a design can be detailed into a logical data model
Logical data model
A logical data model in systems engineering is a representation of an organization's data, organized in terms of entities and relationships and is independent of any particular data management technology.- Overview :...

. In later stages, this model may be translated into physical data model
Physical data model
A physical data model is a representation of a data design which takes into account the facilities and constraints of a given database management system. In the lifecycle of a project it is typically derived from a logical data model, though it may be reverse-engineered from a given database...

. However, it is also possible to implement a conceptual model directly.

History

One of the earliest pioneering works in modelling information systems was done by Young and Kent (1958), who argued for "a precise and abstract way of specifying the informational and time characteristics of a data processing
Data processing
Computer data processing is any process that a computer program does to enter data and summarise, analyse or otherwise convert data into usable information. The process may be automated and run on a computer. It involves recording, analysing, sorting, summarising, calculating, disseminating and...

 problem". They wanted to create "a notation that should enable the analyst
Systems analyst
A systems analyst researches problems, plans solutions, recommends software and systems, and coordinates development to meet business or other requirements. They will be familiar with multiple variety of programming languages, operating systems, and computer hardware platforms...

 to organize the problem around any piece of hardware
Hardware
Hardware is a general term for equipment such as keys, locks, hinges, latches, handles, wire, chains, plumbing supplies, tools, utensils, cutlery and machine parts. Household hardware is typically sold in hardware stores....

". Their work was a first effort to create an abstract specification and invariant basis for designing different alternative implementations using different hardware components. A next step in IS modelling was taken by CODASYL
CODASYL
CODASYL is an acronym for "Conference on Data Systems Languages". This was a consortium formed in 1959 to guide the development of a standard programming language that could be used on many computers...

, an IT industry consortium formed in 1959, who essentially aimed at the same thing as Young and Kent: the development of "a proper structure for machine independent problem definition language, at the system level of data processing". This led to the development of a specific IS information algebra
Information algebra
Classical information theory goes back to Claude Shannon. It is a theory of information transmission, looking at communication and storage. However, it has not been considered so far that information comes from different sources and that it is therefore usually combined...

.

In the 1960s data modeling gained more significance with the initiation of the management information system
Management information system
A management information system provides information needed to manage organizations efficiently and effectively. Management information systems involve three primary resources: people, technology, and information. Management information systems are distinct from other information systems in that...

 (MIS) concept. According to Leondes (2002), "during that time, the information system provided the data and information for management purposes. The first generation database system, called Integrated Data Store
Integrated Data Store
Integrated Data Store is a network database largely used by industry for its performance.IDS was designed by Charles Bachman at General Electric in the 1960s. It was not known to be easy to use or implement applications with, because it was designed to maximize performance using the hardware...

 (IDS), was designed by Charles Bachman
Charles Bachman
Charles William "Charlie" Bachman is an American computer scientist, who spent his entire career as an industrial researcher rather than in academia...

 at General Electric. Two famous database models, the network data model and the hierarchical data model, were proposed during this period of time". Towards the end of the 1960s Edgar F. Codd
Edgar F. Codd
Edgar Frank "Ted" Codd was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases...

 worked out his theories of data arrangement, and proposed the relational model
Relational model
The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...

 for database management based on first-order predicate logic
First-order logic
First-order logic is a formal logical system used in mathematics, philosophy, linguistics, and computer science. It goes by many names, including: first-order predicate calculus, the lower predicate calculus, quantification theory, and predicate logic...

.

In the 1970s entity relationship modeling emerged as a new type of conceptual data modeling, originally proposed in 1976 by Peter Chen
Peter Chen
Dr. Peter Pin-Shan Chen is an American computer scientist and Professor of Computer Science at Louisiana State University, who is known for the development of Entity-Relationship Modeling in 1976.- Biography :...

. Entity relationship models were being used in the first stage of information system
Information system
An information system - or application landscape - is any combination of information technology and people's activities that support operations, management, and decision making. In a very broad sense, the term information system is frequently used to refer to the interaction between people,...

 design during the requirements analysis
Requirements analysis
Requirements analysis in systems engineering and software engineering, encompasses those tasks that go into determining the needs or conditions to meet for a new or altered product, taking account of the possibly conflicting requirements of the various stakeholders, such as beneficiaries or users...

 to describe information needs or the type of information
Information
Information in its most restricted technical sense is a message or collection of messages that consists of an ordered sequence of symbols, or it is the meaning that can be interpreted from such a message or collection of messages. Information can be recorded or transmitted. It can be recorded as...

 that is to be stored in a database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

. This technique can describe any ontology
Ontology (computer science)
In computer science and information science, an ontology formally represents knowledge as a set of concepts within a domain, and the relationships between those concepts. It can be used to reason about the entities within that domain and may be used to describe the domain.In theory, an ontology is...

, i.e., an overview and classification of concepts and their relationships, for a certain area of interest.

In the 1970s G.M. Nijssen
G.M. Nijssen
Sjir Nijssen is a Dutch computer scientist, who was fulltime professor at the University of Queensland. Nijssen is considered the founder of verbalization in computer science, and one of the founders of business modeling and information analysis based on natural language.- Biography :Sjir Nijssen...

 developed "Natural Language Information Analysis Method" (NIAM) method, and developed this in the 1980s in cooperation with Terry Halpin
Terry Halpin
Terence Aidan Halpin is an Australian computer scientist who is known for his formalization of the object role modeling notation.- Biography :...

 into Object-Role Modeling (ORM).

Further in the 1980s according to Jan L. Harrington (2000) "the development of the object-oriented paradigm brought about a fundamental change in the way we look at data and the procedures that operate on data. Traditionally, data and procedures have been stored separately: the data and their relationship in a database, the procedures in an application program. Object orientation, however, combined an entity's procedure with its data."

Database model

A database model
Database model
A database model is the theoretical foundation of a database and fundamentally determines in which manner data can be stored, organized, and manipulated in a database system. It thereby defines the infrastructure offered by a particular database system...

 is a theory or specification describing how a database is structured and used. Several such models have been suggested. Common models include:


  • Flat model
    Flat file database
    A flat file database describes any of various means to encode a database model as a single file .- Overview :...

    : This may not strictly qualify as a data model. The flat (or table) model consists of a single, two-dimensional array of data elements, where all members of a given column are assumed to be similar values, and all members of a row are assumed to be related to one another.
  • Hierarchical model
    Hierarchical model
    A hierarchical database model is a data model in which the data is organized into a tree-like structure. The structure allows representing information using parent/child relationships: each parent can have many children, but each child has only one parent...

    : In this model data is organized into a tree-like structure, implying a single upward link in each record to describe the nesting, and a sort field to keep the records in a particular order in each same-level list.
  • Network model
    Network model
    The network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, is not restricted to being a hierarchy or lattice.The...

    : This model organizes data using two fundamental constructs, called records and sets. Records contain fields, and sets define one-to-many relationships between records: one owner, many members.
  • Relational model
    Relational model
    The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...

    : is a database model based on first-order predicate logic. Its core idea is to describe a database as a collection of predicates over a finite set of predicate variables, describing constraints on the possible values and combinations of values.



  • Object-relational model: Similar to a relational database model, but objects, classes and inheritance are directly supported in database schemas and in the query language.
  • Star schema
    Star schema
    In computing, the star schema is the simplest style of data warehouse schema. The star schema consists of one or more fact tables referencing any number of dimension tables...

     is the simplest style of data warehouse schema. The star schema consists of a few "fact tables" (possibly only one, justifying the name) referencing any number of "dimension tables". The star schema is considered an important special case of the snowflake schema.

Data Structure Diagram

A data structure diagram
Data structure diagram
A Data Structure Diagram is a data model used to describe conceptual data models by providing graphical notations which document entities and their relationships, and the constraints that binds them....

 (DSD) is a diagram
Diagram
A diagram is a two-dimensional geometric symbolic representation of information according to some visualization technique. Sometimes, the technique uses a three-dimensional visualization which is then projected onto the two-dimensional surface...

 and data model used to describe conceptual data models
Conceptual schema
A conceptual schema or conceptual data model is a map of concepts and their relationships. This describes the semantics of an organization and represents a series of assertions about its nature...

 by providing graphical notations which document entities
Entity
An entity is something that has a distinct, separate existence, although it need not be a material existence. In particular, abstractions and legal fictions are usually regarded as entities. In general, there is also no presumption that an entity is animate.An entity could be viewed as a set...

 and their relationship
Relational model
The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...

s, and the constraint
Integrity constraints
Integrity constraints are used to ensure accuracy and consistency of data in a relational database. Data integrity is handled in a relational database through the concept of referential integrity...

s that bind them. The basic graphic elements of DSDs are box
Box
Box describes a variety of containers and receptacles for permanent use as storage, or for temporary use often for transporting contents. The word derives from the Greek πύξος , "box, boxwood"....

es, representing entities, and arrow
Arrow
An arrow is a shafted projectile that is shot with a bow. It predates recorded history and is common to most cultures.An arrow usually consists of a shaft with an arrowhead attached to the front end, with fletchings and a nock at the other.- History:...

s, representing relationships. Data structure diagrams are most useful for documenting complex data entities.

Data structure diagrams are an extension of the entity-relationship model
Entity-relationship model
In software engineering, an entity-relationship model is an abstract and conceptual representation of data. Entity-relationship modeling is a database modeling method, used to produce a type of conceptual schema or semantic data model of a system, often a relational database, and its requirements...

 (ER model). In DSDs, attribute
Attribute (computing)
In computing, an attribute is a specification that defines a property of an object, element, or file. It may also refer to or set the specific value for a given instance of such....

s are specified inside the entity boxes rather than outside of them, while relationships are drawn as boxes composed of attributes which specify the constraints that bind entities together. The E-R model, while robust, doesn't provide a way to specify the constraints between relationships, and becomes visually cumbersome when representing entities with several attributes. DSDs differ from the ER model in that the ER model focuses on the relationships between different entities, whereas DSDs focus on the relationships of the elements within an entity and enable users to fully see the links and relationships between each entity.

There are several styles for representing data structure diagrams, with the notable difference in the manner of defining cardinality
Cardinality (data modeling)
In data modeling, the cardinality of one data table with respect to another data table is a critical aspect of database design. Relationships between data tables define cardinality when explaining how each table links to another....

. The choices are between arrow heads, inverted arrow heads (crow's feet
Entity-relationship model
In software engineering, an entity-relationship model is an abstract and conceptual representation of data. Entity-relationship modeling is a database modeling method, used to produce a type of conceptual schema or semantic data model of a system, often a relational database, and its requirements...

), or numerical representation of the cardinality.

Entity-relationship model

An entity-relationship model
Entity-relationship model
In software engineering, an entity-relationship model is an abstract and conceptual representation of data. Entity-relationship modeling is a database modeling method, used to produce a type of conceptual schema or semantic data model of a system, often a relational database, and its requirements...

 (ERM) is an abstract conceptual data model
Conceptual schema
A conceptual schema or conceptual data model is a map of concepts and their relationships. This describes the semantics of an organization and represents a series of assertions about its nature...

 (or semantic data model
Semantic data model
A semantic data model in software engineering has various meanings:# It is a conceptual data model in which semantic information is included. This means that the model describes the meaning of its instances...

) used in software engineering
Software engineering
Software Engineering is the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software, and the study of these approaches; that is, the application of engineering to software...

 to represent structured data. There are several notations used for ERMs.

Geographic data model

A data model
Data model (GIS)
A data model in geographic information systems is a mathematical construct for representing geographic objects or surfaces as data. For example, the vector data model represents geography as collections of points, lines, and polygons; the raster data model represent geography as cell matrixes that...

 in Geographic information system
Geographic Information System
A geographic information system, geographical information science, or geospatial information studies is a system designed to capture, store, manipulate, analyze, manage, and present all types of geographically referenced data...

s is a mathematical construct for representing geographic objects or surfaces as data. For example,
  • the vector
    Vector graphics
    Vector graphics is the use of geometrical primitives such as points, lines, curves, and shapes or polygon, which are all based on mathematical expressions, to represent images in computer graphics...

     data model represents geography as collections of points, lines, and polygons;
  • the raster
    Raster graphics
    In computer graphics, a raster graphics image, or bitmap, is a data structure representing a generally rectangular grid of pixels, or points of color, viewable via a monitor, paper, or other display medium...

     data model represent geography as cell matrixes that store numeric values;
  • and the Triangulated irregular network
    Triangulated irregular network
    A triangulated irregular network is a digital data structure used in a geographic information system for the representation of a surface...

     (TIN) data model represents geography as sets of contiguous, nonoverlapping triangles.



Generic data model

Generic data model
Generic data model
Generic data models are generalizations of conventional data models. They define standardised general relation types, together with the kinds of things that may be related by such a relation type.- Overview :...

s are generalizations of conventional data models. They define standardised general relation types, together with the kinds of things that may be related by such a relation type. Generic data models are developed as an approach to solve some shortcomings of conventional data models. For example, different modelers usually produce different conventional data models of the same domain. This can lead to difficulty in bringing the models of different people together and is an obstacle for data exchange and data integration. Invariably, however, this difference is attributable to different levels of abstraction in the models and differences in the kinds of facts that can be instantiated (the semantic expression capabilities of the models). The modelers need to communicate and agree on certain elements which are to be rendered more concretely, in order to make the differences less significant.

Semantic data model

A semantic data model
Semantic data model
A semantic data model in software engineering has various meanings:# It is a conceptual data model in which semantic information is included. This means that the model describes the meaning of its instances...

 in software engineering is a technique to define the meaning of data within the context of its interrelationships with other data. A semantic data model is an abstraction which defines how the stored symbols relate to the real world. A semantic data model is sometimes called a conceptual data model.

The logical data structure of a database management system
Database management system
A database management system is a software package with computer programs that control the creation, maintenance, and use of a database. It allows organizations to conveniently develop databases for various applications by database administrators and other specialists. A database is an integrated...

 (DBMS), whether hierarchical
Hierarchical model
A hierarchical database model is a data model in which the data is organized into a tree-like structure. The structure allows representing information using parent/child relationships: each parent can have many children, but each child has only one parent...

, network
Network model
The network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, is not restricted to being a hierarchy or lattice.The...

, or relational
Relational model
The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...

, cannot totally satisfy the requirements
Requirements analysis
Requirements analysis in systems engineering and software engineering, encompasses those tasks that go into determining the needs or conditions to meet for a new or altered product, taking account of the possibly conflicting requirements of the various stakeholders, such as beneficiaries or users...

 for a conceptual definition of data because it is limited in scope and biased toward the implementation strategy employed by the DBMS. Therefore, the need to define data from a conceptual view
Three schema approach
The three-schema approach, or the Three Schema Concept, in software engineering is an approach to building information systems and systems information management from the 1970s...

 has led to the development of semantic data modeling techniques. That is, techniques to define the meaning of data within the context of its interrelationships with other data. As illustrated in the figure. The real world, in terms of resources, ideas, events, etc., are symbolically defined within physical data stores. A semantic data model is an abstraction which defines how the stored symbols relate to the real world. Thus, the model must be a true representation of the real world.

Data architecture

Data architecture
Data architecture
Data Architecture in enterprise architecture is the design of data for use in defining the target state and the subsequent planning needed to achieve the target state...

 is the design of data for use in defining the target state and the subsequent planning needed to hit the target state. It is usually one of several architecture domain
Architecture domain
An architecture domain in enterprise architecture is a broad view of an enterprise or system. It is a partial representation of a whole system that addresses several concerns of several stakeholders...

s that form the pillars of an enterprise architecture
Enterprise architecture
An enterprise architecture is a rigorous description of the structure of an enterprise, which comprises enterprise components , the externally visible properties of those components, and the relationships between them...

 or solution architecture
Solution architecture
Solution architecture in enterprise architecture is a kind of architecture domain, that aims to address specific problems and requirements, usually through the design of specific information systems or applications.Solution architecture is either:...

.

A data architecture describes the data structures used by a business and/or its applications. There are descriptions of data in storage and data in motion; descriptions of data stores, data groups and data items; and mappings of those data artifacts to data qualities, applications, locations etc.

Essential to realizing the target state, Data architecture describes how data is processed, stored, and utilized in a given system. It provides criteria for data processing operations that make it possible to design data flows and also control the flow of data in the system.

Data modeling

Data modeling
Data modeling
Data modeling in software engineering is the process of creating a data model for an information system by applying formal data modeling techniques.- Overview :...

 in software engineering
Software engineering
Software Engineering is the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software, and the study of these approaches; that is, the application of engineering to software...

 is the process of creating a data model by applying formal data model descriptions using data modeling techniques. Data modeling is a technique for defining business requirement
Requirement
In engineering, a requirement is a singular documented physical and functional need that a particular product or service must be or perform. It is most commonly used in a formal sense in systems engineering, software engineering, or enterprise engineering...

s for a database. It is sometimes called database modeling because a data model is eventually implemented in a database.

The figure illustrates the way data models are developed and used today. A conceptual data model is developed based on the data requirements for the application that is being developed, perhaps in the context of an activity model
Activity diagram
Activity diagrams are graphical representations of workflows of stepwise activities and actions with support for choice, iteration and concurrency. In the Unified Modeling Language, activity diagrams can be used to describe the business and operational step-by-step workflows of components in a system...

. The data model will normally consist of entity types, attributes, relationships, integrity rules, and the definitions of those objects. This is then used as the start point for interface or database design.

Data properties

Some important properties of data for which requirements need to be met are:
  • definition-related properties
    • relevance: the usefulness of the data in the context of your business.
    • clarity: the availability of a clear and shared definition for the data.
    • consistency: the compatibility of the same type of data from different sources.

  • content-related properties
    • timeliness: the availability of data at the time required and how up to date that data is.
    • accuracy: how close to the truth the data is.
  • properties related to both definition and content
    • completeness: how much of the required data is available.
    • accessibility: where, how, and to whom the data is available or not available (e.g. security).
    • cost: the cost incurred in obtaining the data, and making it available for use.

Data organization

Another kind of data model describes how to organize data using a database management system
Database management system
A database management system is a software package with computer programs that control the creation, maintenance, and use of a database. It allows organizations to conveniently develop databases for various applications by database administrators and other specialists. A database is an integrated...

 or other data management technology. It describes, for example, relational tables and columns or object-oriented classes and attributes. Such a data model is sometimes referred to as the physical data model
Physical data model
A physical data model is a representation of a data design which takes into account the facilities and constraints of a given database management system. In the lifecycle of a project it is typically derived from a logical data model, though it may be reverse-engineered from a given database...

, but in the original ANSI three schema architecture, it is called "logical". In that architecture, the physical model describes the storage media (cylinders, tracks, and tablespaces). Ideally, this model is derived from the more conceptual data model described above. It may differ, however, to account for constraints like processing capacity and usage patterns.

While data analysis is a common term for data modeling, the activity actually has more in common with the ideas and methods of synthesis (inferring general concepts from particular instances) than it does with analysis (identifying component concepts from more general ones). {Presumably we call ourselves systems analysts
Systems analyst
A systems analyst researches problems, plans solutions, recommends software and systems, and coordinates development to meet business or other requirements. They will be familiar with multiple variety of programming languages, operating systems, and computer hardware platforms...

 because no one can say systems synthesists.
} Data modeling strives to bring the data structures of interest together into a cohesive, inseparable, whole by eliminating unnecessary data redundancies and by relating data structures with relationships
Relational model
The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...

.

A different approach is through the use of adaptive systems such as artificial neural networks that can autonomously create implicit models of data.

Data structure

A data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...

 is a way of storing data in a computer so that it can be used efficiently. It is an organization of mathematical and logical concepts of data. Often a carefully chosen data structure will allow the most efficient
Algorithmic efficiency
In computer science, efficiency is used to describe properties of an algorithm relating to how much of various types of resources it consumes. Algorithmic efficiency can be thought of as analogous to engineering productivity for a repeating or continuous process, where the goal is to reduce...

 algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

 to be used. The choice of the data structure often begins from the choice of an abstract data type
Abstract data type
In computing, an abstract data type is a mathematical model for a certain class of data structures that have similar behavior; or for certain data types of one or more programming languages that have similar semantics...

.

A data model describes the structure of the data within a given domain and, by implication, the underlying structure of that domain itself. This means that a data model in fact specifies a dedicated grammar for a dedicated artificial language for that domain. A data model represents classes of entities (kinds of things) about which a company wishes to hold information, the attributes of that information, and relationships among those entities and (often implicit) relationships among those attributes. The model describes the organization of the data to some extent irrespective of how data might be represented in a computer system.

The entities represented by a data model can be the tangible entities, but models that include such concrete entity classes tend to change over time. Robust data models often identify abstraction
Abstraction
Abstraction is a process by which higher concepts are derived from the usage and classification of literal concepts, first principles, or other methods....

s of such entities. For example, a data model might include an entity class called "Person", representing all the people who interact with an organization. Such an abstract entity class is typically more appropriate than ones called "Vendor" or "Employee", which identify specific roles played by those people.


Data model theory

The term data model can have two meanings:
  1. A data model theory, i.e. a formal description of how data may be structured and accessed.
  2. A data model instance, i.e. applying a data model theory to create a practical data model instance for some particular application.


A data model theory has three main components:
  • The structural part: a collection of data structures which are used to create databases representing the entities or objects modeled by the database.
  • The integrity part: a collection of rules governing the constraints placed on these data structures to ensure structural integrity.
  • The manipulation part: a collection of operators which can be applied to the data structures, to update and query the data contained in the database.


For example, in the relational model
Relational model
The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...

, the structural part is based on a modified concept of the mathematical relation
Relation (mathematics)
In set theory and logic, a relation is a property that assigns truth values to k-tuples of individuals. Typically, the property describes a possible connection between the components of a k-tuple...

; the integrity part is expressed in first-order logic
First-order logic
First-order logic is a formal logical system used in mathematics, philosophy, linguistics, and computer science. It goes by many names, including: first-order predicate calculus, the lower predicate calculus, quantification theory, and predicate logic...

 and the manipulation part is expressed using the relational algebra
Relational algebra
Relational algebra, an offshoot of first-order logic , deals with a set of finitary relations that is closed under certain operators. These operators operate on one or more relations to yield a relation...

, tuple calculus and domain calculus.

A data model instance is created by applying a data model theory. This is typically done to solve some business enterprise requirement. Business requirements are normally captured by a semantic logical data model
Logical data model
A logical data model in systems engineering is a representation of an organization's data, organized in terms of entities and relationships and is independent of any particular data management technology.- Overview :...

. This is transformed into a physical data model instance from which is generated a physical database. For example, a data modeler may use a data modeling tool to create an entity-relationship model
Entity-relationship model
In software engineering, an entity-relationship model is an abstract and conceptual representation of data. Entity-relationship modeling is a database modeling method, used to produce a type of conceptual schema or semantic data model of a system, often a relational database, and its requirements...

 of the corporate data repository of some business enterprise. This model is transformed into a relational model
Relational model
The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...

, which in turn generates a relational database
Relational database
A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...

.

Data flow diagram

A data flow diagram
Data flow diagram
A data flow diagram is a graphical representation of the "flow" of data through an information system, modelling its process aspects. Often they are a preliminary step used to create an overview of the system which can later be elaborated...

 (DFD) is a graphical representation of the "flow" of data through an information system
Information system
An information system - or application landscape - is any combination of information technology and people's activities that support operations, management, and decision making. In a very broad sense, the term information system is frequently used to refer to the interaction between people,...

. It differs from the flowchart
Flowchart
A flowchart is a type of diagram that represents an algorithm or process, showing the steps as boxes of various kinds, and their order by connecting these with arrows. This diagrammatic representation can give a step-by-step solution to a given problem. Process operations are represented in these...

 as it shows the data flow instead of the control flow of the program. A data flow diagram can also be used for the visualization
Data visualization
Data visualization is the study of the visual representation of data, meaning "information that has been abstracted in some schematic form, including attributes or variables for the units of information"....

 of data processing
Data processing
Computer data processing is any process that a computer program does to enter data and summarise, analyse or otherwise convert data into usable information. The process may be automated and run on a computer. It involves recording, analysing, sorting, summarising, calculating, disseminating and...

 (structured design). Data flow diagrams were invented by Larry Constantine
Larry Constantine
Larry LeRoy Constantine is an American software engineer and professor in the Mathematics and Engineering Department at the University of Madeira Portugal, who is considered one of the pioneers of computing...

, the original developer of structured design, based on Martin and Estrin's "data flow graph" model of computation.

It is common practice to draw a context-level Data flow diagram
System context diagram
A System Context Diagram in software engineering and systems engineering is a diagram that represents the actors outside a system that could interact with that system. This diagram is the highest level view of a system. It is similar to a Block diagram...

 first which shows the interaction between the system and outside entities. The DFD is designed to show how a system is divided into smaller portions and to highlight the flow of data between those parts. This context-level Data flow diagram is then "exploded" to show more detail of the system being modeled

Information model

An Information model
Information model
An information model in software engineering is a representation of concepts, relationships, constraints, rules, and operations to specify data semantics for a chosen domain of discourse...

 is not a type of data model, but more or less an alternative model. Within the field of software engineering both a data model and an information model can be abstract, formal representations of entity types that includes their properties, relationships and the operations that can be performed on them. The entity types in the model may be kinds of real-world objects, such as devices in a network, or they may themselves be abstract, such as for the entities used in a billing system. Typically, they are used to model a constrained domain that can be described by a closed set of entity types, properties, relationships and operations.

According to Lee (1999) an information model is a representation of concepts, relationships, constraints, rules, and operations
Operation (mathematics)
The general operation as explained on this page should not be confused with the more specific operators on vector spaces. For a notion in elementary mathematics, see arithmetic operation....

 to specify data semantics
Semantic data model
A semantic data model in software engineering has various meanings:# It is a conceptual data model in which semantic information is included. This means that the model describes the meaning of its instances...

 for a chosen domain of discourse. It can provide sharable, stable, and organized structure of information requirements for the domain context. More in general the term information model is used for models of individual things, such as facilities, buildings, process plants, etc. In those cases the concept is specialised to Facility Information Model
Facility information model
A facility information model is an information model of an individual facility that is integrated with data and documents about the facility. The facility can be any large facility that is designed, fabricated, constructed and installed, operated, maintained and modified; for example, a complete...

, Building Information Model
Building Information Modeling
Building information modeling is the process of generating and managing building data during its life cycle.BIM involves representing a design as objects – vague and undefined, generic or product-specific, solid shapes or void-space oriented , that carry their geometry, relations and attributes...

, Plant Information Model, etc. Such an information model is an integration of a model of the facility with the data and documents about the facility.

An information model provides formalism to the description of a problem domain without constraining how that description is mapped to an actual implementation in software. There may be many mappings of the information model. Such mappings are called data models, irrespective of whether they are object model
Object model
In computing, object model has two related but distinct meanings:# The properties of objects in general in a specific computer programming language, technology, notation or methodology that uses them. For example, the Java objects model, the COM object model, or the object model of OMT...

s (e.g. using UML
Unified Modeling Language
Unified Modeling Language is a standardized general-purpose modeling language in the field of object-oriented software engineering. The standard is managed, and was created, by the Object Management Group...

), entity relationship model
Entity-relationship model
In software engineering, an entity-relationship model is an abstract and conceptual representation of data. Entity-relationship modeling is a database modeling method, used to produce a type of conceptual schema or semantic data model of a system, often a relational database, and its requirements...

s or XML schema
XML schema
An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself...

s.

Object model

An object model
Object model
In computing, object model has two related but distinct meanings:# The properties of objects in general in a specific computer programming language, technology, notation or methodology that uses them. For example, the Java objects model, the COM object model, or the object model of OMT...

 in computer science is a collection of objects or classes through which a program can examine and manipulate some specific parts of its world. In other words, the object-oriented interface to some service or system. Such an interface is said to be the object model of the represented service or system. For example, the Document Object Model (DOM)
Document Object Model
The Document Object Model is a cross-platform and language-independent convention for representing and interacting with objects in HTML, XHTML and XML documents. Aspects of the DOM may be addressed and manipulated within the syntax of the programming language in use...

 http://www.w3.org/DOM/ is a collection of objects that represent a page
Web page
A web page or webpage is a document or information resource that is suitable for the World Wide Web and can be accessed through a web browser and displayed on a monitor or mobile device. This information is usually in HTML or XHTML format, and may provide navigation to other web pages via hypertext...

 in a web browser
Web browser
A web browser is a software application for retrieving, presenting, and traversing information resources on the World Wide Web. An information resource is identified by a Uniform Resource Identifier and may be a web page, image, video, or other piece of content...

, used by script
Scripting language
A scripting language, script language, or extension language is a programming language that allows control of one or more applications. "Scripts" are distinct from the core code of the application, as they are usually written in a different language and are often created or at least modified by the...

 programs to examine and dynamically change the page. There is a Microsoft Excel
Microsoft Excel
Microsoft Excel is a proprietary commercial spreadsheet application written and distributed by Microsoft for Microsoft Windows and Mac OS X. It features calculation, graphing tools, pivot tables, and a macro programming language called Visual Basic for Applications...

 object model for controlling Microsoft Excel from another program, and the ASCOM
ASCOM (standard)
ASCOM is an open initiative to provide a standard interface to a range of astronomy equipment including mounts, focusers and imaging devices in a Microsoft Windows environment....

 Telescope Driver is an object model for controlling an astronomical telescope.

In computing
Computing
Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...

 the term object model has a distinct second meaning of the general properties of objects
Object (computer science)
In computer science, an object is any entity that can be manipulated by the commands of a programming language, such as a value, variable, function, or data structure...

 in a specific computer programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

, technology, notation or methodology
Methodology
Methodology is generally a guideline for solving a problem, with specificcomponents such as phases, tasks, methods, techniques and tools . It can be defined also as follows:...

 that uses them. For example, the Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

 object model
, the COM
Component Object Model
Component Object Model is a binary-interface standard for software componentry introduced by Microsoft in 1993. It is used to enable interprocess communication and dynamic object creation in a large range of programming languages...

 object model
, or the object model of OMT
Object-modeling technique
The object-modeling technique is an object modeling language for software modeling and designing. It was developed around 1991 by Rumbaugh, Blaha, Premerlani, Eddy and Lorensen as a method to develop object-oriented systems and to support object-oriented programming.OMT was developed as an...

. Such object models are usually defined using concepts such as class
Class (computer science)
In object-oriented programming, a class is a construct that is used as a blueprint to create instances of itself – referred to as class instances, class objects, instance objects or simply objects. A class defines constituent members which enable these class instances to have state and behavior...

, message, inheritance
Inheritance (computer science)
In object-oriented programming , inheritance is a way to reuse code of existing objects, establish a subtype from an existing object, or both, depending upon programming language support...

, polymorphism
Polymorphism in object-oriented programming
Subtype polymorphism, almost universally called just polymorphism in the context of object-oriented programming, is the ability to create a variable, a function, or an object that has more than one form. The word derives from the Greek "πολυμορφισμός" meaning "having multiple forms"...

, and encapsulation
Information hiding
In computer science, information hiding is the principle of segregation of the design decisions in a computer program that are most likely to change, thus protecting other parts of the program from extensive modification if the design decision is changed...

. There is an extensive literature on formalized object models as a subset of the formal semantics of programming languages
Formal semantics of programming languages
In programming language theory, semantics is the field concerned with the rigorous mathematical study of the meaning of programming languages and models of computation...

.

Object-Role Model

Object-Role Modeling (ORM) is a method for conceptual modeling, and can be used as a tool for information and rules analysis.

Object-Role Modeling is a fact-oriented method for performing systems analysis
Systems analysis
Systems analysis is the study of sets of interacting entities, including computer systems analysis. This field is closely related to requirements analysis or operations research...

 at the conceptual level. The quality of a database application depends critically on its design. To help ensure correctness, clarity, adaptability and productivity, information systems are best specified first at the conceptual level, using concepts and language that people can readily understand.

The conceptual design may include data, process and behavioral perspectives, and the actual DBMS used to implement the design might be based on one of many logical data models (relational, hierarchic, network, object-oriented etc.).

Unified Modeling Language models

The Unified Modeling Language
Unified Modeling Language
Unified Modeling Language is a standardized general-purpose modeling language in the field of object-oriented software engineering. The standard is managed, and was created, by the Object Management Group...

 (UML) is a standardized general-purpose modeling language
Modeling language
A modeling language is any artificial language that can be used to express information or knowledge or systems in a structure that is defined by a consistent set of rules...

 in the field of software engineering
Software engineering
Software Engineering is the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software, and the study of these approaches; that is, the application of engineering to software...

. It is a graphical language
Graphical language
Graphical language may refer to:* Visual language, a system of communication using visual elements.* Visual programming language, a computer programming language that lets create programs by manipulating program elements graphically....

 for visualizing, specifying, constructing, and documenting the artifacts
Artifact (software development)
An artifact is one of many kinds of tangible by-product produced during the development of software. Some artifacts help describe the function, architecture, and design of software...

 of a software-intensive system. The Unified Modeling Language offers a standard way to write a system's blueprints, including:
  • Conceptual things such as business process
    Business process
    A business process or business method is a collection of related, structured activities or tasks that produce a specific service or product for a particular customer or customers...

    es and system functions
  • Concrete things such as programming language
    Programming language
    A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

     statements, database schemas, and
  • Reusable software components
    Component-based software engineering
    Component-based software engineering is a branch of software engineering that emphasizes the separation of concerns in respect of the wide-ranging functionality available throughout a given software system...

    .

UML offers a mix of functional models, data models, and database model
Database model
A database model is the theoretical foundation of a database and fundamentally determines in which manner data can be stored, organized, and manipulated in a database system. It thereby defines the infrastructure offered by a particular database system...

s.

See also

  • Database design
    Database design
    Database design is the process of producing a detailed data model of a database. This logical data model contains all the needed logical and physical design choices and physical storage parameters needed to generate a design in a Data Definition Language, which can then be used to create a database...

  • Business process model
  • Core Architecture Data Model
    Core Architecture Data Model
    Core architecture data model in enterprise architecture is a logical data model of information used to describe and build architectures....

  • Database system
    Database system
    A database system is a term that is typically used to encapsulate the constructs of a data model, database Management system and database....

  • Data dictionary
    Data dictionary
    A data dictionary, or metadata repository, as defined in the IBM Dictionary of Computing, is a "centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format." The term may have one of several closely related meanings pertaining to...

  • Diagram
    Diagram
    A diagram is a two-dimensional geometric symbolic representation of information according to some visualization technique. Sometimes, the technique uses a three-dimensional visualization which is then projected onto the two-dimensional surface...

  • Enterprise model
  • Entity-Relationship Model
    Entity-relationship model
    In software engineering, an entity-relationship model is an abstract and conceptual representation of data. Entity-relationship modeling is a database modeling method, used to produce a type of conceptual schema or semantic data model of a system, often a relational database, and its requirements...

  • Function model
    Function model
    A function model or functional model in systems engineering and software engineering is a structured representation of the functions within the modeled system or subject area....

  • IDEF1X
    IDEF1X
    IDEF1X is a data modeling language for the developing of semantic data models. IDEF1X is used to produce a graphical information model which represents the structure and semantics of information within an environment or system.IDEF1X permits the construction of semantic data models which may serve...

  • Information model
    Information model
    An information model in software engineering is a representation of concepts, relationships, constraints, rules, and operations to specify data semantics for a chosen domain of discourse...

  • Information system
    Information system
    An information system - or application landscape - is any combination of information technology and people's activities that support operations, management, and decision making. In a very broad sense, the term information system is frequently used to refer to the interaction between people,...

  • JC3IEDM
    JC3IEDM
    JC3IEDM, or Joint Consultation, Command and Control Information Exchange Data Model is a model that when physically implemented aims to enable the interoperability of systems and projects required to share Command and Control information...

  • Ontology
    Ontology (computer science)
    In computer science and information science, an ontology formally represents knowledge as a set of concepts within a domain, and the relationships between those concepts. It can be used to reason about the entities within that domain and may be used to describe the domain.In theory, an ontology is...

  • Process model
  • XML schema
    XML schema
    An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself...

  • Data Format Description Language (DFDL)
    Data Format Description Language
    Data Format Description Language is a modeling language from the Open Grid Forum for describing general text and binary data. A DFDL model or schema allows any text or binary data to be read from its native format and to be presented as an instance of an information set...


Further reading

  • David C. Hay (1996). Data Model Patterns: Conventions of Thought. New York:Dorset House Publishers, Inc.
  • Matthew West and Julian Fowler (1999). Developing High Quality Data Models. The European Process Industries STEP Technical Liaison Executive (EPISTLE).
  • Len Silverston (2001). The Data Model Resource Book Volume 1/2. John Wiley & Sons.
  • RFC 3444 - On the Difference between Information Models and Data Models
  • Len Silverston & Paul Agnew (2008). The Data Model Resource Book: Universal Patterns for data Modeling Volume 3. John Wiley & Sons.
  • Steve Hoberman, Donna Burbank, & Chris Bradley (2009). Data Modeling for the Business. Technics Publications, LLC
  • Andy Graham (2010), The Enterprise Data Model: a framework for enterprise data architecture
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK