Federated database system
Encyclopedia
A federated database system is a type of meta-database management system
Database management system
A database management system is a software package with computer programs that control the creation, maintenance, and use of a database. It allows organizations to conveniently develop databases for various applications by database administrators and other specialists. A database is an integrated...

 (DBMS), which transparently integrates multiple autonomous database systems
Database management system
A database management system is a software package with computer programs that control the creation, maintenance, and use of a database. It allows organizations to conveniently develop databases for various applications by database administrators and other specialists. A database is an integrated...

 into a single federated database. The constituent database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

s are interconnected via a computer network
Computer network
A computer network, often simply referred to as a network, is a collection of hardware components and computers interconnected by communication channels that allow sharing of resources and information....

 and may be geographically decentralized. Since the constituent database systems remain autonomous, a federated database system is a contrastable alternative to the (sometimes daunting) task of merging together several disparate databases. A federated database, or virtual database, is the fully integrated, logical composite of all constituent databases in a federated database system.

McLeod and Heimbigner were among the first to define a federated database system, as one which "define[s] the architecture and interconnect[s] databases that minimize central authority yet support partial sharing and coordination among database systems".

Through data abstraction, federated database systems can provide a uniform user interface
User interface
The user interface, in the industrial design field of human–machine interaction, is the space where interaction between humans and machines occurs. The goal of interaction between a human and a machine at the user interface is effective operation and control of the machine, and feedback from the...

, enabling users
User (computing)
A user is an agent, either a human agent or software agent, who uses a computer or network service. A user often has a user account and is identified by a username , screen name , nickname , or handle, which is derived from the identical Citizen's Band radio term.Users are...

 and clients
Client (computing)
A client is an application or system that accesses a service made available by a server. The server is often on another computer system, in which case the client accesses the service by way of a network....

 to store and retrieve data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...

 in multiple noncontiguous database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

s with a single query
Information retrieval
Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...

 -- even if the constituent databases are heterogeneous. To this end, a federated database system must be able to decompose the query into subqueries for submission to the relevant constituent DBMS's
Database management system
A database management system is a software package with computer programs that control the creation, maintenance, and use of a database. It allows organizations to conveniently develop databases for various applications by database administrators and other specialists. A database is an integrated...

, after which the system must composite the result set
Result set
An SQL result set is a set of rows from a database, as well as meta-information about the query such as the column names, and the types and sizes of each column. Depending on the database system, the number of rows in the result set may or may not be known. Usually, this number is not known up...

s of the subqueries. Because various database management systems employ different query language
Query language
Query languages are computer languages used to make queries into databases and information systems.Broadly, query languages can be classified according to whether they are database query languages or information retrieval query languages...

s, federated database systems can apply wrappers
Wrapper function
A wrapper function is a function in a computer program whose main purpose is to call a second function with little or no additional computation. This is also known as method delegation. Wrapper functions can be used for a number of purposes....

 to the subqueries to translate them into the appropriate query language
Query language
Query languages are computer languages used to make queries into databases and information systems.Broadly, query languages can be classified according to whether they are database query languages or information retrieval query languages...

s.
  • Note: this description of federated databases does not accurately reflect the McLeod/Heimbigner definition of a federated database. Rather, this description fits what McLeod/Heimbinger called a composite database. McLeod/Heimbigner's federated database is a collection of autonomous components that make their data available to other members of the federation through the publication of an export schema and access operations; there is no unified, central schema that encompasses the information available from the members of the federation.


Among other surveys, defines a Federated Database as a collection of cooperating component systems which are autonomous and are possibly heterogeneous
Heterogeneous Database System
A Heterogeneous Database System is an automated system for the integration of heterogeneous, disparate database management systems to present a user with a single, unified query interface....

. The three important components of an FDBS as pointed out in are autonomy, heterogeneity
Heterogeneous Database System
A Heterogeneous Database System is an automated system for the integration of heterogeneous, disparate database management systems to present a user with a single, unified query interface....

 and distribution. Another dimension which has also been considered is the Networking Environment Computer Network
Computer network
A computer network, often simply referred to as a network, is a collection of hardware components and computers interconnected by communication channels that allow sharing of resources and information....

, e.g., many DBSs over a LAN
Local area network
A local area network is a computer network that interconnects computers in a limited area such as a home, school, computer laboratory, or office building...

 or many DBSs over a WAN
Wide area network
A wide area network is a telecommunication network that covers a broad area . Business and government entities utilize WANs to relay data among employees, clients, buyers, and suppliers from various geographical locations...

 update related functions of participating DBSs (e.g., no updates, nonatomic transitions, atomic updates). Master data management
Master Data Management
In computing, master data management comprises a set of processes and tools that consistently defines and manages the non-transactional data entities of an organization...

 or MDM
Master Data Management
In computing, master data management comprises a set of processes and tools that consistently defines and manages the non-transactional data entities of an organization...

 is a relatively new term for similar practices.

FDBS Architecture

A DBMS
Database management system
A database management system is a software package with computer programs that control the creation, maintenance, and use of a database. It allows organizations to conveniently develop databases for various applications by database administrators and other specialists. A database is an integrated...

 can be classified as either centralized or distributed. A centralized system manages a single database while distributed manages multiple databases. A component DBS
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

 in a DBMS may be centralized or distributed. A multiple DBS (MDBS) can be classified into two types depending on the autonomy of the component DBS as federated and non federated. A nonfederated database system is an integration of component DBMS
Database management system
A database management system is a software package with computer programs that control the creation, maintenance, and use of a database. It allows organizations to conveniently develop databases for various applications by database administrators and other specialists. A database is an integrated...

 that are not autonomous.
A federated database system consists of component DBS
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

 that are autonomous yet participate in a federation to allow partial and controlled sharing of their data.

Federated architectures differ based on levels of integration with the component database systems and the extent of services offered by the federation. A FDBS can be categorized as loosely or tightly coupled systems.
  • Loosely Coupled require component databases to construct their own federated schema
    Database schema
    A database schema of a database system is its structure described in a formal language supported by the database management system and refers to the organization of data to create a blueprint of how a database will be constructed...

    . A user will typically access other component database systems by using a multidatabase language but this removes any levels of location transparency, forcing the user to have direct knowledge of the federated schema. A user imports the data they require from other component databases and integrates it with their own to form a federated schema.
  • Tightly coupled system consists of component systems that use independent processes to construct and publicize an integrated federated schema.


Multiple DBS of which FDBS are a specific type can be characterized along three dimensions: Distribution, Heterogeneity and Autonomy. Another characterization could be based on the dimension of networking For e.g. single databases or multiple databases in a LAN
Local area network
A local area network is a computer network that interconnects computers in a limited area such as a home, school, computer laboratory, or office building...

 or WAN
Wide area network
A wide area network is a telecommunication network that covers a broad area . Business and government entities utilize WANs to relay data among employees, clients, buyers, and suppliers from various geographical locations...

.

Distribution

Distribution of data in an FDBS is due to the existence of a multiple DBS before an FDBS is built. Data can be distributed among multiple DB which could be stored in a single computer or multiple computers. These computers could be geographically located in different places but interconnected by a network. The benefits of data distribution help in increased availability and reliability as well as improved access times.

Heterogeneous Database System|Heterogeneity

Heterogeneities
Heterogeneous Database System
A Heterogeneous Database System is an automated system for the integration of heterogeneous, disparate database management systems to present a user with a single, unified query interface....

 in databases arise due to several factors. Some of them occur due to differences in
structures, semantics of data, the constraints supported or query
Query language
Query languages are computer languages used to make queries into databases and information systems.Broadly, query languages can be classified according to whether they are database query languages or information retrieval query languages...

 language. Differences in structure occur when two data model
Data model
A data model in software engineering is an abstract model, that documents and organizes the business data for communication between team members and is used as a plan for developing applications, specifically how data is stored and accessed....

s provide different primitives such as object oriented (OO) models
Object-Oriented Modeling
Object-oriented modeling , also called object-oriented programming is a modeling paradigm mainly used in computer programming. Prior to the rise of OOM, the dominant paradigm was procedural programming, which emphasized the use of discrete reusable code blocks that could stand on their own, take...

 that support specialization and inheritance and relational model
Relational model
The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...

s that do not. Differences due to constraints occur when two models support two different constraints. For example the set type in CODASYL
CODASYL
CODASYL is an acronym for "Conference on Data Systems Languages". This was a consortium formed in 1959 to guide the development of a standard programming language that could be used on many computers...

 schema
Database schema
A database schema of a database system is its structure described in a formal language supported by the database management system and refers to the organization of data to create a blueprint of how a database will be constructed...

 may be partially modeled as a referential integrity constraint in a relationship schema. CODASYL
CODASYL
CODASYL is an acronym for "Conference on Data Systems Languages". This was a consortium formed in 1959 to guide the development of a standard programming language that could be used on many computers...

 supports insertion and retention that are not captured by referential integrity alone. The query language supported by a DBMSs
Database management system
A database management system is a software package with computer programs that control the creation, maintenance, and use of a database. It allows organizations to conveniently develop databases for various applications by database administrators and other specialists. A database is an integrated...

 can also contribute to heterogeneity
Heterogeneous Database System
A Heterogeneous Database System is an automated system for the integration of heterogeneous, disparate database management systems to present a user with a single, unified query interface....

 between other component DBMSs
Database management system
A database management system is a software package with computer programs that control the creation, maintenance, and use of a database. It allows organizations to conveniently develop databases for various applications by database administrators and other specialists. A database is an integrated...

. For example differences in query languages with same data model
Data model
A data model in software engineering is an abstract model, that documents and organizes the business data for communication between team members and is used as a plan for developing applications, specifically how data is stored and accessed....

s or different versions of query languages could contribute heterogeneity
Heterogeneous Database System
A Heterogeneous Database System is an automated system for the integration of heterogeneous, disparate database management systems to present a user with a single, unified query interface....

.

Semantic heterogeneities arise when there is a disagreement about meaning, interpretation or intended use of data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...

. At the schema and data level, some of the possible classification of Heterogeneities that occur are
  • Naming Conflicts e.g. Database
    Database
    A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

    s using different names to represent the same concept.
  • Domain Conflicts or Data
    Data
    The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...

     Representation conflicts e.g. Database
    Database
    A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

    s using different values to represent same concept.
  • Precision Conflicts e.g. Database
    Database
    A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

    s using same data values from domains of different cardinalities for same data
    Data
    The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...

    .
  • Metadata
    Metadata
    The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...

     Conflicts e.g. same concepts are represented at schema
    Database schema
    A database schema of a database system is its structure described in a formal language supported by the database management system and refers to the organization of data to create a blueprint of how a database will be constructed...

     level and instance level.
  • Data
    Data
    The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...

     Conflicts e.g. Missing attributes
  • Schema
    Database schema
    A database schema of a database system is its structure described in a formal language supported by the database management system and refers to the organization of data to create a blueprint of how a database will be constructed...

     Conflicts e.g. Table versus table conflict which includes naming conflicts, data conflicts etc.


In creating a federated schema, one has to resolve such heterogeneities before integrating the component DB schemas.

Schema matching, schema mapping

Dealing with incompatible data types or query syntax is not the only obstacle to a concrete implementation of an FDBS. In systems that are not planned top-down, a generic problem lies in matching semantically equivalent
Semantic equivalence
In computer metadata, semantic equivalence is a declaration that two data elements from different vocabularies contain data that has similar meaning...

, but differently named parts from different schemas
Logical schema
A Logical Schema is a data model of a specific problem domain expressed in terms of a particular data management technology. Without being specific to a particular database management product, it is in terms of either relational tables and columns, object-oriented classes, or XML tags...

 (=data models) (tables, attributes). A pairwise mapping between n attributes would result in mapping rules (given equivalence mappings) - a number that quickly gets too large for practical purposes. A common way out is to provide a global schema that comprises the relevant parts of all member schemas and provide mappings in the form of database views. Two principal solutions can be realized, depending on the direction of the mapping:
  1. Global as View (GaV): the global schema is defined in terms of the underlying schemas
  2. Local as View (LaV): the local schemas are defined in terms of the global schema

Both are explained in more detail in the article Data integration
Data integration
Data integration involves combining data residing in different sources and providing users with a unified view of these data.This process becomes significant in a variety of situations, which include both commercial and scientific domains...

.
Alternate approaches to the schema matching problem and a classification of the same are explained in more detail in the article Schema Matching
Schema matching
The terms schema matching and mapping are often used interchangeably. For this article, we differentiate the two as follows: Schema matching is the process of identifying that two objects are semantically related while mapping refers to the transformations between the objects...


Autonomy

Fundamental to the difference between an MDBS and an FDBS is the concept of autonomy. It is important to understand the aspects of autonomy for component databases and how they can be addressed when a component DBS participates in an FDBS.
There are four kinds of autonomies addressed
  • Design Autonomy which refers to ability to choose its design irrespective of data, query language or conceptualization, functionality of the system implementation.

Heterogeneities
Heterogeneous Database System
A Heterogeneous Database System is an automated system for the integration of heterogeneous, disparate database management systems to present a user with a single, unified query interface....

 in an FDBS are primarily due to design autonomy.
  • Communication autonomy refers to the general operation of the DBMS to communicate with other DBMS
    Database management system
    A database management system is a software package with computer programs that control the creation, maintenance, and use of a database. It allows organizations to conveniently develop databases for various applications by database administrators and other specialists. A database is an integrated...

     or not.
  • Execution autonomy allows a component DBMS to control the operations requested by local and external operations.
  • Association autonomy gives a power to component DBS to disassociate itself from a federation which means FDBS can operate independently of any single DBS
    Database
    A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

    .


The ANSI/X3/SPARC Study Group outlined a three level data description architecture, the components of which are the conceptual schema, internal schema and external schema of databases. The three level architecture is however inadequate to describing the architectures of an FDBS. It was therefore extended to support the three dimensions of the FDBS namely Distribution, Autonomy and Heterogeneity. The five level schema architecture is explained below.

Concurrency control

The Heterogeneity and Autonomy requirements pose special challenges concerning concurrency control
Concurrency control
In information technology and computer science, especially in the fields of computer programming , operating systems , multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible.Computer...

 in an FDBS, which is crucial for the correct execution of its concurrent transactions
Database transaction
A transaction comprises a unit of work performed within a database management system against a database, and treated in a coherent and reliable way independent of other transactions...

 (see also Global concurrency control
Global concurrency control
Global concurrency control typically pertains to the concurrency control of a system comprising several components, each with its own concurrency control. The overall concurrency control of the whole system, the Global concurrency control, is determined by the concurrency control of its components,...

). Achieving global serializability
Global serializability
In concurrency control of databases, transaction processing , and other transactional distributed applications, Global serializability is a property of a global schedule of transactions...

, the major correctness criterion, under these requirements has been characterized as very difficult and unsolved. Commitment ordering
Commitment ordering
In concurrency control of databases, transaction processing , and related applications, Commitment ordering is a class of interoperable Serializability techniques, both centralized and distributed. It allows optimistic implementations...

, introduced in 1991, has provided a general solution for this issue (See Global serializability
Global serializability
In concurrency control of databases, transaction processing , and other transactional distributed applications, Global serializability is a property of a global schedule of transactions...

; See Commitment ordering
Commitment ordering
In concurrency control of databases, transaction processing , and related applications, Commitment ordering is a class of interoperable Serializability techniques, both centralized and distributed. It allows optimistic implementations...

 also for the architectural aspects of the solution).

Five Level Schema Architecture for FDBSs

The five level schema architecture includes the following:
  • Local Schema is the conceptual concept expressed in primary data model of component DBMS.
  • Component Schema is derived by translating local schema into a model called the canonical data model or common data model. They are useful when semantics missed in local schema are incorporated in the component. They help in integration of data for tightly coupled FDBS.
  • Export Schema represents a subset of a component schema that is available to the FDBS. It may include access control information regarding its use by specific federation user. The export schema help in managing flow of control of data.
  • Federated Schema is an integration of multiple export schema. It includes information on data distribution that is generated when integrating export schemas.
  • External Schema defines a schema for a user/applications or a class of users/applications.


While accurately representing the state of the art in data integration, the Five Level Schema Architecture above does suffer from a major drawback, namely IT imposed look and feel. Modern data users demand control over how data is presented; their needs are somewhat in conflict with such bottom-up approaches to data integration.

External links


See also

  • Enterprise Information Integration
    Enterprise Information Integration
    Enterprise Information Integration , is a process of information integration, using data abstraction to provide a unified interface for viewing all the data within an organization, and a single set of structures and naming conventions to represent this data; the goal of EII is to get a large set of...

     (EII)
  • Data Virtualization
    Data virtualization
    Data virtualization describes the process of abstracting disparate data sources through a single data access layer ....

  • Master data management
    Master Data Management
    In computing, master data management comprises a set of processes and tools that consistently defines and manages the non-transactional data entities of an organization...

     (MDM)
  • Schema Matching
    Schema matching
    The terms schema matching and mapping are often used interchangeably. For this article, we differentiate the two as follows: Schema matching is the process of identifying that two objects are semantically related while mapping refers to the transformations between the objects...

  • Universal relation assumption
    Universal relation assumption
    In relational databases, the universal relation assumption states that one can place all data attributes into a table, which may then be decomposed into smaller tables as needed....

  • Linked Data
    Linked Data
    In computing, linked data describes a method of publishing structured data so that it can be interlinked and become more useful. It builds upon standard Web technologies such as HTTP and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a...

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK