Data migration
Encyclopedia
Data migration is the process of transferring data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...

 between storage
Computer storage
Computer data storage, often called storage or memory, refers to computer components and recording media that retain digital data. Data storage is one of the core functions and fundamental components of computers....

 types, formats, or computer systems. Data migration is usually performed programmatically to achieve an automated migration, freeing up human resources from tedious tasks. It is required when organizations or individuals change computer systems or upgrade to new systems, or when systems merge (such as when the organizations that use them undergo a merger or takeover).

To achieve an effective data migration procedure, data on the old system is mapped to the new system providing a design for data extraction
Data extraction
Data extraction is the act or process of retrieving data out of data sources for further data processing or data storage...

 and data loading. The design relates old data formats to the new system's formats and requirements. Programmatic data migration may involve many phases but it minimally includes data extraction where data is read from the old system and data loading where data is written to the new system.

If a decision has been made to provide a set input file specification for loading data onto the target system, this allows a pre-load 'data validation' step to be put in place, interrupting the standard E(T)L process. Such a data validation process can be designed to interrogate the data to be transferred, to ensure that it meets the predefined criteria of the target environment, and the input file specification. An alternative strategy is to have on-the-fly data validation occurring at the point of loading, which can be designed to report on load rejection errors as the load progresses. However, in the event that the extracted and transformed data elements are highly 'integrated' with one another, and the presence of all extracted data in the target system is essential to system functionality, this strategy can have detrimental, and not easily quantifiable effects.

After loading into the new system, results are subjected to data verification
Data verification
Data Verification is a process wherein the data is checked for accuracy and inconsistencies after data migration is done.It helps to determine whether data was accurately translated when data is transported from one source to another, is complete, and supports processes in the new system...

 to determine whether data was accurately translated, is complete, and supports processes in the new system. During verification, there may be a need for a parallel run of both systems to identify areas of disparity and forestall erroneous data loss
Data loss
Data loss is an error condition in information systems in which information is destroyed by failures or neglect in storage, transmission, or processing. Information systems implement backup and disaster recovery equipment and processes to prevent data loss or restore lost data.Data loss is...

.

Automated and manual data cleaning is commonly performed in migration to improve data quality, eliminate redundant or obsolete information, and match the requirements of the new system.

Data migration phases (design, extraction, cleansing, load, verification) for applications of moderate to high complexity are commonly repeated several times before the new system is deployed.

Categories

Data is stored on various media in files
Computer file
A computer file is a block of arbitrary information, or resource for storing information, which is available to a computer program and is usually based on some kind of durable storage. A file is durable in the sense that it remains available for programs to use after the current program has finished...

 or databases, and is generated and consumed by software applications which in turn support business processes. The need to transfer and convert data can be driven by multiple business requirements and the approach taken to the migration depends on those requirements. Four major migration categories are proposed on this basis.

Storage migration

A business may choose to rationalize the physical media to take advantage of more efficient storage technologies. This will result in having to move physical blocks of data from one tape or disk to another, often using virtualization
Storage Virtualization
Storage virtualization or storage virtualisation is a concept and term used within computer science. Specifically, storage systems may use virtualization concepts as a tool to enable better functionality and more advanced features within the storage system.Broadly speaking, a 'storage system' is...

 techniques. The data format and content itself will not usually be changed in the process and can normally be achieved with minimal or no impact to the layers above.

Database migration

Similarly, it may be necessary to move from one database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

 vendor to another, or to upgrade the version of database software being used. The latter case is less likely to require a physical data migration, but this can happen with major upgrades. In these cases a physical transformation process may be required since the underlying data format can change significantly. This may or may not affect behaviour in the applications layer, depending largely on whether the data manipulation language or protocol has changed – but modern applications are written to be agnostic to the database technology so that a change from Sybase
Sybase
Sybase, an SAP company, is an enterprise software and services company offering software to manage, analyze, and mobilize information, using relational databases, analytics and data warehousing solutions and mobile applications development platforms....

, MySQL
MySQL
MySQL officially, but also commonly "My Sequel") is a relational database management system that runs as a server providing multi-user access to a number of databases. It is named after developer Michael Widenius' daughter, My...

, DB2
IBM DB2
The IBM DB2 Enterprise Server Edition is a relational model database server developed by IBM. It primarily runs on Unix , Linux, IBM i , z/OS and Windows servers. DB2 also powers the different IBM InfoSphere Warehouse editions...

 or SQL Server
Microsoft SQL Server
Microsoft SQL Server is a relational database server, developed by Microsoft: It is a software product whose primary function is to store and retrieve data as requested by other software applications, be it those on the same computer or those running on another computer across a network...

 to Oracle
Oracle Database
The Oracle Database is an object-relational database management system produced and marketed by Oracle Corporation....

 should only require a testing cycle to be confident that both functional and non-functional performance has not been adversely affected.

Application migration

Changing application vendor – for instance a new CRM
Customer relationship management
Customer relationship management is a widely implemented strategy for managing a company’s interactions with customers, clients and sales prospects. It involves using technology to organize, automate, and synchronize business processes—principally sales activities, but also those for marketing,...

 or ERP
Enterprise resource planning
Enterprise resource planning systems integrate internal and external management information across an entire organization, embracing finance/accounting, manufacturing, sales and service, customer relationship management, etc. ERP systems automate this activity with an integrated software application...

 platform – will inevitably involve substantial transformation as almost every application or suite operates on its own specific data model. Further, to allow the application to be sold to the widest possible market, commercial off-the-shelf packages are generally configured for each customer using metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...

. Application programming interfaces (APIs) are supplied to protect the integrity of the data they have to handle. Use of the API is normally a condition of the software warranty, although a waiver may be allowed if the vendor's own or certified partner professional services and all tools are used.

Business process migration

Business processes operate through a combination of human and application systems actions, often orchestrated by business process management tools. When these change they can require the movement of data from one store, database or application to another to reflect the changes to the organization and information about customers, products and operations. Examples of such migration drivers are mergers and acquisitions, business optimization and reorganization to attack new markets or respond to competitive threat.

The first two categories of migration are usually routine operational activities that the IT department takes care of without the involvement of the rest of the business. The last two categories directly affect the operational users of processes and applications, are necessarily complex, and delivering them without significant business downtime can be challenging. A highly adaptive approach, concurrent synchronization, a business-oriented audit capability and clear visibility of the migration for stakeholders are likely to be key requirements in such migrations.

Project versus process

Further, it is helpful to distinguish between data migration and data integration
Data integration
Data integration involves combining data residing in different sources and providing users with a unified view of these data.This process becomes significant in a variety of situations, which include both commercial and scientific domains...

 activities. Data migration is generally the term used to describe a project where data will be moved or copied from one environment to another, and removed or decommissioned in the source. During the migration (which can take place over months or even years), data can flow in multiple directions, and there may be multiple migrations taking place simultaneously. The Extract, transform, load
Extract, transform, load
Extract, transform and load is a process in database usage and especially in data warehousing that involves:* Extracting data from outside sources* Transforming it to fit operational needs...

 actions will be necessary, although the means of achieving these may not be those traditionally associated with the ETL
Extract, transform, load
Extract, transform and load is a process in database usage and especially in data warehousing that involves:* Extracting data from outside sources* Transforming it to fit operational needs...

 acronym.

Data integration by contrast is a permanent part of the IT architecture, and is responsible for the way data flows between the various applications and data stores - and is a process rather than a project activity. Standard ETL technologies designed to supply data from operational systems to data warehouses would fit within the latter category.

Migration as a form of digital preservation

Migration, which focuses on the digital object itself, is the act of transferring, or rewriting data from an out-of-date medium to a current medium and has for many years been considered the only viable approach to long-term preservation of digital objects. Reproducing brittle newspapers onto microfilm
Microform
Microforms are any forms, either films or paper, containing microreproductions of documents for transmission, storage, reading, and printing. Microform images are commonly reduced to about one twenty-fifth of the original document size...

 is an example of such migration.

Disadvantages

  • Migration addresses the possible obsolescence of the data carrier, but does not address the fact that certain technologies which run the data may be abandoned altogether, leaving migration useless.
  • Time consuming – migration is a continual process, which must be repeated every time a medium reaches obsolescence, for all data objects stored on a certain media.
  • Costly - an institution must purchase additional data storage media at each migration.


As a result of the disadvantages listed above, technology professionals have begun to develop alternatives to migration, such as emulation
Emulator
In computing, an emulator is hardware or software or both that duplicates the functions of a first computer system in a different second computer system, so that the behavior of the second system closely resembles the behavior of the first system...

.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK