Data proliferation
Encyclopedia
Data proliferation refers to the prodigious amount of data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...

, structured and unstructured, that businesses and governments continue to generate at an unprecedented rate and the usability
Usability
Usability is the ease of use and learnability of a human-made object. The object of use can be a software application, website, book, tool, machine, process, or anything a human interacts with. A usability study may be conducted as a primary job function by a usability analyst or as a secondary job...

 problems that result from attempting to store and manage that data. While originally pertaining to problems associated with paper documentation
Documentation
Documentation is a term used in several different ways. Generally, documentation refers to the process of providing evidence.Modules of Documentation are Helpful...

, data proliferation has become a major problem in primary and secondary data storage
Data storage device
thumb|200px|right|A reel-to-reel tape recorder .The magnetic tape is a data storage medium. The recorder is data storage equipment using a portable medium to store the data....

 on computers.

While digital storage has become cheaper, the associated costs, from raw power to maintenance and from metadata to search engines, have not kept up with the proliferation of data. Although the power required to maintain a unit of data has fallen, the cost of facilities which house the digital storage has tended to rise.
Data proliferation has been documented as a problem for the U.S. military since August of 1971, in particular regarding the excessive documentation submitted during the acquisition of major weapon systems. Efforts to mitigate data proliferation and the problems associated with it are ongoing.

Problems caused

The problem of data proliferation is affecting all areas of commerce as the result of the availability of relatively inexpensive data storage devices. This has made it very easy to dump data into secondary storage immediately after its window of usability has passed. This masks problems that could gravely affect the profitability of businesses and the efficient functioning of health services, police and security forces, local and national governments, and many other types of organization. Data proliferation is problematic for several reasons:
  • Difficulty when trying to find and retrieve information. At Xerox
    Xerox
    Xerox Corporation is an American multinational document management corporation that produced and sells a range of color and black-and-white printers, multifunction systems, photo copiers, digital production printing presses, and related consulting services and supplies...

    , on average it takes employees more than one hour per week to find
    Document retrieval
    Document retrieval is defined as the matching of some stated user query against a set of free-text records. These records could be any type of mainly unstructured text, such as newspaper articles, real estate records or paragraphs in a manual...

     hard-copy documents, costing $2,152 a year to manage and store them. For businesses with more than 10 employees, this increases to almost two hours per week at $5,760 per year. In large networks of primary and secondary data storage, problems finding electronic data are analogous to problems finding hard copy data.
  • Data loss
    Data loss
    Data loss is an error condition in information systems in which information is destroyed by failures or neglect in storage, transmission, or processing. Information systems implement backup and disaster recovery equipment and processes to prevent data loss or restore lost data.Data loss is...

     and legal liability when data is disorganized, not properly replicated, or cannot be found in a timely manner. In April 2005, Ameritrade Holding Corporation told 200,000 current and past customers that a tape
    Magnetic tape data storage
    Magnetic tape data storage uses digital recording on to magnetic tape to store digital information. Modern magnetic tape is most commonly packaged in cartridges and cassettes. The device that performs actual writing or reading of data is a tape drive...

     containing confidential information had been lost or destroyed in transit. In May of the same year, Time Warner Incorporated reported that 40 tapes containing personal data on 600,000 current and former employees had been lost en route to a storage facility. In March 2005, a Florida judge hearing a $2.7 billion lawsuit against Morgan Stanley issued an "adverse inference
    Adverse inference
    Adverse inference is a legal inference, adverse to the concerned party, drawn from silence or absence of requested evidence. It is part of evidence codes based on common law in various countries....

     order" against the company for "willful and gross abuse of its discovery obligations." The judge cited Morgan Stanley for repeatedly finding misplaced tapes of e-mail messages long after the company had claimed that it had turned over all such tapes to the court.
  • Increased manpower requirements to manage increasingly chaotic data storage resources.
  • Slower networks and application performance due to excess traffic as users search and search again for the material they need.
  • High cost in terms of the energy resources required to operate storage hardware. A 100 terabyte system will cost up to $35,040 a year to run—not counting cooling costs.

Proposed solutions

  • Applications that better utilize modern technology
  • Reductions in duplicate data (especially as caused by data movement)
  • Improvement of metadata
    Metadata
    The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...

     structures
  • Improvement of file and storage transfer structures
  • User education and discipline
  • The implementation of Information Lifecycle Management
    Information Lifecycle Management
    Information Lifecycle Management refers to a wide-ranging set of strategies for administering storage systems on computing devices. Specifically, four categories of storage strategies may be considered under the auspices of ILM.-Policy:...

     solutions to eliminate low-value information as early as possible before putting the rest into actively managed long-term storage in which it can be quickly and cheaply accessed.

See also

  • Backup
    Backup
    In information technology, a backup or the process of backing up is making copies of data which may be used to restore the original after a data loss event. The verb form is back up in two words, whereas the noun is backup....

  • Digital Asset Management
    Digital asset management
    Digital asset management consists of management tasks and decisions surrounding the ingestion, annotation, cataloguing, storage, retrieval and distribution of digital assets...

  • Disk storage
    Disk storage
    Disk storage or disc storage is a general category of storage mechanisms, in which data are digitally recorded by various electronic, magnetic, optical, or mechanical methods on a surface layer deposited of one or more planar, round and rotating disks...

  • Document management system
    Document management system
    A document management system is a computer system used to track and store electronic documents and/or images of paper documents. It is usually also capable of keeping track of the different versions created by different users . The term has some overlap with the concepts of content management...

  • Hierarchical storage management
    Hierarchical storage management
    Hierarchical storage management is a data storage technique which automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, such as hard disk drive arrays, are more expensive than slower devices, such as optical discs and magnetic...

  • Information Lifecycle Management
    Information Lifecycle Management
    Information Lifecycle Management refers to a wide-ranging set of strategies for administering storage systems on computing devices. Specifically, four categories of storage strategies may be considered under the auspices of ILM.-Policy:...

  • Information repository
    Information repository
    An information repository is an easy way to deploy a secondary tier of data storage that can comprise multiple, networked data storage technologies running on diverse operating systems, where data that no longer needs to be in primary storage is protected, classified according to captured metadata,...

  • Magnetic tape data storage
    Magnetic tape data storage
    Magnetic tape data storage uses digital recording on to magnetic tape to store digital information. Modern magnetic tape is most commonly packaged in cartridges and cassettes. The device that performs actual writing or reading of data is a tape drive...

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK