Dirty Data
Encyclopedia
Dirty data is a term used by Information technology
Information technology
Information technology is the acquisition, processing, storage and dissemination of vocal, pictorial, textual and numerical information by a microelectronics-based combination of computing and telecommunications...

 (IT) professionals when referring to inaccurate information (data) collected from data capture forms. It is also used to refer to data which has not yet been committed to the database, and is currently held in memory.

Dirty data can be misleading, incorrect, without generalized formatting, incorrectly spelled or punctuated, entered into the wrong field or duplicated. Dirty data can be prevented using input mask
Input mask
In computer programming, an input mask refers to a string expression, defined by a developer, that governs what a user is allowed to enter in as input in a text box. It can be said to be a template, or set format that entered data must conform to, mainly used for the purposes of data integrity by...

s or validation rule
Validation rule
A Validation rule is a criterion used in the process of data validation, carried out after the data has been encoded onto an input medium and involves a data vet or validation program...

s, but completely removing such data from a source can be impossible or impractical

There are several causes of dirty data. In some cases, the information is deliberately distorted. A person may insert misleading or fictional personal information which appears real. Such dirty data may not be picked up by an administrator or a validation routine because it appears legitimate. Duplicate data can be caused by repeat submissions, user error or incorrect data joining. There can also be formatting issues or typographical errors. A common formatting issue is caused by variations in a user's preference for entering phone numbers.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK