The term
data refers to qualitative or quantitative attributes of a
variableIn mathematics, a variable is a value that may change within the scope of a given problem or set of operations. In contrast, a constant is a value that remains unchanged, though often unknown or undetermined. The concepts of constants and variables are fundamental to many areas of mathematics and...
or set of variables. Data (plural of "
datum") are typically the results of measurements and can be the basis of
graphIn computer science, a graph is an abstract data structure that is meant to implement the graph and hypergraph concepts from mathematics.A graph data structure consists of a finite set of ordered pairs, called edges or arcs, of certain entities called nodes or vertices...
s,
imageAn image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...
s, or observations of a set of variables. Data are often viewed as the lowest level of
abstractionAbstraction is a process by which higher concepts are derived from the usage and classification of literal concepts, first principles, or other methods....
from which information and then knowledge are derived.
Raw data'\putang inaIn computing, it may have the following attributes: possibly containing errors, not validated; in sfferent formats; uncoded or unformatted; and suspect, requiring confirmation or citation. For example, a data input sheet might contain dates as raw data in many forms: "31st January...
, i.e. unprocessed data, refers to a collection of
numberA number is a mathematical object used to count and measure. In mathematics, the definition of number has been extended over the years to include such numbers as zero, negative numbers, rational numbers, irrational numbers, and complex numbers....
s,
charactersIn computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language....
, images or other outputs from devices that collect information to convert physical quantities into symbols.
The word
data ( , ˈdætə , or ˈdɑːtə ) is the
LatinLatin is an Italic language originally spoken in Latium and Ancient Rome. It, along with most European languages, is a descendant of the ancient Proto-Indo-European language. Although it is considered a dead language, a number of scholars and members of the Christian clergy speak it fluently, and...
plural of
datum,
neuterGrammatical gender is defined linguistically as a system of classes of nouns which trigger specific types of inflections in associated words, such as adjectives, verbs and others. For a system of noun classes to be a gender system, every noun must belong to one of the classes and there should be...
past participle of
dare, "to give", hence "something given". In discussions of problems in
geometryGeometry arose as the field of knowledge dealing with spatial relationships. Geometry was one of the two fields of pre-modern mathematics, the other being the study of numbers ....
,
mathematicsMathematics is the study of quantity, space, structure, and change. Mathematicians seek out patterns and formulate new conjectures. Mathematicians resolve the truth or falsity of conjectures by mathematical proofs, which are arguments sufficient to convince other mathematicians of their validity...
,
engineeringEngineering is the discipline, art, skill and profession of acquiring and applying scientific, mathematical, economic, social, and practical knowledge, in order to design and build structures, machines, devices, systems, materials and processes that safely realize improvements to the lives of...
, and so on, the terms
givens and
data are used interchangeably. Also, data is a representation of a fact, figure, and idea. Such usage is the origin of
data as a concept in
computer scienceComputer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
: data are numbers, words, images, etc., accepted as they stand.
Usage in English
In
EnglishEnglish is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...
, the word
datum is still used in the general sense of "an item given". In
cartographyCartography is the study and practice of making maps. Combining science, aesthetics, and technique, cartography builds on the premise that reality can be modeled in ways that communicate spatial information effectively.The fundamental problems of traditional cartography are to:*Set the map's...
,
geographyGeography is the science that studies the lands, features, inhabitants, and phenomena of Earth. A literal translation would be "to describe or write about the Earth". The first person to use the word "geography" was Eratosthenes...
,
nuclear magnetic resonanceNuclear magnetic resonance is a physical phenomenon in which magnetic nuclei in a magnetic field absorb and re-emit electromagnetic radiation...
and
technical drawingTechnical drawing, also known as drafting or draughting, is the act and discipline of composing plans that visually communicate how something functions or has to be constructed.Drafting is the language of industry....
it is often used to refer to a single specific reference datum from which distances to all other data are measured. Any measurement or result is a
datum, but
data point is more usual, albeit
tautologicalTautology is an unnecessary or unessential repetition of meaning, using different and dissimilar words that effectively say the same thing...
. Both
datums (see usage in datum article) and the originally Latin plural
data are used as the plural of
datum in English, but
data is commonly treated as a
mass nounIn linguistics, a mass noun is a noun that refers to some entity as an undifferentiated unit rather than as something with discrete subsets. Non-count nouns are best identified by their syntactic properties, and especially in contrast with count nouns. The semantics of mass nouns are highly...
and used with a verb in the
singularIn linguistics, grammatical number is a grammatical category of nouns, pronouns, and adjective and verb agreement that expresses count distinctions ....
form, especially in day-to-day usage. For example,
This is all the data from the experiment. This usage is inconsistent with the rules of Latin grammar and traditional English (
These are all the data from the experiment). Even when a very small quantity of data is referenced (One number, for example) the phrase
piece of data is often used, as opposed to
datum. The debate over appropriate usage is ongoing.
The
IEEE Computer SocietyThe IEEE Computer Society is a professional society of IEEE. Its purpose and scope is “to advance the theory, practice, and application of computer and information processing science and technology” and the “professional standing of its members.” The CS is the largest of 38 technical societies...
, allows usage of
data as either a mass noun or plural based on author preference. Other professional organizations and style guides require that authors treat
data as a plural noun. For example, the
Air Force Flight Test CenterThe Air Force Flight Test Center conducts research, development, test, and evaluation of aerospace systems from concept to deployment. It has test flown every aircraft in the U.S. Air Force's inventory since World War II...
specifically states that the word
data is always plural, never singular.
Data is accepted as a singular mass noun in everyday educated usage. Some major newspapers such as
The New York TimesThe New York Times is an American daily newspaper founded and continuously published in New York City since 1851. The New York Times has won 106 Pulitzer Prizes, the most of any news organization...
use it either in the singular or plural. In the
New York Times the phrases "the survey data are still being analyzed" and "the first year for which data is available" have appeared within one day.
In
scientific writing-History:Scientific writing in English started in the 14th century.The Royal Society established good practice for scientific writing. Founder member Thomas Sprat wrote on the importance of plain and accurate description rather than rhetorical flourishes in his History of the Royal Society of London...
data is often treated as a plural, as in
These data do not support the conclusions, but it is also used as a singular mass entity like
information. British usage now widely accepts treating
data as singular in standard English, including everyday newspaper usage at least in non-scientific use. UK scientific publishing still prefers treating it as a plural. Some UK university style guides recommend using
data for both singular and plural use and some recommend treating it only as a singular in connection with computers.
Meaning of data, information and knowledge
The terms
dataThe term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...
,
informationInformation in its most restricted technical sense is a message or collection of messages that consists of an ordered sequence of symbols, or it is the meaning that can be interpreted from such a message or collection of messages. Information can be recorded or transmitted. It can be recorded as...
and
knowledgeKnowledge is a familiarity with someone or something unknown, which can include information, facts, descriptions, or skills acquired through experience or education. It can refer to the theoretical or practical understanding of a subject...
are frequently used for overlapping concepts. The main difference is in the level of
abstractionAbstraction is a process by which higher concepts are derived from the usage and classification of literal concepts, first principles, or other methods....
being considered. Data is the lowest level of abstraction, information is the next level, and finally, knowledge is the highest level among all three. Data on its own carries no meaning. For data to become information, it must be interpreted and take on a meaning. For example, the height of Mt. Everest is generally considered as "data", a book on Mt. Everest geological characteristics may be considered as "information", and a report containing practical information on the best way to reach Mt. Everest's peak may be considered as "knowledge".
Information as a concept bears a diversity of meanings, from everyday usage to technical settings. Generally speaking, the concept of information is closely related to notions of constraint, communication, control, data, form, instruction, knowledge, meaning, mental stimulus, pattern, perception, and representation.
Beynon-Davies uses the concept of a
signA sign is something that implies a connection between itself and its object. A natural sign bears a causal relation to its object—for instance, thunder is a sign of storm. A conventional sign signifies by agreement, as a full stop signifies the end of a sentence...
to distinguish between data and information; data are symbols while information occurs when symbols are used to refer to something.
It is people and computers who collect data and impose patterns on it. These patterns are seen as information which can be used to enhance knowledge. These patterns can be interpreted as truth, and are authorized as aesthetic and ethical criteria. Events that leave behind perceivable physical or virtual remains can be traced back through data. Marks are no longer considered data once the link between the mark and observation is broken.
Raw data refers to a collection of
numberA number is a mathematical object used to count and measure. In mathematics, the definition of number has been extended over the years to include such numbers as zero, negative numbers, rational numbers, irrational numbers, and complex numbers....
s,
charactersIn computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language....
,
imageAn image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...
s or other outputs from devices to convert physical quantities into symbols, that are unprocessed. Such data is typically further processed by a human or
inputIn computing, input/output, or I/O, refers to the communication between an information processing system , and the outside world, possibly a human, or another information processing system. Inputs are the signals or data received by the system, and outputs are the signals or data sent from it...
into a
computerA computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...
, stored and processed there, or transmitted (
outputOutput is the term denoting either an exit or changes which exit a system and which activate/modify a process. It is an abstract concept, used in the modeling, system design and system exploitation.-In control theory:...
) to another human or computer (possibly through a
data cableA data cable is any media that allows baseband transmissions from a transmitter to a receiver.Examples Are:*Networking Media**Ethernet Cables **Token Ring Cables **Coaxial cable...
).
Raw data is a relative term; data processing commonly occurs by stages, and the "processed data" from one stage may be considered the "raw data" of the next.
Mechanical computing devices are classified according to the means by which they represent data. An
analog computerAn analog computer is a form of computer that uses the continuously-changeable aspects of physical phenomena such as electrical, mechanical, or hydraulic quantities to model the problem being solved...
represents a datum as a voltage, distance, position, or other physical quantity. A
digital computerA computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...
represents a datum as a sequence of symbols drawn from a fixed
alphabetAn alphabet is a standard set of letters—basic written symbols or graphemes—each of which represents a phoneme in a spoken language, either as it exists now or as it was in the past. There are other systems, such as logographies, in which each character represents a word, morpheme, or semantic...
. The most common digital computers use a binary alphabet, that is, an alphabet of two characters, typically denoted "0" and "1". More familiar representations, such as numbers or letters, are then constructed from the binary alphabet.
Some special forms of data are distinguished. A
computer programA computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...
is a collection of data, which can be interpreted as instructions. Most computer languages make a distinction between programs and the other data on which programs operate, but in some languages, notably Lisp and similar languages, programs are essentially indistinguishable from other data. It is also useful to distinguish
metadataThe term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...
, that is, a description of other data. A similar yet earlier term for metadata is "ancillary data." The prototypical example of metadata is the library catalog, which is a description of the contents of books.
Experimental dataExperimental data in science is data produced by a measurement, test method, experimental design or quasi-experimental design. In clinical research any data produced as a result of clinical trial...
refers to data generated within the context of a scientific investigation by observation and recording. Field data refers to raw data collected in an uncontrolled
in situIn situ is a Latin phrase which translated literally as 'In position'. It is used in many different contexts.-Aerospace:In the aerospace industry, equipment on board aircraft must be tested in situ, or in place, to confirm everything functions properly as a system. Individually, each piece may...
environment.
See also
- Biological data
Biological data are data or measurements collected from biological sources, which are often stored or exchanged in a digital form. Biological data are commonly stored in files or databases...
- Data acquisition
Data acquisition is the process of sampling signals that measure real world physical conditions and converting the resulting samples into digital numeric values that can be manipulated by a computer. Data acquisition systems typically convert analog waveforms into digital values for processing...
- Data analysis
Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making...
- Data cable
A data cable is any media that allows baseband transmissions from a transmitter to a receiver.Examples Are:*Networking Media**Ethernet Cables **Token Ring Cables **Coaxial cable...
- Data domain
In data management and database analysis, a data domain refers to all the unique values which a data element may contain. The rule for determining the domain boundary may be as simple as a data type with an enumerated list of values....
- Data element
In metadata, the term data element is an atomic unit of data that has precise meaning or precise semantics. A data element has:# An identification such as a data element name# A clear data element definition# One or more representation terms...
- Data farming
Data Farming is the process of using a high performance computer or computing grid to run a simulation thousands or millions of times across a large parameter and value space...
- Data governance
Data governance is an emerging discipline with an evolving definition. The discipline embodies a convergence of data quality, data management, data policies, business process management, and risk management surrounding the handling of data in an organization...
- Data integrity
Data Integrity in its broadest meaning refers to the trustworthiness of system resources over their entire life cycle. In more analytic terms, it is "the representational faithfulness of information to the true state of the object that the information represents, where representational faithfulness...
- Data maintenance
Data maintenance is the adding, deleting, changing and updating of binary and high-level files, and the real world data associated with those files. Data can be maintained manually and/or through an automated program, but at origination and translation/delivery point must be translated into a...
- Data management
Data management comprises all the disciplines related to managing data as a valuable resource.- Overview :The official definition provided by DAMA International, the professional organization for those in the data management profession, is: "Data Resource Management is the development and execution...
- Data mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
- Data modeling
Data modeling in software engineering is the process of creating a data model for an information system by applying formal data modeling techniques.- Overview :...
- Computer data processing
- Data remanence
Data remanence is the residual representation of data that remains even after attempts have been made to remove or erase the data. This residue may result from data being left intact by a nominal file deletion operation, by reformatting of storage media that does not remove data previously written...
- Data set
A data set is a collection of data, usually presented in tabular form. Each column represents a particular variable. Each row corresponds to a given member of the data set in question. Its values for each of the variables, such as height and weight of an object or values of random numbers. Each...
- Data warehouse
In computing, a data warehouse is a database used for reporting and analysis. The data stored in the warehouse is uploaded from the operational systems. The data may pass through an operational data store for additional operations before it is used in the DW for reporting.A data warehouse...
- Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
- Datasheet
thumb|A floppy disk controller datasheet.A datasheet, data sheet, or spec sheet is a document summarizing the performance and other technical characteristics of a product, machine, component , material, a subsystem or software in sufficient detail to be used by a design engineer to integrate the...
- Environmental data rescue
Environmental data rescue is a collection of processes, including photography and scanning, that stores historical and modern environmental data in a usable format. The data is then analyzed and used in scientific models...
- Fieldwork
- Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...
- Scientific data archiving
Scientific data archiving refers to the long-term storage of scientific data and methods. The various scientific journals have differing policies regarding how much of their data and methods scientists are required to store in a public archive, and what is actually archived varies widely between...
- Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
- Datastructure
External links