All Topics  
Validity (statistics)

 

   Email Print
   Bookmark   Link






 

Validity (statistics)



 
 
In psychology
Psychology

Psychology is an academic and applied science discipline involving the science study of human mental functions and behavior. Occasionally it also relies on symbolic hermeneutics and critical theory, although these traditions are less pronounced than in other social sciences such as sociology....
, validity has two distinct fields of application. The first involves test validity
Test validity

In psychological and educational testing, ?Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests? ....
, a concept that has evolved with the field of psychometrics: "Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests". The second involves research design. Here the term refers to the degree to which a study supports the intended conclusion drawn from the results.






Discussion
Ask a question about 'Validity (statistics)'
Start a new discussion about 'Validity (statistics)'
Answer questions from other users
Full Discussion Forum



Encyclopedia


In psychology
Psychology

Psychology is an academic and applied science discipline involving the science study of human mental functions and behavior. Occasionally it also relies on symbolic hermeneutics and critical theory, although these traditions are less pronounced than in other social sciences such as sociology....
, validity has two distinct fields of application. The first involves test validity
Test validity

In psychological and educational testing, ?Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests? ....
, a concept that has evolved with the field of psychometrics: "Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests". The second involves research design. Here the term refers to the degree to which a study supports the intended conclusion drawn from the results. In the Campbellian
Donald T. Campbell

Donald Thomas Campbell was an United States social science. He is noted for his work in methodology. He coined the term "evolutionary epistemology" and developed a selectionist theory of human creativity....
 tradition, this latter sense divides into four aspects: support for the conclusion that the causal variable caused the effect.

Introduction

An early definition of test validity identified it with the degree of correlation between the test and a criterion. Under this definition, one can show that reliability
Reliability (statistics)

In statistics, reliability is the consistency of a set of measurements or measuring instrument, often used to describe a Test . This can either be whether the measurements of the same instrument give or are likely to give the same measurement , or in the case of more subjective instruments, such as personality or trait inventories, whether t...
 of the test and the criterion places an upper limit on the possible correlation between them (the so-called validity coefficient). Intuitively, this reflects the fact that reliability involves freedom from random error and random errors do not correlate with one another. Thus, the less random error in the variables, the higher the possible correlation between them. Under these definitions, a test cannot have high validity unless it also has high reliability. However, the concept of validity has expanded substantially beyond this early definition and the classical relationship between reliability and validity need not hold for alternative conceptions of reliability and validity. Within classical test theory
Classical test theory

Classical test theory is a body of related psychometric theory that predict outcomes of psychological Statistical hypothesis testinging such as the difficulty of items or the ability of test-takers....
, predictive or concurrent validity (correlation between the predictor and the predicted) cannot exceed the square root of the correlation
Correlation

In probability theory and statistics, correlation indicates the strength and direction of a linear relationship between two random variables....
 between two versions of the same measure — that is, reliability limits validity.

Test validity can be assessed in a number of ways and thorough test validation typically involves more than one line of evidence in support of the validity of an assessment method (e.g. structured interview, personality survey, etc). The current Standards for Educational and Psychological Measurement follow Samuel Messick in discussing various types of validity evidence for a single summative validity judgment. These include construct related evidence, content related evidence, and criterion related evidence which breaks down into two subtypes (concurrent and predictive) according to the timing of the data collection.

Construct related evidence involves the empirical and theoretical support for the interpretation of the construct. Such lines of evidence include statistical analyses of the internal structure of the test including the relationships between responses to different test items. They also include relationships between the test and measures of other constructs. As currently understood, construct validity is not distinct from the support for the substantive theory of the construct that the test is designed to measure. As such, experiments designed to reveal aspects of the causal role of the construct also contribute to construct validity evidence.

Content related evidence involves the degree to which the content of the test matches a content domain associated with the construct. For example, a test of the ability to add two-digit numbers should cover the full range of combinations of digits. A test with only one-digit numbers, or only even numbers, would not have good coverage of the content domain. Content related evidence typically involves subject matter experts (SME's) evaluating test items against the test specifications.

Criterion related evidence involves the correlation between the test and a criterion variable (or variables) taken as representative of the construct. For example, employee selection tests are often validated against measures of job performance. Measures of risk of recidivism among those convicted of a crime can be validated against measures of recidivism. If the test data and criterion data are collected at the same time, this is referred to as concurrent validity evidence. If the test data is collected first in order to predict criterion data collected at a later point in time, then this is referred to as predictive validity evidence.

Face validity
Face validity

Face validity is a property of a test intended to measure something. The test is said to have face validity if it "looks like" it is going to measure what it is supposed to measure....
 is an estimate of whether a test appears to measure a certain criterion; it does not guarantee that the test actually measures phenomena in that domain. Indeed, when a test is subject to faking (malingering), low face validity might make the test more valid.

In contrast to test validity, assessment of the validity of a research design generally does not involve data collection or statistical analysis but rather evaluation of the design in relation to the desired conclusion on the basis of prevailing standards and theory of research design.

Types


Internal validity
Internal validity

Internal validity is the validity of inferences in scientific studies, usually based on experiments as experimental Validity .Details ...
 

Internal validity is an inductive
Inductive reasoning

Induction or inductive reasoning, sometimes called inductive logic, is reasoning which takes us "beyond the confines of our current evidence or knowledge to conclusions about the unknown." The premises of an inductive logical argument support the conclusion but do not entailment it; i.e....
 estimate of the degree to which conclusions about causes of relations are likely to be true, in view of the measures used, the research setting, and the whole research design. Good experimental techniques in which the effect of an independent variable on a dependent variable is studied under highly controlled conditions, usually allow for higher degrees of internal validity
Internal validity

Internal validity is the validity of inferences in scientific studies, usually based on experiments as experimental Validity .Details ...
 than, for example, single-case designs.

External validity
External validity

External validity is the validity of generalized inferences in scientific studies, usually based on experiments as experimental Validity .Inferences about cause-effect relationships based on a specific scientific study are said to possess external validity if they may be generalized from the unique and idiosyncratic settings, procedures an...
 

The issue of External validity concerns the question to what extent one may safely generalize the (internally valid) causal inference (a) from the sample studied to the defined target population and (b) to other populations (i.e. across time and space).

Ecological validity
Ecological validity

Ecological validity is a form of Validity in a research study. For a research study to possess ecological validity, the methods, materials and setting of the study must approximate the real-life situation that is under investigation....
 

This issue is closely related to external validity and covers the question to which degree your experimental findings mirror what you can observe in the real world (ecology= science of interaction between organism and its environment). Ecological validity is whether the results can be applied to real life situations. Typically in science, you have two domains of research: Passive-observational and active-experimental. The purpose of experimental designs is to test causality, so that you can infer A causes B or B causes A. But sometimes, ethical and/or methological restrictions prevent you from conducting an experiment (e.g. how does isolation influence a child's cognitive functioning?) Then you can still do research, but it's not causal, it's correlational, A occurs together with B. Both techniques have their strengths and weaknesses. To get an experimental design you have to control for all interfering variables. That's why you conduct your experiment in a laboratory setting. While gaining internal validity (excluding interfering variables by keeping them constant) you lose ecological validity because you establish an artificial lab setting. On the other hand with observational research you can't control for interfering variables (low internal validity) but you can measure in the natural (ecological) environment, thus at the place where behavior occurs.

Population validity


Construct validity
Construct validity

In social science and psychometrics, construct validity refers to whether a scale measures or correlates with a theorized psychological construct ....
 

Construct validity refers to the totality of evidence about whether a particular operationalization of a construct adequately represents what is intended by theoretical account of the construct being measured. (Demonstrate an element is valid by relating it to another element that is supposively valid.) There are two approaches to construct validity- sometimes referred to as 'convergent validity' and 'divergent validity'.

Intentional validity
Intentional validity

Intentional validity asks, "Do the constructs we chose adequately represent what we intend to study"? Constructs must be specific enough to distinguish. ...
 

Validity proves no bias

Representation validity
Representation validity

Representation validity is concerned about how well the constructs or abstractions translate intoobservable measures. There are two primary questions to be answered....
 or translation validity


Content validity
Content validity

In psychometrics, content validity refers to the extent to which a measure represents all facets of a given social concept. For example, a depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension....
 

This is a non-statistical type of validity that involves “the systematic examination of the test content to determine whether it covers a representative sample of the behaviour domain to be measured” (Anatasi & Urbina, 1997 p114).

A test has content validity built into it by careful selection of which items to include (Anatasi & Urbina, 1997). Items are chosen so that they comply with the test specification which is drawn up through a thorough examination of the subject domain. Foxcraft et al (2004, p. 49) note that by using a panel of experts to review the test specifications and the selection of items the content validity of a test can be improved. The experts will be able to review the items and comment on whether the items cover a representative sample of the behaviour domain.

Face validity
Face validity

Face validity is a property of a test intended to measure something. The test is said to have face validity if it "looks like" it is going to measure what it is supposed to measure....
 

Face validity is very closely related to content validity. While content validity depends on a theoretical basis for assuming if a test is assessing all domains of a certain criterion (e.g. does assessing addition skills yield in a good measure for mathematical skills? - To answer this you have to know, what different kinds of arithmetic skills mathematical skills include ) face validity relates to whether a test appears to be a good measure or not. This judgment is made on the "face" of the test, thus it can also be judged by the amateur.

Observation validity


Criterion validity
Criterion validity

In psychometrics, criterion validity is a measure of how well one variable or set of variables predicts an outcome based on information from other variables, and will be achieved if a set of measures from a personality test relate to a behavioral criterion that psychologists agree on....
 

Criterion-related validity reflects the success of measures used for prediction or estimation. There are two types of criterion-related validity: Concurrent and predictive validity. A good example of criterion-related validity is in the validation of employee selection tests; in this case scores on a test or battery of tests is correlated with employee performance scores.

Concurrent validity
Concurrent validity

Concurrent validity is a parameter used in sociology, psychology, and other psychometric or behavioral sciences. Concurrent validity is demonstrated where a test correlates well with a measure that has previously been Validity ....
 
Concurrent validity refers to the degree to which the operationalization correlates with other measures of the same construct that are measured at the same time. Going back to the selection test example, this would mean that the tests are administered to current employees and then correlated with their scores on performance reviews.

Predictive validity
Predictive validity

In psychometrics, predictive validity is the extent to which a test score on a scale or test predicts scores on some criterion measure.For example, the Validity of a cognitive test for job performance is the correlation between test scores and, for example, supervisor performance ratings....
 
Predictive validity refers to the degree to which the operationalization can predict (or correlate with) with other measures of the same construct that are measured at some time in the future. Again, with the selection test example, this would mean that the tests are administered to applicants, all applicants are hired, their performance is reviewed at a later time, and then their scores on the two measures are correlated.

Convergent validity
Convergent validity

Convergent validity is the degree to which an operation is similar to other operations that it theoretically should also be similar to. For instance, to show the convergent validity of a test of mathematics skills, the scores on the test can be correlated with scores on other tests that are also designed to measure basic mathematics ability....
 

Convergent validity refers to the degree to which a measure is correlated with other measures that it is theoretically predicted to correlate with.

Discriminant validity
Discriminant validity

Discriminant validity describes the degree to which the operationalization is not similar to other operationalizations that it theoretically should not be similar to....
 

Discriminant validity describes the degree to which the operationalization does not correlate with other operationalizations that it theoretically should not correlated with.

Social validity


Statistical conclusion validity
Statistical conclusion validity

Statistical conclusion validity establishes the existence and strength of the co-variation between the cause and effect variables. This type of validity involves ensuring adequate sampling procedures, appropriate statistical tests, and reliable measurement procedures....
 


Factors jeopardizing validity


Campbell and Stanley (1963) define internal validity as the basic requirements for an experiment to be interpretable — did the experiment make a difference in this instance? External validity addresses the question of generalizability — to whom can we generalize this experiment's findings?

internal validity
Eight extraneous variables can interfere with internal validity:

1. History, the specific events occurring between the first and second measurements in addition to the experimental variables

2. Maturation, processes within the participants as a function of the passage of time (not specific to particular events), e.g., growing older, hungrier, more tired, and so on.

3. Testing, the effects of taking a test upon the scores of a second testing.

4. Instrumentation, changes in calibration of a measurement tool or changes in the observers or scorers may produce changes in the obtained measurements.

5. Statistical regression, operating where groups have been selected on the basis of their extreme scores.

6. Selection, biases resulting from differential selection of respondents for the comparison groups.

7. Experimental mortality, or differential loss of respondents from the comparison groups.

8. Selection-maturation interaction, etc. e.g., in multiple-group quasi-experimental designs

external validity
Four factors jeopardizing external validity or representativeness are:

9. Reactive or interaction effect of testing, a pretest might increase the scores on a posttest

10. Interaction effects of selection biases and the experimental variable.

11. Reactive effects of experimental arrangements, which would preclude generalization about the effect of the experimental variable upon persons being exposed to it in non-experimental settings

12. Multiple-treatment interference, where effects of earlier treatments are not erasable.

See also


  • Validity (logic)
    Validity

    The term Validity in logic applies to Argument or statements....


External links