All Topics  
Reliability (statistics)

 

   Email Print
   Bookmark   Link






 

Reliability (statistics)



 
 
In statistics
Statistics

Statistics is a Mathematics pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It also provides tools for prediction and forecasting based on data....
, reliability is the consistency of a set of measurements or measuring instrument, often used to describe a test
Test (student assessment)

A test or an examination is an assessment, often administered on paper or on the Computer-adaptive testing, intended to measure the test-takers' or respondents' knowledge, skills, aptitudes, or classification in many other topics ....
. This can either be whether the measurements of the same instrument give or are likely to give the same measurement (test-retest), or in the case of more subjective instruments, such as personality or trait inventories, whether two independent assessors give similar scores (inter-rater reliability
Inter-rater reliability

Inter-rater reliability, inter-rater agreement, or concordance is the degree of agreement among raters. It gives a score of how much :wikt:homogeneity, or consensus, there is in the ratings given by judges....
). Reliability is inversely related to random error
Random error

Random errors are errors in measurement that lead to measured values being inconsistent when repeated measures of a time-invariant attribute or physical quantity are taken....
.

Reliability does not imply validity.






Discussion
Ask a question about 'Reliability (statistics)'
Start a new discussion about 'Reliability (statistics)'
Answer questions from other users
Full Discussion Forum



Encyclopedia


In statistics
Statistics

Statistics is a Mathematics pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It also provides tools for prediction and forecasting based on data....
, reliability is the consistency of a set of measurements or measuring instrument, often used to describe a test
Test (student assessment)

A test or an examination is an assessment, often administered on paper or on the Computer-adaptive testing, intended to measure the test-takers' or respondents' knowledge, skills, aptitudes, or classification in many other topics ....
. This can either be whether the measurements of the same instrument give or are likely to give the same measurement (test-retest), or in the case of more subjective instruments, such as personality or trait inventories, whether two independent assessors give similar scores (inter-rater reliability
Inter-rater reliability

Inter-rater reliability, inter-rater agreement, or concordance is the degree of agreement among raters. It gives a score of how much :wikt:homogeneity, or consensus, there is in the ratings given by judges....
). Reliability is inversely related to random error
Random error

Random errors are errors in measurement that lead to measured values being inconsistent when repeated measures of a time-invariant attribute or physical quantity are taken....
.

Reliability does not imply validity. That is, a reliable measure is measuring something consistently, but not necessarily what it is supposed to be measuring. For example, while there are many reliable tests of specific abilities, not all of them would be valid for predicting, say, job performance. In terms of accuracy and precision
Accuracy and precision

In the fields of science, engineering, industry and statistics, accuracy is the degree of closeness of a Measure d or calculated quantity to its actual Value ....
, reliability is precision, while validity is accuracy.

In experiment
Experiment

In scientific inquiry, an experiment is a method of investigating causal relationships among variables. An experiment is a cornerstone of the empiricism approach to acquiring data about the world and is used in both natural sciences and social sciences....
al sciences, reliability is the extent to which the measurements of a test remain consistent over repeated tests of the same subject under identical conditions. An experiment is reliable if it yields consistent results of the same measure. It is unreliable if repeated measurements give different results. It can also be interpreted as the lack of random error in measurement.

In engineering
Engineering

Engineering is the discipline and profession of applying Technology and science knowledge and utilizing natural laws and physical resources in order to design and implement materials, structures, machines, devices, systems, and process that safely realize a desired objective and meet specified criteria....
, reliability is the ability of a system or component to perform its required functions under stated conditions for a specified period of time. It is often reported in terms of a probability. Evaluations of reliability involve the use of many statistical tools. See Reliability engineering
Reliability engineering

Reliability engineering is an engineering field, that deals with the study of reliability: the ability of a system or component to perform its required functions under stated conditions for a specified period of time....
 for further discussion.

An example often used to illustrate the difference between reliability and validity in the experimental sciences involves a common bathroom scale. If someone that weighs 200 lbs. steps on the scale 10 times, and it reads "200" each time, then the measurement is reliable and valid. If the scale consistently reads "150", then it is not valid, but it is still reliable because the measurement is very consistent. If the scale varied a lot around 200 (190, 205, 192, 209, etc.), then the scale could be considered valid but not reliable.

Estimation

Reliability may be estimated through a variety of methods that fall into two types: single-administration and multiple-administration. Multiple-administration methods require that two assessments are administered. In the test-retest
Test-retest

Test-retest is a statistical method used to examine how reliable a test is:A test is performed twice, e.g., the same test is given to a group of subjects at two different times....
 method, reliability is estimated as the Pearson product-moment correlation coefficient
Pearson product-moment correlation coefficient

In statistics, the Karl Pearson product-moment correlation coefficient is a common measure of the correlation between two variables X and Y....
 between two administrations of the same measure. In the alternate forms method, reliability is estimated by the Pearson product-moment correlation coefficient of two different forms of a measure, usually administered together. Single-administration methods include split-half and internal consistency
Internal consistency

In statistics and research, internal consistency is a measure based on the correlations between different items on the same test . It measures whether several items that propose to measure the same general construct produce similar scores....
. The split-half method treats the two halves of a measure as alternate forms. This "halves reliability" estimate is then stepped up to the full test length using the Spearman-Brown prediction formula
Spearman-Brown prediction formula

The Spearman-Brown prediction formula is a formula relating psychometric Reliability to testlength:where is the predicted reliability; N is the number of "tests" combined ; and is the reliability of the current "test"....
. The most common internal consistency measure is Cronbach's alpha
Cronbach's alpha

Cronbach's is a statistic. It has an important use as a measure of the Reliability of a psychometrics instrument. It was first named as alpha by Lee Cronbach , as he had intended to continue with further instruments....
, which is usually interpreted as the mean of all possible split-half coefficients. Cronbach's alpha is a generalization of an earlier form of estimating internal consistency, Kuder-Richardson Formula 20
Kuder-Richardson Formula 20

In statistics, the Kuder-Richardson Formula 20 first published in 1937 is a measure of internal consistency Reliability for measures with dichotomous choices....
.

These measures of reliability differ in their sensitivity to different sources of error and so need not be equal. Also, reliability is a property of the scores of a measure rather than the measure itself and are thus said to be sample dependent. Reliability estimates from one sample might differ from those of a second sample (beyond what might be expected due to sampling variations) if the second sample is drawn from a different population because the true reliability is different in this second population. (This is true of measures of all types--yardsticks might measure houses well yet have poor reliability when used to measure the lengths of insects.)

Reliability may be improved by clarity of expression (for written assessments), lengthening the measure, and other informal means. However, formal psychometric analysis, called the item analysis, is considered the most effective way to increase reliability. This analysis consists of computation of item difficulties and item discrimination indices, the latter index involving computation of correlations between the items and sum of the item scores of the entire test. If items that are too difficult, too easy, and/or have near-zero or negative discrimination are replaced with better items, the reliability of the measure will increase.

  • .


  • . (where is the failure rate)


Classical test theory


In classical test theory
Classical test theory

Classical test theory is a body of related psychometric theory that predict outcomes of psychological Statistical hypothesis testinging such as the difficulty of items or the ability of test-takers....
, reliability is defined mathematically as the ratio of the variation of the true score and the variation of the observed score. Or, equivalently, one minus the ratio of the variation of the error score and the variation of the observed score:



where is the symbol for the reliability of the observed score, X; , , and are the variances on the measured, true and error scores respectively. Unfortunately, there is no way to directly observe or calculate the true score, so a variety of methods are used to estimate the reliability of a test.

Some examples of the methods to estimate reliability include test-retest reliability, internal consistency reliability, and parallel-test reliability. Each method comes at the problem of figuring out the source of error in the test somewhat differently.

Item response theory


It was well-known to classical test theorists that measurement precision is not uniform across the scale of measurement. Tests tend to distinguish better for test-takers with moderate trait levels and worse among high- and low-scoring test-takers. Item response theory
Item response theory

In psychometrics, item response theory is a body of theory describing the application of mathematical models to data from questionnaires and Test as a basis for measurement abilities, attitudes, or other variables....
 extends the concept of reliability from a single index to a function called the information function. The IRT information function is the inverse of the conditional observed score standard error at any given test score. Higher levels of IRT information indicate higher precision and thus greater reliability.

See also


  • Accuracy
  • Bayesian inference
    Bayesian inference

    Bayesian inference is statistical inference in which evidence or observations are used to update or to newly infer the probability that a hypothesis may be true....
  • Censoring (statistics)
    Censoring (statistics)

    In statistics and engineering, censoring occurs when the value of an observation is only partially known.For example, suppose a study is conducted to measure the impact of a drug on mortality....
  • Coefficient of variation
    Coefficient of variation

    In probability theory and statistics, the coefficient of variation is a normalization measure of statistical dispersion of a probability distribution....
  • Homogeneity (statistics)
    Homogeneity (statistics)

    In statistics, homogeneity arises in describing the properties of a dataset, or several datasets, and relates to the validity of the often convenient assumption that the statistical properties of any one part of an overall dataset are the same as any other part....
  • Internal consistency
    Internal consistency

    In statistics and research, internal consistency is a measure based on the correlations between different items on the same test . It measures whether several items that propose to measure the same general construct produce similar scores....
  • Levels of measurement
  • Precision
    Precision

    Precision has the following meanings:Concepts* Accuracy and precision, measurement deviation from true value and its scatter* arithmetic precision, the number of digits from which a value is expressed...
  • Proportional reduction in loss
    Proportional reduction in loss

    Proportional reduction in loss refers to a general framework for developing and evaluating measures of data reliability . It was proposed by Bruce Cooil and Roland T....
  • Reliability theory
    Reliability theory

    Reliability theory developed apart from the mainstream of probability and statistics. It was originally a tool to help nineteenth centuryMarine insurance and life insurance companies compute profitable rates to charge their customers....
  • Reliability engineering
    Reliability engineering

    Reliability engineering is an engineering field, that deals with the study of reliability: the ability of a system or component to perform its required functions under stated conditions for a specified period of time....
  • Reproducibility
    Reproducibility

    Reproducibility is one of the main principles of the scientific method, and refers to the ability of a test or experiment to be accurately reproduced, or replicated, by someone else working independently....
  • Scientific method
    Scientific method

    Scientific method refers to techniques for investigating phenomenon, acquiring new knowledge, or correcting and integrating previous knowledge. To be termed scientific, a method of inquiry must be based on gathering observable, empirical and Measure evidence subject to specific principles of reasoning....
  • Statistics
    Statistics

    Statistics is a Mathematics pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It also provides tools for prediction and forecasting based on data....
  • Validity (statistics)
    Validity (statistics)

    In psychology, validity has two distinct fields of application. The first involves test validity, a concept that has evolved with the field of psychometrics: "Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests"....


External links