Equating

Equating

Discussion

Encyclopedia
Test equating traditionally refers to the statistical process of determining comparable scores on different forms of an exam. It can be accomplished using either classical test theory
Classical test theory
Classical test theory is a body of related psychometric theory that predict outcomes of psychological testing such as the difficulty of items or the ability of test-takers. Generally speaking, the aim of classical test theory is to understand and improve the reliability of psychological...

or item response theory
Item response theory
In psychometrics, item response theory also known as latent trait theory, strong true score theory, or modern mental test theory, is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measuring abilities, attitudes, or other variables. It is based...

.

In item response theory, equating is the process of equating the units and origins of two scales on which the abilities of students have been estimated from results on different tests. The process is analogous to equating degrees Fahrenheit with degrees Celsius by converting measurements from one scale to the other. The determination of comparable scores is a by-product of equating that results from equating the scales obtained from test results.

Why is equating necessary?

Suppose that Dick and Jane both take a test to become licensed in a certain profession. Because the high stakes
High-stakes testing
A high-stakes test is a test with important consequences for the test taker. Passing has important benefits, such as a high school diploma, a scholarship, or a license to practice a profession...

(you get to practice the profession if you pass the test) may create a temptation to cheat, the organization that oversees the test creates two forms. If we know that Dick scored 60% on form A and Jane score 70% on form B, do we know for sure which one has a better grasp of the material? What if form A is composed of very difficult items, while form B is relatively easy? Equating analyses are performed to address this very issue, so that scores are as fair as possible.

Equating in item response theory

In item response theory
Item response theory
In psychometrics, item response theory also known as latent trait theory, strong true score theory, or modern mental test theory, is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measuring abilities, attitudes, or other variables. It is based...

, person locations are estimated on a scale; i.e. locations are estimated in relation to a unit and origin. It is common in educational assessment to employ tests in order to assess different groups of students with the intention of establishing a common scale by equating the origins, and sometimes units, of the scales obtained from response data from the different tests. The process is referred to as equating or test equating.

In item response theory, two different kinds of equating are horizontal and vertical equating. Vertical equating refers to the process of equating tests administered to groups of students with different abilities, such as students in different grades (years of schooling). Horizontal equating refers the equating of tests administered to groups with similar abilities; for example, two tests administered students in the same grade in two consecutive calendar years. Different tests are used to avoid practice effects.

In terms of item response theory, equating is just a special case of the more general process of scaling, applicable when more than one test is used. In practice, though, scaling is often implemented separately for different tests and then the scales subsequently equated.

A distinction is often made between two methods of equating; common person and common item equating. Common person equating involves the administration of two tests to a common group of persons. The mean and standard deviation of the scale locations of the groups on the two tests are equated using a linear transformation. Common item equating involves the use of a set of common items referred to as the anchor test
Anchor test
In psychometrics, an anchor test is a common set of test items administered in combination with two or more alternative forms of the test with the aim of establishing the equivalence of the test scores on the alternative forms. The purpose of the anchor test is to provide a baseline for an...

embedded in two different tests. The mean item location of the common items is equated.

Classical approaches to equating

In classical test theory, mean equating simply adjusts the distribution
Frequency distribution
In statistics, a frequency distribution is an arrangement of the values that one or more variables take in a sample. Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval, and in this way, the table summarizes the distribution of...

of scores so that the mean of one form is comparable to the mean of the other form. While mean equating is attractive because of its simplicity, it lacks flexibility, namely accounting for the possibility that the standard deviations of the forms differ.

Linear equating adjusts so that the two forms have a comparable mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

and standard deviation
Standard deviation
Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...

. There are several types of linear equating that differ in the assumptions and mathematics used to estimate parameters. The Tucker and Levine Observed Score methods estimate the relationship between observed scores on the two forms, while the Levine True Score method estimates the relationship between true scores on the two forms.

Equipercentile equating determines the equating relationship as one where a score could have an equivalent percentile
Percentile
In statistics, a percentile is the value of a variable below which a certain percent of observations fall. For example, the 20th percentile is the value below which 20 percent of the observations may be found...

on either form. This relationship can be nonlinear.

Unlike with item response theory, equating based on classical test theory is somewhat distinct from scaling. Equating is a raw-to-raw transformation in that it estimates a raw score on Form B that is equivalent to each raw score on the base Form A. Any scaling transformation used is then applied on top of, or with, the equating.