Errors-in-variables models - AbsoluteAstronomy.com

Statistics

Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

and econometrics

Econometrics

Econometrics has been defined as "the application of mathematics and statistical methods to economic data" and described as the branch of economics "that aims to give empirical content to economic relations." More precisely, it is "the quantitative analysis of actual economic phenomena based on...

, errors-in-variables models or measurement errors models are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been measured exactly, or observed without error; as such, those models account only for errors in the dependent variables, or responses.

In the case when some regressors have been measured with errors, estimation based on the standard assumption leads to inconsistent

Consistent estimator

In statistics, a sequence of estimators for parameter θ0 is said to be consistent if this sequence converges in probability to θ0...

estimates, meaning that the parameter estimates do not tend to the true values even in very large samples. For simple linear regression

Simple linear regression

In statistics, simple linear regression is the least squares estimator of a linear regression model with a single explanatory variable. In other words, simple linear regression fits a straight line through the set of n points in such a way that makes the sum of squared residuals of the model as...

the effect is an underestimate of the coefficient, known as the attenuation bias. In non-linear models the direction of the bias is likely to be more complicated.

Motivational example

Consider a simple linear regression model of the form

where x* denotes the true but unobserved value of the regressor. Instead we observe this value with an error:

where the measurement error η_t is assumed to be independent from the true value x*_t.

If the y_t′s are simply regressed on the x_t′s (see simple linear regression

Simple linear regression

), then the estimator for the slope coefficient is

which converges as the sample size T increases without bound:

The two variances here are positive, so that in the limit the estimate is smaller in magnitude than the true value of β, an effect which statisticians call attenuation or regression dilution

Regression dilution

Regression dilution is a statistical phenomenon also known as "attenuation".Consider fitting a straight line for the relationship of an outcome variable y to a predictor variable x, and estimating the gradient of the line...

. Thus the “naїve” least squares estimator is inconsistent

Consistent estimator

In statistics, a sequence of estimators for parameter θ0 is said to be consistent if this sequence converges in probability to θ0...

in this setting. However, the estimator is a consistent estimator

Consistent estimator

In statistics, a sequence of estimators for parameter θ0 is said to be consistent if this sequence converges in probability to θ0...

of the parameter required for a best linear predictor of y given x: in some applications this may be what is required, rather than an estimate of the "true" regression coefficient, although that what assume that the variance of the errors in observing x* remains fixed.

It can be argued that almost all existing data sets contain errors of different nature and magnitude, so that attenuation bias is extremely frequent (although in multivariate regression the direction of bias is ambiguous). Jerry Hausman sees this as an iron law of econometrics: “The magnitude of the estimate is usually smaller than expected.”

Specification

Usually measurement error models are described using the latent variables

Latent variable model

A latent variable model is a statistical model that relates a set of variables to a set of latent variables.It is assumed that 1) the responses on the indicators or manifest variables are the result of...

approach. If y is the response variable and x are observed values of the regressors, then we assume there exist some latent variable

Latent variable

In statistics, latent variables , are variables that are not directly observed but are rather inferred from other variables that are observed . Mathematical models that aim to explain observed variables in terms of latent variables are called latent variable models...

s y* and x* which follow the model's “true” functional relationship g, and such that the observed quantities are their noisy observations:

where θ is the model's parameter and w are those regressors which are assumed to be error-free (for example when linear regression contains an intercept, the regressor which corresponds to the constant certainly has no “measurement errors”). Depending on the specification these error-free regressors may or may not be treated separately; in the latter case it is simply assumed that corresponding entries in the variance matrix of ηs are zero.

The variables y, x, w are all observed, meaning that the statistician possesses a data set

Data set

A data set is a collection of data, usually presented in tabular form. Each column represents a particular variable. Each row corresponds to a given member of the data set in question. Its values for each of the variables, such as height and weight of an object or values of random numbers. Each...

of n statistical unit

Statistical unit

A unit in a statistical analysis refers to one member of a set of entities being studied. It is the material source for the mathematical abstraction of a "random variable"...

s which follow the data generating process

Data generating process

The term data generating process is used in statistical and scientific literature to convey a number of different ideas:*the data collection process, being routes and procedures by which data reach a database ;...

described above; the latent variables x*, y*, ε, and η are not observed however.

This specification does not encompass all the existing EiV models. For example in some of them function g may be non-parametric or semi-parametric. Other approaches model the relationship between y* and x* as distributional instead of functional, that is they assume that y* conditionally on x* follows a certain (usually parametric) distribution.

Terminology and assumptions

The observed variable x may be called the manifest, indicator, or proxy variable.
The unobserved variable x* may be called the latent or true variable. It may be regarded either as an unknown constant (in which case the model is called a functional model), or as a random variable (correspondingly a structural model).
The relationship between the measurement error η and the latent variable x* can be modeled in different ways:
- Classical errors: the errors are independent from the latent variable. This is the most common assumption, it implies that the errors are introduced by the measuring device and their magnitude does not depend on the value being measured.
- Mean-independence: the errors are mean-zero for every value of the latent regressor. This is a less restrictive assumption than the classical one, as it allows for the presence of heteroscedasticity or other effects in the measurement errors.
- Berkson’s errors: the errors are independent from the observed regressor x. This assumption has very limited applicability. One example is round-off errors: for example if a person’s age* is a continuous random variable, whereas the observed age is truncated to the next smallest integer, then the truncation error is approximately independent from the observed age. Another possibility is with the fixed design experiment: for example if a scientist decides to make a measurement at a certain predetermined moment of time x, say at x = 10 s, then the real measurement may occur at some other value of x* (for example due to her finite reaction time) and such measurement error will be generally independent from the “observed” value of the regressor.
- Misclassification errors: special case used for the dummy regressors. If x* is an indicator of a certain event or condition (such as person is male/female, some medical treatment given/not, etc.), then the measurement error in such regressor will correspond to the incorrect classification similar to type I and type II errors
  Type I and type II errors
  In statistical test theory the notion of statistical error is an integral part of hypothesis testing. The test requires an unambiguous statement of a null hypothesis, which usually corresponds to a default "state of nature", for example "this person is healthy", "this accused is not guilty" or...
  
  in statistical testing. In this case the error η may take only 3 possible values, and its distribution conditional on x* is modeled with two parameters: α = Pr[η=−1 | x*=1], and β = Pr[η=1 | x*=0]. The necessary condition for identification is that α+β<1, that is misclassification should not happen “too often”. (This idea can be generalized to discrete variables with more than two possible values.)

Linear model

Linear errors-in-variables models were studied first, probably because linear model

Linear model

In statistics, the term linear model is used in different ways according to the context. The most common occurrence is in connection with regression models and the term is often taken as synonymous with linear regression model. However the term is also used in time series analysis with a different...

s were so widely used and they are easier than non-linear ones. Unlike standard least squares

Ordinary least squares

In statistics, ordinary least squares or linear least squares is a method for estimating the unknown parameters in a linear regression model. This method minimizes the sum of squared vertical distances between the observed responses in the dataset and the responses predicted by the linear...

regression (OLS), extending errors in variables regression (EiV) from the simple to the multivariate case is not straightforward.

Simple linear model

The simple linear errors-in-variables model was already presented in the “motivation” section:

where all variables are scalar. Here α and β are the parameters of interest, whereas σ_ε and σ_η — standard deviations of the error terms — are the nuisance parameters. The “true” regressor x* is treated as a random variable (structural model), independent from the measurement error η (classic assumption).

This model is identifiable in two cases: (1) either the latent regressor x* is not normally distributed, (2) or x* has normal distribution, but neither ε_t nor η_t are divisible by a normal distribution. That is, the parameters α, β can be consistently estimated from the data set

without any additional information, provided the latent regressor is not Gaussian.

Before this identifiability result was established, statisticians attempted to apply the maximum likelihood

Maximum likelihood

In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....

technique by assuming that all variables are normal, and then concluded that the model is not identified. The suggested remedy was to assume that some of the parameters of the model are known or can be estimated from the outside source. Such estimation methods include:

Deming regression
Deming regression
In statistics, Deming regression, named after W. Edwards Deming, is an errors-in-variables model which tries to find the line of best fit for a two-dimensional dataset...

— assumes that the ratio δ = σ²_ε/σ²_η is known. This could be appropriate for example when errors in y and x are both caused by measurements, and the accuracy of measuring devices or procedures are known. The case when δ = 1 is also known as the orthogonal regression.
Regression with known reliability ratio
Reliability (statistics)
In statistics, reliability is the consistency of a set of measurements or of a measuring instrument, often used to describe a test. Reliability is inversely related to random error.-Types:There are several general classes of reliability estimates:...

λ = σ²_∗/ ( σ²_η + σ²_∗), where σ²_∗ is the variance of the latent regressor. Such approach may be applicable for example when repeating measurements of the same unit are available, or when the reliability ratio has been known from the independent study. In this case the consistent estimate of slope is equal to the least-squares estimate divided by λ.
Regression with known σ²_η may occur when the source of the errors in x’s is known and their variance can be calculated. This could include rounding errors, or errors introduced by the measuring device. When σ²_η is known we can compute the reliability ratio as λ = ( σ²_x − σ²_η) / σ²_x and reduce the problem to the previous case.

Newer estimation methods that do not assume knowledge of some of the parameters of the model, include:

Method of moments — the GMM estimator based on the third- (or higher-) order joint cumulant
Cumulant
In probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...

s of observable variables. The slope coefficient can be estimated from

where (n₁,n₂) are such that K(n₁+1,n₂) — the joint cumulant
Cumulant
In probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...

of (x,y) — is not zero. In the case when the third central moment of the latent regressor x* is non-zero, the formula reduces to

Instrumental variables — a regression which requires that certain additional data variables z, called instruments, were available. These variables should be uncorrelated with the errors in the equation for the dependent variable, and they should also be correlated (relevant) with the true regressors x*. If such variables can be found then the estimator takes form

Multivariate linear model

Multivariate model looks exactly like the linear model, only this time β, η_t, x_t and x*_t are k×1 vectors.

The general identifiability condition for this model remains an open question. It is known however that in the case when (ε,η) are independent and jointly normal, the parameter β is identified if and only if it is impossible to find a non-singular k×k block matrix [a A] (where a is a k×1 vector) such that a′x* is distributed normally and independently from A′x*.

Some of the estimation methods for multivariate linear models are:

Total least squares is an extension of Deming regression
Deming regression
In statistics, Deming regression, named after W. Edwards Deming, is an errors-in-variables model which tries to find the line of best fit for a two-dimensional dataset...

to the multivariate setting. When all the k+1 components of the vector (ε,η) have equal variances and are independent, this is equivalent to running the orthogonal regression of y on the vector x — that is, the regression which minimizes the sum of squared distances between points (y_t,x_t) and the k-dimensional hyperplane of “best fit”.
The method of moments estimator can be constructed based on the moment conditions E[z_t·(y_t − α − β'x_t)] = 0, where the (5k+3)-dimensional vector of instruments z_t is defined as

where * designates the Hadamard product of matrices, and variables x_t, y_t have been preliminarily de-meaned. The authors of the method suggest to use Fuller’s modified IV estimator.

This method can be extended to use moments higher than the third order, if necessary, and to accommodate variables measured without error.

The instrumental variables approach requires to find additional data variables z_t which would serve as instruments for the mismeasured regressors x_t. This method is the simplest from the implementation point of view, however its disadvantage is that it requires to collect additional data, which may be costly or even impossible. When the instruments can be found, the estimator takes standard form

Non-linear models

A generic non-linear measurement error model takes form

Here function g can be either parametric or non-parametric. When function g is parametric it will be written as g(x*, β).

For a general vector-valued regressor x* the conditions for model identifiability

Identifiability

In statistics, identifiability is a property which a model must satisfy in order for inference to be possible. We say that the model is identifiable if it is theoretically possible to learn the true value of this model’s underlying parameter after obtaining an infinite number of observations from it...

are not known. However in the case of scalar x* the model is identified unless the function g is of the “log-exponential” form

and the latent regressor x* has density

where constants A,B,C,D,E,F may depend on a,b,c,d.

Despite this optimistic result, as of now no methods exist for estimating non-linear errors-in-variables models without any extraneous information. However there are several techniques which make use of some additional data: either the instrumental variables, or repeated observations.

Instrumental variables methods

Newey’s simulated moments method for parametric models — requires that there is an additional set of observed predictor variabels z_t, such that the true regressor can be expressed as

where π₀ and σ₀ are (unknown) constant matrices, and ζ_t ⊥ z_t. The coefficient π₀ can be estimated using standard least squares
OLS
OLS can stand for:* IATA code for Nogales International Airport in Arizona* Ordinary least squares, a method used in regression analysis for estimating linear models* Ottawa Linux Symposium* Oulun Luistinseura...

regression of x on z. The distribution of ζ_t is unknown, however we can model it as belonging to a flexible parametric family — the Edgeworth series
Edgeworth series
The Gram–Charlier A series , and the Edgeworth series are series that approximate a probability distribution in terms of its cumulants...

:

where ϕ is the standard normal distribution.

Simulated moments can be computed using the importance sampling
Importance sampling
In statistics, importance sampling is a general technique for estimating properties of a particular distribution, while only having samples generated from a different distribution rather than the distribution of interest. It is related to Umbrella sampling in computational physics...

algorithm: first we generate several random variables {v_ts ~ ϕ, s = 1,…,S, t = 1,…,T} from the standard normal distribution, then we compute the moments at t-th observation as

where θ = (β, σ, γ), A is just some function of the instrumental variables z, and H is a two-component vector of moments

With moment functions m_t one can apply standard GMM technique to estimate the unknown parameter θ.

Repeated observations

In this approach two (or maybe more) repeated observations of the regressor x* are available. Both observations contain their own measurement errors, however those errors are required to be independent:

where x* ⊥ η₁ ⊥ η₂. Variables η₁, η₂ need not be identically distributed (although if they are efficiency of the estimator can be slightly improved). With only these two observations it is possible to consistently estimate the density function of x* using Kotlarski’s deconvolution

Deconvolution

In mathematics, deconvolution is an algorithm-based process used to reverse the effects of convolution on recorded data. The concept of deconvolution is widely used in the techniques of signal processing and image processing...

technique.

Li’s conditional density method for parametric models. The regression equation can be written in terms of the observable variables as

where it would be possible to compute the integral if we knew the conditional density function ƒ_x*|x. If this function could be known or estimated, then the problem turns into standard non-linear regression, which can be estimated for example using the NLLS method.

Assuming for simplicity that η₁, η₂ are identically distributed, this conditional density can be computed as

where with slight abuse of notation x_j denotes the j-th component of a vector.

All densities in this formula can be estimated using inversion of the empirical characteristic functions
Characteristic function (probability theory)
In probability theory and statistics, the characteristic function of any random variable completely defines its probability distribution. Thus it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative...

. In particular,

In order to invert these characteristic function one has to apply the inverse Fourier transform, with a trimming parameter C needed to ensure the numerical stability. For example:

Schennach’s estimator for a parametric linear-in-parameters nonlinear-in-variables model. This is a model of the form

where w_t represents variables measured without errors. The regressor x* here is scalar (the method can be extended to the case of vector x* as well).

If not for the measurement errors, this would have been a standard linear model
Linear model
In statistics, the term linear model is used in different ways according to the context. The most common occurrence is in connection with regression models and the term is often taken as synonymous with linear regression model. However the term is also used in time series analysis with a different...

with the estimator

where

It turns out that all the expected values in this formula are estimable using the same deconvolution trick. In particular, for a generic observable w_t (which could be 1, w_1t, …, w_{ℓ t}, or y_t) and some function h (which could represent any g_j or g_ig_j) we have

where φ_h is the Fourier transform
Fourier transform
In mathematics, Fourier analysis is a subject area which grew from the study of Fourier series. The subject began with the study of the way general functions may be represented by sums of simpler trigonometric functions...

of h(x*), but using the same convention as for the characteristic functions
Characteristic function (probability theory)
In probability theory and statistics, the characteristic function of any random variable completely defines its probability distribution. Thus it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative...

,

,

and

The resulting estimator is consistent and asymptotically normal.

Schennach’s estimator for a nonparametric model. The standard Nadaraya–Watson estimator for a nonparametric model takes form

for a suitable choice of the kernel K and the bandwidth h. Both expectations here can be estimated using the same technique as in the previous method.

Motivational example

Specification

Terminology and assumptions

Linear model

Simple linear model

Multivariate linear model

Non-linear models

Instrumental variables methods

Repeated observations

Further reading