Probit model
Encyclopedia
In statistics, a probit model is a type of regression
Regression
Regression could refer to:* Regression , a defensive reaction to some unaccepted impulses* Regression analysis, a statistical technique for estimating the relationships among variables...

 where the dependent variable can only take two values, for example married or not married.

A probit model is a popular specification for an ordinal or a binary response model
Binomial regression
In statistics, binomial regression is a technique in which the response is the result of a series of Bernoulli trials, or a series of one of two possible disjoint outcomes...

 that employs a probit
Probit
In probability theory and statistics, the probit function is the inverse cumulative distribution function , or quantile function associated with the standard normal distribution...

 link function. This model is most often estimated using standard maximum likelihood procedure, such an estimation being called a probit regression.

Probit models were introduced by Chester Bliss
Chester Ittner Bliss
Chester Ittner Bliss was primarily a biologist, who is best known for his contributions to statistics. He was born in Springfield, Ohio in 1899 and died in 1979.-Academic qualifications:*Bachelor of Arts in Entomology from Ohio State University, 1921...

 in 1935, and a fast method for computing maximum likelihood
Maximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....

 estimates for them was proposed by Ronald Fisher
Ronald Fisher
Sir Ronald Aylmer Fisher FRS was an English statistician, evolutionary biologist, eugenicist and geneticist. Among other things, Fisher is well known for his contributions to statistics by creating Fisher's exact test and Fisher's equation...

 in an appendix to the same article.

Introduction

Suppose response variable Y is binary, that is it can have only two possible outcomes
Limited dependent variable
A limited dependent variable is a variable whose range ofpossible values is "restricted in some important way." In econometrics, the term is often used whenestimation of the relationship between the limited dependent variable...

 which we will denote as 1 and 0. For example Y may represent presence/absence of a certain condition, success/failure of some device, answer yes/no on a survey, etc. We also have a vector of regressors X, which are assumed to influence the outcome Y. Specifically, we assume that the model takes form

where Pr denotes probability
Probability
Probability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...

, and Φ is the Cumulative Distribution Function (CDF
Cumulative distribution function
In probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...

) of the standard normal distribution. The parameters β are typically estimated by maximum likelihood
Maximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....

.

It is also possible to motivate the probit model as a latent variable model
Latent variable model
A latent variable model is a statistical model that relates a set of variables to a set of latent variables.It is assumed that 1) the responses on the indicators or manifest variables are the result of...

. Suppose there exists an auxiliary random variable

where ε ~ N(0, 1). Then Y can be viewed as an indicator for whether this latent variable is positive:

Maximum likelihood estimation

Suppose data set contains n independent statistical unit
Statistical unit
A unit in a statistical analysis refers to one member of a set of entities being studied. It is the material source for the mathematical abstraction of a "random variable"...

s corresponding to the model above. Then their joint log-likelihood function is

The estimator which maximizes this function will be consistent
Consistent estimator
In statistics, a sequence of estimators for parameter θ0 is said to be consistent if this sequence converges in probability to θ0...

, asymptotically normal and efficient
Efficiency (statistics)
In statistics, an efficient estimator is an estimator that estimates the quantity of interest in some “best possible” manner. The notion of “best possible” relies upon the choice of a particular loss function — the function which quantifies the relative degree of undesirability of estimation errors...

 provided that E[XX] exists and is not singular. It can be shown that this log-likelihood function is globally concave
Concave function
In mathematics, a concave function is the negative of a convex function. A concave function is also synonymously called concave downwards, concave down, convex upwards, convex cap or upper convex.-Definition:...

 in
β, and therefore standard numerical algorithms for optimization will converge rapidly to the unique maximum.

Asymptotic distribution for is given by

where

and φ = Φ is the Probability Density Function (PDF
Probability density function
In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

) of standard normal distribution.

Berkson's minimum chi-square method

This method can be applied only when there are many observations of response variable having the same value of the vector of regressors (such situation may be referred to as “many observations per cell”). More specifically, the model can be formulated as follows.

Suppose among n observations there are only T distinct values of the regressors, which can be denoted as . Let be the number of observations with , and the number of observations with and . We assume that there are indeed “many” observations per each “cell”: limit nt÷n → constt>0 as n→∞ and for each group t.

Denote


Then Berkson's minimum chi-square estimator is a generalized least squares
Generalized least squares
In statistics, generalized least squares is a technique for estimating the unknown parameters in a linear regression model. The GLS is applied when the variances of the observations are unequal , or when there is a certain degree of correlation between the observations...

 estimator in a regression of on with weights :


It can be shown that this estimator is consistent (as n→∞ and T fixed), asymptotically normal and efficient. Its advantage is the presence of a closed-form formula for the estimator. However, it is only meaningful to carry out this analysis when individual observations are not available, only their aggregated counts , , and (for example in the analysis of voting behavior).

See also

  • Generalized linear model
    Generalized linear model
    In statistics, the generalized linear model is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to...

  • Logit model
    Logistic regression
    In statistics, logistic regression is used for prediction of the probability of occurrence of an event by fitting data to a logit function logistic curve. It is a generalized linear model used for binomial regression...

  • Limited dependent variable
    Limited dependent variable
    A limited dependent variable is a variable whose range ofpossible values is "restricted in some important way." In econometrics, the term is often used whenestimation of the relationship between the limited dependent variable...

  • Multivariate probit
    Multivariate probit
    In statistics and econometrics, the multivariate probit model is a generalization of the probit model used to estimate several correlated binary outcomes jointly...

     models
  • Ordered probit
    Ordered probit
    In statistics, ordered probit is a generalization of the popular probit analysis to the case of more than two outcomes of an ordinal dependent variable. Similarly, the popular logit method also has a counterpart ordered logit....

     and Ordered logit
    Ordered logit
    In statistics, the ordered logit model , is a regression model for ordinal dependent variables...

     model
  • Separation (statistics)
    Separation (statistics)
    In statistics separation is a phenomenon associated with models for dichotomous or categorical outcomes, including logistic and probit regression. Separation occurs if the predictor is associated with only one outcome value when the predictor is greater than some constant...

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK