Generalized linear model

# Generalized linear model

Discussion
 Ask a question about 'Generalized linear model' Start a new discussion about 'Generalized linear model' Answer questions from other users Full Discussion Forum

Encyclopedia
In statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, the generalized linear model (GLM) is a flexible generalization of ordinary linear regression
Linear regression
In statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one explanatory variable is called simple regression...

. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

Generalized linear models were formulated by John Nelder
John Nelder
John Ashworth Nelder FRS was a British statistician known for his contributions to experimental design, analysis of variance, computational statistics, and statistical theory.-Contributions:...

and Robert Wedderburn
Robert Wedderburn (statistician)
Robert William Maclagan Wedderburn was a Scottish statistician who worked at the Rothamsted Experimental Station. He was co-developer, with John Nelder, of the generalized linear model methodology,...

as a way of unifying various other statistical models, including linear regression
Linear regression
In statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one explanatory variable is called simple regression...

, logistic regression
Logistic regression
In statistics, logistic regression is used for prediction of the probability of occurrence of an event by fitting data to a logit function logistic curve. It is a generalized linear model used for binomial regression...

and Poisson regression
Poisson regression
In statistics, Poisson regression is a form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable Y has a Poisson distribution, and assumes the logarithm of its expected value can be modeled by a linear combination of unknown...

. They proposed an iteratively reweighted least squares method
Iterative method
In computational mathematics, an iterative method is a mathematical procedure that generates a sequence of improving approximate solutions for a class of problems. A specific implementation of an iterative method, including the termination criteria, is an algorithm of the iterative method...

for maximum likelihood
Maximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....

estimation of the model parameters. Maximum-likelihood estimation remains popular and is the default method on many statistical computing packages. Other approaches, including Bayesian approaches
Bayesian statistics
Bayesian statistics is that subset of the entire field of statistics in which the evidence about the true state of the world is expressed in terms of degrees of belief or, more specifically, Bayesian probabilities...

and least squares
Least squares
The method of least squares is a standard approach to the approximate solution of overdetermined systems, i.e., sets of equations in which there are more equations than unknowns. "Least squares" means that the overall solution minimizes the sum of the squares of the errors made in solving every...

fits to variance stabilized
Variance-stabilizing transformation
In applied statistics, a variance-stabilizing transformation is a data transformation that is specifically chosen either to simplify considerations in graphical exploratory data analysis or to allow the application of simple regression-based or analysis of variance techniques.The aim behind the...

responses, have been developed.

## Overview

In a GLM, each outcome of the dependent variables, Y, is assumed to be generated from a particular distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

in the exponential family
Exponential family
In probability and statistics, an exponential family is an important class of probability distributions sharing a certain form, specified below. This special form is chosen for mathematical convenience, on account of some useful algebraic properties, as well as for generality, as exponential...

, a large range of probability distributions that includes the normal, binomial and poisson
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...

distributions, among others. The mean, μ, of the distribution depends on the independent variables, X, through:

where E(Y) is the expected value
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

of Y; is the linear predictor, a linear combination of unknown parameters, β; g is the link function.

In this framework, the variance is typically a function,
V, of the mean:

It is convenient if
V follows from the exponential family distribution, but it may simply be that the variance is a function of the predicted value.

The unknown parameters, β, are typically estimated with maximum likelihood
Maximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....

, maximum quasi-likelihood
Quasi-likelihood
In statistics, quasi-likelihood estimation is one way of allowing for overdispersion, that is, greater variability in the data than would be expected from the statistical model used. It is most often used with models for count data or grouped binary data, i.e...

, or Bayesian
Bayesian probability
Bayesian probability is one of the different interpretations of the concept of probability and belongs to the category of evidential probabilities. The Bayesian interpretation of probability can be seen as an extension of logic that enables reasoning with propositions, whose truth or falsity is...

techniques.

## Model components

The GLM consists of three elements:
1. A probability distribution from the exponential family.
2. A linear predictor η = Xβ .
3. A link function g such that E(Y) = μ = g-1(η).

### Probability distribution

The overdispersed exponential family of distributions is a generalization of the exponential family
Exponential family
In probability and statistics, an exponential family is an important class of probability distributions sharing a certain form, specified below. This special form is chosen for mathematical convenience, on account of some useful algebraic properties, as well as for generality, as exponential...

and exponential dispersion model
Exponential dispersion model
Exponential dispersion models are statistical models in which the probability distribution is of a special form. This class of models represents a generalisation of the exponential family of models which themselves play an important role in statistical theory because they have a special structure...

of distributions and includes those probability distributions, parameterized by and , whose density functions f (or probability mass function
Probability mass function
In probability theory and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value...

, for the case of a discrete distribution) can be expressed in the form

, called the dispersion parameter, typically is known and is usually related to the variance of the distribution. The functions , , , , and are known. Many, although not all, common distributions are in this family.

For scalar and , this reduces to

is related to the mean of the distribution. If is the identity function, then the distribution is said to be in canonical form
Canonical form
Generally, in mathematics, a canonical form of an object is a standard way of presenting that object....

(or natural form). Note that any distribution can be converted to canonical form by rewriting as and then applying the transformation . It is always possible to convert in terms of the new parametrization, even if is not a one-to-one function; see comments in the page on the exponential family
Exponential family
In probability and statistics, an exponential family is an important class of probability distributions sharing a certain form, specified below. This special form is chosen for mathematical convenience, on account of some useful algebraic properties, as well as for generality, as exponential...

. If, in addition, is the identity and is known, then is called the canonical parameter (or natural parameter) and is related to the mean through

For scalar and , this reduces to

Under this scenario, the variance of the distribution can be shown to be

For scalar and , this reduces to

### Linear predictor

The linear predictor is the quantity which incorporates the information about the independent variables into the model. The symbol η (Greek
Greek alphabet
The Greek alphabet is the script that has been used to write the Greek language since at least 730 BC . The alphabet in its classical and modern form consists of 24 letters ordered in sequence from alpha to omega...

"eta
Eta (letter)
Eta ) is the seventh letter of the Greek alphabet. Originally denoting a consonant /h/, its sound value in the classical Attic dialect of Ancient Greek was a long vowel , raised to in medieval Greek, a process known as itacism.In the system of Greek numerals it has a value of 8...

") is typically used to denote a linear predictor. It is related to the expected value
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

of the data (thus, "predictor") through the link function.

η is expressed as linear combinations (thus, "linear") of unknown parameters β. The coefficients of the linear combination are represented as the matrix of independent variables X. η can thus be expressed as

The elements of X are either measured by the experimenters or stipulated by them in the modeling design process.

The link function provides the relationship between the linear predictor and the mean
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

of the distribution function. There are many commonly used link functions, and their choice can be somewhat arbitrary. It can be convenient to match the domain of the link function to the range of the distribution function's mean.

When using a distribution function with a canonical parameter , the canonical link function is the function that expresses in terms of , i.e. . For the most common distributions, the mean is one of the parameters in the standard form of the distribution's density function, and then is the function as defined above that maps the density function into its canonical form. When using the canonical link function, , which allows to be a sufficient statistic
Sufficiency (statistics)
In statistics, a sufficient statistic is a statistic which has the property of sufficiency with respect to a statistical model and its associated unknown parameter, meaning that "no other statistic which can be calculated from the same sample provides any additional information as to the value of...

for .

Following is a table of canonical link functions and their inverses (sometimes referred to as the mean function, as done here) used for several distributions in the exponential family.
Distribution Name Link Function Mean Function
Normal  Identity
Exponential
Exponential distribution
In probability theory and statistics, the exponential distribution is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e...

Inverse
Multiplicative inverse
In mathematics, a multiplicative inverse or reciprocal for a number x, denoted by 1/x or x−1, is a number which when multiplied by x yields the multiplicative identity, 1. The multiplicative inverse of a fraction a/b is b/a. For the multiplicative inverse of a real number, divide 1 by the...

Gamma
Inverse
Gaussian
Inverse Gaussian distribution
| cdf = \Phi\left +\exp\left \Phi\left...

Inverse
squared
Poisson
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...

Log
Natural logarithm
The natural logarithm is the logarithm to the base e, where e is an irrational and transcendental constant approximately equal to 2.718281828...

Binomial Logit
Logit
The logit function is the inverse of the sigmoidal "logistic" function used in mathematics, especially in statistics.Log-odds and logit are synonyms.-Definition:The logit of a number p between 0 and 1 is given by the formula:...

Multinomial

In the cases of the exponential and gamma distributions, the domain of the canonical link function is not the same as the permitted range of the mean. In particular, the linear predictor may be negative, which would give an impossible negative mean. When maximizing the likelihood, precautions must be taken to avoid this. An alternative is to use a noncanonical link function.

### Maximum likelihood

The maximum likelihood
Maximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....

estimates can be found using an iteratively reweighted least squares algorithm using either a Newton–Raphson method with updates of the form:

where is the observed information matrix
Observed information
In statistics, the observed information, or observed Fisher information, is the negative of the second derivative of the "log-likelihood"...

(the negative of the Hessian matrix
Hessian matrix
In mathematics, the Hessian matrix is the square matrix of second-order partial derivatives of a function; that is, it describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German mathematician Ludwig Otto Hesse and later named...

) and is the score function
Score (statistics)
In statistics, the score, score function, efficient score or informant plays an important role in several aspects of inference...

; or a Fisher's scoring method:

where is the Fisher information matrix
Fisher information
In mathematical statistics and information theory, the Fisher information is the variance of the score. In Bayesian statistics, the asymptotic distribution of the posterior mode depends on the Fisher information and not on the prior...

. Note that if the canonical link function is used, then the two methods are the same.

### Bayesian methods

In general, the posterior distribution cannot be found in closed form
Closed-form expression
In mathematics, an expression is said to be a closed-form expression if it can be expressed analytically in terms of a bounded number of certain "well-known" functions...

and so must be approximated, usually using Laplace approximations or some type of Markov chain Monte Carlo
Markov chain Monte Carlo
Markov chain Monte Carlo methods are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its equilibrium distribution. The state of the chain after a large number of steps is then used as a sample of the...

method such as Gibbs sampling
Gibbs sampling
In statistics and in statistical physics, Gibbs sampling or a Gibbs sampler is an algorithm to generate a sequence of samples from the joint probability distribution of two or more random variables...

.

### General linear models

A possible point of confusion has to do with the distinction between generalized linear models and the general linear model
General linear model
The general linear model is a statistical linear model.It may be written aswhere Y is a matrix with series of multivariate measurements, X is a matrix that might be a design matrix, B is a matrix containing parameters that are usually to be estimated and U is a matrix containing errors or...

, two broad statistical models. The general linear model may be viewed as a case of the generalized linear model with identity link. As most exact results of interest are obtained only for the general linear model, the general linear model has undergone a somewhat longer historical development. Results for the generalized linear model with non-identity link are asymptotic (tending to work well with large samples).

### Linear regression

A simple, very important example of a generalized linear model (also an example of a general linear model) is linear regression
Linear regression
In statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one explanatory variable is called simple regression...

. In linear regression, the use of the least-squares estimator is justified by the Gauss-Markov theorem, which does not assume that the distribution is normal.

From the perspective of generalized linear models, however, it is useful to suppose that the distribution function is the normal distribution with constant variance and the link function is the identity, which is the canonical link if the variance is known.

For the normal distribution, the generalized linear model has a closed form
Closed-form expression
In mathematics, an expression is said to be a closed-form expression if it can be expressed analytically in terms of a bounded number of certain "well-known" functions...

expression for the maximum-likelihood estimates, which is convenient. Most other GLMs lack closed form
Closed-form expression
In mathematics, an expression is said to be a closed-form expression if it can be expressed analytically in terms of a bounded number of certain "well-known" functions...

estimates.

### Binomial data

When the response data, Y, are binary (taking on only values 0 and 1), the distribution function is generally chosen to be the binomial distribution and the interpretation of μi is then the probability, p, of Yi taking on the value one.

There are several popular link functions for binomial functions; the most typical is the canonical logit
Logit
The logit function is the inverse of the sigmoidal "logistic" function used in mathematics, especially in statistics.Log-odds and logit are synonyms.-Definition:The logit of a number p between 0 and 1 is given by the formula:...

GLMs with this setup are logistic regression
Logistic regression
In statistics, logistic regression is used for prediction of the probability of occurrence of an event by fitting data to a logit function logistic curve. It is a generalized linear model used for binomial regression...

models.

In addition, the inverse of any continuous cumulative distribution function
Cumulative distribution function
In probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...

(CDF) can be used for the link since the CDF's range is , the range of the binomial mean. The normal CDF is a popular choice and yields the probit model
Probit model
In statistics, a probit model is a type of regression where the dependent variable can only take two values, for example married or not married....

The complementary log-log function may also be used. This link function is asymmetric and will often produce different results from the probit and logit link functions.

The identity link is also sometimes used for binomial data to yield the linear probability model, but a drawback of this model is that the predicted probabilities can be greater than one or less than zero. In implementation it is possible to fix the nonsensical probabilities outside of , but interpreting the coefficients can be difficult. The model's primary merit is that near it is approximately a linear transformation of the probit and logit―econometricians sometimes call this the Harvard model.

The variance function for binomial data is given by:

where the dispersion parameter τ is typically fixed at exactly one. When it is not, the resulting quasi-likelihood
Quasi-likelihood
In statistics, quasi-likelihood estimation is one way of allowing for overdispersion, that is, greater variability in the data than would be expected from the statistical model used. It is most often used with models for count data or grouped binary data, i.e...

model often described as binomial with overdispersion
Overdispersion
In statistics, overdispersion is the presence of greater variability in a data set than would be expected based on a given simple statistical model....

or quasibinomial.

### Count data

Another example of generalized linear models includes Poisson regression
Poisson regression
In statistics, Poisson regression is a form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable Y has a Poisson distribution, and assumes the logarithm of its expected value can be modeled by a linear combination of unknown...

which models count data using the Poisson distribution
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...

The variance function is proportional to the mean

where the dispersion parameter τ is typically fixed at exactly one. When it is not, the resulting quasi-likelihood
Quasi-likelihood
In statistics, quasi-likelihood estimation is one way of allowing for overdispersion, that is, greater variability in the data than would be expected from the statistical model used. It is most often used with models for count data or grouped binary data, i.e...

model is often described as poisson with overdispersion
Overdispersion
In statistics, overdispersion is the presence of greater variability in a data set than would be expected based on a given simple statistical model....

or quasipoisson.

### Correlated or clustered data

The standard GLM assumes that the observations are uncorrelated
Uncorrelated
In probability theory and statistics, two real-valued random variables are said to be uncorrelated if their covariance is zero. Uncorrelatedness is by definition pairwise; i.e...

. Extensions have been developed to allow for correlation
Correlation
In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence....

between observations, as occurs for example in longitudinal studies and clustered designs:
• Generalized estimating equations (GEEs) allow for the correlation between observations without the use of an explicit probability model for the origin of the correlations, so there is no explicit likelihood
Likelihood
Likelihood is a measure of how likely an event is, and can be expressed in terms of, for example, probability or odds in favor.-Likelihood function:...

. They are suitable when the random effects and their variances are not of inherent interest, as they allow for the correlation without explaining its origin. The focus is on estimating the average response over the population ("population-averaged" effects) rather than the regression parameters that would enable prediction of the effect of changing one or more components of X on a given individual. GEEs are usually used in conjunction with Huber-White standard errors.
• Generalized linear mixed model
Generalized linear mixed model
In statistics, a generalized linear mixed model is a particular type of mixed model. It is an extension to the generalized linear model in which the linear predictor contains random effects in addition to the usual fixed effects...

s
(GLMMs) are an extension to GLMs that includes random effects in the linear predictor, giving an explicit probability model that explains the origin of the correlations. The resulting "subject-specific" parameter estimates are suitable when the focus is on estimating the effect of changing one or more components of X on a given individual. GLMMs are a particular type of multilevel model
Multilevel model
Multilevel models are statistical models of parameters that vary at more than one level...

(mixed model
Mixed model
A mixed model is a statistical model containing both fixed effects and random effects, that is mixed effects. These models are useful in a wide variety of disciplines in the physical, biological and social sciences....

). In general, fitting GLMMs is more computationally complex and intensive than fitting GEEs.
• Hierarchical generalized linear models (HGLMs) are similar to GLMMs apart from two distinctions:
1. The random effects can have any distribution in the exponential family
Exponential family
In probability and statistics, an exponential family is an important class of probability distributions sharing a certain form, specified below. This special form is chosen for mathematical convenience, on account of some useful algebraic properties, as well as for generality, as exponential...

, whereas current GLMMs nearly always have normal random effects;
2. They are not as computationally intensive, as instead of integrating out the random effects they are based on a modified form of likelihood known as the hierarchical likelihood or h-likelihood.

The theoretical basis and accuracy of the methods used in HGLMs have been the subject of some debate in the statistical literature.

In statistics, the generalized additive model is a statistical model developed by Trevor Hastie and Rob Tibshirani for blending properties of generalized linear models with additive models....

s (GAMs) are another extension to GLMs in which the linear predictor η is not restricted to be linear in the covariates X but is the sum of smoothing functions
Smoothing
In statistics and image processing, to smooth a data set is to create an approximating function that attempts to capture important patterns in the data, while leaving out noise or other fine-scale structures/rapid phenomena. Many different algorithms are used in smoothing...

applied to the xis:

The smoothing functions fi are estimated from the data. In general this requires a large number of data points and is computationally intensive.

### Multinomial regression

The binomial case may be easily extended to allow for a multinomial distribution as the response (also, a Generalized Linear Model for counts, with a constrained total). There are two ways in which this is usually done:

#### Ordered response

If the response variable is an ordinal measurement, then one may fit a model function of the form:
where .

for m > 2. Different links g lead to proportional odds model
Ordered logit
In statistics, the ordered logit model , is a regression model for ordinal dependent variables...

s or ordered probit
Ordered probit
In statistics, ordered probit is a generalization of the popular probit analysis to the case of more than two outcomes of an ordinal dependent variable. Similarly, the popular logit method also has a counterpart ordered logit....

models.

#### Unordered response

If the response variable is a nominal measurement, or the data do not satisfy the assumptions of an ordered model, one may fit a model of the following form:
where .

Multinomial logit
In statistics, economics, and genetics, a multinomial logit model, also known as multinomial logistic regression, is a regression model which generalizes logistic regression by allowing more than two discrete outcomes...

or multinomial probit
Multinomial probit
In econometrics and statistics, the multinomial probit model, a popular alternative to the multinomial logit model, is a generalization of the probit model that allows more than two discrete, unordered outcomes. It is not to be confused with the multivariate probit model, which is used to model...

models. These are less efficient than the ordered response models, as more parameters are estimated.

## Confusion with general linear models

The term "generalized linear model", and especially its abbreviation GLM, can be confused with general linear model
General linear model
The general linear model is a statistical linear model.It may be written aswhere Y is a matrix with series of multivariate measurements, X is a matrix that might be a design matrix, B is a matrix containing parameters that are usually to be estimated and U is a matrix containing errors or...

. John Nelder
John Nelder
John Ashworth Nelder FRS was a British statistician known for his contributions to experimental design, analysis of variance, computational statistics, and statistical theory.-Contributions:...

Senn: I must confess to having some confusion
when I was a young statistician between general linear
models and generalized linear models. Do you regret
the terminology?

Nelder: I think probably I do. I suspect we should
have found some more fancy name for it that would
have stuck and not been confused with the general
linear model, although general and generalized are not
quite the same. I can see why it might have been better
to have thought of something else.

• Comparison of general and generalized linear models
• Generalized linear array model
Generalized linear array model
In statistics, the generalized linear array model is used for analyzing data sets with array structures. It based on the generalized linear model with the design matrix written as a Kronecker product.- Overview :...

• Tweedie distributions
Tweedie distributions
In probability and statistics, the Tweedie distributions are a family of probability distributions which include continuous distributions such as the normal and gamma, the purely discrete scaled Poisson distribution, and the class of mixed compound Poisson-Gamma distributions which have positive...

• GLIM (software)
GLIM (software)
GLIM is a statistical software program for fitting generalized linear models .It was developed by the Royal Statistical Society'sWorking Party on Statistical Computing...