In

statisticsStatistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

,

**Poisson regression** is a form of

regression analysisIn statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...

used to model

count data and

contingency tableIn statistics, a contingency table is a type of table in a matrix format that displays the frequency distribution of the variables...

s. Poisson regression assumes the response variable

*Y* has a

Poisson distributionIn probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...

, and assumes the

logarithmThe logarithm of a number is the exponent by which another fixed value, the base, has to be raised to produce that number. For example, the logarithm of 1000 to base 10 is 3, because 1000 is 10 to the power 3: More generally, if x = by, then y is the logarithm of x to base b, and is written...

of its

expected valueIn probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

can be modeled by a linear combination of unknown

parameterParameter from Ancient Greek παρά also “para” meaning “beside, subsidiary” and μέτρον also “metron” meaning “measure”, can be interpreted in mathematics, logic, linguistics, environmental science and other disciplines....

s. A Poisson regression model is sometimes known as a

**log-linear model**, especially when used to model contingency tables.

If

is a vector of independent variables, then the model takes the form

,

where

and

. Sometimes this is written more compactly as

,

where

*x* is now an

*n+1*-dimensional vector consisting of

*n* independent variables concatenated to some constant, usually 1. Here

*θ* is simply

*a* concatenated to

*b*.

Thus, when given a Poisson regression model

*θ* and an input vector

, the predicted mean of the associated Poisson distribution is given by

.

If

*Y*_{i} are

independentIn probability theory, to say that two events are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs...

observations with corresponding values

*x*_{i} of the predictor variable, then

*θ* can be estimated by

maximum likelihoodIn statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....

. The maximum-likelihood estimates lack a

closed-form expressionIn mathematics, an expression is said to be a closed-form expression if it can be expressed analytically in terms of a bounded number of certain "well-known" functions...

and must be found by numerical methods. The probability surface for maximum-likelihood Poisson regression is always convex, making Newton-Raphson or other gradient-based methods appropriate estimation techniques.

Poisson regression models are

generalized linear modelIn statistics, the generalized linear model is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to...

s with the logarithm as the (canonical) link function, and the

PoissonIn probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...

distribution function.

## Maximum likelihood-based parameter estimation

Given a set of parameters

*θ* and an input vector

*x*, the mean of the predicted Poisson distribution, as stated above, is given by

,

and thus, the Poisson distribution's

probability mass functionIn probability theory and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value...

is given by

Now suppose we are given a data set consisting of

*m* vectors

, along with a set of

*m* values

. Then, for a given set of parameters

*θ*, the probability of attaining this particular set of data is given by

.

By the method of

maximum likelihoodIn statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....

, we wish to find the set of parameters

*θ* that makes this probability as large as possible. To do this, the equation is first rewritten as a

likelihood functionIn statistics, a likelihood function is a function of the parameters of a statistical model, defined as follows: the likelihood of a set of parameter values given some observed outcomes is equal to the probability of those observed outcomes given those parameter values...

in terms of θ:

.

Note that the expression on the

right hand sideIn mathematics, LHS is informal shorthand for the left-hand side of an equation. Similarly, RHS is the right-hand side. Each is solely a name for a term as part of an expression; and they are in practice interchangeable, since equality is symmetric...

has not actually changed. A formula in this form is typically difficult to work with; instead, one uses the

*log-likelihood*:

.

Notice that the parameters

*θ* only appear in the first two terms of each term in the summation. Therefore, given that we are only interested in finding the best value for

*θ* we may drop the

*y*_{i}! and simply write

.

To find a maximum, we need to solve an equation

which has no closed-form solution. However, the negative log-likelhood,

, is a convex function, and so standard

convex optimization or

gradient descentGradient descent is a first-order optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient of the function at the current point...

techniques can be applied to find the optimal value of

*θ*.

## Poisson regression in practice

Poisson regression is appropriate when the dependent variable is a count, for instance of events such as the arrival of a telephone call at a call centre. The events must be independent in the sense that the arrival of one call will not make another more or less likely, but the probability per unit time of events is understood to be related to covariates such as time of day.

### "Exposure" and offset

Poisson regression is also appropriate for rate data, where the rate is a count of events occurring to a particular unit of observation, divided by some measure of that unit's

*exposure*. For example, biologists may count the number of tree species in a forest, and the rate would be the number of species per square kilometre. Demographers may model death rates in geographic areas as the count of deaths divided by person−years. More generally, event rates can be calculated as events per unit time, which allows the observation window to vary for each unit. In these examples, exposure is respectively unit area, person−years and unit time. In Poisson regression this is handled as an

**offset**, where the exposure variable enters on the right-hand side of the equation, but with a parameter estimate (for log(exposure)) constrained to 1.

which implies

### Overdispersion

A characteristic of the

Poisson distributionIn probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...

is that its mean is equal to its variance. In certain circumstances, it will be found that the observed

varianceIn probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

is greater than the mean; this is known as

overdispersionIn statistics, overdispersion is the presence of greater variability in a data set than would be expected based on a given simple statistical model....

and indicates that the model is not appropriate. A common reason is the omission of relevant explanatory variables. Under some circumstances, the problem of overdispersion can be solved by using a

negative binomial distributionIn probability theory and statistics, the negative binomial distribution is a discrete probability distribution of the number of successes in a sequence of Bernoulli trials before a specified number of failures occur...

instead.

Another common problem with Poisson regression is excess zeros: if there are two processes at work, one determining whether there are zero events or any events, and a Poisson process determining how many events there are, there will be more zeros than a Poisson regression would predict. An example would be the distribution of cigarettes smoked in an hour by members of a group where some individuals are non-smokers.

Other

generalized linear modelIn statistics, the generalized linear model is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to...

s such as the

negative binomialIn probability theory and statistics, the negative binomial distribution is a discrete probability distribution of the number of successes in a sequence of Bernoulli trials before a specified number of failures occur...

model may function better in these cases.

### Use in survival analysis

Poisson regression creates proportional hazards models, one class of

survival analysisSurvival analysis is a branch of statistics which deals with death in biological organisms and failure in mechanical systems. This topic is called reliability theory or reliability analysis in engineering, and duration analysis or duration modeling in economics or sociology...

: see

proportional hazards modelsProportional hazards models are a class of survival models in statistics. Survival models relate the time that passes before some event occurs to one or more covariates that may be associated with that quantity. In a proportional hazards model, the unique effect of a unit increase in a covariate...

for descriptions of Cox models.

### Regularized Poisson Regression

When estimating the parameters for Poisson regression, one typically tries to find values for

*θ* that maximize the likelihood of an expression of the form

,

where

*m* is the number of examples in the data set, and

is the

probability mass functionIn probability theory and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value...

of the

Poisson distributionIn probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...

with the mean set to

. Regularization can be added to this optimization problem by instead maximizing

,

for some positive constant

. This technique, similar to ridge regression, can reduce

overfittingIn statistics, overfitting occurs when a statistical model describes random error or noise instead of the underlying relationship. Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations...

.

## Implementations

Some statistics packages include implementations of Poisson regression.

- MATLAB
MATLAB is a numerical computing environment and fourth-generation programming language. Developed by MathWorks, MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages,...

Statistics Toolbox: Poisson regression can be performed using the "glmfit" and "glmval" functions.
- Microsoft Excel
Microsoft Excel is a proprietary commercial spreadsheet application written and distributed by Microsoft for Microsoft Windows and Mac OS X. It features calculation, graphing tools, pivot tables, and a macro programming language called Visual Basic for Applications...

: Excel is not capable of doing Poisson regression by default. One of the Excel Add-ins for Poisson regression is XPost
- R
R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software, and R is widely used for statistical software development and data analysis....

: The function for fitting a generalized linear model in R is glm, and can be used for Poisson Regression
- SAS
SAS is an integrated system of software products provided by SAS Institute Inc. that enables programmers to perform:* retrieval, management, and mining* report writing and graphics* statistical analysis...

: Poisson regression in SAS is done by using GENMOD
- SPSS
SPSS is a computer program used for survey authoring and deployment , data mining , text analytics, statistical analysis, and collaboration and deployment ....

: In SPSS, Poisson regression is done by using the GENLIN command
- Stata
Stata is a general-purpose statistical software package created in 1985 by StataCorp. It is used by many businesses and academic institutions around the world...

: Stata has a procedure for Poisson regression named "poisson"