All Topics  
Dummy variable

 

   Email Print
   Bookmark   Link






 

Dummy variable



 
 
In regression analysis
Regression analysis

In statistics, regression analysis is a collective name for techniques for the modeling and analysis of numerical data consisting of values of a dependent variable and of one or more independent variables ....
, a dummy variable (also known as indicator variable or just dummy) is one that takes the values 0 or 1 to indicate the absence or presence of some categorical effect that may be expected to shift the outcome. For example, in econometric
Econometrics

Econometrics is concerned with the tasks of developing and applying quantitative or statistical methods to the study and elucidation of economic principles....
 time series analysis, dummy variables may be used to indicate the occurrence of wars, or major strikes
Strike action

Strike action, often simply called a strike, is a work stoppage caused by the mass refusal of employees to perform labour . A strike usually takes place in response to employee grievances....
. Use of dummy variables usually increases model fit (coefficient of determination
Coefficient of determination

In statistics, the coefficient of determination, R2 is used in the context of statistical models whose main purpose is the prediction of future outcomes on the basis of other related information....
), but at a cost of fewer degrees of freedom
Degrees of freedom

Degrees of freedom can mean:* Degrees of freedom * Degrees of freedom * Degrees of freedom ...
 and loss of generality of the model.






Discussion
Ask a question about 'Dummy variable'
Start a new discussion about 'Dummy variable'
Answer questions from other users
Full Discussion Forum



Encyclopedia


In regression analysis
Regression analysis

In statistics, regression analysis is a collective name for techniques for the modeling and analysis of numerical data consisting of values of a dependent variable and of one or more independent variables ....
, a dummy variable (also known as indicator variable or just dummy) is one that takes the values 0 or 1 to indicate the absence or presence of some categorical effect that may be expected to shift the outcome. For example, in econometric
Econometrics

Econometrics is concerned with the tasks of developing and applying quantitative or statistical methods to the study and elucidation of economic principles....
 time series analysis, dummy variables may be used to indicate the occurrence of wars, or major strikes
Strike action

Strike action, often simply called a strike, is a work stoppage caused by the mass refusal of employees to perform labour . A strike usually takes place in response to employee grievances....
. Use of dummy variables usually increases model fit (coefficient of determination
Coefficient of determination

In statistics, the coefficient of determination, R2 is used in the context of statistical models whose main purpose is the prediction of future outcomes on the basis of other related information....
), but at a cost of fewer degrees of freedom
Degrees of freedom

Degrees of freedom can mean:* Degrees of freedom * Degrees of freedom * Degrees of freedom ...
 and loss of generality of the model. Too many dummy variables result in a model that does not provide any general conclusions.

Dummy variables may be extended to more complex cases. For example, seasonal effects may be captured by creating dummy variables for each of the seasons. In panel data
Panel data

In statistics and econometrics, the term panel data refers to two-dimensional data. In marketing, panel data refers to data collected at the point-of-sale ....
 fixed effects estimator
Fixed effects estimator

In econometrics and statistics the fixed effects estimator is an estimator for the coefficients in panel data analysis. If we assume fixed effects, we impose time independent effects for each entity....
 dummies are created for each of the units in cross-sectional data
Cross-sectional data

Cross-sectional data in statistics and econometrics is a type of one-dimensional data set. Cross-sectional data refers to data collected by observing many subjects at the same point of time, or without regard to differences in time....
 (e.g. firms or countries) or periods in a pooled time-series. However in such regressions either the constant term
Constant term

In mathematics, the constant term of a polynomial is the term of degree 0. For example, in the polynomialover the variable X, the constant term is 3....
 has to be removed, or one of the dummies.

When there are dummies in all observations, the constant term has to be excluded. If a constant term is included in the regression, it is important to exclude one of the dummy variables from the regression, making this the base category against which the others are assessed. If all the dummy variables are included, their sum is equal to 1 (which stands for the variable X0 to the constant term B0), resulting in perfect multicollinearity
Multicollinearity

Multicollinearity is a statistical phenomenon in which two or more predictor variables in a multiple regression model are highly correlated. In this situation the coefficient estimates may change erratically in response to small changes in the model or the data....
. This is referred to as the dummy variable trap.

See also

  • Indicator function
    Indicator function

    In mathematics, an indicator function or a characteristic function is a Function defined on a Set that indicates membership of an element in a subset of ....
  • Interaction variable
    Interaction variable

    In statistics, an interaction variable is a variable often used in regression analysis. It is formed by the multiplication of two independent variables....
  • Calculation of glass properties
    Calculation of glass properties

    The calculation of glass properties is used to predict glass properties of interest or glass behavior under certain conditions without experimental investigation, based on past data and experience, with the intention to save time, material, financial, and environmental resources, or to gain scientific insight....
     — dummy variables for detection of systematic errors