CHAID
Encyclopedia
CHAID is a type of decision tree
Decision tree learning
Decision tree learning, used in statistics, data mining and machine learning, uses a decision tree as a predictive model which maps observations about an item to conclusions about the item's target value. More descriptive names for such tree models are classification trees or regression trees...

 technique, based upon adjusted significance testing (Bonferroni testing). The technique was developed in South Africa
South Africa
The Republic of South Africa is a country in southern Africa. Located at the southern tip of Africa, it is divided into nine provinces, with of coastline on the Atlantic and Indian oceans...

 and was published in 1980 by Gordon V. Kass, who had completed a PhD thesis on this topic. CHAID can be used for prediction (in a similar fashion to regression analysis
Regression analysis
In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...

, this version of CHAID being originally known as XAID) as well as classification, and for detection of interaction between variables. CHAID stands for CHi-squared Automatic Interaction Detector, based upon a formal extension of the US AID (Automatic Interaction Detector) and THAID (THeta Automatic Interaction Detector) procedures of the 1960s and 70's, which in turn were extensions of earlier research, including that performed in the UK in the 1950s.

In practice, CHAID is often used in the context of direct marketing
Direct marketing
Direct marketing is a channel-agnostic form of advertising that allows businesses and nonprofits to communicate straight to the customer, with advertising techniques such as mobile messaging, email, interactive consumer websites, online display ads, fliers, catalog distribution, promotional...

 to select groups of consumers and predict how their responses to some variables affect other variables, although other early applications were in the field of medical and psychiatric research.

Like other decision trees, CHAID's advantages are that its output is highly visual and easy to interpret. Because it uses multiway splits by default, it needs rather large sample sizes to work effectively, since with small sample sizes the respondent groups can quickly become too small for reliable analysis.

CHAID detects interaction between variables in the data set
Data set
A data set is a collection of data, usually presented in tabular form. Each column represents a particular variable. Each row corresponds to a given member of the data set in question. Its values for each of the variables, such as height and weight of an object or values of random numbers. Each...

. Using this technique it is possible to establish relationships between a ‘dependent variable’ – for example readership of a certain newspaper – and other explanatory variables such as price, size, supplements etc. CHAID does this by identifying discrete groups of respondents and, by taking their responses to explanatory variables, seeks to predict what the impact will be on the dependent variable.

CHAID is often used as an exploratory technique and is an alternative to multiple linear regression and logistic regression, especially when the data set is not well-suited to regression analysis.

See also

  • Chi-squared distribution
  • Latent class model
    Latent class model
    In statistics, a latent class model relates a set of observed discrete multivariate variables to a set of latent variables. It is a type of latent variable model. It is called a latent class model because the latent variable is discrete...

  • Structural equation modeling
    Structural equation modeling
    Structural equation modeling is a statistical technique for testing and estimating causal relations using a combination of statistical data and qualitative causal assumptions...

  • Market segment
    Market segment
    Market segmentation is a concept in economics and marketing. A market segment is a sub-set of a market made up of people or organizations with one or more characteristics that cause them to demand similar product and/or services based on qualities of those products such as price or function...

  • Decision tree learning
    Decision tree learning
    Decision tree learning, used in statistics, data mining and machine learning, uses a decision tree as a predictive model which maps observations about an item to conclusions about the item's target value. More descriptive names for such tree models are classification trees or regression trees...

  • Multiple comparisons
    Multiple comparisons
    In statistics, the multiple comparisons or multiple testing problem occurs when one considers a set of statistical inferences simultaneously. Errors in inference, including confidence intervals that fail to include their corresponding population parameters or hypothesis tests that incorrectly...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK