Interaction (statistics)
Encyclopedia
In statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, an interaction may arise when considering the relationship among three or more variables, and describes a situation in which the simultaneous influence of two variables on a third is not additive
Additive function
In mathematics the term additive function has two different definitions, depending on the specific field of application.In algebra an additive function is a function that preserves the addition operation:for any two elements x and y in the domain. For example, any linear map is additive...

. Most commonly, interactions are considered in the context of regression analyses
Regression analysis
In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...

.

The presence of interactions can have important implications for the interpretation of statistical models. If two variables of interest interact, the relationship between each of the interacting variables and a third "dependent variable" depends on the value of the other interacting variable. In practice, this makes it more difficult to predict the consequences of changing the value of a variable, particularly if the variables it interacts with are hard to measure or difficult to control.

The notion of "interaction" is closely related to that of "moderation
Moderation (statistics)
In statistics, moderation occurs when the relationship between two variables depends on a third variable. The third variable is referred to as the moderator variable or simply the moderator...

" that is common in social and health science research: the interaction between an explanatory variable and an environmental variable suggests that the effect of the explanatory variable has been moderated or modified by the environmental variable.

Introduction

An "interaction variable" is a variable constructed from an original set of variables to try to represent either all of the interaction present or some part of it. In exploratory statistical analyses it is common to use products of original variables as the basis of testing whether interaction is present with the possibility of substituting other more realistic interaction variables at a later stage. When there are more than two explanatory variables, several interaction variables are constructed, with pairwise-products representing pairwise-interactions and higher order products representing higher order interactions.
Thus, for a response Y and two variables x1 and x2 an additive model would be:


In contrast to this,


is an example of a model with an interaction between variables x1 and x2 ("error" refers to the random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

 whose value is that by which Y differs from the expected value
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

 of Y; see errors and residuals in statistics
Errors and residuals in statistics
In statistics and optimization, statistical errors and residuals are two closely related and easily confused measures of the deviation of a sample from its "theoretical value"...

).

Interactions in ANOVA

A simple setting in which interactions can arise is a two-factor experiment
Factorial experiment
In statistics, a full factorial experiment is an experiment whose design consists of two or more factors, each with discrete possible values or "levels", and whose experimental units take on all possible combinations of these levels across all such factors. A full factorial design may also be...

 analyzed using Analysis of Variance (ANOVA). Suppose we have two binary factors A and B. For example, these factors might indicate whether either of two treatments were administered to a patient, with the treatments applied either singly, or in combination. We can then consider the average treatment response (e.g. the symptom levels following treatment) for each patient, as a function of the treatment combination that was administered. The following table shows one possible situation:
B = 0 B = 1
A = 0 6 7
A = 1 4 5


In this example, there is no interaction between the two treatments — their effects are additive. The reason for this is that the difference in mean response between those subjects receiving treatment A and those not receiving treatment A is −2 regardless of whether treatment B is administered (−2 = 4 − 6) or not (−2 = 5 − 7). Note that it automatically follows that the difference in mean response between those subjects receiving treatment B and those not receiving treatment B is the same regardless of whether treatment A is administered (7 − 6 = 5 − 4).

In contrast, if the following average responses are observed
B = 0 B = 1
A = 0 1 4
A = 1 7 6


then there is an interaction between the treatments — their effects are not additive. Supposing that greater numbers correspond to a better response, in this situation treatment B is helpful on average if the subject is not also receiving treatment A, but is more helpful on average if given in combination with treatment A. Treatment A is helpful on average regardless of whether treatment B is also administered, but it is more helpful in both absolute and relative terms if given alone, rather than in combination with treatment B.

Qualitative and quantitative interactions

In many applications it is useful to distinguish between qualitative and quantitative interactions. A quantitative interaction between A and B refers to a situation where the magnitude of the effect of B depends on the value of A, but the direction of the effect of B is constant for all A. A qualitative interaction between A and B refers to a situation where both the magnitude and direction of each variable's effect can depend on the value of the other variable.

The table of means on the left, below, shows a quantitative interaction — treatment A is beneficial both when B is given, and when B is not given, but the benefit is greater when B is not given (i.e. when A is given alone). The table of means on the right shows a qualitative interaction. A is harmful when B is given, but it is beneficial when B is not given. Note that the same interpretation would hold if we consider the benefit of B based on whether A is given.
B = 0 B = 1 B = 0 B = 1
A = 0 2 1 A = 0 2 6
A = 1 5 3 A = 1 5 3


The distinction between qualitative and quantitative interactions depends on the order in which the variables are considered (in contrast, the property of additivity is invariant to the order of the variables). In the following table, if we focus on the effect of treatment A, there is a quantitative interaction — giving treatment A will improve the outcome on average regardless of whether treatment B is or is not already being given (although the benefit is greater if treatment A is given alone). However if we focus on the effect of treatment B, there is a qualitative interaction — giving treatment B to a subject who is already receiving treatment A will (on average) make things worse, whereas giving treatment B to a subject who is not receiving treatment A will improve the outcome on average.
B = 0 B = 1
A = 0 1 4
A = 1 7 6

Unit treatment additivity

In its simplest form, the assumption of treatment unit additivity states that the observed response yij from experimental unit i when receiving treatment j can be written as the sum yij = yi + tj. The assumption of unit treatment additivity implies that every treatment has exactly the same additive effect on each experimental unit. Since any given experimental unit can only undergo one of the treatments, the assumption of unit treatment additivity is a hypothesis that is not directly falsifiable, according to Cox and Kempthorne.

However, many consequences of treatment-unit additivity can be falsified. For a randomized experiment, the assumption of treatment additivity implies that the variance is constant for all treatments. Therefore, by contraposition, a necessary condition for unit treatment additivity is that the variance is constant.

The property of unit treatment additivity is not invariant under a change of scale, so statisticians often use transformations to achieve unit treatment additivity. If the response variable is expected to follow a parametric family of probability distributions, then the statistician may specify (in the protocol for the experiment or observational study) that the responses be transformed to stabilize the variance. In many cases, a statistician may specify that logarithmic transforms be applied to the responses, which are believed to follow a multiplicative model.

The assumption of unit treatment additivity was enunciated in experimental design by Kempthorne and Cox. Kempthorne's use of unit treatment additivity and randomization is similar to the design-based analysis of finite population survey sampling.

In recent years, it has become common to use the terminology of Donald Rubin, which uses counterfactuals. Suppose we are comparing two groups of people with respect to some attribute y. For example, the first group might consist of people who are given a standard treatment for a medical condition, with the second group consisting of people who receive a new treatment with unknown effect. Taking a "counterfactual" perspective, we can consider an individual whose attribute has value y if that individual belongs to the first group, and whose attribute has value τ(y) if the individual belongs to the second group. The assumption of "unit treatment additivity" is that τ(y) = τ, that is, the "treatment effect" does not depend on y. Since we cannot observe both y and τ(y) for a given individual, this is not testable at the individual level. However, unit treatment additivity imples that the cumulative distribution function
Cumulative distribution function
In probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...

s F1 and F2 for the two groups satisfy
F2(y)  = F1(y − τ), as long as the assignment of individuals to groups 1 and 2 is independent of all other factors influencing y (i.e. there are no confounders). Lack of unit treatment additivity can be viewed as a form of interaction between the treatment assignment (e.g. to groups 1 or 2), and the baseline, or untreated value of y.

Categorical variables

Sometimes the interacting variables are categorical variables rather than real numbers and the study might then be dealt with as an analysis of variance
Analysis of variance
In statistics, analysis of variance is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation...

 problem. For example, members of a population may be classified by religion and by occupation. If one wishes to predict a person's height based only on the person's religion and occupation, a simple additive model, i.e., a model without interaction, would add to an overall average height an adjustment for a particular religion and another for a particular occupation. A model with interaction, unlike an additive model, could add a further adjustment for the "interaction" between that religion and that occupation. This example may cause one to suspect that the word interaction is something of a misnomer.

Statistically, the presence of an interaction between categorical variables is generally tested using a form of analysis of variance
Analysis of variance
In statistics, analysis of variance is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation...

 (ANOVA). If one or more of the variables is continuous in nature, however, it would typically be tested using moderated multiple regression. This is so-called because a moderator is a variable that affects the strength of a relationship between two other variables.

Designed experiments

Interactions have been extensively discussed in the context of analyzing designed experiments
Design of experiments
In general usage, design of experiments or experimental design is the design of any information-gathering exercises where variation is present, whether under the full control of the experimenter or not. However, in statistics, these terms are usually used for controlled experiments...

, particularly using response surface methodology
Response surface methodology
In statistics, response surface methodology explores the relationships between several explanatory variables and one or more response variables. The method was introduced by G. E. P. Box and K. B. Wilson in 1951. The main idea of RSM is to use a sequence of designed experiments to obtain an...

. Genichi Taguchi
Genichi Taguchi
is an engineer and statistician. From the 1950s onwards, Taguchi developed a methodology for applying statistics to improve the quality of manufactured goods...

 contended that interactions could be eliminated from a system
System
System is a set of interacting or interdependent components forming an integrated whole....

 by appropriate choice of response variable and transformation. However George Box and others have argued that this is not the case in general.

Examples

Real-world examples of interaction include:
  • Interaction between adding sugar to coffee and stirring the coffee. Neither of the two individual variables has much effect on sweetness but a combination of the two does.
  • Interaction between adding carbon
    Carbon
    Carbon is the chemical element with symbol C and atomic number 6. As a member of group 14 on the periodic table, it is nonmetallic and tetravalent—making four electrons available to form covalent chemical bonds...

     to steel
    Steel
    Steel is an alloy that consists mostly of iron and has a carbon content between 0.2% and 2.1% by weight, depending on the grade. Carbon is the most common alloying material for iron, but various other alloying elements are used, such as manganese, chromium, vanadium, and tungsten...

     and quenching. Neither of the two individually has much effect on strength
    Tensile strength
    Ultimate tensile strength , often shortened to tensile strength or ultimate strength, is the maximum stress that a material can withstand while being stretched or pulled before necking, which is when the specimen's cross-section starts to significantly contract...

     but a combination of the two has a dramatic effect.
  • Interaction between smoking and inhaling asbestos
    Asbestos
    Asbestos is a set of six naturally occurring silicate minerals used commercially for their desirable physical properties. They all have in common their eponymous, asbestiform habit: long, thin fibrous crystals...

     fibres: Both raise lung carcinoma risk, but exposure to asbestos multiplies the cancer risk in smokers and non-smokers
  • Interaction between genetic risk factors for type 2 diabetes
    Diabetes mellitus type 2
    Diabetes mellitus type 2formerly non-insulin-dependent diabetes mellitus or adult-onset diabetesis a metabolic disorder that is characterized by high blood glucose in the context of insulin resistance and relative insulin deficiency. Diabetes is often initially managed by increasing exercise and...

     and diet (specifically, a "western" dietary pattern). The western dietary pattern was shown to increase diabetes risk for subjects with a high "genetic risk score", but not for other subjects.

See also

  • Analysis of variance
    Analysis of variance
    In statistics, analysis of variance is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation...

  • Factorial experiment
    Factorial experiment
    In statistics, a full factorial experiment is an experiment whose design consists of two or more factors, each with discrete possible values or "levels", and whose experimental units take on all possible combinations of these levels across all such factors. A full factorial design may also be...

  • Generalized randomized block design
    Generalized randomized block design
    In randomized statistical experiments, generalized randomized block designs are used to study the interaction between blocks and treatments...

  • Linear model
    Linear model
    In statistics, the term linear model is used in different ways according to the context. The most common occurrence is in connection with regression models and the term is often taken as synonymous with linear regression model. However the term is also used in time series analysis with a different...

  • Main effect
    Main effect
    In the design of experiments and analysis of variance, a main effect is the effect of an independent variable on a dependent variable averaging across the levels of any other independent variables...

  • Interaction
    Interaction
    Interaction is a kind of action that occurs as two or more objects have an effect upon one another. The idea of a two-way effect is essential in the concept of interaction, as opposed to a one-way causal effect...

  • Tukey's test of additivity
    Tukey's test of additivity
    In statistics, Tukey's test of additivity, named for John Tukey, is an approach used in two-way anova to assess whether the factor variables are additively related to the expected value of the response variable...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK