All Topics  
Odds ratio

 

   Email Print
   Bookmark   Link






 

Odds ratio



 
 
The odds ratio is a measure of effect size
Effect size

In statistics, effect size is a measure of the strength of the relationship between two variables. In scientific experiments, it is often useful to know not only whether an experiment has a statistical significance effect, but also the size of any observed effects....
, describing the strength of association or non-independence between two binary data values. It is used as a descriptive statistic
Descriptive statistics

Descriptive Statistics are used to describe the basic features of the data gathered from an experimental study in various ways. A descriptive Statistics is distinguished from inductive statistics....
, and plays an important role in logistic regression
Logistic regression

In statistics, logistic regression is a model used for prediction of the probability of occurrence of an event by fitting data to a logistic curve....
.

Definition
Definition in terms of group-wise odds
The odds ratio can be defined as the ratio of the odds
Odds

In probability theory and statistics the odds in favour of an event or a proposition are the quantity , where p is the probability of the event or proposition....
 of an event occurring in one group to the odds of it occurring in another group, or to a sample-based estimate of that ratio.






Discussion
Ask a question about 'Odds ratio'
Start a new discussion about 'Odds ratio'
Answer questions from other users
Full Discussion Forum



Encyclopedia


The odds ratio is a measure of effect size
Effect size

In statistics, effect size is a measure of the strength of the relationship between two variables. In scientific experiments, it is often useful to know not only whether an experiment has a statistical significance effect, but also the size of any observed effects....
, describing the strength of association or non-independence between two binary data values. It is used as a descriptive statistic
Descriptive statistics

Descriptive Statistics are used to describe the basic features of the data gathered from an experimental study in various ways. A descriptive Statistics is distinguished from inductive statistics....
, and plays an important role in logistic regression
Logistic regression

In statistics, logistic regression is a model used for prediction of the probability of occurrence of an event by fitting data to a logistic curve....
.

Definition


Definition in terms of group-wise odds


The odds ratio can be defined as the ratio of the odds
Odds

In probability theory and statistics the odds in favour of an event or a proposition are the quantity , where p is the probability of the event or proposition....
 of an event occurring in one group to the odds of it occurring in another group, or to a sample-based estimate of that ratio. These groups might be men and women, an experimental group and a control group, or any other dichotomous
Dichotomy

A dichotomy is any splitting of a whole into exactly two non-overlapping parts.In other words, it is a partition of a set of a whole into two parts that are:...
 classification. If the probabilities of the event in each of the groups are p1 (first group) and p2 (second group), then the odds ratio is:

where qx = 1 − px. An odds ratio of 1 indicates that the condition or event under study is equally likely in both groups. An odds ratio greater than 1 indicates that the condition or event is more likely in the first group. And an odds ratio less than 1 indicates that the condition or event is less likely in the first group. The odds ratio must be greater than or equal to zero if it is defined. It is undefined if p2q1 equals zero.

Definition in terms of joint and conditional probabilities


The odds ratio can also be defined in terms of the joint probability distribution
Probability distribution

In probability theory and statistics, a probability distribution identifies either the probability of each value of an unidentified random variable , or the probability of the value falling within a particular interval ....
 of two binary random variables. The joint distribution of binary random variables X and Y can be written

Y = 1 Y = 0
X = 1
X = 0


where , , and are non-negative "cell probabilities" that sum to one. The odds for Y within the two subpopulations defined by X = 1 and X = 0 are defined in terms of the conditional probabilities given X:

Y = 1 Y = 0
X = 1
X = 0


Thus the odds ratio is

The simple expression on the right, above, is easy to remember as the product of the probabilities of the "concordant cells" (X = Y) divided by the product of the probabilities of the "discordant cells" (X ? Y). However note that in some applications the labeling of categories as zero and one is arbitrary, so there is nothing special about concordant versus discordant values in these applications.

Symmetry


If we had calculated the odds ratio based on the conditional probabilities given Y,

Y = 1 Y = 0
X = 1
X = 0


we would have gotten the same result

Other measures of effect size for binary data such as the relative risk
Relative risk

In statistics and mathematical epidemiology, relative risk is the risk of an event relative to exposure. Relative risk is a ratio of the probability of the event occurring in the exposed group versus a non-exposed group....
 do not have this symmetry property.

Relation to statistical independence


If X and Y are independent, their joint probabilities can be expressed in terms of their marginal probabilities and , as follows

Y = 1 Y = 0
X = 1
X = 0


In this case, the odds ratio equals one, and conversely the odds ratio can only equal one if the joint probabilities can be factored in this way. Thus the odds ratio equals one if and only if X and Y are independent
Statistical independence

In probability theory, to say that two event s are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs....
.

Example


Suppose that in a sample of 100 men, 90 have drunk wine in the previous week, while in a sample of 100 women only 20 have drunk wine in the same period. The odds of a man drinking wine are 90 to 10, or 9:1, while the odds of a woman drinking wine are only 20 to 80, or 1:4 = 0.25:1. The odds ratio is thus 9/0.25, or 36, showing that men are much more likely to drink wine than women. Using the above formula for the calculation yields the same result:

The above example also shows how odds ratios are sometimes sensitive in stating relative positions: in this sample men are 90/20 = 4.5 times more likely to have drunk wine than women, but have 36 times the odds. The logarithm of the odds ratio, the difference of the logit
Logit

The logit function is the inverse of the "sigmoid", or logistic function used in mathematics, especially in statistics. The logit of a number p between 0 and 1 is given by the formula:...
s of the probabilities
Probability

Probability, or wikt:chance, is a way of expressing knowledge or belief that an Event will occur or has occurred. In mathematics the concept has been given an exact meaning in probability theory, that is used extensively in such areas of study as mathematics, statistics, finance, gambling, science, and philosophy to draw conclusions about t...
, tempers this effect, and also makes the measure symmetric
Symmetry

Symmetry generally conveys two primary meanings. The first is an imprecise sense of harmonious or aesthetically-pleasing proportionality and balance; such that it reflects beauty or perfection....
 with respect to the ordering of groups. For example, using natural logarithms, an odds ratio of 36/1 maps to 3.584, and an odds ratio of 1/36 maps to −3.584.

Statistical inference


Confidence intervals and hypothesis tests relating to odds ratios are constructed in terms of the "log odds ratio," which is the natural logarithm
Natural logarithm

The natural logarithm, formerly known as the hyperbolic logarithm, is the logarithm to the base e , where e is an irrational number constant approximately equal to 2.718281828....
 of the odds ratio. If we use the joint probability notation defined above, the population log odds ratio is

If we observe data in the form of a contingency table
Contingency table

In statistics, contingency tables are used to record and analyse the relationship between two or more variables, most usually categorical variables....


Y = 1 Y = 0
X = 1
X = 0


then the probabilities in the joint distribution can be estimated as

Y = 1 Y = 0
X = 1
X = 0


where , with being the sum of all four cell counts. The sample log odds ratio is

.

The standard error
Standard error (statistics)

The standard error of a method of measurement or estimation is the standard deviation of the sampling distribution associated with the estimation method....
 for the log odds ratio is approximately

.

This is an asymptotic approximation, and will not give a meaningful result if any of the cell counts are very small. If is the sample log odds ratio, an approximate 95% confidence interval
Confidence interval

In statistics, a confidence interval is an interval estimation of a population parameter. Instead of estimating the parameter by a single value, an interval likely to include the parameter is given....
 for the population log odds ratio is . This can be mapped to to obtain a 95% confidence interval for the odds ratio. If we wish to test the hypothesis that the population odds ratio equals one, the two-sided p-value
P-value

In statistics hypothesis testing, the p-value is the probability of obtaining a result at least as extreme as the one that was actually observed, assuming that the null hypothesis is true....
 is , where denotes a probability, and denotes a standard normal random variable.

Role in logistic regression


Logistic regression
Logistic regression

In statistics, logistic regression is a model used for prediction of the probability of occurrence of an event by fitting data to a logistic curve....
 is one way to generalize the odds ratio beyond two binary variables. Suppose we have a binary response variable Y and a binary predictor variable X, and in addition we have other variables Z1, ..., Zp that may or may not be binary. If we use multiple logistic regression to regress Y on X, Z1, ..., Zp, then the estimated coefficient for X is related to a conditional odds ratio. Specifically, at the population level

so is an estimate of this conditional odds ratio. The interpretation of is as an estimate of the odds ratio between Y and X when the values of Z1, ..., Zp are held fixed.

Use in quantitative research


Due to the widespread use of logistic regression
Logistic regression

In statistics, logistic regression is a model used for prediction of the probability of occurrence of an event by fitting data to a logistic curve....
 in medical and social science research, the odds ratio is commonly used as a means of expressing the results in some forms of clinical trial
Clinical trial

In health care, clinical trials are conducted to allow safety and efficacy data to be collected for new drugs or devices. These trials can only take place once satisfactory information has been gathered on the quality of the product and its non-clinical safety, and Institutional review board approval is granted in the country where the trial...
s, in survey research
Survey research

a research method involving the use of questionnaires and/or statistical surveys to gather data about people and their thoughts and behaviours....
, and in epidemiology
Epidemiology

Epidemiology is the study of factors affecting the health and illness of populations, and serves as the foundation and logic of interventions made in the interest of public health and preventive medicine....
, such as in case-control studies. It is often abbreviated "OR" in reports. When data from multiple surveys is combined, it will often be expressed as "pooled OR". The odds ratio, while in itself difficult to interpret, is in such cases used as an estimate of the relative risk
Relative risk

In statistics and mathematical epidemiology, relative risk is the risk of an event relative to exposure. Relative risk is a ratio of the probability of the event occurring in the exposed group versus a non-exposed group....
. This, however, is only valid when dealing with low-probability events: The formula for the odds
Odds

In probability theory and statistics the odds in favour of an event or a proposition are the quantity , where p is the probability of the event or proposition....
 is defined as p / (1 − p), so when p moves towards zero, (1 − p) moves towards 1, meaning that as p approaches zero, the odds approaches the risk, and the odds ratio approaches the relative risk.

See also

  • Relative risk
    Relative risk

    In statistics and mathematical epidemiology, relative risk is the risk of an event relative to exposure. Relative risk is a ratio of the probability of the event occurring in the exposed group versus a non-exposed group....
  • Hazard ratio
    Hazard ratio

    The hazard ratio in survival analysis is the effect of an explanatory variable on the hazard or risk of an event. For a less technical definition than is provided here, consider hazard ratio to be an estimate of relative risk and see the explanation on that page....


External links