All Topics  
Design of experiments

 

   Email Print
   Bookmark   Link






 

Design of experiments



 
 
Design of experiments, or experimental design, is the design of all information-gathering exercises where variation is present, whether under the full control of the experimenter or not. (The latter situation is usually called an observational study
Observational study

In statistics, an observational study draws inferences about the effect of a treatment on subjects, where the assignment of subjects into a treated group versus a control group is outside the control of the investigator....
.) Often the experimenter is interested in the effect of some process or intervention (the "treatment") on some objects (the "experimental units"), which may be people.






Discussion
Ask a question about 'Design of experiments'
Start a new discussion about 'Design of experiments'
Answer questions from other users
Full Discussion Forum



Encyclopedia


Design of experiments, or experimental design, is the design of all information-gathering exercises where variation is present, whether under the full control of the experimenter or not. (The latter situation is usually called an observational study
Observational study

In statistics, an observational study draws inferences about the effect of a treatment on subjects, where the assignment of subjects into a treated group versus a control group is outside the control of the investigator....
.) Often the experimenter is interested in the effect of some process or intervention (the "treatment") on some objects (the "experimental units"), which may be people. Design of experiments is thus a discipline that has very broad application across all the natural and social sciences.

Early example of experimental design


In 1747, while serving as surgeon on HM Bark Salisbury, James Lind, the ship's surgeon, carried out a controlled experiment to develop a cure for scurvy
Scurvy

Scurvy is a disease resulting from a deficiency of vitamin C, which is required for the synthesis of collagen in humans. The chemical name for vitamin C, ascorbic acid, is derived from the Latin name of scurvy, scorbutus....
.

Lind selected 12 men from the ship, all suffering from scurvy, and divided them into six pairs, giving each group different additions to their basic diet for a period of two weeks. The treatments were all remedies that had been proposed at one time or another. They were:
  • A quart of cider every day
  • Twenty five gutts of elixir vitriol three times a day upon an empty stomach,
  • One half-pint of seawater every day
  • A mixture of garlic, mustard, and horseradish in a lump the size of a nutmeg
  • Two spoonfuls of vinegar three times a day
  • Two oranges and one lemon every day.


The men who had been given citrus fruits recovered dramatically within a week. One of them returned to duty after 6 days and the other became nurse to the rest. The others experienced some improvement, but nothing was comparable to the citrus fruits, which were proved to be substantially superior to the other treatments.

In this study his subjects' cases "were as similar as I could have them", that is he provided strict entry requirements to reduce extraneous variation. The men were paired, which provided replication. From a modern perspective, the main thing that is missing is randomized allocation of subjects to treatments.

A formal mathematical theory

The first statistician
Statistics

Statistics is a Mathematics pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It also provides tools for prediction and forecasting based on data....
 to consider a formal mathematical methodology for designing experiments was Sir Ronald A. Fisher
Ronald Fisher

Sir Ronald Aylmer Fisher, Fellow of the Royal Society was an England statistician, evolutionary biologist, and genetics. He was described by Anders Hald as "a genius who almost single-handedly created the foundations for modern statistical science" and Richard Dawkins described him as "the greatest of Charles Darwin successors"....
, in his landmark The Design of Experiments
The Design of Experiments

The Design of Experiments is a 1935 book by the British statistician R.A. Fisher, which effectively founded the field of experimental design....
. As an example, he described how to test the hypothesis
Hypothesis

A hypothesis consists either of a suggested explanation for an observable phenomenon or of a reasoned proposal predicting a possible causal correlation among multiple phenomena....
 that a certain lady could distinguish by flavour alone whether the milk or the tea was first placed in the cup. While this sounds like a frivolous application, it allowed him to illustrate the most important means of experimental design:

1.) Comparison

In many fields of study it is hard to reproduce measured results exactly. Comparisons between treatments are much more reproducible and are usually preferable. Often one compares against a standard or traditional treatment that acts as baseline.

2.) Randomization
Randomization

Randomization is the process of making something random; this means:* Generating a random permutation of a sequence .* Selecting a random sample of a population ....


There is an extensive body of mathematical theory that explores the consequences of making the allocation of units to treatments by means of some random mechanism such as tables of random numbers, or the use of randomization devices such as playing cards or dice. Provided the sample size is adequate, the risks associated with random allocation (such as failing to obtain a representative sample in a survey, or having a serious imbalance in a key characteristic between a treatment group and a control group) are calculable and hence can be managed down to an acceptable level. Random does not mean haphazard, and great care must be taken that appropriate random methods are used.

3.) Replication
Replication (statistics)

In engineering, science, and statistics, replication is the repetition of an experimental condition so that the variability associated with the phenomenon can be estimated....


Measurements are usually subject to variation, both between repeated measurements and between replicated items or processes. Multiple measurements of replicated items are necessary so the variation can be estimated.

4.) Blocking
Blocking (statistics)

In the statistical theory of the design of experiments, blocking is the arranging of experimental units in groups that are similar to one another....


Blocking is the arrangement of experimental units into groups (blocks) that are similar to one another. Blocking reduces known but irrelevant sources of variation between units and thus allows greater precision in the estimation of the source of variation under study.

5.) Orthogonality
Orthogonality

In mathematics, two vectors are orthogonal if they are perpendicular, i.e., they form a right angle. The word comes from the Greek language ' , meaning "straight", and ' , meaning "angle"....


Orthogonality concerns the forms of comparison (contrasts) that can be legitimately and efficiently carried out. Contrasts can be represented by vectors and sets of orthogonal contrasts are uncorrelated and independently distributed if the data are normal. Because of this independence, each orthogonal treatment provides different information to the others. If there are T treatments and T – 1 orthogonal contrasts, all the information that can be captured from the experiment is obtainable from the set of contrasts.

6.) Use of factorial experiment
Factorial experiment

In statistics, a factorial experiment is an experiment whose design consists of two or more factors, each with discrete possible values or "levels", and whose experimental units take on all possible combinations of these levels across all such factors....
s instead of the one-factor-at-a-time method. These are efficient at evaluating the effects and possible interactions of several factors (independent variables).

Analysis of the design of experiment
Experiment

In scientific inquiry, an experiment is a method of investigating causal relationships among variables. An experiment is a cornerstone of the empiricism approach to acquiring data about the world and is used in both natural sciences and social sciences....
s was built on the foundation of the analysis of variance
Analysis of variance

In statistics, analysis of variance is a collection of statistical models, and their associated procedures, in which the observed variance is partitioned into components due to different explanatory variables....
, a collection of models in which the observed variance is partitioned into components due to different factors which are estimated and/or tested.

Some efficient designs for estimating several main effects simultaneously were found by Raj Chandra Bose
Raj Chandra Bose

Raj Chandra Bose Indian mathematician and statistician best known for his work in design of experiments and the theory of error-correcting codes in which the class of BCH codes is partly named after him....
 and K. Kishen in 1940 at the Indian Statistical Institute
Indian Statistical Institute

Indian Statistical Institute engages in the research, teaching, and application of statistics to the natural sciences and social sciences. Founded by Professor Prasanta Chandra Mahalanobis in Kolkata in 1931, while statistics was a relatively new scientific field, the institute gained the status of an Institution of National Importance by an...
, but remained little known until the Plackett-Burman design
Plackett-Burman design

Plackett-Burman designs are experimental designs presented in 1946 by Robin L. Plackett and J. P. Burman while working in the British Ministry of Supply....
s were published in Biometrika
Biometrika

Biometrika is a scientific journal principally covering theoretical statistics....
 in 1946. About the same time, C. R. Rao
C. R. Rao

Calyampudi Radhakrishna Rao Royal Society is an Indian born statistician and currently Professor emeritus at Penn State University. He was born in Hadagali, in the state of Karnataka, India....
 introduced the concepts of orthogonal arrays as experimental designs. This was a concept which played a central role in the development of Taguchi methods
Taguchi methods

Taguchi methods are statistics methods developed by Genichi Taguchi to improve the quality of manufactured goods, and more recently also applied to biotechnology, marketing and advertising....
 by Genichi Taguchi
Genichi Taguchi

Gen'ichi Taguchi is an engineer and statistician. From the 1950s onwards, Taguchi developed a methodology for applying statistics to improve the quality of manufactured goods....
, which took place during his visit to Indian Statistical Institute in early 1950s. His methods were successfully applied and adopted by Japanese and Indian industries and subsequently were also embraced by US industry albeit with some reservations.

In 1950, Gertrude Mary Cox
Gertrude Mary Cox

Gertrude Mary Cox was an influential United States statistics and founder of the department of Experimental Statistics at North Carolina State University....
 and William Gemmell Cochran
William Gemmell Cochran

William Gemmell Cochran was a prominent statistics; he was born in Scotland but spent most of his life in the United States.Cochran studied mathematics at the University of Glasgow and the University of Cambridge....
 published the book Experimental Designs which became the major reference work on the design of experiments for statisticians for years afterwards.

Developments of the theory of linear model
Linear model

Disambiguation : go here for the Linear model of innovationIn statistics, given a sample the most general form of linear model is formulated as...
s have encompassed and surpassed the cases that concerned early writers. Today, the theory rests on advanced topics in linear algebra
Linear algebra

Linear algebra is the branch of mathematics concerned with the study of Euclidean vectors, vector spaces , linear maps , and system of linear equations....
, abstract algebra
Abstract algebra

Abstract algebra is the subject area of mathematics that studies algebraic structures, such as group , ring , field , module , vector spaces, and algebra over a field....
 and combinatorics
Combinatorics

Combinatorics is a branch of pure mathematics concerning the study of Countable set objects. It is related to many other areas of mathematics, such as algebra, probability theory, ergodic theory and geometry, as well as to applied subjects in computer science and statistical physics....
.

As with all other branches of statistics, there is both classical and Bayesian experimental design
Bayesian experimental design

Bayesian experimental design provides a general probability-theoretical framework from which other theories on Design of experiments can be derived....
.

Some important notable contributors to the field of experimental designs are R. A. Fisher, R. C. Bose, C. R. Rao
C. R. Rao

Calyampudi Radhakrishna Rao Royal Society is an Indian born statistician and currently Professor emeritus at Penn State University. He was born in Hadagali, in the state of Karnataka, India....
, Keifer, J. N. Srivastava, Shrikhande S. S., Genichi Taguchi
Genichi Taguchi

Gen'ichi Taguchi is an engineer and statistician. From the 1950s onwards, Taguchi developed a methodology for applying statistics to improve the quality of manufactured goods....
, D. Raghavarao, D. Montgomery, and R. Myers.

Example


This example is attributed to Harold Hotelling
Harold Hotelling

Harold Hotelling was a mathematical statistician and an influential economic theorist. His name is known to all statisticians because of Hotelling's T-square distribution and its use in statistical hypothesis testing and confidence regions....
. It conveys some of the flavor of those aspects of the subject that involve combinatorial designs.

The weights of eight objects are to be measured using a pan balance and set of standard weights. Each weighing measures the weight difference between objects placed in the left pan vs. any objects placed in the right pan. Each measurement has a random error
Errors and residuals in statistics

In statistics and Optimization , statistical errors and residuals are two closely related and easily confused measures of "deviation of a sample from the mean": the error of a sample is the deviation of the sample from the population mean or actual function, while the residual of a sample is the difference between the sa...
. The average error is zero; the standard deviation
Standard deviation

In statistics, standard deviation is a simple measure of the variability or statistical dispersion of a data set. A low standard deviation indicates that all of the data points are very close to the same value , while high standard deviation indicates that the data are ?spread out? over a large range of values....
s of the probability distribution
Probability distribution

In probability theory and statistics, a probability distribution identifies either the probability of each value of an unidentified random variable , or the probability of the value falling within a particular interval ....
 of the errors is the same number s on different weighings; and errors on different weighings are independent
Statistical independence

In probability theory, to say that two event s are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs....
. Denote the true weights by

We consider two different experiments:

  1. Weigh each object in one pan, with the other pan empty. Let Xi be the measured weight of the ith object, for i = 1, ..., 8.
  2. Do the eight weighings according to the following schedule and let Yi be the measured difference for i = 1, ..., 8:




Then the estimated value of the weight θ1 is




Similar estimates can be found for the weights of the other items. For example




The question of design of experiments is: which experiment is better?

The variance of the estimate X1 of ?1 is s2 if we use the first experiment. But if we use the second experiment, the variance of the estimate given above is s2/8. Thus the second experiment gives us 8 times as much precision for the estimate of a single item, and estimates all items simultaneously, with the same precision. What is achieved with 8 weighings in the second experiment would require 64 weighings if items are weighed separately. However, note that the estimates for the items obtained in the second experiment have errors which are correlated with other.

Many problems of the design of experiments involve combinatorial design
Combinatorial design

Combinatorial design theory is the part of combinatorics mathematics that deals with the existence and construction of set system whose intersections have specified numerical properties....
s, as in this example.

Statistical control

It is best for a process to be in reasonable statistical control prior to conducting designed experiments. When this is not possible, proper blocking, replication, and randomization allow for the careful conduct of designed experiments.

See also


Citations



External links

  • (free-to-try commercial MVA + DoE software for Windows)
  • A from a at NIST
    National Institute of Standards and Technology

    The National Institute of Standards and Technology , known between 1901 and 1988 as the National Bureau of Standards , is a measurement standards laboratory which is a non-regulatory agency of the United States Department of Commerce....
  • from a ] at NIST
    National Institute of Standards and Technology

    The National Institute of Standards and Technology , known between 1901 and 1988 as the National Bureau of Standards , is a measurement standards laboratory which is a non-regulatory agency of the United States Department of Commerce....
  • a mobile library on Design of Experiments. The server is dynamic in nature and new additions would be posted on this site from time to time.
  • - Matlab code for Design of Experiments + Sequential Design + Surrogate Modeling
  • : a web site that offers free, online design of experiments.
  • : The only database of combinatorial, statistical, experimental block designs