Statistical theory
Encyclopedia
The theory of statistics provides a basis for the whole range of techniques, in both study design
Study design
Clinical study design is the formulation of trials and experiments in medical and epidemiological research, sometimes known as clinical trials. Many of the considerations here are shared under the more general topic of design of experiments but there can be others, in particular related to patient...

 and data analysis
Data analysis
Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making...

, that are used within applications of statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

. The theory covers approaches to statistical-decision problems and to statistical inference
Statistical inference
In statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation...

, and the actions and deductions that satisfy the basic principles stated for these different approaches. Within a given approach, statistical theory gives ways of comparing statistical procedures; it can find a best possible procedure within a given context for given statistical problems, or can provide guidance on the choice between alternative procedures.

Apart from philosophical considerations about how to make statistical inferences and decisions, much of statistical theory consists of mathematical statistics
Mathematical statistics
Mathematical statistics is the study of statistics from a mathematical standpoint, using probability theory as well as other branches of mathematics such as linear algebra and analysis...

, and is closely linked to probability theory
Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...

, to utility theory, and to optimization.

Scope

Statistical theory provides an underlying rationale and provides a consistent basis for the choice of methodology used in applied statistics.

Modelling

Statistical model
Statistical model
A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...

s describe the sources of data and can have different types of formulation corresponding to these sources and to the problem being studied. Such problems can be of various kinds:
  • Sampling
    Survey sampling
    In statistics, survey sampling describes the process of selecting a sample of elements from a target population in order to conduct a survey.A survey may refer to many different types or techniques of observation, but in the context of survey sampling it most often involves a questionnaire used to...

     from a finite population
  • Measuring observational error
    Observational error
    Observational error is the difference between a measured value of quantity and its true value. In statistics, an error is not a "mistake". Variability is an inherent part of things being measured and of the measurement process.-Science and experiments:...

     and refining procedures
  • Studying statistical relations
    Multivariate statistics
    Multivariate statistics is a form of statistics encompassing the simultaneous observation and analysis of more than one statistical variable. The application of multivariate statistics is multivariate analysis...


Statistical models, once specified, can be tested to see whether they provide useful inferences for new data sets. Testing a hypothesis using the data that was used to specify the model is a fallacy, according to the natural science of Bacon and the scientific method of Peirce.

Data collection

Statistical theory provides a guide to comparing methods of data collection
Data collection
Data collection is a term used to describe a process of preparing and collecting data, for example, as part of a process improvement or similar project. The purpose of data collection is to obtain information to keep on record, to make decisions about important issues, to pass information on to...

, where the problem is to generate informative data using optimization
Optimization (mathematics)
In mathematics, computational science, or management science, mathematical optimization refers to the selection of a best element from some set of available alternatives....

 and randomization
Randomization
Randomization is the process of making something random; this means:* Generating a random permutation of a sequence .* Selecting a random sample of a population ....

 while measuring and controlling for observational error
Observational error
Observational error is the difference between a measured value of quantity and its true value. In statistics, an error is not a "mistake". Variability is an inherent part of things being measured and of the measurement process.-Science and experiments:...

. Optimization of data collection reduces the cost of data while satisfying statistical goals, while randomization
Randomization
Randomization is the process of making something random; this means:* Generating a random permutation of a sequence .* Selecting a random sample of a population ....

 allows reliable inferences. Statistical theory provides a basis for good data collection and the structuring of investigations in the topics of:
  • Design of experiments
    Design of experiments
    In general usage, design of experiments or experimental design is the design of any information-gathering exercises where variation is present, whether under the full control of the experimenter or not. However, in statistics, these terms are usually used for controlled experiments...

     to estimate treatment effects, to test hypotheses, and to optimize responses.
  • Survey sampling
    Survey sampling
    In statistics, survey sampling describes the process of selecting a sample of elements from a target population in order to conduct a survey.A survey may refer to many different types or techniques of observation, but in the context of survey sampling it most often involves a questionnaire used to...

     to describe populations
    Statistical population
    A statistical population is a set of entities concerning which statistical inferences are to be drawn, often based on a random sample taken from the population. For example, if we were interested in generalizations about crows, then we would describe the set of crows that is of interest...


Summarising data

The task of summarising statistical data in conventional forms (also known as descriptive statistics
Descriptive statistics
Descriptive statistics quantitatively describe the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics , in that descriptive statistics aim to summarize a data set, rather than use the data to learn about the population that the data are...

) is considered in theoretical statistics as a problem of defining what aspects of statistical samples need to be described and how well they can be described from a typically limited sample of data. Thus the problems theoretical statistics considers include:
  • Choosing summary statistics
    Summary statistics
    In descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount as simply as possible...

     to describe a sample
  • Summarising probability distribution
    Probability distribution
    In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

    s of sample data while making limited assumptions about the form of distribution that may be met
  • Summarising the relationships between different quantities measured on the same items with a sample

Interpeting data

Besides the philosophy underlying statistical inference
Statistical inference
In statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation...

, statistical theory has the task of considering the types of questions that data analysts might want to ask about the problems they are studying and of providing data analytic techniques for answering them. Some of these tasks are:
  • Summarising populations in the form of a fitted distribution or probability density function
    Probability density function
    In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

  • Summarising the relationship between variables using some type of regression analysis
    Regression analysis
    In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...

  • Providing ways of predicting the outcome of a random quantity given other related variables
  • Examining the possibility of reducing the number of variables being considered within a problem (the task of Dimension reduction)

When a statistical procedure has been specified in the study protocol, then statistical theory provides well-defined probability statements for the method when applied to all populations that could have arisen from the randomization used to generate the data. This provides an objective way of estimating parameters, estimating confidence intervals, testing hypotheses, and selecting the best. Even for observational data, statistical theory provides a way of calculating a value that can be used to interpret a sample of data from a population, it can provide a means of indicating how well that value is determined by the sample, and thus a means of saying corresponding values derived for different populations are as different as they might seem; however, the reliability of inferences from post-hoc observational data is often worse than for planned randomized generation of data.

Applied statistical inference

Statistical theory provides the basis for a number of data analytic methods that are common across scientific and social research. Some of these are:
Interpreting data is an important objective of statistical research:
  • Estimating parameters
    Estimation theory
    Estimation theory is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the...

  • Testing statistical hypotheses
    Statistical hypothesis testing
    A statistical hypothesis test is a method of making decisions using data, whether from a controlled experiment or an observational study . In statistics, a result is called statistically significant if it is unlikely to have occurred by chance alone, according to a pre-determined threshold...

  • Providing a range of values
    Interval estimation
    In statistics, interval estimation is the use of sample data to calculate an interval of possible values of an unknown population parameter, in contrast to point estimation, which is a single number. Neyman identified interval estimation as distinct from point estimation...

     instead of a point estimate
  • Regression analysis
    Regression analysis
    In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...


Many of the standard methods for these tasks rely on certain statistical assumptions (made in the derivation of the methodology) actually holding in practice. Statistical theory studies the consequences of departures from these assumptions. In addition it provides a range of robust statistical techniques that are less dependent on assumptions, and it provides methods checking whether particular assumptions are reasonable for a give data-set.

Further reading

  • Davidson, A,C. (2003) Statistical Models. Cambridge University Press. ISBN 0521773393
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK