Outline of regression analysis
Encyclopedia
In statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, regression analysis includes any technique for learning about the relationship between one or more dependent variables Y and one or more independent variables X.

The following outline is an overview and guide to the variety of topics included within the subject of regression analysis
Regression analysis
In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...

.

Non-statistical articles related to regression

  • Least squares
    Least squares
    The method of least squares is a standard approach to the approximate solution of overdetermined systems, i.e., sets of equations in which there are more equations than unknowns. "Least squares" means that the overall solution minimizes the sum of the squares of the errors made in solving every...

  • Linear least squares (mathematics)
  • Non-linear least squares
    Non-linear least squares
    Non-linear least squares is the form of least squares analysis which is used to fit a set of m observations with a model that is non-linear in n unknown parameters . It is used in some forms of non-linear regression. The basis of the method is to approximate the model by a linear one and to...

  • Least absolute deviations
    Least absolute deviations
    Least absolute deviations , also known as Least Absolute Errors , Least Absolute Value , or the L1 norm problem, is a mathematical optimization technique similar to the popular least squares technique that attempts to find a function which closely approximates a set of data...

  • Curve fitting
    Curve fitting
    Curve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points, possibly subject to constraints. Curve fitting can involve either interpolation, where an exact fit to the data is required, or smoothing, in which a "smooth" function...

  • Smoothing
    Smoothing
    In statistics and image processing, to smooth a data set is to create an approximating function that attempts to capture important patterns in the data, while leaving out noise or other fine-scale structures/rapid phenomena. Many different algorithms are used in smoothing...

  • Cross-sectional study
    Cross-sectional study
    Cross-sectional studies form a class of research methods that involve observation of all of a population, or a representative subset, at one specific point in time...


Basic statistical ideas related to regression

  • Conditional expectation
    Conditional expectation
    In probability theory, a conditional expectation is the expected value of a real random variable with respect to a conditional probability distribution....

  • Correlation
    Correlation
    In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence....

  • Correlation coefficient
    Pearson product-moment correlation coefficient
    In statistics, the Pearson product-moment correlation coefficient is a measure of the correlation between two variables X and Y, giving a value between +1 and −1 inclusive...

  • Mean square error
  • Residual sum of squares
    Residual sum of squares
    In statistics, the residual sum of squares is the sum of squares of residuals. It is also known as the sum of squared residuals or the sum of squared errors of prediction . It is a measure of the discrepancy between the data and an estimation model...

  • Explained sum of squares
    Explained sum of squares
    In statistics, the explained sum of squares is a quantity used in describing how well a model, often a regression model, represents the data being modelled...

  • Total sum of squares
    Total sum of squares
    In statistical data analysis the total sum of squares is a quantity that appears as part of a standard way of presenting results of such analyses...


Linear regression based on least squares

  • General linear model
    General linear model
    The general linear model is a statistical linear model.It may be written aswhere Y is a matrix with series of multivariate measurements, X is a matrix that might be a design matrix, B is a matrix containing parameters that are usually to be estimated and U is a matrix containing errors or...

  • Ordinary least squares
    Ordinary least squares
    In statistics, ordinary least squares or linear least squares is a method for estimating the unknown parameters in a linear regression model. This method minimizes the sum of squared vertical distances between the observed responses in the dataset and the responses predicted by the linear...

  • Generalized least squares
    Generalized least squares
    In statistics, generalized least squares is a technique for estimating the unknown parameters in a linear regression model. The GLS is applied when the variances of the observations are unequal , or when there is a certain degree of correlation between the observations...

  • Simple linear regression
    Simple linear regression
    In statistics, simple linear regression is the least squares estimator of a linear regression model with a single explanatory variable. In other words, simple linear regression fits a straight line through the set of n points in such a way that makes the sum of squared residuals of the model as...

  • Trend estimation
    Trend estimation
    Trend estimation is a statistical technique to aid interpretation of data. When a series of measurements of a process are treated as a time series, trend estimation can be used to make and justify statements about tendencies in the data...

  • Ridge regression
  • Polynomial regression
    Polynomial regression
    In statistics, polynomial regression is a form of linear regression in which the relationship between the independent variable x and the dependent variable y is modeled as an nth order polynomial...

  • Segmented regression
    Segmented regression
    Segmented regression is a method in regression analysis in which the independent variable is partitioned into intervals and a separate line segment is fit to each interval. Segmented or piecewise regression analysis can also be performed on multivariate data by partitioning the various independent...

  • Nonlinear regression
    Nonlinear regression
    In statistics, nonlinear regression is a form of regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one or more independent variables...


Generalized linear models

  • Generalized linear models
  • Logistic regression
    Logistic regression
    In statistics, logistic regression is used for prediction of the probability of occurrence of an event by fitting data to a logit function logistic curve. It is a generalized linear model used for binomial regression...

  • Ordered logit
    Ordered logit
    In statistics, the ordered logit model , is a regression model for ordinal dependent variables...

  • Probit model
    Probit model
    In statistics, a probit model is a type of regression where the dependent variable can only take two values, for example married or not married....

  • Ordered probit
    Ordered probit
    In statistics, ordered probit is a generalization of the popular probit analysis to the case of more than two outcomes of an ordinal dependent variable. Similarly, the popular logit method also has a counterpart ordered logit....

  • Poisson regression
    Poisson regression
    In statistics, Poisson regression is a form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable Y has a Poisson distribution, and assumes the logarithm of its expected value can be modeled by a linear combination of unknown...

  • Maximum likelihood
    Maximum likelihood
    In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....

  • Cochrane–Orcutt estimation

Inference for regression models

  • F-test
    F-test
    An F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis.It is most often used when comparing statistical models that have been fit to a data set, in order to identify the model that best fits the population from which the data were sampled. ...

  • t-test
  • Lack-of-fit sum of squares
    Lack-of-fit sum of squares
    In statistics, a sum of squares due to lack of fit, or more tersely a lack-of-fit sum of squares, is one of the components of a partition of the sum of squares in an analysis of variance, used in the numerator in an F-test of the null hypothesis that says that a proposed model fits well.- Sketch of...

  • Confidence band
    Confidence band
    A confidence band is used in statistical analysis to represent the uncertainty in an estimate of a curve or function based on limited or noisy data. Confidence bands are often used as part of the graphical presentation of results in a statistical analysis...

  • Coefficient of determination
    Coefficient of determination
    In statistics, the coefficient of determination R2 is used in the context of statistical models whose main purpose is the prediction of future outcomes on the basis of other related information. It is the proportion of variability in a data set that is accounted for by the statistical model...

  • Multiple correlation
    Multiple correlation
    In statistics, multiple correlation is a linear relationship among more than two variables. It is measured by the coefficient of multiple determination, denoted as R2, which is a measure of the fit of a linear regression...

  • Scheffé's method
    Scheffé's method
    In statistics, Scheffé's method, named after Henry Scheffé, is a method for adjusting significance levels in a linear regression analysis to account for multiple comparisons...


Challenges to regression modeling

  • Autocorrelation
    Autocorrelation
    Autocorrelation is the cross-correlation of a signal with itself. Informally, it is the similarity between observations as a function of the time separation between them...

  • Cointegration
    Cointegration
    Cointegration is a statistical property of time series variables. Two or more time series are cointegrated if they share a common stochastic drift.-Introduction:...

  • Multicollinearity
    Multicollinearity
    Multicollinearity is a statistical phenomenon in which two or more predictor variables in a multiple regression model are highly correlated. In this situation the coefficient estimates may change erratically in response to small changes in the model or the data...

  • Homoscedasticity and heteroscedasticity
  • Lack of fit
    Goodness of fit
    The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, e.g...

  • Non-normality of errors
    Normality test
    In statistics, normality tests are used to determine whether a data set is well-modeled by a normal distribution or not, or to compute how likely an underlying random variable is to be normally distributed....

  • Outlier
    Outlier
    In statistics, an outlier is an observation that is numerically distant from the rest of the data. Grubbs defined an outlier as: An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs....

    s

Diagnostics for regression models

  • Regression model validation
  • Studentized residual
    Studentized residual
    In statistics, a studentized residual is the quotient resulting from the division of a residual by an estimate of its standard deviation. Typically the standard deviations of residuals in a sample vary greatly from one data point to another even when the errors all have the same standard...

  • Cook's distance
    Cook's distance
    In statistics, Cook's distance is a commonly used estimate of the influence of a data point when doing least squares regression analysis. In a practical ordinary least squares analysis, Cook's distance can be used in several ways: to indicate data points that are particularly worth checking for...

  • Variance inflation factor
    Variance inflation factor
    In statistics, the variance inflation factor quantifies the severity of multicollinearity in an ordinary least squares regression analysis...

  • DFFITS
  • Partial residual plot
    Partial residual plot
    In applied statistics, a partial residual plot is a graphical technique that attempts to show the relationship between a given independent variable and the response variable given that other independent variables are also in the model.-Background:...

  • Partial regression plot
    Partial regression plot
    In applied statistics, a partial regression plot attempts to show the effect of adding an additional variable to the model...

  • Leverage
    Leverage (statistics)
    In statistics, leverage is a term used in connection with regression analysis and, in particular, in analyses aimed at identifying those observations that are far away from corresponding average predictor values...

  • Durbin–Watson statistic

Formal aids to model selection

  • Model selection
    Model selection
    Model selection is the task of selecting a statistical model from a set of candidate models, given data. In the simplest cases, a pre-existing set of data is considered...

  • Mallows' Cp
    Mallows' Cp
    In statistics, Mallows' Cp, named for Colin L. Mallows, is used to assess the fit of a regression model that has been estimated using ordinary least squares. It is applied in the context of model selection, where a number of predictor variables are available for predicting some outcome, and the...

  • Akaike information criterion
    Akaike information criterion
    The Akaike information criterion is a measure of the relative goodness of fit of a statistical model. It was developed by Hirotsugu Akaike, under the name of "an information criterion" , and was first published by Akaike in 1974...

  • Bayesian information criterion
  • Hannan–Quinn information criterion
  • Cross validation

Terminology

  • Linear model
    Linear model
    In statistics, the term linear model is used in different ways according to the context. The most common occurrence is in connection with regression models and the term is often taken as synonymous with linear regression model. However the term is also used in time series analysis with a different...

     

    — relates to meaning of "linear"

  • Dependent and independent variables
    Dependent and independent variables
    The terms "dependent variable" and "independent variable" are used in similar but subtly different ways in mathematics and statistics as part of the standard terminology in those subjects...

  • Errors and residuals in statistics
    Errors and residuals in statistics
    In statistics and optimization, statistical errors and residuals are two closely related and easily confused measures of the deviation of a sample from its "theoretical value"...

  • Hat matrix
    Hat matrix
    In statistics, the hat matrix, H, maps the vector of observed values to the vector of fitted values. It describes the influence each observed value has on each fitted value...

  • Trend stationary
  • Cross-sectional data
    Cross-sectional data
    Cross-sectional data or cross section in statistics and econometrics is a type of one-dimensional data set. Cross-sectional data refers to data collected by observing many subjects at the same point of time, or without regard to differences in time...

  • Time series
    Time series
    In statistics, signal processing, econometrics and mathematical finance, a time series is a sequence of data points, measured typically at successive times spaced at uniform time intervals. Examples of time series are the daily closing value of the Dow Jones index or the annual flow volume of the...


Methods for dependent data

  • Mixed model
    Mixed model
    A mixed model is a statistical model containing both fixed effects and random effects, that is mixed effects. These models are useful in a wide variety of disciplines in the physical, biological and social sciences....

  • Random effects model
  • Hierarchical linear models

Other forms of regression
  • Total least squares regression
  • Deming regression
    Deming regression
    In statistics, Deming regression, named after W. Edwards Deming, is an errors-in-variables model which tries to find the line of best fit for a two-dimensional dataset...

  • Errors-in-variables model
    Errors-in-variables model
    Total least squares, also known as errors in variables, rigorous least squares, or orthogonal regression, is a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account...

  • Instrumental variables regression
  • Quantile regression
    Quantile regression
    Quantile regression is a type of regression analysis used in statistics. Whereas the method of least squares results in estimates that approximate the conditional mean of the response variable given certain values of the predictor variables, quantile regression results in estimates approximating...

  • Generalized additive model
    Generalized additive model
    In statistics, the generalized additive model is a statistical model developed by Trevor Hastie and Rob Tibshirani for blending properties of generalized linear models with additive models....

  • Autoregressive model
    Autoregressive model
    In statistics and signal processing, an autoregressive model is a type of random process which is often used to model and predict various types of natural phenomena...

  • Moving average model
    Moving average model
    In time series analysis, the moving-average model is a common approach for modeling univariate time series models. The notation MA refers to the moving average model of order q:...

  • Autoregressive moving average model
    Autoregressive moving average model
    In statistics and signal processing, autoregressive–moving-average models, sometimes called Box–Jenkins models after the iterative Box–Jenkins methodology usually used to estimate them, are typically applied to autocorrelated time series data.Given a time series of data Xt, the ARMA model is a...

  • Autoregressive integrated moving average
    Autoregressive integrated moving average
    In statistics and econometrics, and in particular in time series analysis, an autoregressive integrated moving average model is a generalization of an autoregressive moving average model. These models are fitted to time series data either to better understand the data or to predict future points...

  • Autoregressive conditional heteroskedasticity
    Autoregressive conditional heteroskedasticity
    In econometrics, AutoRegressive Conditional Heteroskedasticity models are used to characterize and model observed time series. They are used whenever there is reason to believe that, at any point in a series, the terms will have a characteristic size, or variance...


See also
  • Prediction
    Prediction
    A prediction or forecast is a statement about the way things will happen in the future, often but not always based on experience or knowledge...

  • Design of experiments
    Design of experiments
    In general usage, design of experiments or experimental design is the design of any information-gathering exercises where variation is present, whether under the full control of the experimenter or not. However, in statistics, these terms are usually used for controlled experiments...

  • Data transformation
    Data transformation (statistics)
    In statistics, data transformation refers to the application of a deterministic mathematical function to each point in a data set — that is, each data point zi is replaced with the transformed value yi = f, where f is a function...

  • Box–Cox transformation
    Power transform
    In statistics, the power transform is from a family of functions that are applied to create a rank-preserving transformation of data using power functions. This is a useful data processing technique used to stabilize variance, make the data more normal distribution-like, improve the correlation...

  • Machine learning
    Machine learning
    Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

  • Analysis of variance
  • Causal inference
    Causality
    Causality is the relationship between an event and a second event , where the second event is understood as a consequence of the first....

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK