All Topics  
Simple linear regression

 

   Email Print
   Bookmark   Link






 

Simple linear regression



 
 
A simple linear regression is a linear regression
Linear regression

In statistics, linear regression is used for two things;Linear regression is a form of regression analysis in which the relationship between one or more independent variables and another variable, called the dependent variable, is modeled by a least squares function, called linear regression equation....
 in which there is only one covariate
Covariate

In statistics, a covariate is a variable that is possibly predictive of the outcome under study. A covariate may be of direct interest or it may be a confounding or Interaction variable....
 (predictor variable). Simple linear regression is a form of multiple regression.

Simple linear regression is used in situations to evaluate the linear relationship between two variables. One example could be the relationship between muscle strength and lean body mass. Another way to put it is that simple linear regression is used to develop an equation by which we can predict or estimate a dependent variable given an independent variable. Given a sample , the regression model is given by



Where is the dependent variable, is the y intercept, is the gradient or slope of the line, is independent variable and is a random term associated with each observation. The linear relationship between the two variables (i.e.






Discussion
Ask a question about 'Simple linear regression'
Start a new discussion about 'Simple linear regression'
Answer questions from other users
Full Discussion Forum



Encyclopedia


A simple linear regression is a linear regression
Linear regression

In statistics, linear regression is used for two things;Linear regression is a form of regression analysis in which the relationship between one or more independent variables and another variable, called the dependent variable, is modeled by a least squares function, called linear regression equation....
 in which there is only one covariate
Covariate

In statistics, a covariate is a variable that is possibly predictive of the outcome under study. A covariate may be of direct interest or it may be a confounding or Interaction variable....
 (predictor variable). Simple linear regression is a form of multiple regression.

Simple linear regression is used in situations to evaluate the linear relationship between two variables. One example could be the relationship between muscle strength and lean body mass. Another way to put it is that simple linear regression is used to develop an equation by which we can predict or estimate a dependent variable given an independent variable. Given a sample , the regression model is given by



Where is the dependent variable, is the y intercept, is the gradient or slope of the line, is independent variable and is a random term associated with each observation. The linear relationship between the two variables (i.e. dependent and independent) can be measured using a correlation coefficient e.g. the Pearson product moment correlation coefficient.

Estimating the regression line


The parameters of the linear regression model, , can be estimated using the method of ordinary least squares. This method finds the line that minimizes the sum of the squares of errors, .

The minimization problem can be solved using calculus, producing the following formulas for the estimates of the regression parameters:

Ordinary least squares produces the following features:

1. The line goes through the point . This is easily seen rearranging the expression as , which shows that the point verifies the fitted regression equation.

2. The sum of the residuals is equal to zero, if the model includes a constant. To see why, minimize with respect to a taking the following partial derivative:

Setting this partial derivative to zero and noting that yields as desired.


3. The linear combination of the residuals in which the coefficients are the x-values is equal to zero.

4. The estimates are unbiased.

Alternative formulas for the slope coefficient

There are alternative (and simpler) formulas for calculating :



Here, r is the correlation coefficient of X and Y, sx is the sample standard deviation
Standard deviation

In statistics, standard deviation is a simple measure of the variability or statistical dispersion of a data set. A low standard deviation indicates that all of the data points are very close to the same value , while high standard deviation indicates that the data are ?spread out? over a large range of values....
 of X and sy is the sample standard deviation of Y.

Inference


Under the assumption that the error term is normally distributed, the estimate of the slope coefficient has a normal distribution
Normal distribution

The normal distribution, also called the Gaussian distribution, is an important family of continuous probability distributions, applicable in many fields....
 with mean equal to b and standard error given by:



A confidence interval for b can be created using a t-distribution with N-2 degrees of freedom:



Numerical example

Suppose we have the sample of points . The mean of X is 3 and the mean of Y is 2. The slope coefficient estimate is given by:



The standard error of the coefficient is 0.866. A 95% confidence interval is given by

[0.5 − 0.866 × 12.7062, 0.5 + 0.866 × 12.7062] = [−10.504, 11.504].

Mathematical derivation of the least squares estimates


Assume that is a stochastic simple regression model and let be a sample of size n. Here the sample is seen as observable nonrandom variables but the calculations don't change when assuming that the sample is represented by random variables .

Let Q be the sum of squared errors:

Then taking partial derivatives with respect to and :

Setting and to zero yields

which are known as the normal equations and can be written in matrix notation as

Using Cramer's rule we get

Dividing the last expression by n:

Isolating from the first normal equation yields

which is a common formula for in terms of and the sample means.

may also be written as

using the following equalities:

The following calculation shows that is a minimum.

Hence the Hessian matrix of Q is given by

Since and , is positive definite for all and is a minimum.