Kernel regression
Encyclopedia
The kernel regression is a non-parametric technique in statistics to estimate the conditional expectation
Conditional expectation
In probability theory, a conditional expectation is the expected value of a real random variable with respect to a conditional probability distribution....

 of a random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

. The objective is to find a non-linear relation between a pair of random variables X and Y.

In any nonparametric regression
Nonparametric regression
Nonparametric regression is a form of regression analysis in which the predictor does not take a predetermined form but is constructed according to information derived from the data...

, the conditional expectation
Conditional expectation
In probability theory, a conditional expectation is the expected value of a real random variable with respect to a conditional probability distribution....

 of a variable relative to a variable may be written:



where is an unknown function.

Nadaraya-Watson kernel regression

Nadaraya (1964) and Watson (1964) proposed to estimate as a locally weighted average, using a kernel
Kernel (statistics)
A kernel is a weighting function used in non-parametric estimation techniques. Kernels are used in kernel density estimation to estimate random variables' density functions, or in kernel regression to estimate the conditional expectation of a random variable. Kernels are also used in time-series,...

 as a weighting function. The Nadaraya-Watson estimator is:



where is a kernel with a bandwidth . The fraction is a weighting term with sum 1.

Derivation



Using the kernel density estimation
Kernel density estimation
In statistics, kernel density estimation is a non-parametric way of estimating the probability density function of a random variable. Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample...

 for the joint distribution f(x,y) and f(x) with a kernel K,

,



we obtain the Nadaraya-Watson estimator.

Priestley-Chao kernel estimator


Gasser-Müller kernel estimator



where

Example

This example is based upon Canadian cross-section wage data consisting
of a random sample taken from the 1971 Canadian Census Public Use
Tapes for male individuals having common education (grade 13). There
are 205 observations in total.

We consider estimating the unknown regression function using
Nadaraya-Watson kernel regression via the
R np package
that uses automatic (data-driven) bandwidth selection; see the np vignette for an introduction to the np package.

The figure below shows the estimated regression function using a
second order Gaussian kernel along with asymptotic variability bounds

Estimated Regression Function.

Script for example

The following commands of the R programming language use the
npreg function to deliver optimal smoothing and to create
the figure given above. These commands can be entered at the command
prompt via cut and paste.

library(np) /* non parametic library */
data(cps71)
attach(cps71)

m <- npreg(logwage~age)

plot(m,plot.errors.method="asymptotic",
plot.errors.style="band",
ylim=c(11,15.2))

points(age,logwage,cex=.25)

Related

According to Salsburg(2002, page 290-291), the algorithms used in kernel regression were independently developed and used in Fuzzy Systems: "Coming up with almost exactly the same computer algorithm, fuzzy systems and kernel density-based regressions appear to have been developed completely independently of one another."

Statistical implementation

  • Stata
    Stata
    Stata is a general-purpose statistical software package created in 1985 by StataCorp. It is used by many businesses and academic institutions around the world...

     kernreg2

 kernreg2 y x, bwidth(.5) kercode(3) npoint(500) gen(kernelprediction gridofpoints)
  • R
    R (programming language)
    R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software, and R is widely used for statistical software development and data analysis....

    : npreg (package np)
  • GNU/octave
    GNU Octave
    GNU Octave is a high-level language, primarily intended for numerical computations. It provides a convenient command-line interface for solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly compatible with MATLAB...

     mathematical program package:

External links

  • Scale-adaptive kernel regression (with Matlab software).
  • An online kernel regression demonstration Requires .NET 3.0 or later.
  • The np package An R
    R (programming language)
    R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software, and R is widely used for statistical software development and data analysis....

    package that provides a variety of nonparametric and semiparametric kernel methods that seamlessly handle a mix of continuous, unordered, and ordered factor data types.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK