All Topics  
Curve fitting

 

   Email Print
   Bookmark   Link






 

Curve fitting



 
 
Curve fitting is finding a curve which has the best fit to a series of data points and possibly other constraints. This section is an introduction to both interpolation
Interpolation

In the mathematics subfield of numerical analysis, interpolation is a method of constructing new data points within the range of a discrete set of known data points....
 (where an exact fit to constraints is expected) and regression analysis
Regression analysis

In statistics, regression analysis is a collective name for techniques for the modeling and analysis of numerical data consisting of values of a dependent variable and of one or more independent variables ....
. Both are sometimes used for extrapolation
Extrapolation

In mathematics, extrapolation is the process of constructing new data points outside a discrete set of known data points. It is similar to the process of interpolation, which constructs new points between known points, but the results of extrapolations are often less meaningful, and are subject to greater uncertainty....
. Regression analysis allows for an approximate fit by minimizing the difference between the data points and the curve.

is a line with slope
Slope

Slope is used to describe the steepness, incline, gradient, or grade of a line . A higher slope value indicates a steeper incline. The slope is defined as the ratio of the "rise" divided by the "run" between two points on a line, or in other words, the ratio of the altitude change to the horizontal distance between any two point...
 a.






Discussion
Ask a question about 'Curve fitting'
Start a new discussion about 'Curve fitting'
Answer questions from other users
Full Discussion Forum



Encyclopedia


Curve fitting is finding a curve which has the best fit to a series of data points and possibly other constraints. This section is an introduction to both interpolation
Interpolation

In the mathematics subfield of numerical analysis, interpolation is a method of constructing new data points within the range of a discrete set of known data points....
 (where an exact fit to constraints is expected) and regression analysis
Regression analysis

In statistics, regression analysis is a collective name for techniques for the modeling and analysis of numerical data consisting of values of a dependent variable and of one or more independent variables ....
. Both are sometimes used for extrapolation
Extrapolation

In mathematics, extrapolation is the process of constructing new data points outside a discrete set of known data points. It is similar to the process of interpolation, which constructs new points between known points, but the results of extrapolations are often less meaningful, and are subject to greater uncertainty....
. Regression analysis allows for an approximate fit by minimizing the difference between the data points and the curve.

Different types of curve fitting


Fitting lines and polynomial curves to data points


Let's start with a first degree polynomial
Polynomial

In mathematics, a polynomial is an expression constructed from variables and constants, using the operations of addition, subtraction, multiplication, and constant non-negative whole number exponents....
 equation:

This is a line with slope
Slope

Slope is used to describe the steepness, incline, gradient, or grade of a line . A higher slope value indicates a steeper incline. The slope is defined as the ratio of the "rise" divided by the "run" between two points on a line, or in other words, the ratio of the altitude change to the horizontal distance between any two point...
 a. We know that a line will connect any two points. So, a first degree polynomial equation is an exact fit through any two points.

If we increase the order of the equation to a second degree polynomial, we get:

This will exactly fit three points.

If we increase the order of the equation to a third degree polynomial, we get:

This will exactly fit four points.

A more general statement would be to say it will exactly fit four constraints. Each constraint can be a point, angle
Angle

In geometry and trigonometry, an angle is the figure formed by two Ray sharing a common endpoint, called the vertex of the angle . The magnitude of the angle is the "amount of rotation" that separates the two rays, and can be measured by considering the length of circular arc swept out when one ray is rotated about the vertex to coincide...
, or curvature
Curvature

In mathematics, curvature refers to any of a number of loosely related concepts in different areas of geometry. Intuitively, curvature is the amount by which a geometric object deviates from being flat, or straight in the case of a line , but this is defined in different ways depending on the context....
 (which is the reciprocal of the radius of an osculating circle
Osculating circle

In differential geometry of curves, the osculating circle of a sufficiently smooth plane curve at a given point on the curve is the circle whose center lies on the inner normal line and whose curvature is the same as that of the given curve at that point....
). Angle and curvature constraints are most often added to the ends of a curve, and in such cases are called end conditions. Identical end conditions are frequently used to ensure a smooth transition between polynomial curves contained within a single spline
Spline (mathematics)

In mathematics, a spline is a special Function defined piecewise by polynomials.In interpolation problems, spline interpolation is often preferred to polynomial interpolation because it yields similar results, even when using low degree polynomials, while avoiding Runge's phenomenon for higher degrees....
. Higher-order constraints, such as "the change in the rate of curvature", could also be added. This, for example, would be useful in highway cloverleaf
Cloverleaf interchange

A cloverleaf interchange is a two-level interchange in which left turns are handled by loop roads . To go left , vehicles first pass either over or under the other road, then turn right onto a one-way three-fourths loop ramp and merge onto the intersecting road....
 design to understand the forces applied to a car, as it follows the cloverleaf, and to set reasonable speed limits, accordingly.

Bearing this in mind, the first degree polynomial equation could also be an exact fit for a single point and an angle while the third degree polynomial equation could also be an exact fit for two points, an angle constraint, and a curvature constraint. Many other combinations of constraints are possible for these and for higher order polynomial equations.

If we have more than n + 1 constraints (n being the degree of the polynomial), we can still run the polynomial curve through those constraints. An exact fit to all the constraints is not certain (but might happen, for example, in the case of a first degree polynomial exactly fitting three collinear points). In general, however, some method is then needed to evaluate each approximation. The least squares
Least squares

The method of least squares or ordinary least squares is used to solve overdetermined systems. Least squares is often applied in statistical contexts, particularly regression analysis....
 method is one way to compare the deviations.

Now, you might wonder why we would ever want to get an approximate fit when we could just increase the degree of the polynomial equation and get an exact match. There are several reasons:

  • Even if an exact match exists, it does not necessarily follow that we can find it. Depending on the algorithm used, we may encounter a divergent case, where the exact fit cannot be calculated, or it might take too much computer time to find the solution. Either way, you might end up having to accept an approximate solution.


  • We may actually prefer the effect of averaging out questionable data points in a sample, rather than distorting the curve to fit them exactly.


  • High order polynomials can be highly oscillatory. If we run a curve through two points A and B, we would expect the curve to run somewhat near the midpoint of A and B, as well. This may not happen with high-order polynomial curves, they may even have values that are very large in positive or negative magnitude
    Magnitude (mathematics)

    The magnitude of a mathematical object is its size: a property by which it can be larger or smaller than other objects of the same kind; in technical terms, an ordering of the class of objects to which it belongs....
    . With low-order polynomials, the curve is more likely to fall near the midpoint (it's even guaranteed to exactly run through the midpoint on a first degree polynomial).


  • Low-order polynomials tend to be smooth and high order polynomial curves tend to be "lumpy". To define this more precisely, the maximum number of ogee/inflection point
    Inflection point

    In differential calculus, an inflection point, or point of inflection is a point on a curve at which the curvature changes Negative and non-negative numbers....
    s possible in a polynomial curve is n-2, where n is the order of the polynomial equation. An inflection point is a location on the curve where it switches from a positive radius to negative. We can also say this is where it transitions from "holding water" to "shedding water". Note that it is only "possible" that high order polynomials will be lumpy, they could also be smooth, but there is no guarantee of this, unlike with low order polynomial curves. A fifteenth degree polynomial could have, at most, thirteen inflection points, but could also have twelve, eleven, or any number down to zero.


Now that we have talked about using a degree too low for an exact fit, let's also discuss what happens if the degree of the polynomial curve is higher than needed for an exact fit. This is bad for all the reasons listed previously for high order polynomials, but also leads to a case where there are an infinite number of solutions. For example, a first degree polynomial (a line) constrained by only a single point, instead of the usual two, would give us an infinite number of solutions. This brings up the problem of how to compare and choose just one solution, which can be a problem for software and for humans, as well. For this reason, it is usually best to choose as low a degree as possible for an exact match on all constraints, and perhaps an even lower degree, if an approximate fit is acceptable.

For more details, see the polynomial interpolation
Polynomial interpolation

In the mathematics subfield of numerical analysis, polynomial interpolation is the interpolation of a given data set by a polynomial. In other words, given some data points , the aim is to find a polynomial which goes exactly through these points....
 article.


Fitting other curves to data points


Other types of curves, such as conic sections (circular, elliptical, parabolic, and hyperbolic arcs) or trigonometric functions (such as sine and cosine), may also be used, in certain cases. For example, trajectories of objects under the influence of gravity follow a parabolic path, when air resistance is ignored. Hence, matching trajectory data points to a parabolic curve would make sense. Tides follow sinusoidal patterns, hence tidal data points should be matched to a sine wave, or the sum of two sine waves of different periods, if the effects of the Moon and Sun are both considered.

Algebraic fit versus geometric fit for curves


For algebraic analysis of data, "fitting" usually means trying to find the curve that minimizes the vertical (i.e. y-axis) displacement of a point from the curve (e.g. ordinary least squares). However for graphical and image applications geometric fitting seeks to provide the best visual fit; which usually means trying to minimize the orthogonal distance to the curve (e.g. total least squares), or to otherwise include both axes of displacement of a point from the curve. Geometric fits are not popular because they usually require non-linear and/or iterative calculations, although they have the advantage of a more aesthetic and geometrically accurate result.

Fitting a circle by geometric fit


Coope approaches the problem of trying to find the best visual fit of circle to a set of 2D data points. The method elegantly transforms the ordinarily non-linear problem into a linear problem that can be solved without using iterative numerical methods, and is hence an order of magnitude faster than previous techniques.

Fitting an ellipse by geometric fit


The above technique is extended to general ellipses by adding a non-linear step, resulting in a method that is fast, yet finds visually pleasing ellipses of arbitrary orientation and displacement.

Application to surfaces


Note that while this discussion was in terms of 2D curves, much of this logic also extends to 3D surfaces, each patch of which is defined by a net of curves in two parametric directions, typically called u and v. A surface may be composed of one or more surface patches in each direction.

For more details, see the computer representation of surfaces
Computer representation of surfaces

In technical applications of 3D computer graphics such as computer-aided design and computer-aided manufacturing, surfaces are one way of representing objects....
 article.


Software


Many statistical packages
List of statistical packages

A statistical package is a suite of computer programs that are specialised for statistics. It enables people to obtain the results of standard statistical procedures and statistical significance tests, without requiring low-level numerical programming....
 such as R
R (programming language)

In computing, R is a programming language and software environment for statistics computing and graphics. It is an implementation of the S programming language with lexical scoping semantics inspired by Scheme ....
 and numerical software
List of numerical analysis software

Listed here are a number of computer programs used for performing numerical analysis calculations:* ADMB is a software suite for non-linear statistical modeling based on C++ which uses automatic differentiation....
 such as the GNU Scientific Library
GNU Scientific Library

In computing, the GNU Scientific Library is a software library written in the C for numerical calculations in applied mathematics and science....
 and SciPy
SciPy

SciPy is an open source library of algorithms and mathematical tools for the Python .SciPy contains modules for Optimization , Linear algebra, Integral, Interpolation Special functions, Fast Fourier transform, signal processing and , ordinary differential equation solvers and other tasks common in science and engineering....
 include commands for doing curve fitting in a variety of scenarios. There are also programs specifically written to do curve fitting, such as TableCurve, Fityk
Fityk

Fityk is a curve fitting and data analysis application, predominantly used to fit analytical,bell-shaped functions to experimental data.It is positioned to fill the gap between general :Category:plotting software...
 and so on; see the external links section for more details.

See also

  • Smoothing
    Smoothing

    In statistics and , to smooth a data set is to create an approximating function that attempts to capture important patterns in the data, while leaving out noise or other fine-scale structures/rapid phenomena....
  • Levenberg–Marquardt algorithm
  • Nonlinear regression
    Nonlinear regression

    In statistics, nonlinear regression is a form of regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one or more independent variables....
  • Total least squares


External links


Implementations


Online textbooks

  • from GraphPad Software


Online calculators, applications and demos

  • Fit a set of data points to a linear combination of specified functions
  • Online curve and surface fitting application
  • by Theodore Gray
    Theodore Gray

    Theodore W. Gray is one of the founders of Wolfram Research and is currently Wolfram's Director of User Interface Technology.He created a wooden periodic table with compartments for each of the elements....
    , The Wolfram Demonstrations Project.