{{other uses}}
{{Calculus|cTopic=Differentiation}}
In

calculusCalculus is a branch of mathematics focused on limits, functions, derivatives, integrals, and infinite series. This subject constitutes a major part of modern mathematics education. It has two major branches, differential calculus and integral calculus, which are related by the fundamental theorem...

, the

**chain rule** is a

formulaIn mathematics, a formula is an entity constructed using the symbols and formation rules of a given logical language....

for computing the

derivativeIn calculus, a branch of mathematics, the derivative is a measure of how a function changes as its input changes. Loosely speaking, a derivative can be thought of as how much one quantity is changing in response to changes in some other quantity; for example, the derivative of the position of a...

of the composition of two or more

functionsIn mathematics, a function associates one quantity, the argument of the function, also known as the input, with another quantity, the value of the function, also known as the output. A function assigns exactly one output to each input. The argument and the value may be real numbers, but they can...

. That is, if

*f* is a function and

*g* is a function, then the chain rule expresses the derivative of the composite function {{nowrap|

*f ∘ g*}} in terms of the derivatives of

*f* and

*g*.
In

integrationIntegration is an important concept in mathematics and, together with its inverse, differentiation, is one of the two main operations in calculus...

, the counterpart to the chain rule is the substitution rule.

## History

The chain rule seems to have first been used by Leibniz. He used it to calculate the derivative of

$\backslash sqrt\{a\; +\; bz\; +\; cz^2\}$ as the composite of the square root function and the function

$a\; +\; bz\; +\; cz^2$. However, he did not state it as a separate rule, nor did anyone else for a long time.

L'HôpitalGuillaume François Antoine, Marquis de l'Hôpital was a French mathematician. His name is firmly associated with l'Hôpital's rule for calculating limits involving indeterminate forms 0/0 and ∞/∞...

uses the chain rule implicitly in his

*Analyse des infiniment petits* but also does not state it explicitly. The chain rule does not appear in any of

Leonhard EulerLeonhard Euler was a pioneering Swiss mathematician and physicist. He made important discoveries in fields as diverse as infinitesimal calculus and graph theory. He also introduced much of the modern mathematical terminology and notation, particularly for mathematical analysis, such as the notion...

's analysis books, even though they were written over a hundred years after Leibniz's discovery. The first explicit and modern statement of the chain rule was in

LagrangeLa Grange literally means the barn in French. Lagrange may refer to:- People :* Charles Varlet de La Grange , French actor* Georges Lagrange , translator to and writer in Esperanto...

's 1797

*Théorie des fonctions analytiques*.

## The chain rule in one dimension

NEWLINE

NEWLINE-
*For an explanation of notation used in this section, see Function composition*In mathematics, function composition is the application of one function to the results of another. For instance, the functions and can be composed by computing the output of g when it has an argument of f instead of x...

.

NEWLINE

### First example

Suppose that a skydiver jumps from an aircraft. Assume that

*t* seconds after his jump, his height above sea level in meters is given by {{nowrap begin}}

*g*(

*t*) = 4000 − 4.9

*t*^{2}{{nowrap end}}. One model for the

atmospheric pressureAtmospheric pressure is the force per unit area exerted into a surface by the weight of air above that surface in the atmosphere of Earth . In most circumstances atmospheric pressure is closely approximated by the hydrostatic pressure caused by the weight of air above the measurement point...

at a height

*h* is {{nowrap begin}}

*f*(

*h*) = {{gaps|101|325}}

*e*^{−0.0001h}{{nowrap end}}. These two equations can be differentiated and combined in various ways to produce the following data:

*g*′(

*t*) = −9.8

*t*{{nowrap end}} is the velocity of the skydiver at time

*t*.

*f*′(

*h*) = −10.1325

*e*^{−0.0001h}{{nowrap end}} is the rate of change in atmospheric pressure with respect to height at the height

*h* and is proportional to the

buoyant forceIn physics, buoyancy is a force exerted by a fluid that opposes an object's weight. In a column of fluid, pressure increases with depth as a result of the weight of the overlying fluid. Thus a column of fluid, or an object submerged in the fluid, experiences greater pressure at the bottom of the...

on the skydiver at

*h* meters above sea level. (The true buoyant force depends on the volume of the skydiver.) is the atmospheric pressure the skydiver experiences

*t* seconds after his jump. is the rate of change in atmospheric pressure with respect to time at

*t* seconds after the skydiver's jump and is proportional to the buoyant force on the skydiver at

*t* seconds after his jump.
The chain rule gives a method for computing {{nowrap|(

*f* ∘

*g*)′(

*t*)}} in terms of {{nowrap|

*f*′}} and {{nowrap|

*g*′}}. While it is always possible to directly apply the definition of the derivative to compute the derivative of a composite function, this is usually very difficult. The utility of the chain rule is that it turns a complicated derivative into several easy derivatives.
The chain rule states that, under appropriate conditions,

$(f\; \backslash circ\; g)\text{'}(t)\; =\; f\text{'}(g(t))g\text{'}(t).$
In this example, this equals

$(f\; \backslash circ\; g)\text{'}(t)\; =\; \backslash big(\backslash mathord\{-\}10.1325e^\{-0.0001(4000\; -\; 4.9t^2)\}\backslash big)\backslash cdot\backslash big(\backslash mathord\{-\}9.8t\backslash big).$
In the statement of the chain rule,

*f* and

*g* play slightly different roles because

*f*′ is evaluated at

*g*(

*t*) whereas

*g*′ is evaluated at

*t*. This is necessary to make the units work out correctly. For example, suppose that we want to compute the rate of change in atmospheric pressure ten seconds after the skydiver jumps. This is {{nowrap|(

*f* ∘

*g*)′(10)}} and has units of

PascalThe pascal is the SI derived unit of pressure, internal pressure, stress, Young's modulus and tensile strength, named after the French mathematician, physicist, inventor, writer, and philosopher Blaise Pascal. It is a measure of force per unit area, defined as one newton per square metre...

s per second. The factor

*g*′(10) in the chain rule is the velocity of the skydiver ten seconds after his jump, and it is expressed in meters per second.

*f*′(

*g*(10)) is the change in pressure with respect to height at the height

*g*(10) and is expressed in Pascals per meter. The product of

*f*′(

*g*(10)) and

*g*′(10) therefore has the correct units of Pascals per second. It is not possible to evaluate

*f* anywhere else. For instance, because the 10 in the problem represents ten seconds, the expression

*f*′(10) represents the change in pressure at a height of ten seconds, which is nonsense. Similarly, because {{nowrap|

*g*′(10) {{=}} −98}} meters per second, the expression

*f*′(

*g*′(10)) represents the change in pressure at a height of −98 meters per second, which is also nonsense. However,

*g*(10) is 3020 meters above sea level, the height of the skydiver ten seconds after his jump. This has the correct units for an input to

*f*.

### Statement of the rule

The simplest form of the chain rule is for real-valued functions of one

realIn mathematics, a real number is a value that represents a quantity along a continuum, such as -5 , 4/3 , 8.6 , √2 and π...

variable. It says that if

*g* is a function that is differentiable at a point

*c* (i.e. the derivative

*g*′(

*c*) exists) and

*f* is a function that is differentiable at

*g*(

*c*), then the composite function

*f* ∘

*g* is differentiable at

*c*, and the derivative is

$(f\backslash circ\; g)\text{'}(c)\; =\; f\text{'}(g(c))\backslash cdot\; g\text{'}(c).$
The rule is sometimes abbreviated as

$(f\backslash circ\; g)\text{'}\; =\; (f\text{'}\backslash circ\; g)\; \backslash cdot\; g\text{'}.\backslash ,$
If {{nowrap begin}}

*y* =

*f*(

*u*){{nowrap end}} and {{nowrap begin}}

*u* =

*g*(

*x*){{nowrap end}}, then this abbreviated form is written in

Leibniz notationIn calculus, Leibniz's notation, named in honor of the 17th-century German philosopher and mathematician Gottfried Wilhelm Leibniz, uses the symbols dx and dy to represent "infinitely small" increments of x and y, just as Δx and Δy represent finite increments of x and y...

as:

$\backslash frac\{dy\}\{dx\}\; =\; \backslash frac\{dy\}\{du\}\; \backslash cdot\; \backslash frac\{du\}\{dx\}.$
The points where the derivatives are evaluated may also be stated explicitly:

$\backslash left.\backslash frac\{dy\}\{dx\}\backslash right|\_\{x=c\}\; =\; \backslash left.\backslash frac\{dy\}\{du\}\backslash right|\_\{u\; =\; g(c)\}\; \backslash cdot\; \backslash left.\backslash frac\{du\}\{dx\}\backslash right|\_\{x=c\}.\backslash ,$
#### The chain rule in the absence of formulas

It may be possible to apply the chain rule even when there are no formulas for the functions which are being differentiated. This can happen when the derivatives are measured directly. Suppose that a car is driving up a tall mountain. The car's speedometer measures its velocity directly. If the grade is known, then the rate of ascent can be calculated using

trigonometryTrigonometry is a branch of mathematics that studies triangles and the relationships between their sides and the angles between these sides. Trigonometry defines the trigonometric functions, which describe those relationships and have applicability to cyclical phenomena, such as waves...

. Suppose that the car is ascending at 2.5 km/h. Standard models for the Earth's atmosphere imply that the temperature drops about 6.5 °C per kilometer ascended (see

lapse rateThe lapse rate is defined as the rate of decrease with height for an atmospheric variable. The variable involved is temperature unless specified otherwise. The terminology arises from the word lapse in the sense of a decrease or decline; thus, the lapse rate is the rate of decrease with height and...

). To find the temperature drop per hour, we apply the chain rule. Let the function

*g*(

*t*) be the altitude of the car at time

*t*, and let the function

*f*(

*h*) be the temperature

*h* kilometers above sea level.

*f* and

*g* are not known exactly: For example, the altitude where the car starts is not known and the temperature on the mountain is not known. However, their derivatives are known:

*f*′ is −6.5 °C/km, and

*g*′ is 2.5 km/h. The chain rule says that the derivative of the composite function is the product of the derivative of

*f* and the derivative of

*g*. This is {{nowrap begin}}−6.5 °C/km · 2.5 km/h = −16.25 °C/h{{nowrap end}}.
One of the reasons why this computation is possible is because

*f*′ is a constant function. This is because the above model is very simple. A more accurate description of how the temperature near the car varies over time would require an accurate model of how the temperature varies at different altitudes. This model may not have a constant derivative. To compute the temperature change in such a model, it would be necessary to know

*g* and not just

*g*′, because without knowing

*g* it is not possible to know where to evaluate

*f*′.

#### Composites of more than two functions

The chain rule can be applied to composites of more than two functions. To take the derivative of a composite of more than two functions, notice that the composite of

*f*,

*g*, and

*h* (in that order) is the composite of

*f* with {{nowrap|

*g* ∘

*h*}}. The chain rule says that to compute the derivative of {{nowrap|

*f* ∘

*g* ∘

*h*}}, it is sufficient to compute the derivative of

*f* and the derivative of {{nowrap|

*g* ∘

*h*}}. The derivative of

*f* can be calculated directly, and the derivative of {{nowrap|

*g* ∘

*h*}} can be calculated by applying the chain rule again.
For concreteness, consider the function

$y\; =\; e^\{\backslash sin\; \{x^2\}\}.$
This can be decomposed as the composite of three functions:

$\backslash begin\{align\}\; y\; \&=\; f(u)\; =\; e^u,\; \backslash \backslash \; u\; \&=\; g(v)\; =\; \backslash sin\; v,\; \backslash \backslash \; v\; \&=\; h(x)\; =\; x^2.\; \backslash end\{align\}$
Their derivatives are:

$\backslash begin\{align\}\; \backslash frac\{dy\}\{du\}\; \&=\; f\text{'}(u)\; =\; e^u,\; \backslash \backslash \; \backslash frac\{du\}\{dv\}\; \&=\; g\text{'}(v)\; =\; \backslash cos\; v,\; \backslash \backslash \; \backslash frac\{dv\}\{dx\}\; \&=\; h\text{'}(x)\; =\; 2x.\; \backslash end\{align\}$
The chain rule says that the derivative of their composite at the point {{nowrap begin}}

*x* =

*a*{{nowrap end}} is:

$(f\; \backslash circ\; g\; \backslash circ\; h)\text{'}(a)\; =\; f\text{'}((g\; \backslash circ\; h)(a))(g\; \backslash circ\; h)\text{'}(a)\; =\; f\text{'}((g\; \backslash circ\; h)(a))g\text{'}(h(a))h\text{'}(a).$
In Leibniz notation, this is:

$\backslash frac\{dy\}\{dx\}\; =\; \backslash left.\backslash frac\{dy\}\{du\}\backslash right|\_\{u=g(h(a))\}\backslash cdot\backslash left.\backslash frac\{du\}\{dv\}\backslash right|\_\{v=h(a)\}\backslash cdot\backslash left.\backslash frac\{dv\}\{dx\}\backslash right|\_\{x=a\},$
or for short,

$\backslash frac\{dy\}\{dx\}\; =\; \backslash frac\{dy\}\{du\}\backslash cdot\backslash frac\{du\}\{dv\}\backslash cdot\backslash frac\{dv\}\{dx\}.$
The derivative function is therefore:

$\backslash frac\{dy\}\{dx\}\; =\; e^\{\backslash sin\; \{x^2\}\}\backslash cdot\backslash cos\{x^2\}\backslash cdot\; 2x.$
Another way of computing this derivative is to view the composite function {{nowrap|

*f* ∘

*g* ∘

*h*}} as the composite of {{nowrap|

*f* ∘

*g*}} and

*h*. Applying the chain rule to this situation gives:

$(f\; \backslash circ\; g\; \backslash circ\; h)\text{'}(a)\; =\; (f\; \backslash circ\; g)\text{'}(h(a))h\text{'}(a)\; =\; f\text{'}(g(h(a))g\text{'}(h(a))h\text{'}(a).$
This is the same as what was computed above. This should be expected because {{nowrap|(

*f* ∘

*g*) ∘

*h* {{=}}

*f* ∘ (

*g* ∘

*h*)}}.

#### The quotient rule

{{see also|Quotient rule}}
The chain rule can be used to derive some well-known differentiation rules. For example, the quotient rule is a consequence of the chain rule and the product rule. To see this, write the function

*f*(

*x*)/

*g*(

*x*) as the product {{nowrap|

*f*(

*x*) · 1/

*g*(

*x*)}}. First apply the product rule:

$\backslash begin\{align\}\; \backslash frac\{d\}\{dx\}\backslash left(\backslash frac\{f(x)\}\{g(x)\}\backslash right)\; \&=\; \backslash frac\{d\}\{dx\}\backslash left(f(x)\backslash cdot\backslash frac\{1\}\{g(x)\}\backslash right)\; \backslash \backslash \; \&=\; f\text{'}(x)\backslash cdot\backslash frac\{1\}\{g(x)\}\; +\; f(x)\backslash cdot\backslash frac\{d\}\{dx\}\backslash left(\backslash frac\{1\}\{g(x)\}\backslash right).\; \backslash end\{align\}$
To compute the derivative of 1/

*g*(

*x*), notice that it is the composite of

*g* with the reciprocal function, that is, the function that sends

*x* to 1/

*x*. The derivative of the reciprocal function is −1/

*x*^{2}. By applying the chain rule, the last expression becomes:

$f\text{'}(x)\backslash cdot\backslash frac\{1\}\{g(x)\}\; +\; f(x)\backslash cdot\backslash left(-\backslash frac\{1\}\{g(x)^2\}\backslash cdot\; g\text{'}(x)\backslash right)$#### Derivatives of inverse functions

Suppose that {{nowrap begin}}

*y* =

*g*(

*x*){{nowrap end}} has an

inverse functionIn mathematics, an inverse function is a function that undoes another function: If an input x into the function ƒ produces an output y, then putting y into the inverse function g produces the output x, and vice versa. i.e., ƒ=y, and g=x...

. Call its inverse function

*f* so that we have {{nowrap begin}}

*x* =

*f*(

*y*){{nowrap end}}. There is a formula for the derivative of

*f* in terms of the derivative of

*g*. To see this, note that

*f* and

*g* satisfy the formula

$f(g(x))\; =\; x.$
Because the functions

*f*(

*g*(

*x*)) and

*x* are equal, their derivatives must be equal. The derivative of

*x* is the constant function with value 1, and the derivative of

*f*(

*g*(

*x*)) is determined by the chain rule. Therefore we have:

$f\text{'}(g(x))g\text{'}(x)\; =\; 1.$
To express

*f*′ as a function of an independent variable

*y*, we substitute

*f*(

*y*) for

*x* wherever it appears. Then we can solve for

*f*′.

$\backslash begin\{align\}\; f\text{'}(g(f(y)))g\text{'}(f(y))\; \&=\; 1\; \backslash \backslash \; f\text{'}(y)g\text{'}(f(y))\; \&=\; 1\; \backslash \backslash \; f\text{'}(y)\; =\; \backslash frac\{1\}\{g\text{'}(f(y))\}.\; \backslash end\{align\}$
For example, consider the function {{nowrap begin}}

*g*(

*x*) =

*e*^{x}{{nowrap end}}. It has an inverse which is denoted {{nowrap begin}}

*f*(

*y*) = ln

*y*{{nowrap end}}. Because {{nowrap begin}}

*g*′(

*x*) =

*e*^{x}{{nowrap end}}, the above formula says that

$\backslash frac\{d\}\{dy\}\backslash ln\; y\; =\; \backslash frac\{1\}\{e^\{\backslash ln\; y\}\}\; =\; \backslash frac\{1\}\{y\}.$
This formula is true whenever

*g* is differentiable and its inverse

*f* is also differentiable. This formula can fail when one of these conditions is not true. For example, consider {{nowrap begin}}

*g*(

*x*) =

*x*^{3}{{nowrap end}}. Its inverse is {{nowrap begin}}

*f*(

*y*) =

*y*^{1/3}{{nowrap end}}, which is not differentiable at zero. If we attempt to use the above formula to compute the derivative of

*f* at zero, then we must evaluate 1/

*g*′(

*f*(0)). {{nowrap begin}}

*f*(0) = 0{{nowrap end}} and {{nowrap begin}}

*g*′(0) = 0{{nowrap end}}, so we must evaluate 1/0, which is undefined. Therefore the formula fails in this case. This is not surprising because

*f* is not differentiable at zero.

### Higher derivatives

Faà di Bruno's formulaFaà di Bruno's formula is an identity in mathematics generalizing the chain rule to higher derivatives, named after , though he was not the first to state or prove the formula...

generalizes the chain rule to higher derivatives. The first few derivatives are

$\backslash frac\{d\; (f\; \backslash circ\; g)\; \}\{dx\}\; =\; \backslash frac\{df\}\{dg\}\backslash frac\{dg\}\{dx\}$$\backslash frac\{d^2\; (f\; \backslash circ\; g)\; \}\{d\; x^2\}\; =\; \backslash frac\{d^2\; f\}\{d\; g^2\}\backslash left(\backslash frac\{dg\}\{dx\}\backslash right)^2\; +\; \backslash frac\{df\}\{dg\}\backslash frac\{d^2\; g\}\{dx^2\}$$\backslash frac\{d^3\; (f\; \backslash circ\; g)\; \}\{d\; x^3\}\; =\; \backslash frac\{d^3\; f\}\{d\; g^3\}\; \backslash left(\backslash frac\{dg\}\{dx\}\backslash right)^3\; +\; 3\; \backslash frac\{d^2\; f\}\{d\; g^2\}\; \backslash frac\{dg\}\{dx\}\; \backslash frac\{d^2\; g\}\{d\; x^2\}\; +\; \backslash frac\{df\}\{dg\}\; \backslash frac\{d^3\; g\}\{d\; x^3\}$$\backslash frac\{d^4\; (f\; \backslash circ\; g)\; \}\{d\; x^4\}\; =\backslash frac\{d^4\; f\}\{dg^4\}\; \backslash left(\backslash frac\{dg\}\{dx\}\backslash right)^4\; +\; 6\; \backslash frac\{d^3\; f\}\{d\; g^3\}\; \backslash left(\backslash frac\{dg\}\{dx\}\backslash right)^2\; \backslash frac\{d^2\; g\}\{d\; x^2\}\; +\; \backslash frac\{d^2\; f\}\{d\; g^2\}\; \backslash left\backslash \{\; 4\; \backslash frac\{dg\}\{dx\}\; \backslash frac\{d^3\; g\}\{dx^3\}\; +\; 3\backslash left(\backslash frac\{d^2\; g\}\{dx^2\}\backslash right)^2\backslash right\backslash \}\; +\; \backslash frac\{df\}\{dg\}\backslash frac\{d^4\; g\}\{dx^4\}.$
#### First proof

One proof of the chain rule begins with the definition of the derivative:

$(f\; \backslash circ\; g)\text{'}(a)\; =\; \backslash lim\_\{x\; \backslash to\; a\}\; \backslash frac\{f(g(x))\; -\; f(g(a))\}\{x\; -\; a\}.$
Assume for the moment that

*g*(

*x*) does not equal

*g*(

*a*) for any

*x* near

*a*. Then the previous expression is equal to the product of two factors:

$\backslash lim\_\{x\; \backslash to\; a\}\; \backslash frac\{f(g(x))\; -\; f(g(a))\}\{g(x)\; -\; g(a)\}\; \backslash cdot\; \backslash frac\{g(x)\; -\; g(a)\}\{x\; -\; a\}.$
When

*g* oscillates near

*a*, then it might happen that no matter how close one gets to

*a*, there is always an even closer

*x* such that

*g*(

*x*) equals

*g*(

*a*). For example, this happens for {{nowrap|

*g*(

*x*) {{=}}

*x*^{2}sin(1 /

*x*)}} near the point {{nowrap|

*a* {{=}} 0}}. Whenever this happens, the above expression is undefined because it involves

division by zeroIn mathematics, division by zero is division where the divisor is zero. Such a division can be formally expressed as a / 0 where a is the dividend . Whether this expression can be assigned a well-defined value depends upon the mathematical setting...

. To work around this, introduce a function

*Q* as follows:

$Q(y)\; =\; \backslash begin\{cases\}\; \backslash frac\{f(y)\; -\; f(g(a))\}\{y\; -\; g(a)\},\; \&\; y\; \backslash neq\; g(a),\; \backslash \backslash \; f\text{'}(g(a)),\; \&\; y\; =\; g(a).\; \backslash end\{cases\}$
We will show that the difference quotient for {{nowrap|

*f* ∘

*g*}} is always equal to:

$Q(g(x))\; \backslash cdot\; \backslash frac\{g(x)\; -\; g(a)\}\{x\; -\; a\}.$
Whenever

*g*(

*x*) is not equal to

*g*(

*a*), this is clear because the factors of {{nowrap|

*g*(

*x*) -

*g*(

*a*)}} cancel. When

*g*(

*x*) equals

*g*(

*a*), then the difference quotient for {{nowrap|

*f* ∘

*g*}} is zero because

*f*(

*g*(

*x*)) equals

*f*(

*g*(

*a*)), and the above product is zero because it equals

*f*′(

*g*(

*a*)) times zero. So the above product is always equal to the difference quotient, and to show that the derivative of {{nowrap|

*f* ∘

*g*}} at

*a* exists and to determine its value, we need only show that the limit as

*x* goes to

*a* of the above product exists and determine its value.
To do this, recall that the limit of a product exists if the limits of its factors exist. When this happens, the limit of the product of these two factors will equal the product of the limits of the factors. The two factors are

*Q*(

*g*(

*x*)) and {{nowrap|(

*g*(

*x*) -

*g*(

*a*)) / (

*x* -

*a*)}}. The latter is the difference quotient for

*g* at

*a*, and because

*g* is differentiable at

*a* by assumption, its limit as

*x* tends to

*a* exists and equals

*g*′(

*a*).
It remains to study

*Q*(

*g*(

*x*)).

*Q* is defined wherever

*f* is. Furthermore, because

*f* is differentiable at

*g*(

*a*) by assumption,

*Q* is continuous at

*g*(

*a*).

*g* is continuous at

*a* because it is differentiable at

*a*, and therefore {{nowrap|

*Q* ∘

*g*}} is continuous at

*a*. So its limit as

*x* goes to

*a* exists and equals

*Q*(

*g*(

*a*)), which is

*f*′(

*g*(

*a*)).
This shows that the limits of both factors exist and that they equal

*f*′(

*g*(

*a*)) and

*g*′(

*a*), respectively. Therefore the derivative of {{nowrap|

*f* ∘

*g*}} at

*a* exists and equals

*f*′(

*g*(

*a*))

*g*′(

*a*).

#### Second proof

Another way of proving the chain rule is to measure the error in the linear approximation determined by the derivative. This proof has the advantage that it generalizes to several variables. It relies on the following equivalent definition of differentiability at a point: A function

*g* is differentiable at

*a* if there exists a real number

*g*′(

*a*) and a function ε(

*h*) that tends to zero as

*h* tends to zero, and furthermore

$g(a\; +\; h)\; -\; g(a)\; =\; g\text{'}(a)h\; +\; \backslash varepsilon(h)h.\backslash ,$
Here the left-hand side represents the true difference between the value of

*g* at

*a* and at {{nowrap|

*a* +

*h*}}, whereas the right-hand side represents the approximation determined by the derivative plus an error term.
In the situation of the chain rule, such a function ε exists because

*g* is assumed to be differentiable at

*a*. Again by assumption, a similar function also exists for

*f* at

*g*(

*a*). Calling this function η, we have

$f(g(a)\; +\; k)\; -\; f(g(a))\; =\; f\text{'}(g(a))k\; +\; \backslash eta(k)k.\backslash ,$
The above definition imposes no constraints on η(0), even though it is assumed that η(

*k*) tends to zero as

*k* tends to zero. If we set {{nowrap|η(0) {{=}} 0}}, then η is continuous at 0.
Proving the theorem requires studying the difference {{nowrap|

*f*(

*g*(

*a* +

*h*)) −

*f*(

*g*(

*a*))}} as

*h* tends to zero. The first step is to substitute for {{nowrap|

*g*(

*a* +

*h*)}} using the definition of differentiability of

*g* at

*a*:

$f(g(a\; +\; h))\; -\; f(g(a))\; =\; f(g(a)\; +\; g\text{'}(a)h\; +\; \backslash varepsilon(h)h)\; -\; f(g(a)).$
The next step is to use the definition of differentiability of

*f* at

*g*(

*a*). This requires a term of the form {{nowrap|

*f*(

*g*(

*a*) +

*k*)}} for some

*k*. In the above equation, the correct

*k* varies with

*h*. Set {{nowrap|

*k*_{h} {{=}}

*g*′(

*a*)

*h* + ε(

*h*)

*h*}} and the right hand side becomes {{nowrap|

*f*(

*g*(

*a*) +

*k*_{h}) −

*f*(

*g*(

*a*))}}. Applying the definition of the derivative gives:

$f(g(a)\; +\; k\_h)\; -\; f(g(a))\; =\; f\text{'}(g(a))k\_h\; +\; \backslash eta(k\_h)k\_h.\backslash ,$
To study the behavior of this expression as

*h* tends to zero, expand

*k*_{h}. After regrouping the terms, the right-hand side becomes:

$f\text{'}(g(a))g\text{'}(a)h\; +\; [f\text{'}(g(a))\backslash varepsilon(h)\; +\; \backslash eta(k\_h)g\text{'}(a)\; +\; \backslash eta(k\_h)\backslash varepsilon(h)]h.\backslash ,$
Because

$\backslash varepsilon(h)$ and

$\backslash eta(k\_h)$ tend to zero as

*h* tends to zero, the bracketed terms tend to zero as

*h* tends to zero. Because the above expression is equal to the difference {{nowrap|

*f*(

*g*(

*a* +

*h*)) −

*f*(

*g*(

*a*))}}, by the definition of the derivative {{nowrap|

*f* ∘

*g*}} is differentiable at

*a* and its derivative is

*f*′(

*g*(

*a*))

*g*′(

*a*).
The role of

*Q* in the first proof is played by η in this proof. They are related by the equation:

$Q(y)\; =\; f\text{'}(g(a))\; +\; \backslash eta(y\; -\; g(a)).\; \backslash ,$
The need to define

*Q* at

*g*(

*a*) is analogous to the need to define η at zero. However, the proofs are not exactly equivalent. The first proof relies on a theorem about products of limits to show that the derivative exists. The second proof does not need this because showing that the error term vanishes proves the existence of the limit directly.

## The chain rule in higher dimensions

The simplest generalization of the chain rule to higher dimensions uses the

total derivativeIn the mathematical field of differential calculus, the term total derivative has a number of closely related meanings.The total derivative of a function f, of several variables, e.g., t, x, y, etc., with respect to one of its input variables, e.g., t, is different from the partial derivative...

. The total derivative is a linear transformation that captures how the function changes in all directions. Let {{nowrap|

*f* :

**R**^{m} →

**R**^{k}}} and {{nowrap|

*g* :

**R**^{n} →

**R**^{m}}} be differentiable functions, and let

*D* be the total derivative operator. If

**a** is a point in

**R**^{n}, then the higher dimensional chain rule says that:

$D\_\{\backslash mathbf\{a\}\}(f\; \backslash circ\; g)\; =\; D\_\{g(\backslash mathbf\{a\})\}f\; \backslash circ\; D\_\{\backslash mathbf\{a\}\}g,$
or for short,

$D(f\; \backslash circ\; g)\; =\; Df\; \backslash circ\; Dg.$
In terms of Jacobian matrices, the rule says

$J\_\{\backslash mathbf\{a\}\}(f\; \backslash circ\; g)\; =\; J\_\{g(\backslash mathbf\{a\})\}(f)J\_\{\backslash mathbf\{a\}\}(g),$
That is, the Jacobian of the composite function is the product of the Jacobians of the composed functions. The higher-dimensional chain rule can be proved using a technique similar to the second proof given above.
The higher-dimensional chain rule is a generalization of the one-dimensional chain rule. If

*k*,

*m*, and

*n* are 1, so that {{nowrap|

*f* :

**R** →

**R**}} and {{nowrap|

*g* :

**R** →

**R**}}, then the Jacobian matrices of

*f* and

*g* are {{nowrap|1 × 1}}. Specifically, they are:

$\backslash begin\{align\}\; J\_a(g)\; \&=\; \backslash begin\{pmatrix\}\; g\text{'}(a)\; \backslash end\{pmatrix\},\; \backslash \backslash \; J\_\{g(a)\}(f)\; \&=\; \backslash begin\{pmatrix\}\; f\text{'}(g(a))\; \backslash end\{pmatrix\}.\; \backslash end\{align\}$
The Jacobian of

*f* ∘

*g* is the product of these {{nowrap|1 × 1}} matrices, so it is {{nowrap|

*f*′(

*g*(

*a*))

*g*′(

*a*)}}, as expected from the one-dimensional chain rule. In the language of linear transformations,

*D*_{a}(

*g*) is the function which scales a vector by a factor of

*g*′(

*a*) and

*D*_{g(a)}(

*f*) is the function which scales a vector by a factor of

*f*′(

*g*(

*a*)). The chain rule says that the composite of these two linear transformations is the linear transformation {{nowrap|

*D*_{a}(

*f* ∘

*g*)}}, and therefore it is the function that scales a vector by

*f*′(

*g*(

*a*))

*g*′(

*a*).
Another way of writing the chain rule is used when

*f* and

*g* are expressed in terms of their components as {{nowrap begin}}

**y** =

*f*(

**u**) = (

*f*_{1}(

**u**), ...,

*f*_{k}(

**u**)){{nowrap end}} and {{nowrap begin}}

**u** =

*g*(

**x**) = (

*g*_{1}(

**x**), ...,

*g*_{m}(

**x**)){{nowrap end}}. In this case, the above rule for Jacobian matrices is usually written as:

$\backslash frac\{\backslash partial(f\_1,\; \backslash ldots,\; f\_k)\}\{\backslash partial(x\_1,\; \backslash ldots,\; x\_n)\}\; =\; \backslash frac\{\backslash partial(f\_1,\; \backslash ldots,\; f\_k)\}\{\backslash partial(u\_1,\; \backslash ldots,\; u\_m)\}\backslash frac\{\backslash partial(g\_1,\; \backslash ldots,\; g\_m)\}\{\backslash partial(x\_1,\; \backslash ldots,\; x\_n)\}.$
The chain rule for total derivatives implies a chain rule for partial derivatives. Recall that when the total derivative exists, the partial derivative in the

*i*th coordinate direction is found by multiplying the Jacobian matrix by the

*i*th basis vector. By doing this to the formula above, we find:

$\backslash frac\{\backslash partial(f\_1,\; \backslash ldots,\; f\_k)\}\{\backslash partial\; x\_i\}\; =\; \backslash frac\{\backslash partial(f\_1,\; \backslash ldots,\; f\_k)\}\{\backslash partial(u\_1,\; \backslash ldots,\; u\_m)\}\backslash frac\{\backslash partial(g\_1,\; \backslash ldots,\; g\_m)\}\{\backslash partial\; x\_i\}.$
Since the entries of the Jacobian matrix are partial derivatives, we may simplify the above formula to get:

$\backslash frac\{\backslash partial(f\_1,\; \backslash ldots,\; f\_k)\}\{\backslash partial\; x\_i\}\; =\; \backslash sum\_\{\backslash ell\; =\; 1\}^m\; \backslash frac\{\backslash partial(f\_1,\; \backslash ldots,\; f\_k)\}\{\backslash partial\; u\_\backslash ell\}\backslash frac\{\backslash partial\; g\_\backslash ell\}\{\backslash partial\; x\_i\}.$
More conceptually, this rule expresses the fact that a change in the

*x*_{i} direction may change all of

*g*_{1} through

*g*_{k}, and any of these changes may affect

*f*.
In the special case where {{nowrap begin}}

*k* = 1{{nowrap end}}, so that

*f* is a real-valued function, then this formula simplifies even further:

$\backslash frac\{\backslash partial\; f\}\{\backslash partial\; x\_i\}\; =\; \backslash sum\_\{\backslash ell\; =\; 1\}^m\; \backslash frac\{\backslash partial\; f\}\{\backslash partial\; u\_\backslash ell\}\backslash frac\{\backslash partial\; g\_\backslash ell\}\{\backslash partial\; x\_i\}.$
### Example

Given

$\backslash ,u\; =\; x^2\; +\; 2y$ where

$\backslash ,x\; =\; r\backslash sin(t)$ and

$\backslash ,y\; =\; \backslash sin^2(t)$, determine the value of

$\backslash frac\{\backslash partial\; u\}\{\backslash partial\; r\}$ and

$\backslash frac\{\backslash partial\; u\}\{\backslash partial\; t\}$ using the chain rule.

$\backslash frac\{\backslash partial\; u\}\{\backslash partial\; r\}=\backslash frac\{\backslash partial\; u\}\{\backslash partial\; x\}\backslash frac\{\backslash partial\; x\}\{\backslash partial\; r\}+\backslash frac\{\backslash partial\; u\}\{\backslash partial\; y\}\backslash frac\{\backslash partial\; y\}\{\backslash partial\; r\}\; =\; \backslash left(2x\backslash right)\backslash left(\backslash sin(t)\backslash right)+\backslash left(2\backslash right)\backslash left(0\backslash right)=2r\backslash sin^2(t)$
and

$\backslash frac\{\backslash partial\; u\}\{\backslash partial\; t\}=\backslash frac\{\backslash partial\; u\}\{\backslash partial\; x\}\backslash frac\{\backslash partial\; x\}\{\backslash partial\; t\}+\backslash frac\{\backslash partial\; u\}\{\backslash partial\; y\}\backslash frac\{\backslash partial\; y\}\{\backslash partial\; t\}\; =\; \backslash left(2x\backslash right)\backslash left(r\backslash cos(t)\backslash right)+\backslash left(2\backslash right)\backslash left(2\backslash sin(t)\backslash cos(t)\backslash right)$$=\; 2\backslash left(r\backslash sin(t)\backslash right)r\backslash cos(t)+4\backslash sin(t)\backslash cos(t)\; =\; 2\backslash left(r^2+2\backslash right)\backslash sin(t)\backslash cos(t).$
### Higher derivatives of multivariable functions

{{main|Faà di Bruno's formula#Multivariate version}}
Faà di Bruno's formula for higher-order derivatives of single-variable functions generalizes to the multivariable case. If

*f* is a function of {{nowrap|

*u* {{=}}

*g*(

*x*)}} as above, then the second derivative of {{nowrap|

*f* ∘

*g*}} is:

$\backslash frac\{\backslash partial^2\; (f\; \backslash circ\; g)\}\{\backslash partial\; x\_i\; \backslash partial\; x\_j\}\; =\; \backslash sum\_k\; \backslash frac\{\backslash partial\; f\}\{\backslash partial\; u\_k\}\backslash frac\{\backslash partial^2\; g\_k\}\{\backslash partial\; x\_i\; \backslash partial\; x\_j\}\; +\; \backslash sum\_\{k,\; \backslash ell\}\; \backslash frac\{\backslash partial^2\; f\}\{\backslash partial\; u\_k\; \backslash partial\; u\_\backslash ell\}\backslash frac\{\backslash partial\; g\_k\}\{\backslash partial\; x\_i\}\backslash frac\{\backslash partial\; g\_\backslash ell\}\{\backslash partial\; x\_j\}.$
## Further generalizations

All extensions of calculus have a chain rule. In most of these, the formula remains the same, though the meaning of that formula may be vastly different.
One generalization is to

manifoldIn mathematics , a manifold is a topological space that on a small enough scale resembles the Euclidean space of a specific dimension, called the dimension of the manifold....

s. In this situation, the chain rule represents the fact that the derivative of {{nowrap|

*f* ∘

*g*}} is the composite of the derivative of

*f* and the derivative of

*g*. This theorem is an immediate consequence of the higher dimensional chain rule given above, and it has exactly the same formula.
The chain rule is also valid for

Fréchet derivativeIn mathematics, the Fréchet derivative is a derivative defined on Banach spaces. Named after Maurice Fréchet, it is commonly used to formalize the concept of the functional derivative used widely in the calculus of variations. Intuitively, it generalizes the idea of linear approximation from...

s in

Banach spaceIn mathematics, Banach spaces is the name for complete normed vector spaces, one of the central objects of study in functional analysis. A complete normed vector space is a vector space V with a norm ||·|| such that every Cauchy sequence in V has a limit in V In mathematics, Banach spaces is the...

s. The same formula holds as before. This case and the previous one admit a simultaneous generalization to

Banach manifoldIn mathematics, a Banach manifold is a manifold modeled on Banach spaces. Thus it is a topological space in which each point has a neighbourhood homeomorphic to an open set in a Banach space...

s.
In

abstract algebraAbstract algebra is the subject area of mathematics that studies algebraic structures, such as groups, rings, fields, modules, vector spaces, and algebras...

, the derivative is interpreted as a morphism of modules of

Kähler differentialIn mathematics, Kähler differentials provide an adaptation of differential forms to arbitrary commutative rings or schemes.-Presentation:The idea was introduced by Erich Kähler in the 1930s...

s. A

ring homomorphismIn ring theory or abstract algebra, a ring homomorphism is a function between two rings which respects the operations of addition and multiplication....

of

commutative ringIn ring theory, a branch of abstract algebra, a commutative ring is a ring in which the multiplication operation is commutative. The study of commutative rings is called commutative algebra....

s {{nowrap|

*f* :

*R* →

*S*}} determines a morphism of Kähler differentials {{nowrap|

*Df* : Ω

_{R} → Ω

_{S}}} which sends an element

*dr* to

*d*(

*f*(

*r*)), the exterior differential of

*f*(

*r*). The formula {{nowrap begin}}

*D*(

*f* ∘

*g*) =

*Df* ∘

*Dg*{{nowrap end}} holds in this context as well.
The common feature of these examples is that they are expressions of the idea that the derivative is part of a

functorIn category theory, a branch of mathematics, a functor is a special type of mapping between categories. Functors can be thought of as homomorphisms between categories, or morphisms when in the category of small categories....

. A functor is an operation on spaces and functions between them. It associates to each space a new space and to each function between two spaces a new function between the corresponding new spaces. In each of the above cases, the functor sends each space to its

tangent bundleIn differential geometry, the tangent bundle of a differentiable manifold M is the disjoint unionThe disjoint union assures that for any two points x1 and x2 of manifold M the tangent spaces T1 and T2 have no common vector...

and it sends each function to its derivative. There is one requirement for such an operation to be a functor, namely that the derivative of a composite is the composite of the derivatives. This is exactly the formula {{nowrap begin}}

*D*(

*f* ∘

*g*) =

*Df* ∘

*Dg*{{nowrap end}}.
There are also chain rules in

stochastic calculusStochastic calculus is a branch of mathematics that operates on stochastic processes. It allows a consistent theory of integration to be defined for integrals of stochastic processes with respect to stochastic processes...

. One of these,

Itō's lemmaIn mathematics, Itō's lemma is used in Itō stochastic calculus to find the differential of a function of a particular type of stochastic process. It is named after its discoverer, Kiyoshi Itō...

, expresses the composite of an Itō process (or more generally a

semimartingaleIn probability theory, a real valued process X is called a semimartingale if it can be decomposed as the sum of a local martingale and an adapted finite-variation process....

)

*dX*_{t} with a twice-differentiable function

*f*. In Itō's lemma, the derivative of the composite function depends not only on

*dX*_{t} and the derivative of

*f* but also on the second derivative of

*f*. The dependence on the second derivative is a consequence of the non-zero

quadratic variationIn mathematics, quadratic variation is used in the analysis of stochastic processes such as Brownian motion and martingales. Quadratic variation is just one kind of variation of a process.- Definition :...

of the stochastic process, which broadly speaking means that the process can move up and down in a very rough way. This variant of the chain rule is not an example
of a functor because the two functions being composed are of different types.

## See also

NEWLINE

NEWLINE- Integration by substitution
In calculus, integration by substitution is a method for finding antiderivatives and integrals. Using the fundamental theorem of calculus often requires finding an antiderivative. For this and other reasons, integration by substitution is an important tool for mathematicians...

NEWLINE- Quotient rule
NEWLINE- Triple product rule
The triple product rule, known variously as the cyclic chain rule, cyclic relation, cyclical rule or Euler's chain rule, is a formula which relates partial derivatives of three interdependent variables...

NEWLINE- Leibniz integral rule
In mathematics, Leibniz's rule for differentiation under the integral sign, named after Gottfried Leibniz, tells us that if we have an integral of the formthen for x \in the derivative of this integral is thus expressible...

NEWLINE

## External links

NEWLINE

NEWLINE- Khan Academy
The Khan Academy is a not-for-profit educational organization, created in 2006 by Bangladeshi American educator Salman Khan, a graduate of MIT. With the stated mission of "providing a high quality education to anyone, anywhere", the website supplies a free online collection of more than 2,700 micro...

Lesson 1 Lesson 3 NEWLINE- http://calculusapplets.com/chainrule.html

NEWLINE
{{DEFAULTSORT:Chain Rule}}