Hamilton-Jacobi-Bellman equation - AbsoluteAstronomy.com

The Hamilton–Jacobi–Bellman (HJB) equation is a partial differential equation

Partial differential equation

In mathematics, partial differential equations are a type of differential equation, i.e., a relation involving an unknown function of several independent variables and their partial derivatives with respect to those variables...

which is central to optimal control

Optimal control

Optimal control theory, an extension of the calculus of variations, is a mathematical optimization method for deriving control policies. The method is largely due to the work of Lev Pontryagin and his collaborators in the Soviet Union and Richard Bellman in the United States.-General method:Optimal...

theory. The solution of the HJB equation is the 'value function', which gives the optimal cost-to-go for a given dynamical system

Dynamical system

A dynamical system is a concept in mathematics where a fixed rule describes the time dependence of a point in a geometrical space. Examples include the mathematical models that describe the swinging of a clock pendulum, the flow of water in a pipe, and the number of fish each springtime in a...

with an associated cost function.

When solved locally, the HJB is a necessary condition, but when solved over the whole of state space, the HJB equation is a necessary and sufficient condition for an optimum. The solution is open loop, but it also permits the solution of the closed loop problem. The HJB method can be generalized to stochastic

Stochastic

Stochastic refers to systems whose behaviour is intrinsically non-deterministic. A stochastic process is one whose behavior is non-deterministic, in that a system's subsequent state is determined both by the process's predictable actions and by a random element. However, according to M. Kac and E...

systems as well.

Classical variational problems, for example, the brachistochrone problem can be solved using this method.

The equation is a result of the theory of dynamic programming

Dynamic programming

In mathematics and computer science, dynamic programming is a method for solving complex problems by breaking them down into simpler subproblems. It is applicable to problems exhibiting the properties of overlapping subproblems which are only slightly smaller and optimal substructure...

which was pioneered in the 1950s by Richard Bellman

Richard Bellman

Richard Ernest Bellman was an American applied mathematician, celebrated for his invention of dynamic programming in 1953, and important contributions in other fields of mathematics.-Biography:...

and coworkers. The corresponding discrete-time equation is usually referred to as the Bellman equation

Bellman equation

A Bellman equation , named after its discoverer, Richard Bellman, is a necessary condition for optimality associated with the mathematical optimization method known as dynamic programming...

. In continuous time, the result can be seen as an extension of earlier work in classical physics

Classical physics

What "classical physics" refers to depends on the context. When discussing special relativity, it refers to the Newtonian physics which preceded relativity, i.e. the branches of physics based on principles developed before the rise of relativity and quantum mechanics...

on the Hamilton-Jacobi equation by William Rowan Hamilton

William Rowan Hamilton

Sir William Rowan Hamilton was an Irish physicist, astronomer, and mathematician, who made important contributions to classical mechanics, optics, and algebra. His studies of mechanical and optical systems led him to discover new mathematical concepts and techniques...

and Carl Gustav Jacob Jacobi.

Optimal control problems

Consider the following problem in deterministic optimal control over the time period

where C[] is the scalar cost rate function and D[] is a function that gives the economic value or utility at the final state, x(t) is the system state vector, x(0) is assumed given, and u(t) for 0 ≤ t ≤ T is the control vector that we are trying to find.

The system must also be subject to

where F[] gives the vector determining physical evolution of the state vector over time.

The partial differential equation

For this simple system, the Hamilton Jacobi Bellman partial differential equation is

subject to the terminal condition

where the

means the dot product

Dot product

In mathematics, the dot product or scalar product is an algebraic operation that takes two equal-length sequences of numbers and returns a single number obtained by multiplying corresponding entries and then summing those products...

of the vectors a and b and

is the gradient

Gradient

In vector calculus, the gradient of a scalar field is a vector field that points in the direction of the greatest rate of increase of the scalar field, and whose magnitude is the greatest rate of change....

operator.

The unknown scalar

in the above PDE is the Bellman 'value function', which represents the cost incurred from starting in state

at time

and controlling the system optimally from then until time

Deriving the equation

Intuitively HJB can be "derived" as follows. If

is the optimal cost-to-go function (also called the 'value function'), then by Richard Bellman's principle of optimality, going from time t to t + dt, we have

Note that the Taylor expansion of the last term is

where o(dt²) denotes the terms in the Taylor expansion of higher order than one. Then if we cancel V(x(t), t) on both sides, divide by dt, and take the limit as dt approaches zero, we obtain the HJB equation defined above.

Solving the equation

The HJB equation is usually solved backwards in time

Backward induction

Backward induction is the process of reasoning backwards in time, from the end of a problem or situation, to determine a sequence of optimal actions. It proceeds by first considering the last time a decision might be made and choosing what to do in any situation at that time. Using this...

, starting from

and ending at

.

When solved over the whole of state space, the HJB equation is a necessary and sufficient condition for an optimum. If we can solve for

then we can find from it a control

that achieves the minimum cost.

In general case, the HJB equation does not have a classical (smooth) solution. Several notions of generalized solutions have been developed to cover such situations, including viscosity solution

Viscosity solution

In mathematics, the viscosity solution concept was introduced in the early 1980s by Pierre-Louis Lions and Michael Crandall as a generalization of the classical concept of what is meant by a 'solution' to a partial differential equation...

(Pierre-Louis Lions

Pierre-Louis Lions

Pierre-Louis Lions is a French mathematician. His parents were Jacques-Louis Lions, a mathematician and at that time professor at the University of Nancy, who in particular became President of the International Mathematical Union, and Andrée Olivier, his wife...

and Michael Crandall), minimax solution (Andrei Izmailovich Subbotin), and others.

Extension to stochastic problems

The idea of solving a control problem by applying Bellman's principle of optimality and then working out backwards in time an optimizing strategy can be generalized to stochastic control problems. Consider similar as above

now with

the stochastic process to optimize and

the steering. By first using Bellman and then expanding

with Itô's rule, one finds the deterministic HJB equation

where

represents the stochastic differentiation operator, and subject to the terminal condition

Note, that the randomness has disappeared. In this case a solution

of the latter does not necessarily solve the primal problem, it is a candidate only and a further verifying argument is required. This technique is widely used in Financial Mathematics to determine optimal investment strategies in the market (see for example Merton's portfolio problem

Merton's portfolio problem

Merton's Portfolio Problem is a well known problem in continuous-time finance. An investor with a finite lifetime must choose how much to consume and must allocate his wealth between stocks and a risk-free asset so as to maximize expected lifetime utility. The problem was formulated and solved by...

Optimal control problems

The partial differential equation

Deriving the equation

Solving the equation

Extension to stochastic problems

See also