Loss function - AbsoluteAstronomy.com

Statistics

Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

and decision theory

Decision theory

Decision theory in economics, psychology, philosophy, mathematics, and statistics is concerned with identifying the values, uncertainties and other issues relevant in a given decision, its rationality, and the resulting optimal decision...

a loss function is a function that maps an event

Event (probability theory)

In probability theory, an event is a set of outcomes to which a probability is assigned. Typically, when the sample space is finite, any subset of the sample space is an event...

onto a real number

Real number

In mathematics, a real number is a value that represents a quantity along a continuum, such as -5 , 4/3 , 8.6 , √2 and π...

intuitively representing some "cost" associated with the event. Typically it is used for parameter estimation, and the event in question is some function of the difference between estimated and true values for an instance of data. In the context of economics

Economics

Economics is the social science that analyzes the production, distribution, and consumption of goods and services. The term economics comes from the Ancient Greek from + , hence "rules of the house"...

, for example, this is usually economic cost

Economic cost

The economic cost of a decision depends on both the cost of the alternative chosen and the benefit that the best alternative would have provided if chosen. Economic cost differs from accounting cost because it includes opportunity cost....

or regret

Regret (decision theory)

Regret is defined as the difference between the actual payoff and the payoff that would have been obtained if a different course of action had been chosen. This is also called difference regret...

. In Machine Learning

Machine learning

Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

, it is the penalty for an incorrect classification of an example.

Definition

Formally, we begin by considering some family of distributions for a random variable

Random variable

In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

X, that is indexed by some θ.

More intuitively, we can think of X as our "data", perhaps

, where

i.i.d. The X is the set of things the decision rule

Decision rule

In decision theory, a decision rule is a function which maps an observation to an appropriate action. Decision rules play an important role in the theory of statistics and economics, and are closely related to the concept of a strategy in game theory....

will be making decisions on. There exists some number of possible ways

to model our data X, which our decision function can use to make decisions. For a finite number of models, we can thus think of θ as the index to this family of probability models. For an infinite family of models, it is a set of parameters to the family of distributions.

On a more practical note, it is important to understand that, while it is tempting to think of loss functions as necessarily parametric (since they seem to take θ as a "parameter"), the fact that θ is non-finite-dimensional is completely incompatible with this notion; for example, if the family of probability functions is uncountably infinite, θ indexes an uncountably infinite space.

From here, given a set A of possible actions, a decision rule
Decision rule
In decision theory, a decision rule is a function which maps an observation to an appropriate action. Decision rules play an important role in the theory of statistics and economics, and are closely related to the concept of a strategy in game theory....

is a function δ :

→ A.

A loss function is a real lower-bounded function L on Θ × A for some θ ∈ Θ. The value L(θ, δ(X)) is the cost of action δ(X) under parameter θ.

Decision rules

A decision rule makes a choice using an optimality criterion. Some commonly used criteria are:

Minimax
Minimax
Minimax is a decision rule used in decision theory, game theory, statistics and philosophy for minimizing the possible loss for a worst case scenario. Alternatively, it can be thought of as maximizing the minimum gain...

: Choose the decision rule with the lowest worst loss — that is, minimize the worst-case (maximum possible) loss:

Invariance
Invariant estimator
In statistics, the concept of being an invariant estimator is a criterion that can be used to compare the properties of different estimators for the same quantity. It is a way of formalising the idea that an estimator should have certain intuitively appealing qualities...

: Choose the optimal decision rule which satisfies an invariance requirement.
Minimize the expected value
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

of the loss function.

Expected loss

The value of the loss function itself is a random quantity because it depends on the outcome of a random variable X. Both frequentist and Bayesian

Bayesian probability

Bayesian probability is one of the different interpretations of the concept of probability and belongs to the category of evidential probabilities. The Bayesian interpretation of probability can be seen as an extension of logic that enables reasoning with propositions, whose truth or falsity is...

statistical theory involve making a decision based on the expected value

Expected value

In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

of the loss function: however this quantity is defined differently under the two paradigms.

Frequentist risk

The expected loss in the frequentist context is obtained by taking the expected value with respect to the probability distribution, P_θ, of the observed data, X. This is also referred to as the risk function of the decision rule δ and the parameter θ. Here the decision rule depends on the outcome of X. The risk function is given by

Bayesian expected loss

In a Bayesian approach, the expectation is calculated using the posterior distribution π^* of the parameter θ:

.

One then should choose the action a^* which minimises the expected loss. Although this will result in choosing the same action as would be chosen using the Bayes risk, the emphasis of the Bayesian approach is that one is only interested in choosing the optimal action under the actual observed data, whereas choosing the actual Bayes optimal decision rule, which is a function of all possible observations, is a much more difficult problem.

Selecting a loss function

Sound statistical practice requires selecting an estimator consistent with the actual loss experienced in the context of a particular applied problem. Thus, in the applied use of loss functions, selecting which statistical method to use to model an applied problem depends on knowing the losses that will be experienced from being wrong under the problem's particular circumstances, which results in the introduction of an element of teleology

Teleology

A teleology is any philosophical account which holds that final causes exist in nature, meaning that design and purpose analogous to that found in human actions are inherent also in the rest of nature. The word comes from the Greek τέλος, telos; root: τελε-, "end, purpose...

into problems of scientific decision-making.

A common example involves estimating "location

Location parameter

In statistics, a location family is a class of probability distributions that is parametrized by a scalar- or vector-valued parameter μ, which determines the "location" or shift of the distribution...

." Under typical statistical assumptions, the mean

Mean

In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

or average is the statistic for estimating location that minimizes the expected loss experienced under the Taguchi

Taguchi methods

Taguchi methods are statistical methods developed by Genichi Taguchi to improve the quality of manufactured goods, and more recently also applied to, engineering, biotechnology, marketing and advertising...

or squared-error

Least squares

The method of least squares is a standard approach to the approximate solution of overdetermined systems, i.e., sets of equations in which there are more equations than unknowns. "Least squares" means that the overall solution minimizes the sum of the squares of the errors made in solving every...

loss function, while the median

Median

In probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...

is the estimator that minimizes expected loss experienced under the absolute-difference loss function. Still different estimators would be optimal under other, less common circumstances.

In economics, when an agent is risk neutral

Risk neutral

In economics and finance, risk neutral behavior is between risk aversion and risk seeking. If offered either €50 or a 50% chance of each of €100 and nothing, a risk neutral person would have no preference between the two options...

, the loss function is simply expressed in monetary terms, such as profit, income, or end-of-period wealth.

But for risk averse (or risk-loving

Risk-loving

In economics and finance, a risk lover is a person who has a preference for risk. While most investors are considered risk averse, one could view casino goers as risk loving...

) agents, loss is measured as the negative of a utility function

Utility

In economics, utility is a measure of customer satisfaction, referring to the total satisfaction received by a consumer from consuming a good or service....

, which represents satisfaction and is usually interpreted in ordinal

Ordinal utility

Ordinal utility theory states that while the utility of a particular good or service cannot be measured using a numerical scale bearing economic meaning in and of itself, pairs of alternative bundles of goods can be ordered such that one is considered by an individual to be worse than, equal to,...

terms rather than in cardinal

Cardinal utility

In economics, cardinal utility refers to a property of mathematical indices that preserve preference orderings uniquely up to positive linear transformations...

(absolute) terms.

Other measures of cost are possible, for example mortality

Death

Death is the permanent termination of the biological functions that sustain a living organism. Phenomena which commonly bring about death include old age, predation, malnutrition, disease, and accidents or trauma resulting in terminal injury....

or morbidity in the field of public health

Public health

Public health is "the science and art of preventing disease, prolonging life and promoting health through the organized efforts and informed choices of society, organizations, public and private, communities and individuals" . It is concerned with threats to health based on population health...

or safety engineering

Safety engineering

Safety engineering is an applied science strongly related to systems engineering / industrial engineering and the subset System Safety Engineering...

.

For most optimization algorithms, it is desirable to have a loss function that is globally continuous and differentiable.

Two very commonly-used loss functions are the squared loss

Mean squared error

In statistics, the mean squared error of an estimator is one of many ways to quantify the difference between values implied by a kernel density estimator and the true values of the quantity being estimated. MSE is a risk function, corresponding to the expected value of the squared error loss or...

, and the absolute loss

Absolute deviation

In statistics, the absolute deviation of an element of a data set is the absolute difference between that element and a given point. Typically the point from which the deviation is measured is a measure of central tendency, most often the median or sometimes the mean of the data set.D_i = |x_i-m|...

. However the absolute loss has the disadvantage that it is not differentiable around

. The squared loss has the disadvantage that it has the tendency to be dominated by outliers---when summing over a set of

's (as in

), the final sum tends to be the result of a few particularly-large a-values, rather than an expression of the average a-value.

Loss functions in Bayesian statistics

One of the consequences of Bayesian inference

Bayesian inference

In statistics, Bayesian inference is a method of statistical inference. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection...

is that in addition to experimental data, the loss function does not in itself wholly determine a decision. What is important is the relationship between the loss function and the prior probability

Prior probability

In Bayesian statistical inference, a prior probability distribution, often called simply the prior, of an uncertain quantity p is the probability distribution that would express one's uncertainty about p before the "data"...

. So it is possible to have two different loss functions which lead to the same decision when the prior probability distribution

Probability distribution

In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

s associated with each compensate for the details of each loss function.

Combining the three elements of the prior probability, the data, and the loss function then allows decisions to be based on maximizing the subjective expected utility

Subjective expected utility

Subjective expected utility is a method in decision theory in the presence of risk, promoted by L. J. Savage in 1954 following previous work by Ramsey and von Neumann...

, a concept introduced by Leonard J. Savage.

Regret

Savage also argued that using non-Bayesian methods such as minimax

Minimax

Minimax is a decision rule used in decision theory, game theory, statistics and philosophy for minimizing the possible loss for a worst case scenario. Alternatively, it can be thought of as maximizing the minimum gain...

, the loss function should be based on the idea of regret
Regret (decision theory)
Regret is defined as the difference between the actual payoff and the payoff that would have been obtained if a different course of action had been chosen. This is also called difference regret...

, i.e., the loss associated with a decision should be the difference between the consequences of the best decision that could have been taken had the underlying circumstances been known and the decision that was in fact taken before they were known.

Quadratic loss function

The use of a quadratic loss function is common, for example when using least squares

Least squares

techniques or Taguchi methods

Taguchi methods

. It is often more mathematically tractable than other loss functions because of the properties of variance

Variance

In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

s, as well as being symmetric: an error above the target causes the same loss as the same magnitude of error below the target. If the target is t, then a quadratic loss function is

for some constant C; the value of the constant makes no difference to a decision, and can be ignored by setting it equal to 1.

Many common statistics, including t-tests, regression

Regression analysis

In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...

models, design of experiments

Design of experiments

In general usage, design of experiments or experimental design is the design of any information-gathering exercises where variation is present, whether under the full control of the experimenter or not. However, in statistics, these terms are usually used for controlled experiments...

, and much else, use least squares

Least squares

Linear models theory, which is based on the Taguchi loss function.

The quadratic loss function is also used in linear-quadratic optimal control problems

Linear-quadratic regulator

The theory of optimal control is concerned with operating a dynamic system at minimum cost. The case where the system dynamics are described by a set of linear differential equations and the cost is described by a quadratic functional is called the LQ problem...

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.