Minimax estimator
Encyclopedia
In statistical decision theory
Decision theory
Decision theory in economics, psychology, philosophy, mathematics, and statistics is concerned with identifying the values, uncertainties and other issues relevant in a given decision, its rationality, and the resulting optimal decision...

, where we are faced with the problem of estimating a deterministic parameter (vector) \theta \in \Theta from observations x \in \mathcal{X}, an estimator
Estimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....

 (estimation rule) \delta^M \,\! is called minimax
Minimax
Minimax is a decision rule used in decision theory, game theory, statistics and philosophy for minimizing the possible loss for a worst case scenario. Alternatively, it can be thought of as maximizing the minimum gain...

if its maximal risk
Risk
Risk is the potential that a chosen action or activity will lead to a loss . The notion implies that a choice having an influence on the outcome exists . Potential losses themselves may also be called "risks"...

 is minimal among all estimators of \theta \,\!. In a sense this means that \delta^M \,\! is an estimator which performs best in the worst possible case allowed in the problem.

Problem setup

Consider the problem of estimating a deterministic (not Bayesian
Bayes estimator
In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function . Equivalently, it maximizes the posterior expectation of a utility function...

) parameter \theta \in \Theta from noisy or corrupt data x \in \mathcal{X} related through the conditional probability distribution P(x|\theta)\,\!. Our goal is to find a "good" estimator \delta(x) \,\! for estimating the parameter \theta \,\!, which minimizes some given risk function
Risk function
In decision theory and estimation theory, the risk function R of a decision rule, δ, is the expected value of a loss function L:...

 R(\theta,\delta) \,\!. Here the risk function is the expectation
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

 of some loss function
Loss function
In statistics and decision theory a loss function is a function that maps an event onto a real number intuitively representing some "cost" associated with the event. Typically it is used for parameter estimation, and the event in question is some function of the difference between estimated and...

 L(\theta,\delta) \,\! with respect to P(x|\theta)\,\!. A popular example for a loss function is the squared error loss L(\theta,\delta)= \|\theta-\delta\|^2 \,\!, and the risk function for this loss is the mean squared error
Mean squared error
In statistics, the mean squared error of an estimator is one of many ways to quantify the difference between values implied by a kernel density estimator and the true values of the quantity being estimated. MSE is a risk function, corresponding to the expected value of the squared error loss or...

 (MSE). Unfortunately in general the risk cannot be minimized, since it depends on the unknown parameter \theta \,\! itself (If we knew what was the actual value of \theta \,\!, we wouldn't need to estimate it). Therefore additional criteria for finding an optimal estimator in some sense are required. One such criterion is the minimax criteria.

Definition

Definition : An estimator \delta^M:\mathcal{X} \rightarrow \Theta \,\! is called minimax with respect to a risk function R(\theta,\delta) \,\! if it achieves the smallest maximum risk among all estimators, meaning it satisfies NEWLINE
NEWLINE
\sup_{\theta \in \Theta} R(\theta,\delta^M) = \inf_\delta \sup_{\theta \in \Theta} R(\theta,\delta). \,
NEWLINE

Least favorable distribution

Logically, an estimator is minimax when it is the best in the worst case. Continuing this logic, a minimax estimator should be a Bayes estimator
Bayes estimator
In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function . Equivalently, it maximizes the posterior expectation of a utility function...

 with respect to a prior least favorable distribution of \theta \,\!. To demonstrate this notion denote the average risk of the Bayes estimator \delta_{\pi} \,\! with respect to a prior distribution \pi \,\! as NEWLINE
NEWLINE
r_\pi = \int R(\theta,\delta_{\pi}) \, d\pi(\theta) \,
NEWLINE Definition: A prior distribution \pi \,\! is called least favorable if for any other distribution \pi ' \,\! the average risk satisfies r_\pi \geq r_{\pi '} \, . Theorem 1: If r_\pi = \sup_\theta R(\theta,\delta_\pi), \, then: NEWLINE
    NEWLINE
  1. \delta_{\pi}\,\! is minimax.
  2. NEWLINE
  3. If \delta_{\pi}\,\! is a unique Bayes estimator, it is also the unique minimax estimator.
  4. NEWLINE
  5. \pi\,\! is least favorable.
NEWLINE Corollary: If a Bayes estimator has constant risk, it is minimax. Note that this is not a necessary condition. Example 1, Unfair coin: Consider the problem of estimating the "success" rate of a Binomial variable, x \sim B(n,\theta)\,\!. This may be viewed as estimating the rate at which an unfair coin
Fair coin
In probability theory and statistics, a sequence of independent Bernoulli trials with probability 1/2 of success on each trial is metaphorically called a fair coin. One for which the probability is not 1/2 is called a biased or unfair coin...

 falls on "heads" or "tails". In this case the Bayes estimator with respect to a Beta-distributed prior, \theta \sim \text{Beta}(\sqrt{n}/2,\sqrt{n}/2) \, is NEWLINE
NEWLINE
\delta^M=\frac{x+0.5\sqrt{n}}{n+\sqrt{n}}, \,
NEWLINE with constant Bayes risk r=\frac{1}{4(1+\sqrt{n})^2} \, and, according to the Corollary, is minimax. Definition: A sequence of prior distributions {\pi}_n\,\! is called least favorable if for any other distribution \pi '\,\!,NEWLINE
NEWLINE
\lim_{n \rightarrow \infty} r_{\pi_n} \geq r_{\pi '}. \,
NEWLINE Theorem 2: If there are a sequence of priors \pi_n\,\! and an estimator \delta\,\! such that \sup_{\theta} R(\theta,\delta)=\lim_{n \rightarrow \infty} r_{\pi_n} \,\!, then : NEWLINE
    NEWLINE
  1. \delta\,\! is minimax.
  2. NEWLINE
  3. The sequence {\pi}_n\,\! is least favorable.
NEWLINE Notice that no uniqueness is guaranteed here. For example, the ML estimator from the previous example may be attained as the limit of Bayes estimators with respect to a uniform
Uniform distribution (continuous)
In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of probability distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable. The support is defined by...

 prior, \pi_n \sim U[-n,n]\,\! with increasing support and also with respect to a zero mean normal prior \pi_n \sim N(0,n \sigma^2) \,\! with increasing variance. So neither the resulting ML estimator is unique minimax not the least favorable prior is unique. Example 2: Consider the problem of estimating the mean of p\,\! dimensional Gaussian random vector, x \sim N(\theta,I_p \sigma^2)\,\!. The Maximum likelihood
Maximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....

 (ML) estimator for \theta\,\! in this case is simply \delta_{ML}=x\,\!, and it risk is NEWLINE
NEWLINE
R(\theta,\delta_{ML})=E{\|\delta_{ML}-\theta\|^2}=\sum \limits_1^p E{(x_i-\theta_i)^2}=p \sigma^2. \,
NEWLINE
The risk is constant, but the ML estimator is actually not a Bayes estimator, so the Corollary of Theorem 1 does not apply. However, the ML estimator is the limit of the Bayes estimators with respect to the prior sequence \pi_n \sim N(0,n \sigma^2) \,\!, and, hence, indeed minimax according to Theorem 2 . Nonetheless, minimaxity does not always imply admissibility
Admissible decision rule
In statistical decision theory, an admissible decision rule is a rule for making a decision such that there isn't any other rule that is always "better" than it, in a specific sense defined below....

. In fact in this example, the ML estimator is known to be inadmissible (not admissible) whenever p >2\,\!. The famous James–Stein estimator dominates the ML whenever p >2\,\!. Though both estimators have the same risk p \sigma^2\,\! when \|\theta\| \rightarrow \infty\,\!, and they are both minimax, the James–Stein estimator has smaller risk for any finite \|\theta\|\,\!. This fact is illustrated in the following figure.

Some examples

In general it is difficult, often even impossible to determine the minimax estimator. Nonetheless, in many cases a minimax estimator has been determined. Example 3, Bounded Normal Mean: When estimating the Mean of a Normal Vector x \sim N(\theta,I_n \sigma^2)\,\!, where it is known that \|\theta\|^2 \leq M\,\!. The Bayes estimator with respect to a prior which is uniformly distributed on the edge of the bounding sphere
Sphere
A sphere is a perfectly round geometrical object in three-dimensional space, such as the shape of a round ball. Like a circle in two dimensions, a perfect sphere is completely symmetrical around its center, with all points on the surface lying the same distance r from the center point...

 is known to be minimax whenever M \leq n\,\!. The analytical expression for this estimator is \delta^M=\frac{nJ_{n+1}(n\|x\|)}{\|x\|J_{n}(n\|x\|)}, \, where J_{n}(t)\,\!, is the modified Bessel function
Bessel function
In mathematics, Bessel functions, first defined by the mathematician Daniel Bernoulli and generalized by Friedrich Bessel, are canonical solutions y of Bessel's differential equation:...

 of the first kind of order n.

Relationship to Robust Optimization

Robust optimization
Robust optimization
Robust optimization is a field of optimization theory that deals with optimization problems where robustness is sought against uncertainty and/or variability in the value of a parameter of the problem.- History :...

 is an approach to solve optimization problems under uncertainty in the knowledge of underlying parameters,. For instance, the MMSE Bayesian estimation of a parameter requires the knowledge of parameter correlation function. If the knowledge of this correlation function is not perfectly available, a popular minimax robust optimization approach is to define a set characterizing the uncertainty about the correlation function, and then pursuing a minimax optimization over the uncertainty set and the estimator respectively. Similar minimax optimizations can be pursued to make estimators robust to certain imprecisely known parameters. For instance, a recent study dealing with such techniques in the area of signal processing can be found in . In R. Fandom Noubiap and W. Seidel (2001) an algorithm for calculating a Gamma-minimax decision rule has been developed, when Gamma is given by a finite number of generalized moment conditions. Such a decision rule minimizes the maximum of the integrals of the risk function with respect to all distributions in Gamma. Gamma-minimax decision rules are of interest in robustness studies in Bayesian statistics.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK