Hierarchical Bayes model
Encyclopedia
The hierarchical Bayes model is a method in modern Bayesian statistical inference
Bayesian inference
In statistics, Bayesian inference is a method of statistical inference. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection...

. It is a framework for describing statistical model
Statistical model
A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...

s that can capture dependencies more realistically than non-hierarchical models.

Given data and parameters , a simple Bayesian analysis
Bayesian statistics
Bayesian statistics is that subset of the entire field of statistics in which the evidence about the true state of the world is expressed in terms of degrees of belief or, more specifically, Bayesian probabilities...

 starts with a prior probability
Prior probability
In Bayesian statistical inference, a prior probability distribution, often called simply the prior, of an uncertain quantity p is the probability distribution that would express one's uncertainty about p before the "data"...

 (prior) and likelihood
Likelihood function
In statistics, a likelihood function is a function of the parameters of a statistical model, defined as follows: the likelihood of a set of parameter values given some observed outcomes is equal to the probability of those observed outcomes given those parameter values...

  to compute a posterior probability
Posterior probability
In Bayesian statistics, the posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant evidence is taken into account...

 .

Often the prior on depends in turn on other parameters that are not mentioned in the likelihood. So, the prior must be replaced by a likelihood , and a prior on the newly introduced parameters is required, resulting in a posterior probability


This is the simplest example of a hierarchical Bayes model.

The process may be repeated; for example, the parameters may depend in turn on additional parameters , which will require their own prior. Eventually the process must terminate, with priors that do not depend on any other unmentioned parameters.

Examples

Suppose we have measured the quantities each with normally distributed errors of known standard deviation
Standard deviation
Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...

 ,


Suppose we are interested in estimating the . An approach would be to estimate the using a maximum likelihood
Maximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....

 approach; since the observations are independent, the likelihood factorizes and the maximum likelihood estimate is simply


However, if the quantities are related, so that for example we may think that the individual have themselves been drawn from an underlying distribution, then this relationship destroys the independence and suggests a more complex model, e.g.,


with improper priors flat, flat. When , this is an identified model (i.e. there exists a unique solution for the model's parameters), and the posterior distributions of the individual will tend to move, or shrink
Shrinkage estimator
In statistics, a shrinkage estimator is an estimator that, either explicitly or implicitly, incorporates the effects of shrinkage. In loose terms this means that a naïve or raw estimate is improved by combining it with other information. The term relates to the notion that the improved estimate is...

away from the maximum likelihood estimates towards their common mean. This shrinkage is a typical behavior in hierarchical Bayes models.

Restrictions on priors

Some care is needed when choosing priors in a hierarchical model, particularly on scale variables at higher levels of the hierarchy such as the variable in the example. The usual priors such as the Jeffreys prior
Jeffreys prior
In Bayesian probability, the Jeffreys prior, named after Harold Jeffreys, is a non-informative prior distribution on parameter space that is proportional to the square root of the determinant of the Fisher information:...

 often do not work, because the posterior distribution will be improper (not normalizable), and estimates made by minimizing the expected loss will be inadmissible
Admissible decision rule
In statistical decision theory, an admissible decision rule is a rule for making a decision such that there isn't any other rule that is always "better" than it, in a specific sense defined below....

.

Representation by directed acyclic graphs (DAGs)

A useful graphical tool for representing hierarchical Bayes models is the directed acyclic graph
Directed acyclic graph
In mathematics and computer science, a directed acyclic graph , is a directed graph with no directed cycles. That is, it is formed by a collection of vertices and directed edges, each edge connecting one vertex to another, such that there is no way to start at some vertex v and follow a sequence of...

, or DAG. In this diagram, the likelihood function is represented as the root of the graph; each prior is represented as a separate node pointing to the node that depends on it. In a simple Bayesian model, the data are at the root of the diagram, representing the likelihood , and the variable is placed in a node that points to the root, as in the following diagram:


In the simplest hierarchical Bayes model, where in turn depends on a new variable , a new node labelled is indicated, with an arrow pointed towards the node , as in the equation below or the diagram at right. See also Bayesian network
Bayesian network
A Bayesian network, Bayes network, belief network or directed acyclic graphical model is a probabilistic graphical model that represents a set of random variables and their conditional dependencies via a directed acyclic graph . For example, a Bayesian network could represent the probabilistic...

s.

See also

  • Mixture density
    Mixture density
    In probability and statistics, a mixture distribution is the probability distribution of a random variable whose values can be interpreted as being derived in a simple way from an underlying set of other random variables. In particular, the final outcome value is selected at random from among the...

  • Mixture model
    Mixture model
    In statistics, a mixture model is a probabilistic model for representing the presence of sub-populations within an overall population, without requiring that an observed data-set should identify the sub-population to which an individual observation belongs...

  • WinBUGS
    WinBUGS
    WinBUGS is statistical software for Bayesian analysis using Markov chain Monte Carlo methods.It is based on the BUGS project started in 1989...

    , OpenBUGS
    OpenBUGS
    OpenBUGS is a computer software for the Bayesian analysis of complex statistical models using Markov chain Monte Carlo methods. OpenBUGS is the open source variant of WinBUGS . It runs under Windows and Linux, as well as from inside the R statistical package...

  • Just another Gibbs sampler
    Just another Gibbs sampler
    Just another Gibbs sampler is a program for analysis of Bayesian hierarchical models using Markov chain Monte Carlo developed by Martyn Plummer...

    (JAGS)

External links


Software:
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK