Ziggurat algorithm

# Ziggurat algorithm

Discussion

Encyclopedia
The ziggurat algorithm is an algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

for pseudo-random number sampling
Pseudo-random number sampling
Pseudo-random number sampling or non-uniform pseudo-random variate generation is the numerical practice of generating pseudo-random numbers that are distributed according to a given probability distribution....

. Belonging to the class of rejection sampling
Rejection sampling
In mathematics, rejection sampling is a basic pseudo-random number sampling technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or "accept-reject algorithm"....

algorithms, it relies on an underlying source of uniformly-distributed random numbers, typically from a pseudo-random number generator, as well as precomputed tables. The algorithm is used to generate values from a monotone decreasing
Monotonic function
In mathematics, a monotonic function is a function that preserves the given order. This concept first arose in calculus, and was later generalized to the more abstract setting of order theory....

probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

. It can also be applied to symmetric
Symmetric function
In algebra and in particular in algebraic combinatorics, the ring of symmetric functions, is a specific limit of the rings of symmetric polynomials in n indeterminates, as n goes to infinity...

unimodal distributions, such as the normal distribution, by choosing a value from one half of the distribution and then randomly choosing which half the value is considered to have been drawn from. It was developed by George Marsaglia
George Marsaglia
George Marsaglia was an American mathematician and computer scientist. He established the lattice structure of congruential random number generators in the paper "Random numbers fall mainly in the planes". This phenomenon is sometimes called the Marsaglia effect...

and others in the 1960s.

A typical value produced by the algorithm only requires the generation of one random floating-point value and one random table index, followed by one table lookup, one multiply operation and one comparison. Sometimes (2.5% of the time, in the case of a normal or exponential distribution when using typical table sizes) more computations are required. Nevertheless, the algorithm is computationally much faster than the two most commonly used methods of generating normally distributed random numbers, the Marsaglia polar method
Marsaglia polar method
The polar method is a pseudo-random number sampling method for generating a pair of independent standard normal random variables...

and the Box–Muller transform, which require at least one logarithm and one square root calculation for each pair of generated values. However, since the ziggurat algorithm is more complex to implement it is best used when large quantities of random numbers are required.

The term ziggurat algorithm dates from Marsaglia's paper with Wai Wan Tsang in 2000; it is so named because it is conceptually based on covering the probability distribution with rectangular segments stacked in decreasing order of size, resulting in a figure that resembles a ziggurat
Ziggurat
Ziggurats were massive structures built in the ancient Mesopotamian valley and western Iranian plateau, having the form of a terraced step pyramid of successively receding stories or levels.Notable ziggurats include the Great Ziggurat of Ur near Nasiriyah, Iraq; the Ziggurat of Aqar Quf near...

.

## Theory of operation

The ziggurat algorithm is a rejection sampling algorithm; it randomly generates a point in a distribution slightly larger than the desired distribution, then tests whether the generated point is inside the desired distribution. If not, it tries again. Given a random point underneath a probability distribution curve, its x coordinate is a random number with the desired distribution.

The distribution the ziggurat algorithm chooses from is made up of n equal-area regions; n − 1 rectangles that cover the bulk of the desired distribution, on top of a non-rectangular base that includes the tail of the distribution.

Given a monotone decreasing probability distribution function f(x), defined for all x≥0, the base of the ziggurat is defined as all points inside the distribution and below y1 = f(x1). This consists of a rectangular region from (0, 0) to (x1y1), and the (typically infinite) tail of the distribution, where x > x1 (and y < y1).

This layer (call it layer 0) has area A. On top of this, add a rectangular layer of width x1 and height A/x1, so it also has area A. The top of this layer is at height y2 = y1 + A/x1, and intersects the distribution function at a point (x2y2), where y2 = f(x2). This layer includes every point in the distribution function between y1 and y2, but (unlike the base layer) also includes points such as (x1y2) which are not in the desired distribution.

Further layers are then stacked on top. To use a precomputed table of size n (n = 256 is typical), one chooses x1 such that xn=0, meaning that the top box, layer n−1, reaches the distribution's peak at (0, f(0)) exactly.

Layer i extends vertically from yi to yi+1, and can be divided into two regions horizontally: the (generally larger) portion from 0 to xi+1 which is entirely contained within the desired distribution, and the (small) portion from xi+1 to xi, which is only partially contained.

Ignoring for a moment the problem of layer 0, and given uniform random variables U0 and U1 ∈ [0,1), the ziggurat algorithm can be described as:
1. Choose a random layer 0 ≤ i < n.
2. Let x = U0xi.
3. If x < xi+1, return x.
4. Let y = yi + U1(yi+1yi).
5. Compute f(x). If y < f(x), return x.
6. Otherwise, choose new random numbers and go back to step 1.

Step 1 amounts to choosing a low-resolution y coordinate. Step 3 tests if the x coordinate is clearly within the desired distribution function without knowing more about the y coordinate. If it is not, step 4 chooses a high-resolution y coordinate and step 5 does the rejection test.

With closely spaced layers, the algorithm terminates at step 3 a very large fraction of the time. Note that for the top layer n−1, however, this test always fails, because xn = 0.

Layer 0 can also be divided into a central region and an edge, but the edge is an infinite tail. To use the same algorithm to check if the point is in the central region, generate a fictitious x0 = A/y1. This will generate points with x < x1 with the correct frequency, and in the rare case that layer 0 is selected and xx1, use a special fallback algorithm to select a point at random from the tail. Because the fallback algorithm is used less than one time in a thousand, speed is not essential.

Thus, the full ziggurat algorithm for one-sided distributions is:
1. Choose a random layer 0 ≤ i < n.
2. Let x = U0xi
3. If x < xi+1, return x.
4. If i=0, generate a point from the tail using the fallback algorithm.
5. Let y = yi + U1(yi+1yi).
6. Compute f(x). If y < f(x), return x.
7. Otherwise, choose new random numbers and go back to step 1.

For a two-sided distribution, of course, the result must be negated 50% of the time. This can often be done conveniently by choosing U0 ∈ (−1,1) and, in step 3, testing if |x| < xi+1.

### Fallback algorithms for the tail

Because the ziggurat algorithm only generates most outputs very rapidly, and requires a fallback algorithm whenever x > x1, it is always more complex than a more direct implementation. The fallback algorithm, of course, depends on the distribution.

For an exponential distribution, the tail looks just like the body of the distribution. One way is to fall back to the most elementary algorithm E = −ln(U1) and let x = x1 − ln(U1). Another is to call the ziggurat algorithm recursively
Recursion
Recursion is the process of repeating items in a self-similar way. For instance, when the surfaces of two mirrors are exactly parallel with each other the nested images that occur are a form of infinite recursion. The term has a variety of meanings specific to a variety of disciplines ranging from...

and add x1 to the result.

For a normal distribution, Marsaglia suggests a compact algorithm:
1. Let x = −ln(U1)/x1
2. Let y = −ln(U2)
3. If 2y > x2, return x + x1
4. Otherwise, go back to step 1.

Since x1 ≈ 3.5 for typical table sizes, the test in step 3 is almost always successful. Note also that −ln(U) is just a simple way to generate an exponentially distributed random number; if you have a ziggurat exponential distribution generator available, you can use it instead.

### Optimizations

The algorithm can be performed efficiently with precomputed tables of xi and yi = f(xi), but there are some modifications to make it even faster:
• Nothing in the ziggurat algorithm depends on the probability distribution function being normalized (integral under the curve equal to 1); removing normalizing constant
Normalizing constant
The concept of a normalizing constant arises in probability theory and a variety of other areas of mathematics.-Definition and examples:In probability theory, a normalizing constant is a constant by which an everywhere non-negative function must be multiplied so the area under its graph is 1, e.g.,...

s can speed up the computation of f(x).
• Most uniform random number generators are based on integer random number generators which return an integer in the range [0, 232−1]. A table of 2−32xi lets you use such numbers directly for U0.
• When computing two-sided distributions using a two-sided U0 as described earlier, the random integer can be interpreted as a signed number in the range [−231, 231−1] and a scale factor of 2−31 can be used.
• Rather than comparing U0xi to xi+1 in step 3, it is possible to precompute xi+1/xi and compare U0 with that directly. If U0 is an integer random number generator, these limits may be premultiplied by 232 (or 231, as appropriate) so an integer comparison can be used.
• With the above two changes, the table of unmodified xi values is no longer needed and may be deleted.
• When generating IEEE 754 single-precision floating point values, which only have a 24-bit mantissa (including the implicit leading 1), the least-significant bits of a 32-bit integer random number are not used. These bits may be used to select the layer number. (See the references below for a detailed discussion of this.)
• The first three steps may be put into an inline function
Inline function
In various versions of the C and C++ programming languages, an inline function is a function upon which the compiler has been requested to perform inline expansion...

, which can call an out-of-line implementation of the less frequently needed steps.

### Generating the tables

It is possible to store the entire table precomputed, or just include the values n, y1, A, and an implementation of f −1(x) in the source code, and compute the remaining values when initializing the random number generator.

As previously described, you can find xi = f −1(yi) and yi+1yi + A/xi. Repeat n − 1 times for the layers of the ziggurat. At the end, you should have yn = f(0). There will, of course, be some round-off error
Round-off error
A round-off error, also called rounding error, is the difference between the calculated approximation of a number and its exact mathematical value. Numerical analysis specifically tries to estimate this error when using approximation equations and/or algorithms, especially when using finitely many...

, but it is a useful sanity test
Sanity test
A sanity test or sanity check is a basic test to quickly evaluate whether a claim or the result of a calculation can possibly be true. It is a simple check to see if the produced material is rational...

to see that it is acceptably small.

When actually filling in the table values, just assume that xn = 0 and yn = f(0), and accept the slight difference in layer n − 1's area as rounding error.

### Finding x1 and A

Given an initial (guess at) x1, you need a way to compute the area t of the tail for which x>x1. For the exponential distribution, this is just ex1, while for the normal distribution, assuming you are using the unnormalized f(x) = ex2/2, this is √erfc
Error function
In mathematics, the error function is a special function of sigmoid shape which occurs in probability, statistics and partial differential equations...

(x/√). For more awkward distributions, numerical integration
Numerical integration
In numerical analysis, numerical integration constitutes a broad family of algorithms for calculating the numerical value of a definite integral, and by extension, the term is also sometimes used to describe the numerical solution of differential equations. This article focuses on calculation of...

may be required.

With this in hand, from x1, you can find y1 = f(x1), the area t in the tail, and the area of the base layer A = x1y1 + t.

Then compute the series yi and xi as above. If yi > f(0) for any i < n, then the initial estimate x0 was too low, leading to too large an area A. If yn < f(0), then the initial estimate x0 was too high.

Given this, use a root-finding algorithm
Root-finding algorithm
A root-finding algorithm is a numerical method, or algorithm, for finding a value x such that f = 0, for a given function f. Such an x is called a root of the function f....

(such as the bisection method
Bisection method
The bisection method in mathematics is a root-finding method which repeatedly bisects an interval and then selects a subinterval in which a root must lie for further processing. It is a very simple and robust method, but it is also relatively slow...

) to find the value x1 which produces yn−1 as close to f(0) as possible. Alternatively, look for the value which makes the area of the topmost layer, xn−1(f(0) − yn−1), as close to the desired value A as possible. This saves one evaluation of f −1(x) and is actually the condition of greatest interest.