Geostatistics is a branch of
statisticsStatistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
focusing on spatial or
spatiotemporalIn physics, spacetime is any mathematical model that combines space and time into a single continuum. Spacetime is usually interpreted with space as being three-dimensional and time playing the role of a fourth dimension that is of a different sort from the spatial dimensions...
datasets. Developed originally to predict
probability distributionIn probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
s of ore grades for
miningMining is the extraction of valuable minerals or other geological materials from the earth, from an ore body, vein or seam. The term also includes the removal of soil. Materials recovered by mining include base metals, precious metals, iron, uranium, coal, diamonds, limestone, oil shale, rock...
operations, it is currently applied in diverse disciplines including
petroleum geologyPetroleum geology refers to the specific set of geological disciplines that are applied to the search for hydrocarbons .-Sedimentary basin analysis:...
,
hydrogeologyHydrogeology is the area of geology that deals with the distribution and movement of groundwater in the soil and rocks of the Earth's crust, . The term geohydrology is often used interchangeably...
,
hydrologyHydrology is the study of the movement, distribution, and quality of water on Earth and other planets, including the hydrologic cycle, water resources and environmental watershed sustainability...
,
meteorologyMeteorology is the interdisciplinary scientific study of the atmosphere. Studies in the field stretch back millennia, though significant progress in meteorology did not occur until the 18th century. The 19th century saw breakthroughs occur after observing networks developed across several countries...
,
oceanographyOceanography , also called oceanology or marine science, is the branch of Earth science that studies the ocean...
,
geochemistryThe field of geochemistry involves study of the chemical composition of the Earth and other planets, chemical processes and reactions that govern the composition of rocks, water, and soils, and the cycles of matter and energy that transport the Earth's chemical components in time and space, and...
,
geometallurgyGeometallurgy relates to the practice of combining geology or geostatistics with metallurgy, or, more specifically, extractive metallurgy, to create a spatially- or geologically-based predictive model for mineral processing plants. It is used in the hard rock mining industry for risk management...
,
geographyGeography is the science that studies the lands, features, inhabitants, and phenomena of Earth. A literal translation would be "to describe or write about the Earth". The first person to use the word "geography" was Eratosthenes...
,
forestryForestry is the interdisciplinary profession embracing the science, art, and craft of creating, managing, using, and conserving forests and associated resources in a sustainable manner to meet desired goals, needs, and values for human benefit. Forestry is practiced in plantations and natural stands...
, environmental control,
landscape ecologyLandscape ecology is the science of studying and improving relationships between urban development and ecological processes in the environment and particular ecosystems...
,
soil scienceSoil science is the study of soil as a natural resource on the surface of the earth including soil formation, classification and mapping; physical, chemical, biological, and fertility properties of soils; and these properties in relation to the use and management of soils.Sometimes terms which...
, and
agricultureAgriculture is the cultivation of animals, plants, fungi and other life forms for food, fiber, and other products used to sustain life. Agriculture was the key implement in the rise of sedentary human civilization, whereby farming of domesticated species created food surpluses that nurtured the...
(esp. in precision farming). Geostatistics is applied in varied branches of
geographyGeography is the science that studies the lands, features, inhabitants, and phenomena of Earth. A literal translation would be "to describe or write about the Earth". The first person to use the word "geography" was Eratosthenes...
, particularly those involving the spread of diseases (
epidemiologyEpidemiology is the study of health-event, health-characteristic, or health-determinant patterns in a population. It is the cornerstone method of public health research, and helps inform policy decisions and evidence-based medicine by identifying risk factors for disease and targets for preventive...
), the practice of commerce and military planning (
logisticsLogistics is the management of the flow of goods between the point of origin and the point of destination in order to meet the requirements of customers or corporations. Logistics involves the integration of information, transportation, inventory, warehousing, material handling, and packaging, and...
), and the development of efficient
spatial networkA spatial network is a network of spatial elements. In physical space spatial networks are derived from maps of open space within the urban context or building. One might think of the 'space map' as being the negative image of the standard map, with the open space cut out of the background...
s. Geostatistical algorithms are incorporated in many places, including geographic information systems (GIS) and the
R statistical environmentR is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software, and R is widely used for statistical software development and data analysis....
.
Background
Geostatistics is intimately related to interpolation methods, but extends far beyond simple interpolation problems. It consists of a collection of numerical and mathematical techniques dealing with the characterization of spatial phenomena. Geostatistical techniques rely on statistical model that is based on random function (or
random variableIn probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...
) theory to model the uncertainty associated with spatial estimation and simulation.
A number of simpler interpolation methods/algorithms, such as
inverse distance weightingInverse distance weighting is a method for multivariate interpolation, a process of assigning values to unknown points by using values from usually scattered set of known points...
,
bilinear interpolationIn mathematics, bilinear interpolation is an extension of linear interpolation for interpolating functions of two variables on a regular grid. The interpolated function should not use the term of x^2 or y^2, but x y, which is the bilinear form of x and y.The key idea is to perform linear...
and nearest-neighbor interpolation, were already well known before geostatistics. Geostatistics goes beyond the interpolation problem by considering the studied phenomenon at unknown locations as a set of correlated random variables.
Let be the value of the variable of interest at a certain location . This value is unknown (e.g. temperature, rainfall, piezometric level, geological facies, etc.). Although there exists a value at location that could be measured, geostatistics considers this value as random since it was not measured, or has not been measured yet. However, the randomness of is not complete, but defined by a
cumulative distribution functionIn probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...
(cdf) that depends on certain information that is known about the value :
Typically, if the value of is known at locations close to (or in the
neighborhoodIn topology and related areas of mathematics, a neighbourhood is one of the basic concepts in a topological space. Intuitively speaking, a neighbourhood of a point is a set containing the point where you can move that point some amount without leaving the set.This concept is closely related to the...
of ) one can constrain the pdf of by this neighborhood: if a high spatial continuity is assumed, can only have values similar to the ones found in the neighborhood. Conversely, in the absence of spatial continuity can take any value. The spatial continuity of the random variables is described by a model of spatial continuity that can be either a parametric function in the case of
variogramIn spatial statistics the theoretical variogram 2\gamma is a function describing the degree of spatial dependence of a spatial random field or stochastic process Z...
-based geostatistics, or have a non-parametric form when using other methods such as multiple-point simulation or pseudo-genetic techniques.
By applying a single spatial model on an entire domain, one makes the assumption that is a
stationary processIn the mathematical sciences, a stationary process is a stochastic process whose joint probability distribution does not change when shifted in time or space...
. It means that the same statistical properties are applicable on the entire domain. Several geostatistical methods provide ways of relaxing this stationarity assumption.
In this framework, one can distinguish two modeling goals:
- 1) Estimating
Estimation theory is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the...
the value for , typically by the expectationIn probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...
, the medianIn probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...
or the modeIn statistics, the mode is the value that occurs most frequently in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score....
of the pdf . This is usually denoted as an estimation problem.
- 2) Sampling
In statistics and survey methodology, sampling is concerned with the selection of a subset of individuals from within a population to estimate characteristics of the whole population....
from the entire probability density function by actually considering each possible outcome of it at each location. This is generally done by creating several alternative maps of , called realizations. Consider a domain discretized in grid nodes (or pixels). Each realization is a sample of the complete -dimensional joint distribution function
-
-

- In this approach, the presence of multiple solutions to the interpolation problem is acknowledged. Each realization is considered as a possible scenario of what the real variable could be. All associated workflows are then considering ensemble of realizations, and consequently ensemble of predictions that allow for probabilistic forecasting. Therefore, geostatistics is often used to generate or update spatial models when solving inverse problem
An inverse problem is a general framework that is used to convert observed measurements into information about a physical object or system that we are interested in...
s.
A number of methods exist for both geostatistical estimation and multiple realizations approaches. Several reference books provide a comprehensive overview of the discipline.
Simulation
- Aggregation
- Dissagregation
- Turning bands
- Spectral simulation
- SGS
- Transition probabilities
- Markov chain geostatistics
Markov chain geostatistics refer to the Markov chain models, simulation algorithms and associated spatial correlation measures based on the Markov chain random field theory, which extends a single Markov chain into a multi-dimensional field for geostatistical modeling. A Markov chain random field...
- Markov mesh models
- Support vector machine
A support vector machine is a concept in statistics and computer science for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis...
- Boolean simulation
- Genetic models
- Pseudo-genetic models
- Cellular automata
- Multiple-Point Geostatistics (MPS)
Definitions and tools
- Regionalized variable theory
Regionalized variable theory is a geostatistical method used for interpolation in space.The concept of the theory is that interpolation from points in space should not be based on a smooth continuous object. It should be, however, based on a stochastic model that takes into consideration the...
- Covariance function
In probability theory and statistics, covariance is a measure of how much two variables change together and the covariance function describes the variance of a random variable process or field...
- Semi-variance
- Variogram
In spatial statistics the theoretical variogram 2\gamma is a function describing the degree of spatial dependence of a spatial random field or stochastic process Z...
- Kriging
Kriging is a group of geostatistical techniques to interpolate the value of a random field at an unobserved location from observations of its value at nearby locations....
- Range (geostatistics)
- Sill (geostatistics)
- Nugget effect
- Training image
Main scientific journals related to geostatistics
Related software
- gslib is a set of Fortran 77 routines (open source) implementing most of the classical geostatistics estimation and simulation algorithms
- sgems is a cross-platform (Windows, Unix), open-source software that implements most of the classical geostatistics algorithms (kriging, Gaussian and indicator simulation, etc.) as well as new developments (multiple-points geostatistics). It also provides an interactive 3D visualization and offers the scripting capabilities of Python.
- mgstat is a free MATLAB toolbox that allows calling sgems from MATLAB and transparent import/export of objects.
- gstat is an open source computer code for multivariable geostatistical modelling, prediction and simulation. It is also available as R package.
- R has around 20 other packages dedicated to geostatistics, and around 30 dedicated to other areas of spatial statistics.
See also
- Inverse distance weighting
Inverse distance weighting is a method for multivariate interpolation, a process of assigning values to unknown points by using values from usually scattered set of known points...
- Multivariate interpolation
In numerical analysis, multivariate interpolation or spatial interpolation is interpolation on functions of more than one variable.The function to be interpolated is known at given points and the interpolation problem consist of yielding values at arbitrary points .-Regular grid:For function...
- Nearest-neighbor interpolation
- Spline interpolation
In the mathematical field of numerical analysis, spline interpolation is a form of interpolation where the interpolant is a special type of piecewise polynomial called a spline. Spline interpolation is preferred over polynomial interpolation because the interpolation error can be made small even...
- Geology
Geology is the science comprising the study of solid Earth, the rocks of which it is composed, and the processes by which it evolves. Geology gives insight into the history of the Earth, as it provides the primary evidence for plate tectonics, the evolutionary history of life, and past climates...
- Geodemographic segmentation
In marketing, Geodemographic segmentation is a multivariate statistical classification technique for discovering whether the individuals of a population fall into different groups by making quantitative comparisons of multiple characteristics with the assumption that the differences within any...
- Geographic information system
A geographic information system, geographical information science, or geospatial information studies is a system designed to capture, store, manipulate, analyze, manage, and present all types of geographically referenced data...
(GIS)
- Remote sensing
Remote sensing is the acquisition of information about an object or phenomenon, without making physical contact with the object. In modern usage, the term generally refers to the use of aerial sensor technologies to detect and classify objects on Earth by means of propagated signals Remote sensing...
- Pedometrics
Pedometrics is the application of mathematical and statistical methods for the study of the distribution and genesis of soils.Pedometrics is a neologism derived from the Greek roots pedos, soil and, metron, measurement...
External links
- GeoENVia promotes the use of geostatistical methods in environmental applications, and organizes bi-annual conferences.
- European Forum for GeoStatistics is a forum that uses the word geostatistics in another way as used here: they take it as the plural of "geostatistic". In a project called "GEOSTAT", the ... goals are to develop the guidelines for datasets and methods to link 2010/11 Population and Housing Census results to a common harmonised grid. See also the difference between Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
and StatisticA statistic is a single measure of some attribute of a sample . It is calculated by applying a function to the values of the items comprising the sample which are known together as a set of data.More formally, statistical theory defines a statistic as a function of a sample where the function...
.
- Kriging link, contains explanations of variance in geostats
- Arizona university geostats page
- AI-Geostats, a resource on the internet about geostatistics and spatial statistics
- On-Line Library that chronicles Matheron's journey from classical statistics to the new science of geostatistics
- http://www.geostatscam.com Is the site of Jan W. Merks, who claims that geostatistics is "voodoo science" and a "scientific fraud".