Inductive inference - AbsoluteAstronomy.com

Around 1960, Ray Solomonoff

Ray Solomonoff

Ray Solomonoff was the inventor of algorithmic probability, and founder of algorithmic information theory, He was an originator of the branch of artificial intelligence based on machine learning, prediction and probability...

founded the theory of universal inductive inference, the theory of prediction based on observations; for example, predicting the next symbol based upon a given series of symbols. The only assumption is that the environment follows some unknown but computable probability distribution

Probability distribution

In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

.

Fundamental ingredients of the theory are the concepts of algorithmic probability

Algorithmic probability

In algorithmic information theory, algorithmic probability is a method of assigning a probability to each hypothesis that explains a given observation, with the simplest hypothesis having the highest probability and the increasingly complex hypotheses receiving increasingly small probabilities...

and Kolmogorov complexity

Kolmogorov complexity

In algorithmic information theory , the Kolmogorov complexity of an object, such as a piece of text, is a measure of the computational resources needed to specify the object...

. The universal prior probability

Prior probability

In Bayesian statistical inference, a prior probability distribution, often called simply the prior, of an uncertain quantity p is the probability distribution that would express one's uncertainty about p before the "data"...

of any prefix p of a computable sequence x is the sum of the probabilities of all programs (for a universal computer) that compute something starting with p. Given some p and any computable but unknown probability distribution from which x is sampled, the universal prior and Bayes' theorem

Bayes' theorem

In probability theory and applications, Bayes' theorem relates the conditional probabilities P and P. It is commonly used in science and engineering. The theorem is named for Thomas Bayes ....

can be used to predict the yet unseen parts of x in optimal fashion.

This is a mathematically formalized Occam's razor

Occam's razor

Occam's razor, also known as Ockham's razor, and sometimes expressed in Latin as lex parsimoniae , is a principle that generally recommends from among competing hypotheses selecting the one that makes the fewest new assumptions.-Overview:The principle is often summarized as "simpler explanations...

: shorter (Kolmogorov complexity

Kolmogorov complexity

In algorithmic information theory , the Kolmogorov complexity of an object, such as a piece of text, is a measure of the computational resources needed to specify the object...

) computable theories have more weight when calculating the probability of the next observation, using all computable theories which perfectly describe previous observations.
Marcus Hutter

Marcus Hutter

Marcus Hutter is a German computer scientist and professor at the Australian National University. Hutter was born and educated in Munich, where he studied physics and computer science...

's universal artificial intelligence builds upon this to calculate the expected value

Expected value

In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

of an action.

Another direction of inductive inference is based on E. Mark Gold's model of learning in the limit

Language identification in the limit

Language identification in the limit is a formal model for inductive inference. It was introduced by E. Mark Gold in his paper with the same title . In this model, a learner is provided with presentation of some language. The learning is seen as an infinite process. Each time an element of the...

from 1967 and has developed since then more and more models of learning. The general scenario is the following: Given a class S of computable functions, is there a learner (that is, recursive functional) which for any input of the form (f(0),f(1),...,f(n)) outputs a hypothesis (an index e with respect to a previously agreed on acceptable numbering of all computable functions; the indexed function should be consistent with the given values of f). A learner M learns a function f if almost all its hypotheses are the same index e, which generates the function f; M learns S if M learns every f in S. Basic results are that all recursively enumerable classes of functions are learnable while the class REC of all computable functions is not learnable. Many related models have been considered and also the learning of classes of recursively enumerable sets from positive data is a topic studied from Gold's pioneering paper in 1967 onwards.
A far reaching extension of the Gold’s approach is developed by Schmidhuber's theory of generalized Kolmogorov complexities, which are kinds of super-recursive algorithm

Super-recursive algorithm

In computability theory, super-recursive algorithms are a generalization of ordinary algorithms that are more powerful, that is, compute more than Turing machines. The term was introduced by Mark Burgin, whose book "Super-recursive algorithms" develops their theory and presents several mathematical...

See also