Population-based incremental learning

Computer science

Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

and machine learning

Machine learning

Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

, population-based incremental learning (PBIL) is an optimization

Optimization (mathematics)

In mathematics, computational science, or management science, mathematical optimization refers to the selection of a best element from some set of available alternatives....

algorithm

Algorithm

In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

, and an estimation of distribution algorithm. This is a type of genetic algorithm

Genetic algorithm

A genetic algorithm is a search heuristic that mimics the process of natural evolution. This heuristic is routinely used to generate useful solutions to optimization and search problems...

where the genotype

Genotype

The genotype is the genetic makeup of a cell, an organism, or an individual usually with reference to a specific character under consideration...

of an entire population (probability

Probability

Probability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...

vector) is evolved rather than individual members. The algorithm is proposed by Shumeet Baluja in 1994. The algorithm is simpler than a standard genetic algorithm, and in many cases leads to better results than a standard genetic algorithm.

Algorithm

In PBIL, genes are represented as real values in the range [0,1], indicating the probability that any particular allele

Allele

An allele is one of two or more forms of a gene or a genetic locus . "Allel" is an abbreviation of allelomorph. Sometimes, different alleles can result in different observable phenotypic traits, such as different pigmentation...

appears in that gene

Gene

A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...

.

The PBIL algorithm is as follows:

A population is generated from the probability vector.
The fitness of each member is evaluated and ranked.
Update population genotype (probability vector) based on fittest individual.
Mutate.
Repeat steps 1-4

Source code

This is a part of source code implemented in Java

Java (programming language)

Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

. In the paper, learnRate = 0.1, negLearnRate = 0.075, mutProb = 0.02, and mutShift = 0.05 is used. N = 100 and ITER_COUNT = 1000 is enough for a small problem.

public void optimize {
final int totalBits = getTotalBits(domains);
final double[] probVec = new double[totalBits];
Arrays.fill(probVec, 0.5);
bestCost = POSITIVE_INFINITY;

for (int i = 0; i < ITER_COUNT; i++) {
// Creates N genes
final boolean[][] genes = new boolean[N][totalBits];
for (boolean[] gene : genes) {
for (int k = 0; k < gene.length; k++) {
if (rand.nextDouble < probVec[k])
gene[k] = true;
}
}

// Calculate costs
final double[] costs = new double[N];
for (int j = 0; j < N; j++) {
costs[j] = costFunc.cost(toRealVec(genes[j], domains));
}

// Find min and max cost genes
boolean[] minGene = null, maxGene = null;
double minCost = POSITIVE_INFINITY, maxCost = NEGATIVE_INFINITY;
for (int j = 0; j < N; j++) {
double cost = costs[j];
if (minCost > cost) {
minCost = cost;
minGene = genes[j];
}
if (maxCost < cost) {
maxCost = cost;
maxGene = genes[j];
}
}

// Compare with the best cost gene
if (bestCost > minCost) {
bestCost = minCost;
bestGene = minGene;
}

// Update the probability vector with max and min cost genes
for (int j = 0; j < totalBits; j++) {
if (minGene[j] maxGene[j]) {
probVec[j] = probVec[j] * (1d - learnRate) +
(minGene[j] ? 1d : 0d) * learnRate;
} else {
final double learnRate2 = learnRate + negLearnRate;
probVec[j] = probVec[j] * (1d - learnRate2) +
(minGene[j] ? 1d : 0d) * learnRate2;
}
}

// Mutation
for (int j = 0; j < totalBits; j++) {
if (rand.nextDouble < mutProb) {
probVec[j] = probVec[j] * (1d - mutShift) +
(rand.nextBoolean ? 1d : 0d) * mutShift;
}
}
}
}

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.