Topological data analysis
Encyclopedia
Topological data analysis is a new area of study aimed at having applications in areas such as data mining
Data mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...

 and computer vision
Computer vision
Computer vision is a field that includes methods for acquiring, processing, analysing, and understanding images and, in general, high-dimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions...

.
The main problems are:
  1. how one infers high-dimensional structure from low-dimensional representations; and
  2. how one assembles discrete points into global structure.


The human brain can easily extract global structure from representations in a strictly lower dimension, i.e. we infer a 3D environment from a 2D image from each eye. The inference of global structure also occurs when converting discrete data into continuous images, e.g. dot-matrix printers and televisions communicate images via arrays of discrete points.

The main method used by topological data analysis is:
  1. Replace a set of data points with a family of simplicial complex
    Simplicial complex
    In mathematics, a simplicial complex is a topological space of a certain kind, constructed by "gluing together" points, line segments, triangles, and their n-dimensional counterparts...

    es, indexed by a proximity parameter.
  2. Analyse these topological complexes via algebraic topology
    Algebraic topology
    Algebraic topology is a branch of mathematics which uses tools from abstract algebra to study topological spaces. The basic goal is to find algebraic invariants that classify topological spaces up to homeomorphism, though usually most classify up to homotopy equivalence.Although algebraic topology...

     — specifically, via the new theory of persistent homology.
  3. Encode the persistent homology of a data set in the form of a parameterized version of a Betti number
    Betti number
    In algebraic topology, a mathematical discipline, the Betti numbers can be used to distinguish topological spaces. Intuitively, the first Betti number of a space counts the maximum number of cuts that can be made without dividing the space into two pieces....

     which will be called a barcode.

Point cloud data

Data is often represented as points in a Euclidean n-dimensional space
Euclidean space
In mathematics, Euclidean space is the Euclidean plane and three-dimensional space of Euclidean geometry, as well as the generalizations of these notions to higher dimensions...

 En. The global shape of the data may provide information about the phenomena that the data represent.

One type of data set for which global features are certainly present is the so-called point cloud data coming from physical objects in 3D. E.g. a laser can scan an object at a set of discrete points and the cloud of such points can be used in a computer representation of the object. Point cloud data refers to any collection of points in En or a (perhaps noisy) sample of points on a lower-dimensional subset.

For point clouds in low-dimensional spaces there are numerous approaches for inferring features based on planar projections in the fields of computer graphics
Computer graphics
Computer graphics are graphics created using computers and, more generally, the representation and manipulation of image data by a computer with help from specialized software and hardware....

 and statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

. Topological data analysis is needed when the spaces are high-dimensional or too twisted to allow planar projections.

To convert a point cloud in a metric space
Metric space
In mathematics, a metric space is a set where a notion of distance between elements of the set is defined.The metric space which most closely corresponds to our intuitive understanding of space is the 3-dimensional Euclidean space...

 into a global object, use the point cloud as the
vertices of a graph
Graph (mathematics)
In mathematics, a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected objects are represented by mathematical abstractions called vertices, and the links that connect some pairs of vertices are called edges...

 whose edges are determined by proximity, then turn the graph into a simplicial complex
Simplicial complex
In mathematics, a simplicial complex is a topological space of a certain kind, constructed by "gluing together" points, line segments, triangles, and their n-dimensional counterparts...

 and use algebraic topology to study it. An alternative approach is the minimum spanning tree-based method in the geometric data clustering. If a group of data points forms a cluster, then the geometry of this point cloud can be determined.

Persistent homology

See homology
Homology (mathematics)
In mathematics , homology is a certain general procedure to associate a sequence of abelian groups or modules with a given mathematical object such as a topological space or a group...

 for an introduction to the notation.


Persistent homology essentially calculates homology groups at different resolutions to see which features persist for long periods of time. It is assumed that important features and structures are the ones that persist. We define persistent homology as follows:

Let be a filtration. The p-persistent kth homology group of is .

If we let be a nonbounding -cycle created at time by simplex and let be a homologous -cycle that becomes a boundary cycle at time by simplex , then we can define the persistence of as . We call the creator of and the destroyer of . If does not have a destroyer, its persistence is .

Instead of using an index-based filtration, we can use a time-based filtration. Let be a simplicial complex
Simplicial complex
In mathematics, a simplicial complex is a topological space of a certain kind, constructed by "gluing together" points, line segments, triangles, and their n-dimensional counterparts...

 and be a filtration defined for an associated map that maps simplices in the final complex to real numbers. Then for all real numbers , the -persistent kth homology group of is . The persistence of a -cycle created at time and destroyed at is .

See also

  • Dimensionality reduction
    Dimensionality reduction
    In machine learning, dimension reduction is the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction.-Feature selection:...

  • Data mining
    Data mining
    Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...

  • Computer vision
    Computer vision
    Computer vision is a field that includes methods for acquiring, processing, analysing, and understanding images and, in general, high-dimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions...

  • Computational topology
  • Digital topology
    Digital topology
    Digital topology deals with properties and features of two-dimensional or three-dimensional digital imagesthat correspond to topological properties or topological features of objects....

  • Digital Morse theory
    Digital Morse theory
    In mathematics, digital Morse theory is a digital adaptation of continuum Morse theory for scalar volume data.The main utility of a digital Morse theory is that it serves to provide a theoretical basis for isosurfaces, and perpendicular streamlines....

  • Shape analysis
    Shape analysis
    This article describes shape analysis to analyze and process geometric shapes.The shape analysis described here is related to the statistical analysis of geometric shapes, to shape matching and shape recognition...

  • Size theory
  • Structured data analysis (statistics)
    Structured data analysis (statistics)
    Structured data analysis is the statistical data analysis of structured data. This can arise either in the form of an a priori structure such as multiple-choice questionnaires or in situations with the need to search for structure that fits the given data, either exactly or approximately...


Further reading

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK