Spectral clustering - AbsoluteAstronomy.com

Given a set of data points A, the similarity matrix

Similarity matrix

A similarity matrix is a matrix of scores which express the similarity between two data points. Similarity matrices are strongly related to their counterparts, distance matrices and substitution matrices.-Use in sequence alignment:...

may be defined as a matrix

where

represents a measure of the similarity between points

. Spectral clustering techniques make use of the spectrum

Spectrum of a matrix

In mathematics, the spectrum of a matrix is the set of its eigenvalues. This notion can be extended to the spectrum of an operator in the infinite-dimensional case.The determinant equals the product of the eigenvalues...

of the similarity matrix of the data to perform dimensionality reduction

Dimensionality reduction

In machine learning, dimension reduction is the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction.-Feature selection:...

for clustering in fewer dimensions.

One such technique is the Normalized Cuts algorithm or Shi–Malik algorithm introduced by Jianbo Shi and Jitendra Malik, commonly used for image segmentation

Segmentation (image processing)

In computer vision, segmentation refers to the process of partitioning a digital image into multiple segments . The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze...

. It partitions points into two sets

based on the eigenvector

corresponding to the second-smallest eigenvalue of the Laplacian matrix

Laplacian matrix

In the mathematical field of graph theory the Laplacian matrix, sometimes called admittance matrix or Kirchhoff matrix, is a matrix representation of a graph. Together with Kirchhoff's theorem it can be used to calculate the number of spanning trees for a given graph. The Laplacian matrix can be...

, where

is the diagonal matrix

This partitioning may be done in various ways, such as by taking the median

of the components in

, and placing all points whose component in

is greater than

, and the rest in

. The algorithm can be used for hierarchical clustering by repeatedly partitioning the subsets in this fashion.

A related algorithm is the Meila–Shi algorithm, which takes the eigenvectors corresponding to the k largest eigenvalues of the matrix

for some k, and then invokes another algorithm (e.g. k-means) to cluster points by their respective k components in these eigenvectors.

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.