Maximally stable extremal regions - AbsoluteAstronomy.com

Computer vision

Computer vision is a field that includes methods for acquiring, processing, analysing, and understanding images and, in general, high-dimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions...

, maximally stable extremal regions (MSER) are used as a method of blob detection

Blob detection

In the area of computer vision, blob detection refers to visual modules that are aimed at detecting points and/or regions in the image that differ in properties like brightness or color compared to the surrounding...

in images. This technique was proposed by Matas et al. to find correspondences

Correspondence problem

The correspondence problem tries to figure out which parts of an image correspond to which parts of another image, after the camera has moved, time has elapsed, and/or the objects have moved around.-Overview:...

between image elements from two images with different viewpoints. This method of extracting a comprehensive number of corresponding image elements contributes to the wide-baseline matching, and it has led to better stereo matching and object recognition

Object recognition

Object recognition in computer vision is the task of finding a given object in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the objects may vary somewhat in different view points, in many different sizes / scale...

algorithms.

Terms and Definitions

Image

is a mapping

. Extremal regions are well defined on images if:

is totally ordered (reflexive, antisymmetric and transitive binary relations exist).
An adjacency relation is defined.

Region

is a contiguous subset of

. (For each

there is a sequence

and

.

(Outer) Region Boundary

, which means the boundary

is the set of pixels adjacent to at least one pixel of

but not belonging to

.

Extremal Region

is a region such that either for all

(maximum intensity region) or for all

(minimum intensity region).

Maximally Stable Extremal Region Let

be a sequence of nested extremal regions (

). Extremal region

is maximally stable if and only if

has a local minimum at

. (Here

denotes cardinality.)

is a parameter of the method.

The concept more simply can be explained by thresholding

Thresholding (image processing)

Thresholding is the simplest method of image segmentation. From a grayscale image, thresholding can be used to create binary images Thresholding is the simplest method of image segmentation. From a grayscale image, thresholding can be used to create binary images Thresholding is the simplest method...

. All the pixels below a given threshold are 'black' and all those above or equal are 'white'. If we are shown a sequence of thresholded images

with frame

corresponding to threshold t, we would see first a white image, then 'black' spots corresponding to local intensity minima will appear then grow larger. These 'black' spots will eventually merge, until the whole image is black. The set of all connected components in the sequence is the set of all extremal regions. In that sense, the concept of MSER is linked to the one of component tree of the image. The component tree indeed provide an easy way for implementing MSER.

Extremal regions

Extremal regions in this context have two important properties, that the set is closed under...

continuous (and thus projective) transformation of image coordinates. This means it is affine invariant and it doesn't matter if the image is warped or skewed.
monotonic transformation of image intensities. The approach is of course sensitive to natural lighting effects as change of day light or moving shadows.

Advantages of MSER

Because the regions are defined exclusively by the intensity function in the region and the outer border, this leads to many key characteristics of the regions which make them useful. Over a large range of thresholds, the local binarization is stable in certain regions, and have the properties listed below.

Invariance to affine transformation
Affine transformation
In geometry, an affine transformation or affine map or an affinity is a transformation which preserves straight lines. It is the most general class of transformations with this property...

of image intensities
Covariance to adjacency preserving (continuous)transformation on the image domain
Stability: only regions whose support is nearly the same over a range of thresholds is selected.
Multi-scale detection without any smoothing involved, both fine and large structure is detected.
Note however that detection of MSERs in a scale pyramid improves repeatability, and number of correspondences across scale changes.
The set of all extremal regions can be enumerated in worst-case , where is the number of pixels in the image.

Comparison to other region detectors

In Mikolajczyk et al., six region detectors are studied (Harris-affine, Hessian-affine, MSER, edge-based regions, intensity extrema, and salient regions). A summary of MSER performance in comparison to the other five follows.

Region density - in comparison to the others MSER offers the most variety detecting about 2600 regions for a textured blur scene and 230 for a light changed. scene, and variety is generally considered to be good. Also MSER had a repeatability of 92% for this test.
Region size - MSER tended to detect many small regions, versus large regions which are more likely to be occluded or to not cover a planar part of the scene. Though large regions may be slightly easier to match.
Viewpoint change - MSER outperforms the five other region detectors in both the original images and those with repeated texture motifs.
Scale change - Following Hessian-affine detector, MSER comes in second under a scale change and in-plane rotation.
Blur - MSER proved to be the most sensitive to this type of change in image, which is the only area that this type of detection is lacking in.
Note however that this evaluation did not make use of multi-resolution detection, which has been shown to improve repeatability under blur.
Light change - MSER showed the highest repeatability score for this type of scene, with all the other having good robustness as well.

MSER consistently resulted in the highest score through many tests, proving it to be a reliable region detector.

Implementation

The original algorithm of Matas et al. is

in the number

of pixels, which is almost linear. It proceeds by first sorting the pixels by intensity. This would take

time, using BINSORT. After sorting, pixels are marked in the image, and the list of growing and merging connected components and their areas is maintained using the union-find algorithm. This would take

time. In practice these steps are very fast. During this process, the area of each connected component as a function of intensity is stored producing a data structure. A merge of two components is viewed as termination of existence of the smaller component and an insertion of all pixels of the smaller component into the larger one. In the extremal regions, the 'maximally stable' ones are those corresponding to thresholds where the relative area change as a function of relative change of threshold is at a local minimum, i.e. the MSER are the parts of the image where local binarization is stable over a large range of thresholds.

The component tree is the set of all connected components of the thresholds of the image, ordered by inclusion. Efficient (quasi-linear whatever the range of the weights) algorithms for computing it do exist. Thus this structure offers an easy way for implementing MSER.

More recently, Nister and Stewenius have proposed a truly (if the weight are small integers) worst-case

method in, which is also much faster in practice. This algorithm is similar to the one of Ph. Salembier et al. .

Robust wide-baseline algorithm

The purpose of this algorithm is to match MSERs to establish correspondence points between images. First MSER regions are computed on the intensity image (MSER+) and on the inverted image (MSER-). Measurement regions are selected at multiple scales: the size of the actual region, 1.5x, 2x, and 3x scaled convex hull of the region. Matching is accomplished in a robust manner, so it is better to increase the distinctiveness of large regions without being severely affected by clutter or non-planarity of the region's pre-image. A measurement taken from an almost planar patch of the scene with stable invariant description are called a 'good measurement'. Unstable ones or those on non-planar surfaces or discontinuities are called 'corrupted measurements'. The robust similarity is computed:
For each

on region

regions

from the other image with the corresponding i-th measurement

nearest to

are found and a vote is cast suggesting correspondence of A and each of

. Votes are summed over all measurements, and using probability analysis, we pick out the 'good measurements' as the 'corrupt measurements' will likely spread their votes randomly. By applying RANSAC

RANSAC

RANSAC is an abbreviation for "RANdom SAmple Consensus". It is an iterative method to estimate parameters of a mathematical model from a set of observed data which contains outliers. It is a non-deterministic algorithm in the sense that it produces a reasonable result only with a certain...

to the centers of gravity of the regions, we can compute a rough epipolar geometry

Epipolar geometry

Epipolar geometry is the geometry of stereo vision. When two cameras view a 3D scene from two distinct positions, there are a number of geometric relations between the 3D points and their projections onto the 2D images that lead to constraints between the image points...

. An affine transformation between pairs of potentially corresponding regions is computed, and correspondences define it up to a rotation, which is then determined by epipolar lines. The regions are then filtered, and the ones with correlation of their transformed images above a threshold are chosen. RANSAC is applied again with a more narrow threshold, and the final eipolar geometry is estimated by the eight-point algorithm

Eight-point algorithm

The eight-point algorithm is an algorithm used in computer vision to estimate the essential matrix or the fundamental matrix related to a stereo camera pair from a set of corresponding image points. It was introduced by Christopher Longuet-Higgins in 1981 for the case of the essential matrix...

.

This algorithm can be tested here (Epipolar or homography geometry constrained matches): [WBS Image Matcher http://cmp.felk.cvut.cz/~wbsdemo/demo/

Extensions and Adaptations

The MSER algorithm has been adapted to colour images, by replacing thresholding of the intensity function with agglomerative clustering
Data clustering
Cluster analysis or clustering is the task of assigning a set of objects into groups so that the objects in the same cluster are more similar to each other than to those in other clusters....

, based on colour gradients.
The MSER algorithm can be used to track colour objects, by performing MSER detection on the Mahalanobis distance
Mahalanobis distance
In statistics, Mahalanobis distance is a distance measure introduced by P. C. Mahalanobis in 1936. It is based on correlations between variables by which different patterns can be identified and analyzed. It gauges similarity of an unknown sample set to a known one. It differs from Euclidean...

to a colour distribution.
By detecting MSERs in multiple resolutions, robustness to blur, and scale change can be improved.

Other Applications

External links

VLFeat, an open source computer vision library in C (with a MEX interface to MATLAB), including an implementation of MSER
OpenCV, an open source computer vision library in C/C++, including an implementation of Linear Time MSER
Detector Repeatabilty Study, Kristian Mikolajczyk Binaries (Win/Linux to compute MSER/HarrisAffine... . Binary used in his repeatability study.

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.