Multimedia Information Retrieval
Encyclopedia
Multimedia Information Retrieval (MMIR) is a research discipline of computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

 that aims at extracting semantic information from multimedia
Multimedia
Multimedia is media and content that uses a combination of different content forms. The term can be used as a noun or as an adjective describing a medium as having multiple content forms. The term is used in contrast to media which use only rudimentary computer display such as text-only, or...

 data sources. Data sources include directly perceivable media such as audio
Audio
Audio is an electrical or other representation of sound.Audio may also refer to:*Audio, audible content in media production and publishing*AUDIO , an American R&B band of 5 brothers formerly known as TNT Boyz and as B5...

, image
Image
An image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...

 and video
Video
Video is the technology of electronically capturing, recording, processing, storing, transmitting, and reconstructing a sequence of still images representing scenes in motion.- History :...

, indirectly perceivable sources such as text
Written language
A written language is the representation of a language by means of a writing system. Written language is an invention in that it must be taught to children, who will instinctively learn or create spoken or gestural languages....

, biosignals as well as not perceivable sources such as bioinformation, stock prices, etc. The methodology of MMIR can be organized in three groups:
  1. Methods for the summarization of media content (feature extraction
    Feature extraction
    In pattern recognition and in image processing, feature extraction is a special form of dimensionality reduction.When the input data to an algorithm is too large to be processed and it is suspected to be notoriously redundant then the input data will be transformed into a reduced representation...

    ). The result of feature extraction is a description.
  2. Methods for the filtering of media descriptions (for example, elimination of redundancy
    Redundancy
    Redundancy may refer to:* Redundancy * Redundancy * Redundancy * Redundancy * Redundancy * Data redundancy* Gene redundancy* Logic redundancy...

    )
  3. Methods for the categorization
    Categorization
    Categorization is the process in which ideas and objects are recognized, differentiated and understood. Categorization implies that objects are grouped into categories, usually for some specific purpose. Ideally, a category illuminates a relationship between the subjects and objects of knowledge...

     of media descriptions into classes.

Feature Extraction Methods

Feature extraction is motivated by the sheer size of multimedia objects as well as their redundancy and, possibly, noisiness. Generally, two possible goals can be achieved by feature extraction:
  • Summarization of media content. Methods for summarization include in the audio domain, for example, Mel Frequency Cepstral Coefficients, Zero Crossings Rate, Short-Time Energy. In the visual domain, color histograms such as the MPEG-7
    MPEG-7
    MPEG-7 is a multimedia content description standard. It was standardized in ISO/IEC 15938 . This description will be associated with the content itself, to allow fast and efficient searching for material that is of interest to the user. MPEG-7 is formally called Multimedia Content Description...

     Scalable Color Descriptor can be used for summarization.
  • Detection of patterns by auto-correlation and/or cross-correlation
    Cross-correlation
    In signal processing, cross-correlation is a measure of similarity of two waveforms as a function of a time-lag applied to one of them. This is also known as a sliding dot product or sliding inner-product. It is commonly used for searching a long-duration signal for a shorter, known feature...

    . Patterns are recurring media chunks that can either be detected by comparing chunks over the media dimensions (time, space, etc.) or comparing media chunks to templates (e.g. face templates, phrases). Typical methods include Linear Predictive Coding in the audio/biosignal domain, texture description in the visual domain and n-grams in text information retrieval.

Merging and Filtering Methods

Multimedia Information Retrieval implies that multiple channels are employed for the understanding of media content. Each of this channels is described by media-specific feature transformations. The resulting descriptions have to be merged to one description per media object. Merging can be performed by simple concatenation if the descriptions are of fixed size. Variable-sized descriptions - as they frequently occur in motion description - have to be normalized to a fixed length first.

Frequently used methods for description filtering include factor analysis
Factor analysis
Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved, uncorrelated variables called factors. In other words, it is possible, for example, that variations in three or four observed variables...

 (e.g. by PCA), singular value decomposition (e.g. as latent semantic indexing in text retrieval) and the extraction and testing of statistical moments. Advanced concepts such as the Kalman filter
Kalman filter
In statistics, the Kalman filter is a mathematical method named after Rudolf E. Kálmán. Its purpose is to use measurements observed over time, containing noise and other inaccuracies, and produce values that tend to be closer to the true values of the measurements and their associated calculated...

 are used for merging of descriptions.

Categorization Methods

Generally, all forms of machine learning can be employed for the categorization of multimedia descriptions though some methods are more frequently used in one area than another. For example, Hidden Markov models are state-of-the-art in speech recognition
Speech recognition
Speech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software...

, while Dynamic Time Warping
Dynamic time warping
Dynamic time warping is an algorithm for measuring similarity between two sequences which may vary in time or speed. For instance, similarities in walking patterns would be detected, even if in one video the person was walking slowly and if in another he or she were walking more quickly, or even...

 - a semantically related method - is state-of-the-art in gene sequence alignment. The list of applicable classifiers includes the following:
  • Metric approaches (Cluster Analysis, Vector Space Model
    Vector space model
    Vector space model is an algebraic model for representing text documents as vectors of identifiers, such as, for example, index terms. It is used in information filtering, information retrieval, indexing and relevancy rankings...

    , Minkowski
    Minkowski
    Minkowski is a surname, and may refer to:* Eugène Minkowski , French psychiatrist* Hermann Minkowski Russian-born German mathematician and physicist, known for:** Minkowski addition** Minkowski–Bouligand dimension...

     Distances, Dynamic Alignment)
  • Nearest Neighbor methods (K-Nearest Neighbor, K-Means, Self-Organizing Map
    Self-organizing map
    A self-organizing map or self-organizing feature map is a type of artificial neural network that is trained using unsupervised learning to produce a low-dimensional , discretized representation of the input space of the training samples, called a map...

    )
  • Risk Minimization (Support Vector Regression, Support Vector Machine
    Support vector machine
    A support vector machine is a concept in statistics and computer science for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis...

    , Linear Discriminant Analysis
    Linear discriminant analysis
    Linear discriminant analysis and the related Fisher's linear discriminant are methods used in statistics, pattern recognition and machine learning to find a linear combination of features which characterizes or separates two or more classes of objects or events...

    )
  • Density-based Methods (Bayes Nets, Markov Processes, Mixture Models)
  • Neural Networks (Perceptron
    Perceptron
    The perceptron is a type of artificial neural network invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt. It can be seen as the simplest kind of feedforward neural network: a linear classifier.- Definition :...

    , Associative Memories, Spiking Nets)
  • Heuristics (Decision Trees, Random Forests, etc.)


The selection of the best classifier for a given problem (test set with descriptions and class labels, so-called ground truth
Ground truth
Ground truth is a term used in cartography, meteorology, analysis of aerial photographs, satellite imagery and a range of other remote sensing techniques in which data are gathered at a distance. Ground truth refers to information that is collected "on location." In remote sensing, this is...

) can be performed automatically, for example, using the Weka
Weka
The Weka or woodhen is a flightless bird species of the rail family. It is endemic to New Zealand, where four subspecies are recognized. Weka are sturdy brown birds, about the size of a chicken. As omnivores, they feed mainly on invertebrates and fruit...

 Data Miner.

Open Problems

The quality of MMIR Systems depends heavily on the quality of the training data. Discriminative descriptions can be extracted from media sources in various forms. Machine learning provides categorization methods for all types of data. However, the classifier can only be as good as the given training data. On the other hand, it requires considerable effort to provide class labels for large databases. The future success of MMIR will depend on the provision of such data. The annual TRECVID
TRECVID
The TRECVID evaluation meetings are on-going series of workshops focusing on a list of different information retrieval research areas in content based retrieval of video. It is co-sponsored by the National Institute of Standards and Technology and the Intelligence Advanced Projects Activity of...

 competition is currently one of the most relevant sources of high-quality ground truth.

Related Areas

MMIR provides an overview over methods employed in the areas of information retrieval. Methods of one area are adapted and employed on other types of media. Multimedia content is merged before the classification is performed. MMIR methods are, therefore, usually reused from other areas such as:
  • Bioinformation Analysis
    Bioinformatics
    Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

  • Biosignal Processing
    Biosignal
    Biosignal is a summarizing term for all kinds of signals that can be measured and monitored from biological beings. The term biosignal is often used to mean bio-electrical signal but in fact, biosignal refers to both electrical and non-electrical signals.Electrical biosignals are usually taken to...

  • Content-based Image and Video Retrieval
    Content-based image retrieval
    Content-based image retrieval , also known as query by image content and content-based visual information retrieval is the application of computer vision techniques to the image retrieval problem, that is, the problem of searching for digital images in large databases....

  • Face Recognition
  • Audio and Music Classification
    Music information retrieval
    Music information retrieval is the interdisciplinary science of retrieving information from music. MIR is a small but growing field of research with many real-world applications...

  • Speech Recognition
    Speech recognition
    Speech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software...

  • Technical Chart Analysis
    Technical analysis
    In finance, technical analysis is security analysis discipline for forecasting the direction of prices through the study of past market data, primarily price and volume. Behavioral economics and quantitative analysis incorporate technical analysis, which being an aspect of active management stands...

  • Text Information Retrieval
    Information retrieval
    Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...



The new Journal of Multimedia Information Retrieval should help the development of MMIR as a research discipline independent of these areas.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK