Visual descriptors
Encyclopedia
In computer vision
Computer vision
Computer vision is a field that includes methods for acquiring, processing, analysing, and understanding images and, in general, high-dimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions...

, visual descriptors or image descriptors are descriptions of the visual features of the contents in images
Image
An image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...

, videos
Motion graphics
Motion graphics are graphics that use video footage and/or animation technology to create the illusion of motion or rotation, graphics are usually combined with audio for use in multimedia projects. Motion graphics are usually displayed via electronic media technology, but may be displayed via...

, algorithms, or applications that produce such descriptions. They describe elementary characteristics such as the shape
Shape
The shape of an object located in some space is a geometrical description of the part of that space occupied by the object, as determined by its external boundary – abstracting from location and orientation in space, size, and other properties such as colour, content, and material...

, the color
Color
Color or colour is the visual perceptual property corresponding in humans to the categories called red, green, blue and others. Color derives from the spectrum of light interacting in the eye with the spectral sensitivities of the light receptors...

, the texture or the motion
Motion (physics)
In physics, motion is a change in position of an object with respect to time. Change in action is the result of an unbalanced force. Motion is typically described in terms of velocity, acceleration, displacement and time . An object's velocity cannot change unless it is acted upon by a force, as...

, among others.

Introduction

As a result of the new communication technologies and the massive use of Internet
Internet
The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...

 in our society, the amount of audio-visual information available in digital format is increasing considerably. Therefore, it has been necessary to design some systems that allow us to describe the content of several types of multimedia
Multimedia
Multimedia is media and content that uses a combination of different content forms. The term can be used as a noun or as an adjective describing a medium as having multiple content forms. The term is used in contrast to media which use only rudimentary computer display such as text-only, or...

 information in order to search and classify them.

The audio-visual descriptors are in charge of the contents description. These descriptors have a good knowledge of the objects and events found in a video
Video
Video is the technology of electronically capturing, recording, processing, storing, transmitting, and reconstructing a sequence of still images representing scenes in motion.- History :...

, image
Image
An image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...

 or audio
Sound
Sound is a mechanical wave that is an oscillation of pressure transmitted through a solid, liquid, or gas, composed of frequencies within the range of hearing and of a level sufficiently strong to be heard, or the sensation stimulated in organs of hearing by such vibrations.-Propagation of...

 and they allow the quick and efficient searches of the audio-visual content.

This system can be compared to the search engine
Search engine
A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...

s for textual contents. Although it is certain, that it is relatively easy to find text with a computer, is much more difficult to find concrete audio and video parts. For instance, imagine somebody searching a scene of a happy person. The happiness is a feeling and it is not evident its shape
Shape
The shape of an object located in some space is a geometrical description of the part of that space occupied by the object, as determined by its external boundary – abstracting from location and orientation in space, size, and other properties such as colour, content, and material...

, color
Color
Color or colour is the visual perceptual property corresponding in humans to the categories called red, green, blue and others. Color derives from the spectrum of light interacting in the eye with the spectral sensitivities of the light receptors...

 and texture description in images
Image
An image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...

.

The description of the audio-visual content is not a superficial task and it is essential for the effective use of this type of archives. The standardization system that deals with audio-visual descriptors is the MPEG-7
MPEG-7
MPEG-7 is a multimedia content description standard. It was standardized in ISO/IEC 15938 . This description will be associated with the content itself, to allow fast and efficient searching for material that is of interest to the user. MPEG-7 is formally called Multimedia Content Description...

 (Motion Picture Expert Group - 7).

Types of visual descriptors

Descriptors are the first step to find out the connection between pixels contained in a digital image
Digital image
A digital image is a numeric representation of a two-dimensional image. Depending on whether or not the image resolution is fixed, it may be of vector or raster type...

 and what humans recall after having observed an image
Image
An image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...

 or a group of images
Image
An image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...

 after some minutes.

Visual descriptors are divided in two main groups:
  1. General information descriptors: they contain low level descriptors which give a description about color
    Color
    Color or colour is the visual perceptual property corresponding in humans to the categories called red, green, blue and others. Color derives from the spectrum of light interacting in the eye with the spectral sensitivities of the light receptors...

    , shape
    Shape
    The shape of an object located in some space is a geometrical description of the part of that space occupied by the object, as determined by its external boundary – abstracting from location and orientation in space, size, and other properties such as colour, content, and material...

    , regions, textures and motion
    Motion (physics)
    In physics, motion is a change in position of an object with respect to time. Change in action is the result of an unbalanced force. Motion is typically described in terms of velocity, acceleration, displacement and time . An object's velocity cannot change unless it is acted upon by a force, as...

    .
  2. Specific domain information descriptors: they give information about objects and events in the scene. A concrete example would be face recognition.

General information descriptors

General information descriptors consist of a set of descriptors that covers different basic and elementary features like: color
Color
Color or colour is the visual perceptual property corresponding in humans to the categories called red, green, blue and others. Color derives from the spectrum of light interacting in the eye with the spectral sensitivities of the light receptors...

, texture, shape
Shape
The shape of an object located in some space is a geometrical description of the part of that space occupied by the object, as determined by its external boundary – abstracting from location and orientation in space, size, and other properties such as colour, content, and material...

, motion
Motion (physics)
In physics, motion is a change in position of an object with respect to time. Change in action is the result of an unbalanced force. Motion is typically described in terms of velocity, acceleration, displacement and time . An object's velocity cannot change unless it is acted upon by a force, as...

, location and others. This description is automatically generated by means of signal processing
Signal processing
Signal processing is an area of systems engineering, electrical engineering and applied mathematics that deals with operations on or analysis of signals, in either discrete or continuous time...

.
  • COLOR: the most basic quality of visual content. Five tools are defined to describe color
    Color
    Color or colour is the visual perceptual property corresponding in humans to the categories called red, green, blue and others. Color derives from the spectrum of light interacting in the eye with the spectral sensitivities of the light receptors...

    . The three first tools represent the color
    Color
    Color or colour is the visual perceptual property corresponding in humans to the categories called red, green, blue and others. Color derives from the spectrum of light interacting in the eye with the spectral sensitivities of the light receptors...

     distribution and the last ones describe the color
    Color
    Color or colour is the visual perceptual property corresponding in humans to the categories called red, green, blue and others. Color derives from the spectrum of light interacting in the eye with the spectral sensitivities of the light receptors...

     relation between sequences or group of images
    Image
    An image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...

    :
    • Dominant Color Descriptor (DCD)
    • Scalable Color Descriptor (SCD)
    • Color Structure Descriptor (CSD)
    • Color Layout Descriptor
      Color Layout Descriptor
      Color Layout Descriptor is designed to capture the spatial distribution of color in an image. The feature extraction process consists of two parts; grid based representative color selection and Discrete Cosine Transform with quantization....

       (CLD)
    • Group of frame (GoF) or Group-of-pictures (GoP)

  • TEXTURE: also, an important quality in order to describe an image
    Image
    An image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...

    . The texture descriptors characterize image
    Image
    An image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...

     textures or regions. They observe the region homogeneity and the histograms of these region borders. The set of descriptors is formed by:
    • Homogeneous Texture Descriptor (HTD)
    • Texture Browsing Descriptor (TBD)
    • Edge Histogram Descriptor (EHD)

  • SHAPE: contains important semantic information due to human’s ability to recognize objects through their shape
    Shape
    The shape of an object located in some space is a geometrical description of the part of that space occupied by the object, as determined by its external boundary – abstracting from location and orientation in space, size, and other properties such as colour, content, and material...

    . However, this information can only be extracted by means of a segmentation
    Segmentation
    Segmentation may mean:*Market segmentation, in economics and marketingBiology*A process of morphogenesis that divides a metazoan body into a series of semi-repetitive segments*Segmentation , a series of semi-repetitive segments...

     similar to the one that the human visual system implements. Nowadays, such a segmentation
    Segmentation
    Segmentation may mean:*Market segmentation, in economics and marketingBiology*A process of morphogenesis that divides a metazoan body into a series of semi-repetitive segments*Segmentation , a series of semi-repetitive segments...

     system is not available yet, however there exists a serial of algorithms which are considered to be a good approximation. These descriptors describe regions, contours and shapes for 2D
    2D geometric model
    A 2D geometric model is a geometric model of an object as two-dimensional figure, usually on the Euclidean or Cartesian plane.Even though all material objects are three-dimensional, a 2D geometric model is often adequate for certain flat objects, such as paper cut-outs and machine parts made of...

     images
    Image
    An image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...

     and for 3D
    Volume
    Volume is the quantity of three-dimensional space enclosed by some closed boundary, for example, the space that a substance or shape occupies or contains....

     volumes. The shape
    Shape
    The shape of an object located in some space is a geometrical description of the part of that space occupied by the object, as determined by its external boundary – abstracting from location and orientation in space, size, and other properties such as colour, content, and material...

     descriptors are the following ones:
    • Region-based Shape Descriptor (RSD)
    • Contour-based Shape Descriptor (CSD)
    • 3-D Shape Descriptor (3-D SD)

  • MOTION: defined by four different descriptors which describe motion
    Motion (physics)
    In physics, motion is a change in position of an object with respect to time. Change in action is the result of an unbalanced force. Motion is typically described in terms of velocity, acceleration, displacement and time . An object's velocity cannot change unless it is acted upon by a force, as...

     in video
    Video
    Video is the technology of electronically capturing, recording, processing, storing, transmitting, and reconstructing a sequence of still images representing scenes in motion.- History :...

     sequence. Motion is related to the objects motion in the sequence and to the camera
    Camera
    A camera is a device that records and stores images. These images may be still photographs or moving images such as videos or movies. The term camera comes from the camera obscura , an early mechanism for projecting images...

     motion. This last information is provided by the capture device, whereas the rest is implemented by means of image processing
    Image processing
    In electrical engineering and computer science, image processing is any form of signal processing for which the input is an image, such as a photograph or video frame; the output of image processing may be either an image or, a set of characteristics or parameters related to the image...

    . The descriptor set is the following one:
    • Motion Activity Descriptor (MAD)
    • Camera Motion Descriptor (CMD)
    • Motion Trajectory Descriptor (MTD)
    • Warping and Parametric Motion Descriptor (WMD and PMD)

  • LOCATION: elements location in the image
    Image
    An image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...

     is used to describe elements in the spatial domain. In addition, elements can also be located in the temporal domain:
    • Region Locator Descriptor (RLD)
    • Spatio Temporal Locator Descriptor (STLD)

Specific domain information descriptors

These descriptors, which give information about objects and events in the scene, are not easily extractable, even more when the extraction is to be automatically done. Nevertheless they can be manually processed.

As mentioned before, face recognition is a concrete example of an application that tries to automatically obtain this information.

Descriptors applications

Among all applications, the most important ones are:
  • Multimedia
    Multimedia
    Multimedia is media and content that uses a combination of different content forms. The term can be used as a noun or as an adjective describing a medium as having multiple content forms. The term is used in contrast to media which use only rudimentary computer display such as text-only, or...

     documents search engines and classifiers.
  • Digital library
    Digital library
    A digital library is a library in which collections are stored in digital formats and accessible by computers. The digital content may be stored locally, or accessed remotely via computer networks...

    : visual descriptors allow a very detailed and concrete search of any video
    Video
    Video is the technology of electronically capturing, recording, processing, storing, transmitting, and reconstructing a sequence of still images representing scenes in motion.- History :...

     or image
    Image
    An image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...

     by means of different search parameters. For instance, the search of films where a known actor appears, the search of videos
    Motion graphics
    Motion graphics are graphics that use video footage and/or animation technology to create the illusion of motion or rotation, graphics are usually combined with audio for use in multimedia projects. Motion graphics are usually displayed via electronic media technology, but may be displayed via...

     containing the Everest mountain, etc.
  • Personalized electronic news service.
  • Possibility of an automatic connection to a TV channel broadcasting a soccer match, for example, whenever a player approaches the goal area.
  • Control and filtering of concrete audio-visual contents, like violent or pornographic material. Also, authorization for some multimedia
    Multimedia
    Multimedia is media and content that uses a combination of different content forms. The term can be used as a noun or as an adjective describing a medium as having multiple content forms. The term is used in contrast to media which use only rudimentary computer display such as text-only, or...

    contents.

External links




The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK