Cognitive neuroscience of visual object recognition

Object recognition is the ability to perceive an object’s physical properties (such as shape, colour and texture) and apply semantic attributes to the object, which includes the understanding of its use, previous experience with the object and how it relates to others.

Basic Stages of Object Recognition

One model of object recognition, based on neuropsychological evidence, provides information that allows us to divide the process into four different stages.

Stage 1 Processing of basic object components, such as colour, depth, and form.

Stage 2 These basic components are then grouped on the basis of similarity, providing information on distinct edges to the visual form. Subsequently, figure-ground segregation is able to take place.

Stage 3 The visual representation is matched with structural descriptions in memory.

Stage 4 Semantic attributes are applied to the visual representation, providing meaning, and thereby recognition.

It should be noted that, within these stages, there are more specific processes that take place to complete the different processing components. In addition, other existing models propose integrative hierarchies (top-down and bottom-up), as well as parallel processing, as opposed to this general bottom-up hierarchy.

Hierarchical Recognition Processing

Visual recognition processing has been typically viewed as a bottom-up hierarchy in which information is processed sequentially with increasing complexities, where lower-level cortical processors, such as the primary visual cortex, are at the bottom of the processing hierarchy and higher-level cortical processors, such as the inferotemporal cortex (IT), are at the top, where recognition is facilitated. A most recognized bottom-up hierarchical theory is David Marr’s theory of vision. In contrast, an increasingly popular recognition processing theory, is that of top-down processing. One model, proposed by Moshe Bar

Moshe Bar (neuroscientist)

Moshe Bar is a neuroscientist, associate professor in psychiatry and radiology at Harvard Medical School, and associate professor in psychiatry and neuroscience at Massachusetts General Hospital. He directs the Cognitive Neuroscience Laboratory at the Athinoula A. Martinos Center for Biomedical...

(2003), describes a “shortcut” method in which early visual inputs are sent, partially analyzed, from the early visual cortex to the prefrontal cortex

Prefrontal cortex

The prefrontal cortex is the anterior part of the frontal lobes of the brain, lying in front of the motor and premotor areas.This brain region has been implicated in planning complex cognitive behaviors, personality expression, decision making and moderating correct social behavior...

(PFC). Possible interpretations of the crude visual input is generated in the PFC and then sent to the inferotemporal cortex (IT) subsequently activating relevant object representations which are then incorporated into the slower, bottom-up process. This “shortcut” is meant to minimize the amount of object representations required for matching thereby facilitating object recognition. Lesion studies have supported this proposal with findings of slower response times for individuals with PFC lesions, suggesting use of only the bottom-up processing.

Object Constancy and Theories of Object Recognition

A significant aspect of object recognition is that of object constancy: the ability to recognize an object across varying viewing conditions. These varying conditions include object orientation, lighting, and object variability (size, colour, and other within-category differences). For the visual system to achieve object constancy, it must be able to extract a commonality in the object description across different viewpoints and the retinal descriptions. Several theories have been generated to provide insight on how object constancy may be achieved for the purpose of object recognition including, viewpoint-invariant, viewpoint-dependent and multiple views theories.

Viewpoint-Invariant Theories

Viewpoint-invariant theories suggest that object recognition is based on structural information, such as individual parts, allowing for recognition to take place regardless of the object’s viewpoint. Accordingly, recognition is possible from any viewpoint as individual parts of an object can be rotated to fit any particular view. This form of analytical recognition requires little memory as only structural parts need to be encoded, which can produce multiple object representations through the interrelations of these parts and mental rotation. Therefore, storage of multiple object viewpoints is not required in memory.

3-D Model Representation

This model, proposed by Marr and Nishihara (1978), states that object recognition is achieved by matching 3-D model representations obtained from the visual object with 3-D model representations stored in memory. The 3-D model representations obtained from the object are formed by first identifying the concavities of the object, which separate the stimulus into individual parts. Then the axis of each individual part of the object are found. Identifying the principal axis of the object assists in the normalization process via mental rotation that is required because only the canonical description of the object is stored in memory. Recognition is acquired when the observed object viewpoint is mentally rotated to match the stored canonical description.

Recognition by Components

An extension of Marr and Nishihara's model, the Recognition by Components Theory

Recognition by Components Theory

The Recognition-by-components theory, or RBC theory, is a bottom-up process proposed by Irving Biederman to explain object recognition. According to RBC theory, we are able to recognize objects by separating them into geons...

, proposed by Biederman (1987), proposes that the visual information gained from an object is divided into simple geometric components, such as blocks and cylinders, also known as “geons

Geon (psychology)

Geons are the simple 2D or 3D forms such as cylinders, bricks, wedges, cones, circles and rectangles corresponding to the simple parts of an object in Biederman's theory of object recognition. The theory proposes that the visual input is matched against structural representations of objects in the...

” (geometric ions), and are then matched with the most similar object representation that is stored in memory to provide the object’s identification (see Figure 1).

Viewpoint-Dependent Theories

Viewpoint-dependent theories suggest that object recognition is affected by the viewpoint at which it is seen, implying that objects seen in novel viewpoints reduce the accuracy and speed of object identification. This theory of recognition is based on a more holistic system rather than by parts, suggesting that objects are stored in memory with multiple viewpoints and angles. This form of recognition requires a lot of memory as each viewpoint must be stored. Accuracy of recognition also depends on how familiar the observed viewpoint of the object is.

Multiple Views Theory

This theory proposes that object recognition lies on a viewpoint continuum where each viewpoint is recruited for different types of recognition. At one extreme of this continuum, viewpoint-dependent mechanisms are used for within-category discriminations, while at the other extreme, viewpoint-invariant mechanisms are used for the categorization of objects.

Neural Substrates

The Dorsal and Ventral Stream

The visual processing of objects in the brain can be divided into two processing pathways: the dorsal stream (how/where), which extends from the visual cortex

Visual cortex

The visual cortex of the brain is the part of the cerebral cortex responsible for processing visual information. It is located in the occipital lobe, in the back of the brain....

to the parietal lobes and ventral stream (what), which extends from the visual cortex

Visual cortex

The visual cortex of the brain is the part of the cerebral cortex responsible for processing visual information. It is located in the occipital lobe, in the back of the brain....

to the inferotemporal cortex (IT). The existence of these two separate visual processing pathways was first proposed by Ungerleider and Mishkin (1982) who, based on their lesion studies, suggested that the dorsal stream is involved in the processing of visual spatial information, such as object localization (where), and the ventral stream is involved in the processing of visual object identification information (what). Since this initial proposal, it has been alternatively suggested that the dorsal pathway should be known as the ‘How’ pathway as the visual spatial information processed here provides us with information about how to interact with objects For the purpose of object recognition, the neural focus is on the ventral stream.

Functional Specialization in the Ventral Stream

Within the ventral stream, various regions of proposed functional specialization have been observed in functional imaging studies. The brain regions most consistently found to display functional specialization are the Fusiform Face Area

Fusiform face area

The fusiform face area is a part of the human visual system which might be specialized for facial recognition, although there is some evidence that it also processes categorical information about other objects, particularly familiar ones.-Localization:...

(FFA), which shows increased activation for faces when compared with objects, the Parahippocampal Place Area (PPA) for scenes vs. objects, the Extrastriate Body Area (EBA) for body parts vs. objects, MT+/V5 for moving stimuli vs. static stimuli, and the Lateral Occipital Complex (LOC) for discernable shapes vs. scrambled stimuli. (See Also: Neural processing for individual categories of objects

Neural processing for individual categories of objects

Discrete categories of objects such as faces, body parts, tools, animals and buildings have been associated with preferential activation in specialised areas of the cerebral cortex, leading to the suggestion that they may be produced separately in discrete neural regions.Several such regions have...

)

Structural Processing: The Lateral Occipital Complex

The Lateral Occipital Complex (LOC) has been found to be particularly important for object recognition at the perceptual structural level. In an event-related fMRI study that looked at the adaptation of neurons activated in visual processing of objects, it was discovered that the similarity of an object’s shape is necessary for subsequent adaptation in the LOC, but specific object features such as edges and contours are not. This suggests that activation in the LOC represents higher-level object shape information and not simple object features. In a related fMRI study, the activation of the LOC, which occurred regardless of the presented object’s visual cues such as motion, texture, or luminance contrasts, suggests that the different low-level visual cues used to define an object converge in “object-related areas” to assist in the perception and recognition process. None of the mentioned higher-level object shape information seems to provide any semantic information about the object as the LOC shows a neuronal response to varying forms including non-familiar, abstract objects.

Further experiments have proposed that the LOC consists of a hierarchical system for shape selectivity indicating greater selective activation in the posterior regions for fragments of objects whereas the anterior regions show greater activation for full or partial objects. This is consistent with previous research that suggests a hierarchical representation in the ventral temporal cortex where primary feature processing occurs in the posterior regions and the integration of these features into a whole and meaningful object occurs in the anterior regions.

Semantic Processing

Through information provided from neuropsychological patients, dissociations of recognition processing have been identified between structural and semantic processing as structural, colour, and associative information can be selectively impaired. In one PET

Positron emission tomography

Positron emission tomography is nuclear medicine imaging technique that produces a three-dimensional image or picture of functional processes in the body. The system detects pairs of gamma rays emitted indirectly by a positron-emitting radionuclide , which is introduced into the body on a...

study, areas found to be involved in associative semantic processing include the left anterior superior/middle temporal gyrus

Middle temporal gyrus

Middle temporal gyrus is a gyrus in the brain on the Temporal lobe. It is located between the superior temporal gyrus and inferior temporal gyrus. Its exact function is unknown, but it has been connected with processes as different as contemplating distance, recognition of known faces, and...

and the left temporal pole comparative to structural and colour information, as well as the right temporal pole comparative to colour decision tasks only. These results indicate that stored perceptual knowledge and semantic knowledge involve separate cortical regions in object recognition as well as indicating that there are hemispheric differences in the temporal regions.

Research has also provided evidence which indicates that visual semantic information converges in the fusiform gyri of the inferotemporal lobes. In a study that compared the semantic knowledge of category

Category (Kant)

In Kant's philosophy, a category is a pure concept of the understanding. A Kantian category is a characteristic of the appearance of any object in general, before it has been experienced...

versus attributes, it was found that they play separate roles in how they contribute to recognition. For categorical comparisons, the lateral regions of the fusiform gyrus

Fusiform gyrus

The fusiform gyrus is part of the temporal lobe in Brodmann Area 37. It is also known as the occipitotemporal gyrus. Other sources have the fusiform gyrus above the occipitotemporal gyrus and underneath the parahippocampal gyrus....

were activated by living objects, in comparison to nonliving objects which activated the medial regions. For attribute comparisons, it was found that the right fusiform gyrus

Fusiform gyrus

was activated by global form, in comparison to local details which activated the left fusiform gyrus

Fusiform gyrus

. These results suggest that the type of object category determines which region of the fusiform gyrus

Fusiform gyrus

is activated for processing semantic recognition, whereas the attributes of an object determines the activation in either the left or right fusiform gyrus depending on whether global form or local detail is processed.

In addition, it has been proposed that activation in anterior regions of the fusiform gyri indicate successful recognition. However, levels of activation have been found to depend on the semantic relevance of the object. The term semantic relevance here refers to “a measure of the contribution of semantic features to the ‘‘core’’ meaning of a concept.” Results showed that objects with high semantic relevance, such as artefacts

Visual artifact

Visual artifacts are anomalies during visual representation of e.g. digital graphics and imagery.-Examples in digital graphics:* Image quality factors, different types of visual artifacts...

, created an increase in activation compared to objects with low semantic relevance, such as natural objects. This is due to the proposed increased difficulty to distinguish between natural objects as they have very similar structural properties which makes them harder to identify in comparison to artefacts. Therefore, the easier the object is to identify, the more likely it will be successfully recognized.

Another condition that affects successful object recognition performance is that of contextual facilitation. It is thought that during tasks of object recognition, an object is accompanied by a “context frame”, which offers semantic information about the object’s typical context. It has been found that when an object is out of context, object recognition performance is hindered with slower response times and greater inaccuracies in comparison to recognition tasks when an object was in an appropriate context. Based on results from a study using fMRI, it has been proposed that there is a “context network” in the brain for contextually associated objects with activity largely found in the Parahippocampal cortex (PHC) and the Retrosplenial Complex (RSC). Within the PHC, activity in the Parahippocampal Place Area (PPA), has been found to be preferential to scenes rather than objects; however, it has been suggested that activity in the PHC for solitary objects in tasks of contextual facilitation may be due to subsequent thought of the spatial scene in which the object is contextually represented. Further experimenting found that activation was found for both non-spatial and spatial contexts in the PHC, although activation from non-spatial contexts was limited to the anterior PHC and the posterior PHC for spatial contexts.

Recognition Memory

When you see an object, you know what the object is because you've seen it on a past occasion; this is recognition memory

Recognition memory

Recognition memory is a subcategory of declarative memory Essentially, recognition memory is the ability to recognize previously encountered events, objects, or people...

. Not only do abnormalities to the ventral (what) stream of the visual pathway effect our ability to recognize an object but also the way in which an object is presented to us.

Familiarity

A mechanism that is context free in the sense that what we recognize just feels familiar rather than spending time trying to find in what context we know this object. The ventro-lateral region of the frontal lobe is involved in memory encoding during incidental learning and then later maintaining and retrieving semantic memories.
Familiarity can induce perceptual processes different to those of unfamiliar objects which means that our perception of a finite amount of familiar objects is unique. Deviations from typical viewpoints and contexts can affect the efficiency for which an object is recognized most effectively. It was found that not only are familiar objects recognized more effciently when viewed from a familiar viewpoint opposed to an unfamiliar one, but also this principle applies to novel objects. This deduces to the thought that representations of objects in
our brain are organized in more of a familiar fashion of the objects observed in the environment. Recognition is not only largely driven by object shape and/or views but also by dynamic information. Familiarity can benefit the perception of dynamic point-light displays, moving objects, the sex of faces, and face recognition.

Recollection

Recollection shares many similarities with familiarity; however it is context dependent, requiring specific information from the inquired incident.

Affects of lesions in the ventral stream

Object recognition is a complex task and involves several different areas of the brain - not just one. If one area is damaged then object recognition can be impaired. The main area for object recognition takes place in the temporal lobe

Temporal lobe

The temporal lobe is a region of the cerebral cortex that is located beneath the Sylvian fissure on both cerebral hemispheres of the mammalian brain....

. For example it was found that lesions to the perirhinal cortex

Perirhinal cortex

Perirhinal cortex is a cortical region in the medial temporal lobe that is made up of Brodmann areas 35 and 36. In rats, it is located along and dorsal to the rhinal sulcus. It receives highly-processed sensory information from all sensory regions, and is generally accepted to be an important...

in rats causes impairments in object recognition especially with an increase in feature ambiguity. Neonatal aspiration lesions of the amygdaloid complex in monkeys appear to have resulted in a greater object memory loss than early hippocampal lesions. However, in adult monkeys, the object memory impairment is better accounted for by damage to the perirhinal and entorhinal cortex

Entorhinal cortex

The entorhinal cortex is located in the medial temporal lobe and functions as a hub in a widespread network for memory and navigation. The EC is the main interface between the hippocampus and neocortex...

than by damage to the amygdaloid nuclei Combined amygdalohippocampal (A + H) lesions in rats impaired performance on an object recognition task when the retention intervals were increased beyond 0s and when test stimuli were repeated within a session. Damage to the amygdala

Amygdala

The ' are almond-shaped groups of nuclei located deep within the medial temporal lobes of the brain in complex vertebrates, including humans. Shown in research to perform a primary role in the processing and memory of emotional reactions, the amygdalae are considered part of the limbic system.-...

or hippocampus

Hippocampus

The hippocampus is a major component of the brains of humans and other vertebrates. It belongs to the limbic system and plays important roles in the consolidation of information from short-term memory to long-term memory and spatial navigation. Humans and other mammals have two hippocampi, one in...

does not affect object recognition, whereas A + H damage produces clear deficits. In an object recognition task, the level of discrimination was significantly lower in the electrolytic lesions of globus pallidus (part of the basal ganglia

Basal ganglia

The basal ganglia are a group of nuclei of varied origin in the brains of vertebrates that act as a cohesive functional unit. They are situated at the base of the forebrain and are strongly connected with the cerebral cortex, thalamus and other brain areas...

) in rats compared to the Substantia- Innominata/Ventral Pallidum which was in turn worse compared to Control and Medial Septum/Vertical Diagonal Band of Broca groups; however, only globus pallidus did not discriminate between new and familiar objects. These lesions damage the ventral (what) pathway of the visual processing of objects in the brain.

Visual Agnosias

Agnosia

Agnosia is a loss of ability to recognize objects, persons, sounds, shapes, or smells while the specific sense is not defective nor is there any significant memory loss...

is a rare occurrence and can be the result of a stroke, dementia, head injury, brain infection, or hereditary.
Apperceptive agnosia

Apperceptive agnosia

Apperceptive Agnosia is the visual disorder that renders a person unable to recognize objects. It is also known as visual space agnosia. Distinction between shapes is difficult, although other aspects of vision, such as ability to see detail and colour, remain intact. Recognition of, copying and...

is a deficit in object perception creating an inability to understand the significance of objects.
Similarly Associative agnosia

Associative agnosia

People with associative agnosia fail in assigning meaning to an object, animal or building that they can see clearly. Most cases have injury to the occipital and temporal lobes and the critical site of injury appears to be in the left occipital-temporal region, often with involvement of the...

is the inability to understand the significance of objects; however, this time the deficit is in semantic memory. Both of these agnosias can effect the pathway to object recognition, like Marr's Theory of Vision. More specifically unlike apperceptive agnosia, associative agnosic patients are more successful at drawing, copying, and matching taks; however, these patients demonstrate that they can perceive but not recognize.
Integrative agnosia

Integrative agnosia

Integrative agnosia, as first defined by Riddoch and Humphreys , is the disability to recognize objects due to the inability to group and integrate the component parts of the object into a coherent whole...

(a subtype of associative agnosia)is the inability to integrate separate parts to form a whole image. With these types of agnosias there is damage to the ventral (what) stream of the visual processing pathway.
Object orientation agnosia is the inability to extract the orientation of an object despite adequate object recognition. With this type of agnosia there is damage to the dorsal (where) stream of the visual processing pathway.
This can effect object recognition in terms of familiarity and even more so in unfamiliar objects and viewpoints.
A difficulty in recognizing faces can be explained by prosopagnosia

Prosopagnosia

Prosopagnosia is a disorder of face perception where the ability to recognize faces is impaired, while the ability to recognize other objects may be relatively intact...

. Someone with prosopagnosia cannot identify the face but is still able to perceive age, gender, and emotional expression. The brain region that specifies in facial recognition

Face perception

Face perception is the process by which the brain and mind understand and interpret the face, particularly the human face.The human face's proportions and expressions are important to identify origin, emotional tendencies, health qualities, and some social information. From birth, faces are...

is the fusiform face area

Fusiform face area

. Prosopagnosia can also be divided into apperceptive and associative subtypes. Recognition of individual chairs, cars, animals can also be impaired; therefore, these object share similar perceptual features with the face that are recognized in the fusiform face area.

Alzheimer's Disease

The distinction between category and attribute in semantic representation may inform our ability to assess semantic function in aging and disease states affecting semantic memory, such as Alzheimer’s disease (AD). Because of semantic memory deficits, persons suffering from Alzheimer's disease have difficulties recognizing objects as the semantic memory

Semantic memory

Semantic memory refers to the memory of meanings, understandings, and other concept-based knowledge unrelated to specific experiences. The conscious recollection of factual information and general knowledge about the world is generally thought to be independent of context and personal relevance...

is known to be used to retrieve information for naming and categorizing objects. In fact, it is highly debated whether the semantic memory deficit in AD reflects the loss of semantic knowledge for particular categories and concepts or the loss of knowledge of perceptual features and attributes.

Basic Stages of Object Recognition

Hierarchical Recognition Processing

Object Constancy and Theories of Object Recognition

Viewpoint-Invariant Theories

3-D Model Representation

Recognition by Components

Viewpoint-Dependent Theories

Multiple Views Theory

Neural Substrates

The Dorsal and Ventral Stream

Functional Specialization in the Ventral Stream

Structural Processing: The Lateral Occipital Complex

Semantic Processing

Recognition Memory

Familiarity

Recollection

Affects of lesions in the ventral stream

Visual Agnosias

Alzheimer's Disease

See also