Motor theory of speech perception
Encyclopedia
The motor theory of speech perception is the hypothesis that people perceive
Speech perception
Speech perception is the process by which the sounds of language are heard, interpreted and understood. The study of speech perception is closely linked to the fields of phonetics and phonology in linguistics and cognitive psychology and perception in psychology...

 spoken words
Speech
Speech is the human faculty of speaking.It may also refer to:* Public speaking, the process of speaking to a group of people* Manner of articulation, how the body parts involved in making speech are manipulated...

 by identifying the vocal tract
Vocal tract
The vocal tract is the cavity in human beings and in animals where sound that is produced at the sound source is filtered....

 gestures with which they are pronounced rather than by identifying the sound
Sound
Sound is a mechanical wave that is an oscillation of pressure transmitted through a solid, liquid, or gas, composed of frequencies within the range of hearing and of a level sufficiently strong to be heard, or the sensation stimulated in organs of hearing by such vibrations.-Propagation of...

 patterns that speech generates. It originally claimed that speech perception is done through a specialized module
Modularity of mind
Modularity of mind is the notion that a mind may, at least in part, be composed of separate innate structures which have established evolutionarily developed functional purposes...

 that is innate
Nature (innate)
Nature is innate behavior , character or essence, especially of a human. This is a way of using the word nature which goes back to its earliest forms in Greek...

 and human-specific. Though the idea of a module has been qualified in more recent versions of the theory, the idea remains that the role of the speech motor system
Motor system
The motor system is the part of the central nervous system that is involved with movement. It consists of the pyramidal and extrapyramidal system....

 is not only to produce speech articulations but also to detect them.

The hypothesis has gained more interest outside the field of speech perception
Speech perception
Speech perception is the process by which the sounds of language are heard, interpreted and understood. The study of speech perception is closely linked to the fields of phonetics and phonology in linguistics and cognitive psychology and perception in psychology...

 than inside. This has increased particularly since the discovery of mirror neuron
Mirror neuron
A mirror neuron is a neuron that fires both when an animal acts and when the animal observes the same action performed by another. Thus, the neuron "mirrors" the behaviour of the other, as though the observer were itself acting. Such neurons have been directly observed in primate and other...

s that link the production and perception of motor movements, including those made by the vocal tract. An alternative interpretation of research linking speech perception to speech production, however, is that it links to speech imitation rather than speech perception.

The theory was initially proposed in the Haskins Laboratories
Haskins Laboratories
Haskins Laboratories is an independent, international, multidisciplinary community of researchers conducting basic research on spoken and written language. Founded in 1935 and located in New Haven, Connecticut since 1970, Haskins Laboratories is a private, non-profit research institute with a...

 in the 1950s by Alvin Liberman
Alvin Liberman
Alvin Meyer Liberman was an American psychologist whose ideas set the agenda for fifty years of research in the psychology of speech perception and laid the groundwork for modern computer speech synthesis and the understanding of critical issues in cognitive science...

 and Franklin S. Cooper
Franklin S. Cooper
Franklin Seaney Cooper was an American physicist and inventor who was a pioneer in speech research.-Biography:...

, and developed further by Donald Shankweiler
Donald Shankweiler
Donald P. Shankweiler is an eminent psychologist and cognitive scientist who has done pioneering work on the representation and processing of language in the brain...

, Michael Studdert-Kennedy
Michael Studdert-Kennedy
Michael Studdert-Kennedy is an eminent psychologist and speech scientist. He is well known for his contributions to studies of speech perception, the motor theory of speech perception, and the evolution of language, among other areas. He is a Professor Emeritus of Psychology at the University of...

, Ignatius Mattingly
Ignatius Mattingly
Ignatius G. Mattingly was a prominent American linguist and speech scientist. Prior to his academic career, he was an analyst for the National Security Agency from 1955-1966. He was a Lecturer and then Professor of Linguistics at the University of Connecticut from 1966-1996 and a researcher at...

, Carol Fowler
Carol Fowler
Carol A. Fowler is an American experimental psychologist. She was a former President and Director of Research at Haskins Laboratories in New Haven, Connecticut from 1992 to 2008. She is also a Professor of Psychology at the University of Connecticut and an Adjunct Professor of Linguistics and...

 and Douglas Whalen
Douglas Whalen
Douglas H. Whalen is an American linguist who is presently a program officer at the National Science Foundation where he is affiliated with the Cognitive Neuroscience, Documenting Endangered Languages, and Linguistics programs...

.

Origins and development

The hypothesis has its origins in research using pattern playback
Pattern playback
The Pattern playback is an early talking device that was built by Dr. Franklin S. Cooper and his colleagues, including John M. Borst and Caryl Haskins, at Haskins Laboratories in the late 1940s and completed in 1950. There were several different versions of this hardware device. Only one currently...

 to create reading machines for the blind
Blindness
Blindness is the condition of lacking visual perception due to physiological or neurological factors.Various scales have been developed to describe the extent of vision loss and define blindness...

 that would substitute sounds for orthographic letters. This led to a close examination of how spoken sounds correspond to the acoustic spectrogram
Spectrogram
A spectrogram is a time-varying spectral representation that shows how the spectral density of a signal varies with time. Also known as spectral waterfalls, sonograms, voiceprints, or voicegrams, spectrograms are used to identify phonetic sounds, to analyse the cries of animals; they were also...

 of them as a sequence of auditory sounds. This found that successive consonant
Consonant
In articulatory phonetics, a consonant is a speech sound that is articulated with complete or partial closure of the vocal tract. Examples are , pronounced with the lips; , pronounced with the front of the tongue; , pronounced with the back of the tongue; , pronounced in the throat; and ,...

s and vowel
Vowel
In phonetics, a vowel is a sound in spoken language, such as English ah! or oh! , pronounced with an open vocal tract so that there is no build-up of air pressure at any point above the glottis. This contrasts with consonants, such as English sh! , where there is a constriction or closure at some...

s overlap in time with one another (a phenomenon known as coarticulation
Coarticulation
Coarticulation in its general sense refers to a situation in which a conceptually isolated speech sound is influenced by, and becomes more like, a preceding or following speech sound...

). This suggested that speech is not heard like an acoustic "alphabet" or "cipher," but as a "code" of overlapping speech gestures.

Associationist approach

Initially, the theory was associationist: infants mimic the speech they hear and that this leads to behavioristic
Behaviorism
Behaviorism , also called the learning perspective , is a philosophy of psychology based on the proposition that all things that organisms do—including acting, thinking, and feeling—can and should be regarded as behaviors, and that psychological disorders are best treated by altering behavior...

 associations between articulation and its sensory consequences. Later, this overt mimicry would be short-circuited and become speech perception. This aspect of the theory was dropped, however, with the discovery that prelinguistic infant
Infant
A newborn or baby is the very young offspring of a human or other mammal. A newborn is an infant who is within hours, days, or up to a few weeks from birth. In medical contexts, newborn or neonate refers to an infant in the first 28 days after birth...

s could already detect most of the phonetic contrasts used to separate different speech sounds.

Cognitivist approach

The behavioristic approach was replaced by a cognitivist
Cognitivism (psychology)
In psychology, cognitivism is a theoretical framework for understanding the mind that came into usage in the 1950s. The movement was a response to behaviorism, which cognitivists said neglected to explain cognition...

 one in which there was a speech module
Language module
Language module refers to a hypothesized structure in the human brain or cognitive system that some psycholinguists claim contains innate capacities for language...

. The module detected speech in terms of hidden distal objects rather than at the proximal or immediate level of their input. The evidence for this was the research finding that speech processing was special such as duplex perception
Duplex perception
Duplex perception refers to the linguistic phenomenon whereby "part of the acoustic signal is used for both a speech and a nonspeech percept." A listener is presented with two simultaneous, dichotic stimuli. One ear receives an isolated third-formant transition that sounds like a nonspeech chirp....

.

Changing distal objects

Initially, speech perception was assumed to link to speech objects that were both
  • the invariant movements of speech articulators
  • the invariant motor commands sent to muscles to move the vocal tract articulators

This was later revised to include the phonetic gestures rather than motor commands, and then the gestures intended by the speaker at a prevocal, linguistic level, rather than actual movements.

Modern revision

The "speech is special" claim has been dropped, as it was found that speech perception could occur for nonspeech sounds (for example, slamming doors for duplex perception
Duplex perception
Duplex perception refers to the linguistic phenomenon whereby "part of the acoustic signal is used for both a speech and a nonspeech percept." A listener is presented with two simultaneous, dichotic stimuli. One ear receives an isolated third-formant transition that sounds like a nonspeech chirp....

).

Mirror neurons

The discovery of mirror neuron
Mirror neuron
A mirror neuron is a neuron that fires both when an animal acts and when the animal observes the same action performed by another. Thus, the neuron "mirrors" the behaviour of the other, as though the observer were itself acting. Such neurons have been directly observed in primate and other...

s has led to renewed interest in the motor theory of speech perception, and the theory still has its advocates, although there are also critics.

Nonauditory gesture information

If speech is identified in terms of how it is physically made, then nonauditory information should be incorporated into speech percepts even if it is still subjectively
Subjectivity
Subjectivity refers to the subject and his or her perspective, feelings, beliefs, and desires. In philosophy, the term is usually contrasted with objectivity.-Qualia:...

 heard as "sounds". This is, in fact, the case.
  • The McGurk effect
    McGurk effect
    The McGurk effect is a perceptual phenomenon which demonstrates an interaction between hearing and vision in speech perception. "It is a compelling illusion in which humans perceive mismatched audiovisual speech as a completely different syllable". The visual information a person gets from seeing a...

     shows that seeing the production of a spoken syllable
    Syllable
    A syllable is a unit of organization for a sequence of speech sounds. For example, the word water is composed of two syllables: wa and ter. A syllable is typically made up of a syllable nucleus with optional initial and final margins .Syllables are often considered the phonological "building...

     that differs from one an auditory one synchronized with it affects the perception of the auditory one. In other words, if someone hears "ba" but sees a video of someone pronouncing "ga", what they hear is different—some people believe they hear "da".
  • People find it easier to hear speech in noise
    Noise
    In common use, the word noise means any unwanted sound. In both analog and digital electronics, noise is random unwanted perturbation to a wanted signal; it is called noise as a generalisation of the acoustic noise heard when listening to a weak radio transmission with significant electrical noise...

     if they can see the speaker.
  • People can hear syllables better when their production can be felt haptically
    Haptic communication
    Haptic communication is the means by which people and other animals communicate via touching. Touch, or the haptic sense, is extremely important for humans; as well as providing information about surfaces and textures it is a component of nonverbal communication in interpersonal relationships, and...

    .

Categorical perception

Using a speech synthesizer, speech sounds can be varied in place of articulation
Place of articulation
In articulatory phonetics, the place of articulation of a consonant is the point of contact where an obstruction occurs in the vocal tract between an articulatory gesture, an active articulator , and a passive location...

 along a continuum from /bɑ/ to /dɑ/ to /ɡɑ/, or in voice onset time
Voice onset time
In phonetics, voice onset time, commonly abbreviated VOT, is a feature of the production of stop consonants. It is defined as the length of time that passes between when a stop consonant is released and when voicing, the vibration of the vocal folds, or, according to the authors, periodicity begins...

 on a continuum from /dɑ/ to /tɑ/ (for example). When listeners are asked to discriminate between two different sounds, they perceive sounds as belonging to discrete categories, even though the sounds vary continuously. In other words, 10 sounds (with the sound on one extreme being /dɑ/ and the sound on the other extreme being /tɑ/, and the ones in the middle varying on a scale) may all be acoustically different from one another, but the listener will hear all of them as either /dɑ/ or /tɑ/. Likewise, the English consonant /d/ may vary in its acoustic details across different phonetic contexts (the /d/ in /du/ does not technically sound the same as the one in /di/, for example), but all /d/'s as perceived by a listener fall within one category (voiced alveolar stop) and that is because "linguistic representations are abstract, canonical, phonetic segments or the gestures that underlie these segments." This suggests that humans identify speech using categorical perception
Categorical perception
Categorical perception is the experience of percept invariances in sensory phenomena that can be varied along a continuum. Multiple views of a face, for example, are mapped onto a common identity, visually distinct objects such as cars are mapped into the same category and distinct speech tokens...

, and thus that a specialized module, such as that proposed by the motor theory of speech perception, may be on the right track.

Speech imitation

If people can hear the gestures in speech, then the imitation of speech should be very fast, as in when words are repeated that are heard in headphones as in speech shadowing
Speech shadowing
Speech shadowing is an experimental technique in which subjects repeat speech immediately after hearing it . The reaction time between hearing a word and pronouncing it can be as short as 254 ms or even 150 ms. This is only the delay duration of a speech syllable...

. People can repeat heard syllables more quickly than they would be able to produce them normally.

Speech production

  • Hearing speech actives vocal tract muscles, and the motor cortex
    Motor cortex
    Motor cortex is a term that describes regions of the cerebral cortex involved in the planning, control, and execution of voluntary motor functions.-Anatomy of the motor cortex :The motor cortex can be divided into four main parts:...

     and premotor cortex
    Premotor cortex
    The premotor cortex is an area of motor cortex lying within the frontal lobe of the brain. It extends 3 mm anterior to the primary motor cortex, near the Sylvian fissure, before narrowing to approximately 1 mm near the medial longitudinal fissure, which serves as the posterior border for...

    . The integration of auditory and visual input in speech perception also involves such areas.
  • Disrupting the premotor cortex disrupts the perception of speech units such as stop consonants.
  • The activation of the motor areas occurs in terms of the phonemic features which link with the vocal track articulators that create speech gestures.
  • The perception of a speech sound is aided by pre-emptively stimulating the motor representation of the articulators responsible for its pronunciation .

Perception-action meshing

Evidence exists that perception and production are generally coupled in the motor system. This is supported by the existence of mirror neuron
Mirror neuron
A mirror neuron is a neuron that fires both when an animal acts and when the animal observes the same action performed by another. Thus, the neuron "mirrors" the behaviour of the other, as though the observer were itself acting. Such neurons have been directly observed in primate and other...

s that are activated both by seeing (or hearing) an action and when that action is carried out. Another source of evidence is that for common coding theory
Common coding theory
Common coding theory is a cognitive psychology theory describing how perceptual representations and motor representations are linked. The theory claims that there is a shared representation for both perception and action...

 between the representations used for perception and action.

Criticisms

The motor theory of speech perception is not widely held in the field of speech perception, though it is more popular in other fields, such as theoretical linguistics
Theoretical linguistics
Theoretical linguistics is the branch of linguistics that is most concerned with developing models of linguistic knowledge. The fields that are generally considered the core of theoretical linguistics are syntax, phonology, morphology, and semantics...

. As three of its advocates have noted, "it has few proponents within the field of speech perception, and many authors cite it primarily to offer critical commentary".p. 361 Several critiques of it exist.

Multiple sources

Speech perception is affected by nonproduction sources of information, such as context. Individual words are hard to understand in isolation but easy when heard in sentence context. It therefore seems that speech perception uses multiple sources that are integrated together in an optimal way.

Production

The motor theory of speech perception would predict that speech motor abilities in infants predict their speech perception abilities, but in actuality it is the other way around. It would also predict that defects in speech production would impair speech perception, but they do not. However, this only affects the first and already superseded behaviorist version of the theory, where infants were supposed to learn all production-perception patterns by imitation early in childhood. This is no longer the mainstream view of motor-speech theorists.

Speech module

Several sources of evidence for a specialized speech module have failed to be supported.
  • Duplex perception
    Duplex perception
    Duplex perception refers to the linguistic phenomenon whereby "part of the acoustic signal is used for both a speech and a nonspeech percept." A listener is presented with two simultaneous, dichotic stimuli. One ear receives an isolated third-formant transition that sounds like a nonspeech chirp....

     can be observed with door slams.
  • The McGurk effect
    McGurk effect
    The McGurk effect is a perceptual phenomenon which demonstrates an interaction between hearing and vision in speech perception. "It is a compelling illusion in which humans perceive mismatched audiovisual speech as a completely different syllable". The visual information a person gets from seeing a...

     can also be achieved with nonlinguistic stimuli, such as showing someone a video of a basketball bouncing but playing the sound of a ping-pong ball bouncing.
  • As for categorical perception
    Categorical perception
    Categorical perception is the experience of percept invariances in sensory phenomena that can be varied along a continuum. Multiple views of a face, for example, are mapped onto a common identity, visually distinct objects such as cars are mapped into the same category and distinct speech tokens...

    , listeners can be sensitive to acoustic differences within single phonetic categories.

As a result, this part of the theory has been dropped by some researchers.

Sublexical tasks

The evidence provided for the motor theory of speech perception is limited to tasks such as syllable discrimination that use speech units not full spoken words or spoken sentences. As a result, "speech perception is sometimes interpreted as referring to the perception of speech at the sublexical level. However, the ultimate goal of these studies is presumably to understand the neural processes supporting the ability to process speech sounds under ecologically valid conditions, that is, situations in which successful speech sound processing ultimately leads to contact with the mental lexicon and auditory comprehension." This however creates the problem of " a tenuous connection to their implicitt target of investigation, speech recognition".

Imitation

The motor theory of speech perception faces the problem that the research linking speech perception to speech production is also consistent with the brain processing speech to imitate spoken word
Lexical item
A Lexical item is a single word or chain of words that forms the basic elements of a language's lexicon . Examples are "cat", "traffic light", "take care of", "by-the-way", and "it's raining cats and dogs"...

s. The brain must have a means to do this if language is to exist, since a child's vocabulary
Vocabulary
A person's vocabulary is the set of words within a language that are familiar to that person. A vocabulary usually develops with age, and serves as a useful and fundamental tool for communication and acquiring knowledge...

 expansion requires a means to learn novel spoken words, as does an adult's picking up of new names. Imitation has to be initiated for all vocalizations since a word's novelty cannot be known until after it is heard, and so after when the information needed to identify its articulation gestures and motor goal
Motor goal
A motor goal is a neurally planned motor outcome that is used to organize motor control.Motor goals are experimentally shown to exist since planned movements can when disrupted adjust to achieve their planned outcome...

s has gone. As result vocal imitation needs to be initiated by default into short term memory for every heard spoken vocalizations. If speech perception uses multiple sources of information, this default imitation processing would provide as a secondary use an extra source for word perception. Since imitation will be most needed for vocalizations that are not proper words, this could explain why sublexical tasks that do not use proper words so strongly link to processing of motor gestures.

Birds

It has been suggested that bird
Bird
Birds are feathered, winged, bipedal, endothermic , egg-laying, vertebrate animals. Around 10,000 living species and 188 families makes them the most speciose class of tetrapod vertebrates. They inhabit ecosystems across the globe, from the Arctic to the Antarctic. Extant birds range in size from...

s also hear each others' bird song
Bird song
Bird vocalization includes both bird calls and bird songs. In non-technical use, bird songs are the bird sounds that are melodious to the human ear. In ornithology and birding, songs are distinguished by function from calls.-Definition:The distinction between songs and calls is based upon...

 in terms of vocal gestures.

See also

  • Cohort model
    Cohort model
    The cohort model in psycholinguistics and neurolinguistics is a model of lexical retrieval first proposed by William Marslen-Wilson in the late 1980s. It attempts to describe how visual or auditory input is mapped onto a word in a hearer's lexicon...

  • Speech recognition
    Speech recognition
    Speech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software...

  • Auditory phonetics
    Auditory phonetics
    Auditory phonetics is a branch of phonetics concerned with the hearing of speech sounds and with speech perception.-See also:* Acoustic phonetics* Auditory illusion* Auditory processing disorder* Hearing * Motor theory of speech perception...

  • Trace (psycholinguistics)
    Trace (psycholinguistics)
    TRACE is a connectionist model of speech perception, proposed by James McClelland and Jeffrey Elman in 1986. TRACE was made into a working computer program for running perceptual simulations...

  • Haskins Laboratories
    Haskins Laboratories
    Haskins Laboratories is an independent, international, multidisciplinary community of researchers conducting basic research on spoken and written language. Founded in 1935 and located in New Haven, Connecticut since 1970, Haskins Laboratories is a private, non-profit research institute with a...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK