All Topics  
Linear predictive coding

 

   Email Print
   Bookmark   Link






 

Linear predictive coding



 
 
Linear predictive coding (LPC) is a tool used mostly in audio signal processing
Audio signal processing

Audio signal processing, sometimes referred to as audio processing, is the intentional alteration of sound Signal , or sound. As audio signals may be electronically represented in either digital or analog signal format, signal processing may occur in either domain....
 and speech processing
Speech processing

Speech processing is the study of Speech communication Signal_ and the processing methods of these signals.The signals are usually processed in a digital representation whereby speech processing can be seen as the intersection of digital signal processing and natural language processing....
 for representing the spectral envelope
Spectral envelope

In remote sensing using a spectrometer, the spectral envelope of a feature is the boundary of its electromagnetic spectrum properties, as defined by the range of brightness levels in each of the spectral bands of interest....
 of a digital
Digital

A digital system uses discrete values, usually but not always symbolized numerically to represent information for input, processing, transmission, storage, etc....
 signal of speech in compressed
Data compression

In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits than an code representation would use through use of specific encoding schemes....
 form, using the information of a linear predictive
Linear prediction

Linear prediction is a mathematical operation where future values of a discrete time Signal processing are estimated as a linear transformation of previous samples....
 model. It is one of the most powerful speech analysis techniques, and one of the most useful methods for encoding good quality speech at a low bit rate and provides extremely accurate estimates of speech parameters.

Overview
LPC starts with the assumption that a speech signal is produced by a buzzer at the end of a tube (voiced sounds), with occasional added hissing and popping sounds (sibilants and plosive sounds).






Discussion
Ask a question about 'Linear predictive coding'
Start a new discussion about 'Linear predictive coding'
Answer questions from other users
Full Discussion Forum



Encyclopedia


Linear predictive coding (LPC) is a tool used mostly in audio signal processing
Audio signal processing

Audio signal processing, sometimes referred to as audio processing, is the intentional alteration of sound Signal , or sound. As audio signals may be electronically represented in either digital or analog signal format, signal processing may occur in either domain....
 and speech processing
Speech processing

Speech processing is the study of Speech communication Signal_ and the processing methods of these signals.The signals are usually processed in a digital representation whereby speech processing can be seen as the intersection of digital signal processing and natural language processing....
 for representing the spectral envelope
Spectral envelope

In remote sensing using a spectrometer, the spectral envelope of a feature is the boundary of its electromagnetic spectrum properties, as defined by the range of brightness levels in each of the spectral bands of interest....
 of a digital
Digital

A digital system uses discrete values, usually but not always symbolized numerically to represent information for input, processing, transmission, storage, etc....
 signal of speech in compressed
Data compression

In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits than an code representation would use through use of specific encoding schemes....
 form, using the information of a linear predictive
Linear prediction

Linear prediction is a mathematical operation where future values of a discrete time Signal processing are estimated as a linear transformation of previous samples....
 model. It is one of the most powerful speech analysis techniques, and one of the most useful methods for encoding good quality speech at a low bit rate and provides extremely accurate estimates of speech parameters.

Overview


LPC starts with the assumption that a speech signal is produced by a buzzer at the end of a tube (voiced sounds), with occasional added hissing and popping sounds (sibilants and plosive sounds). Although apparently crude, this model is actually a close approximation to the reality of speech production. The glottis
Glottis

The glottis defined as the combination of the vocal folds and the space in between the folds ....
 (the space between the vocal folds) produces the buzz, which is characterized by its intensity (loudness) and frequency (pitch). The vocal tract (the throat and mouth) forms the tube, which is characterized by its resonances, which give rise to formants, or enhanced frequency bands in the sound produced. Hisses and pops are generated by the action of the tongue, lips and throat during sibilants and plosives.

LPC analyzes the speech signal by estimating the formants, removing their effects from the speech signal, and estimating the intensity and frequency of the remaining buzz. The process of removing the formants is called inverse filtering, and the remaining signal after the subtraction of the filtered modeled signal is called the residue.

The numbers which describe the intensity and frequency of the buzz, the formants, and the residue signal, can be stored or transmitted somewhere else. LPC synthesizes the speech signal by reversing the process: use the buzz parameters and the residue to create a source signal, use the formants to create a filter (which represents the tube), and runs the source through the filter, resulting in speech.

Because speech signals vary with time, this process is done on short chunks of the speech signal, which are called frames; generally 30 to 50 frames per second give intelligible speech with good compression.

Early history of LPC

According to Robert M. Gray
Robert M. Gray

Robert M. Gray is a famous American information theorist, and the Alcatel-Lucent Professor of Electrical Engineering at Stanford University in Palo Alto, California....
 of Stanford University
Stanford University

Leland Stanford Junior University, commonly referred to as Stanford University or Stanford, is a private university research university located in Stanford, California, California, United States....
, the first ideas leading to LPC started in 1966 when S. Saito and F. Itakura of NTT
Nippon Telegraph and Telephone

, commonly known as NTT, is a telephone company that dominates the telecommunication market in Japan. Ranked the 54th in Fortune Global 500, NTT is the largest telecommunications company in Asia, and the third-largest in the world in terms of revenue....
 described an approach to automatic phoneme discrimination that involved the first maximum likelihood
Maximum likelihood

Maximum likelihood estimation is a popular statistics method used for fitting a mathematical model to data. The modeling of real world data using estimation by maximum likelihood offers a way of tuning the free parameters of the model to provide a good fit....
 approach to speech coding. In 1967, John Burg outlined the maximum entropy approach. In 1969 Itakura and Saito introduced partial correlation
Partial correlation

In probability theory and statistics, partial correlation measures the degree of Association between two random variables, with the effect of a set of controlling random variables removed....
, May Glen Culler proposed realtime speech encoding, and B. S. Atal presented an LPC speech coder at the Annual Meeting of the Acoustical Society of America
Acoustical Society of America

The Acoustical Society of America is an international scientific society dedicated to increasing and diffusing the knowledge of acoustics and its practical applications....
. In 1971 realtime LPC using 16-bit LPC hardware was demonstrated by Philco-Ford; four units were sold.

In 1972 Bob Kahn
Bob Kahn

Robert Elliot Kahn, invented the Transmission Control Protocol , and along with Vinton G. Cerf created the Internet Protocol , the technologies used to transmit information on the Internet....
 of ARPA
Defense Advanced Research Projects Agency

The Defense Advanced Research Projects Agency is an government agency of the United States Department of Defense responsible for the development of new technology for use by the military of the United States....
, with Jim Forgie (Lincoln Laboratory
Lincoln Laboratory

MIT Lincoln Laboratory, also known as Lincoln Lab, is a federally funded research and development center managed by the Massachusetts Institute of Technology and primarily funded by the United States Department of Defense....
, LL) and Dave Walden (BBN Technologies
BBN Technologies

BBN Technologies is a high-technology company which provides research and development services. BBN is based next to Fresh Pond, Cambridge, Massachusetts in Cambridge, Massachusetts, Massachusetts, United States....
), started the first developments in packetized speech, which would eventually lead to Voice over IP
Voice over IP

Voice over Internet Protocol is a general term for a family of transmission technologies for delivery of voice communications over Internet Protocol networks such as the Internet or other packet-switched Computer network....
 technology. In 1973, according to Lincoln Laboratory informal history, the first realtime 2400 bit/s LPC was implemented by Ed Hofstetter. In 1974 the first realtime two-way LPC packet speech communication was accomplished over the ARPANET at 3500 bit/s between Culler-Harrison and Lincoln Laboratories. In 1976 the first LPC conference took place over the ARPANET using the Network Voice Protocol
Network Voice Protocol

The Network Voice Protocol was a pioneering computer network protocol for transporting human Speech communication over packet ized communications networks....
, between Culler-Harrison, ISI, SRI, and LL at 3500 bit/s. And finally in 1978, Vishwanath et al. of BBN developed the first variable-rate
Variable bitrate

Variable bitrate is a term used in telecommunications and computing that relates to the bitrate used in sound or video encoding. As opposed to constant bitrate , VBR files vary the amount of output data per time segment....
 LPC algorithm.

LPC coefficient representations

LPC is frequently used for transmitting spectral envelope information, and as such it has to be tolerant of transmission errors. Transmission of the filter coefficients directly (see linear prediction
Linear prediction

Linear prediction is a mathematical operation where future values of a discrete time Signal processing are estimated as a linear transformation of previous samples....
 for definition of coefficients) is undesirable, since they are very sensitive to errors. In other words, a very small error can distort the whole spectrum, or worse, a small error might make the prediction filter unstable.

There are more advanced representations such as Log Area Ratios
Log Area Ratios

Log Area Ratios can be used to represent Reflection coefficient for transmission over a channel. While not as efficient as Line spectral pairs , Log Area Ratios are much simpler to compute....
 (LAR), line spectral pairs
Line spectral pairs

Line Spectral Pairs or Line Spectral Frequencies are used to represent Linear predictive coding for transmission over a channel. LSPs have several properties that make them superior to direct quantisation of LPCs....
 (LSP) decomposition and reflection coefficients
Levinson recursion

Levinson recursion or Levinson-Durbin recursion is a procedure in linear algebra to recursion calculate the solution to an equation involving a Toeplitz matrix....
. Of these, especially LSP decomposition has gained popularity, since it ensures stability of the predictor, and spectral errors are local for small coefficient deviations.

Applications

LPC is generally used for speech analysis and resynthesis. It is used as a form of voice compression by phone companies, for example in the GSM standard. It is also used for secure
COMSEC

The compound word COMSEC is prevalent in the United States Department of Defense culture with hundreds of secondary and tertiary words. Historically, it is originated from Communications security; however, in the 21st century, the compound word is used without regards to its origin in thousands of pages of manuals and documents and by millio...
 wireless, where voice must be digitized, encrypted
Encryption

In cryptography, encryption is the process of transforming information using an algorithm to make it unreadable to anyone except those possessing special knowledge, usually referred to as a key ....
 and sent over a narrow voice channel, an early example of this is the US government's Navajo I
Navajo I

The Navajo I is a secure voice built into a briefcase that was developed by the U.S. National Security Agency. According to information on display in 2002 at the NSA's National Cryptologic Museum, 110 units were built in the 1980s for use by senior government officials when traveling....
.

LPC synthesis can be used to construct vocoder
Vocoder

A vocoder, , is an analysis / synthesis system, mostly used for speech in which the input is passed through a multiband filter, each filter is passed through an envelope follower, the control signals from the envelope followers are communicated, and the decoder applies these control signals to corresponding filters in the synthesizer....
s where musical instruments are used as excitation signal to the time-varying filter estimated from a singer's speech. This is somewhat popular in electronic music
Electronic music

Electronic music is music that employs electronic musical instruments and electronic music technology in its production. In general a distinction can be made between sound produced using electromechanical means and that produced using electronic technology....
. Paul Lansky
Paul Lansky

Paul Lansky is an Electronic music or Computer music composer who has been producing works from the 1970s up to the present day ....
 made the well-known computer music piece notjustmoreidlechatter using linear predictive coding. A 10th-order LPC was used in the popular 1980's Speak & Spell educational toy.

Waveform ROM
Read-only memory

Read-only memory is a class of computer storage media used in computers and other electronic devices. Because data stored in ROM cannot be modified , it is mainly used to distribute firmware ....
 in digital sample-based
Sample-based synthesis

Sample-based synthesis is a form of audio synthesis that can be contrasted to either subtractive synthesis or additive synthesis. The principal difference with sample-based synthesis is that the seed waveforms are sample d sounds or instruments instead of fundamental waveforms such as the saw waves of subtractive synthesis or the sine of add...
 music synthesizer
Synthesizer

A synthesizer is an electronic instrument capable of producing a variety of sounds by generating and combining signals of different frequency....
s made by Yamaha Corporation is compressed using LPC algorithm.

0-to-32nd order LPC predictors are used in FLAC
FLAC

Free Lossless Audio Codec is a file format for lossless data compression audio data compression. During compression, FLAC does not lose quality from the audio stream, as Lossy data compression formats such as MP3, Advanced Audio Coding, and Vorbis do....
 audio codec.

See also

  • Warped Linear Predictive Coding
    Warped Linear Predictive Coding

    Warped Linear Predictive Coding is a variant of Linear predictive coding in which the spectral representation of the system is modified, for example by replacing the unit delays used in an LPC implementation with first-order allpass filters....
  • Akaike information criterion
    Akaike information criterion

    Akaike's information criterion, developed by Hirotsugu Akaike under the name of "an information criterion" in 1971 and proposed in Akaike , is a measure of the goodness of fit of an estimated statistical model....
  • Audio compression
    Audio compression

    Audio compression can mean two things:* Audio data compression - in which the amount of data in a recorded waveform is reduced for transmission....
  • Pitch estimation
  • FS-1015
    FS-1015

    FS-1015 is a secure telephony speech encoding standard developed by the United States Department of Defense and later by NATO. It is also known as LPC-10 and STANAG 4198....
  • FS-1016
    FS-1016

    FS-1016 is a deprecated secure telephone speech encoding standard developed by the United States Department of Defense. The standard was finished 1991....


External links

  • DSP experts.com
  • Dr. Sung-won Park Texas A&M University-Kingsville