Home      Discussion      Topics      Dictionary      Almanac
Signup       Login
Language model

Language model

Overview
A statistical language model assigns a probability
Probability
Probability is a way of expressing knowledge or belief that an event will occur or has occurred. In mathematics the concept has been given an exact meaning in probability theory, that is used extensively in such areas of study as mathematics, statistics, finance, gambling, science, and philosophy...

 to a sequence of m words by means of a probability distribution
Probability distribution
In probability theory and statistics, a probability distribution identifies either the probability of each value of an unidentified random variable , or the probability of the value falling within a particular interval...

.

Language modeling is used in many natural language processing
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages. Natural language generation systems convert information from computer databases into readable human language...

 applications such as speech recognition
Speech recognition
Speech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to speech recognition where the recognition system is trained to a particular speaker - as is the case for most desktop recognition software, hence there is an aspect of speaker recognition,...

, machine translation
Machine translation
Machine translation, sometimes referred to by the abbreviation MT, is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. At its basic level, MT performs simple substitution of words in one natural...

, part-of-speech tagging
Part-of-speech tagging
In corpus linguistics, part-of-speech tagging , also called grammatical tagging or word-category disambiguation, is the process of marking up the words in a text as corresponding to a particular part of speech, based on both its definition, as well as its context —ie...

, parsing
Parsing
In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens , to determine its grammatical structure with respect to a given formal grammar.Parsing is also an earlier term for the diagramming of sentences of...

 and information retrieval
Information retrieval
Information retrieval is the science of searching for documents, for information within documents and for metadata about documents, as well as that of searching relational databases and the World Wide Web...

.

In speech recognition
Speech recognition
Speech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to speech recognition where the recognition system is trained to a particular speaker - as is the case for most desktop recognition software, hence there is an aspect of speaker recognition,...

 and in data compression
Data compression
In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits than an unencoded representation would use, through use of specific encoding schemes.As with any communication, compressed data communication only works when both...

, such a model tries to capture the properties of a language, and to predict the next word in a speech sequence.

When used in information retrieval, a language model is associated with a document
Document
A document is a bounded physical representation of a body of information designed with the capacity to communicate. A document may manifest symbolic, diagrammatic or sensory-representational information. To document is to produce a document artifact by collecting and representing information...

 in a collection.
Discussion
Ask a question about 'Language model'
Start a new discussion about 'Language model'
Answer questions from other users
Full Discussion Forum
 
Encyclopedia
A statistical language model assigns a probability
Probability
Probability is a way of expressing knowledge or belief that an event will occur or has occurred. In mathematics the concept has been given an exact meaning in probability theory, that is used extensively in such areas of study as mathematics, statistics, finance, gambling, science, and philosophy...

 to a sequence of m words by means of a probability distribution
Probability distribution
In probability theory and statistics, a probability distribution identifies either the probability of each value of an unidentified random variable , or the probability of the value falling within a particular interval...

.

Language modeling is used in many natural language processing
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages. Natural language generation systems convert information from computer databases into readable human language...

 applications such as speech recognition
Speech recognition
Speech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to speech recognition where the recognition system is trained to a particular speaker - as is the case for most desktop recognition software, hence there is an aspect of speaker recognition,...

, machine translation
Machine translation
Machine translation, sometimes referred to by the abbreviation MT, is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. At its basic level, MT performs simple substitution of words in one natural...

, part-of-speech tagging
Part-of-speech tagging
In corpus linguistics, part-of-speech tagging , also called grammatical tagging or word-category disambiguation, is the process of marking up the words in a text as corresponding to a particular part of speech, based on both its definition, as well as its context —ie...

, parsing
Parsing
In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens , to determine its grammatical structure with respect to a given formal grammar.Parsing is also an earlier term for the diagramming of sentences of...

 and information retrieval
Information retrieval
Information retrieval is the science of searching for documents, for information within documents and for metadata about documents, as well as that of searching relational databases and the World Wide Web...

.

In speech recognition
Speech recognition
Speech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to speech recognition where the recognition system is trained to a particular speaker - as is the case for most desktop recognition software, hence there is an aspect of speaker recognition,...

 and in data compression
Data compression
In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits than an unencoded representation would use, through use of specific encoding schemes.As with any communication, compressed data communication only works when both...

, such a model tries to capture the properties of a language, and to predict the next word in a speech sequence.

When used in information retrieval, a language model is associated with a document
Document
A document is a bounded physical representation of a body of information designed with the capacity to communicate. A document may manifest symbolic, diagrammatic or sensory-representational information. To document is to produce a document artifact by collecting and representing information...

 in a collection. With query Q as input, retrieved documents are ranked based on the probability that the document's language model would generate the terms of the query, P(Q|Md).

Estimating the probability of sequences can become difficult in corpora, in which phrase
Phrase
In grammar, a phrase is a group of words functioning as a single unit in the syntax of a sentence.For example, the house at the end of the street is a phrase. It acts like a noun. It can further be broken down into two shorter phrases functioning as adjectives: at the end and of the street, a...

s or sentence
Sentence (linguistics)
In linguistics, a sentence is an expression in natural language—a grammatical and lexical unit consisting of one or more words, representing distinct and differentiated concepts, and combined to form a meaningful statement, question, request and command....

s can be arbitrarily long and hence some sequences are not observed during training
Training
The term training refers to the acquisition of knowledge, skills, and competencies as a result of the teaching of vocational or practical skills and knowledge that relate to specific useful competencies. It forms the core of apprenticeships and provides the backbone of content at institutes of...

 of the language model (data sparseness problem of overfitting
Overfitting
In statistics, overfitting occurs when a statistical model describes random error or noise instead of the underlying relationship. Overfitting generally occurs when a model is excessively complex, such as having too many degrees of freedom, in relation to the amount of data available...

). For that reason these models are often approximated using smoothed N-gram
N-gram
An n-gram model is a type of probabilistic model for predicting the next item in a sequence. n-grams are used in various areas of statistical natural language processing and genetic sequence analysis....

 models.

N-gram models


In an n-gram model, the probability of observing the sentence w1,...,wm is approximated as
Here, it is assumed that the probability of observing the ith word wi in the context history of the preceding i-1 words can be approximated by the probability of observing it in the shortened context history of the preceding n-1 words (nth order Markov property
Markov property
In mathematics, the term Markov property or Markov-type property can refer to either of two closely-related things.In the narrowest sense, a stochastic process has the Markov property if the conditional probability distribution of future states of the process, given the present state and a constant...

).

The conditional probability can be calculated from n-gram frequency counts:
The words bigram and trigram language model denote n-gram language models with
n=2 and n=3, respectively.

Example


In a bigram (n=2) language model, the probability of the sentence
I saw the red house is approximated as
whereas in a trigram (n=3) language model, the approximation is
Note, that the context of the first n-1 ngrams is filled start-of-sentence markers, typically denoted .