Word sense induction
Encyclopedia
In computational linguistics
Computational linguistics
Computational linguistics is an interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective....

, word-sense induction (WSI) or discrimination is an open problem
Open problem
In science and mathematics, an open problem or an open question is a known problem that can be accurately stated, and has not yet been solved . Some questions remain unanswered for centuries before solutions are found...

 of natural language processing
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....

, which concerns the automatic identification of the senses
Word sense
In linguistics, a word sense is one of the meanings of a word.For example a dictionary may have over 50 different meanings of the word , each of these having a different meaning based on the context of the word usage in a sentence...

 of a word
Word
In language, a word is the smallest free form that may be uttered in isolation with semantic or pragmatic content . This contrasts with a morpheme, which is the smallest unit of meaning but will not necessarily stand on its own...

 (i.e. meanings
Meaning (linguistics)
In linguistics, meaning is what is expressed by the writer or speaker, and what is conveyed to the reader or listener, provided that they talk about the same thing . In other words if the object and the name of the object and the concepts in their head are the same...

). Given that the output of word-sense induction is a set of senses for the target word (sense inventory), this task is strictly related to that of word-sense disambiguation (WSD), which relies on a predefined sense inventory and aims to solve the ambiguity
Polysemy
Polysemy is the capacity for a sign or signs to have multiple meanings , i.e., a large semantic field.Charles Fillmore and Beryl Atkins’ definition stipulates three elements: the various senses of a polysemous word have a central origin, the links between these senses form a network, and ...

 of words in context.

Approaches and methods

The output of a word-sense induction algorithm is a clustering of contexts in which the target word occurs or a clustering of words related to the target word. Three main methods have been proposed in the literature:
  • Context clustering
  • Word clustering
  • Co-occurrence graphs

Context clustering

In context clustering each occurrence of a target word is represented as a context vector. These vectors are then grouped into clusters, each representing a different meaning of the target word. A seminal approach of this kind is based on the idea of word space, that is a vector whose dimensions are words.

Word clustering

A second approach consists of methods aimed to cluster words which are semantically similar and can thus convey a specific meaning. These include Lin's algorithm and the Clustering by Committee algorithm.

Co-occurrence graphs

The third main approach to word-sense induction is based on the notion of co-occurrence graph, that is a graph whose vertices are words related to the target word and edges connect pairs of co-occurring words. Approaches include: the use of the Markov clustering algorithm, HyperLex and variants thereof.

Applications

  • Word-sense induction has been shown to benefit Web Information Retrieval when highly ambiguous queries are employed.
  • Simple word-sense induction algorithms boost Web search result clustering considerably and improve the diversification of search results returned by search engines such as Yahoo!
    Yahoo! Search
    Yahoo! Search is a web search engine, owned by Yahoo! Inc. and was , the 2nd largest search engine on the web by query volume, at 6.42%, after its competitor Google at 85.35% and before Baidu at 3.67%, according to Net Applications....


Software

  • SenseClusters is a freely available open source software package that performs both context clustering and word clustering.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK