EXtended WordNet
Encyclopedia
The eXtended WordNet is a project at the University of Texas at Dallas
University of Texas at Dallas
The University of Texas at Dallas, also referred to as UT Dallas or UTD, is a public research university in the University of Texas System. The main campus is in the heart of the Richardson, Texas, Telecom Corridor, north of downtown Dallas...

 (and funded by the National Science Foundation
National Science Foundation
The National Science Foundation is a United States government agency that supports fundamental research and education in all the non-medical fields of science and engineering. Its medical counterpart is the National Institutes of Health...

) that aims to improve WordNet
WordNet
WordNet is a lexical database for the English language. It groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets...

 by semantically parsing the gloss
Lexical definition
The lexical definition of a term, also known as the dictionary definition, is the meaning of the term in common usage. As its other name implies, this is the sort of definition one is likely to find in the dictionary...

es, thus making the information contained in these definitions available for automatic knowledge processing systems. It is freely available under a BSD style license. Although it has not been updated since November 2004 (the most recent version is based on WordNet 2.0), it still remains a useful resource.

Database format

The database is available as a set of four XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

 files - one each for verb
Verb
A verb, from the Latin verbum meaning word, is a word that in syntax conveys an action , or a state of being . In the usual description of English, the basic form, with or without the particle to, is the infinitive...

s, adverb
Adverb
An adverb is a part of speech that modifies verbs or any part of speech other than a noun . Adverbs can modify verbs, adjectives , clauses, sentences, and other adverbs....

s, noun
Noun
In linguistics, a noun is a member of a large, open lexical category whose members can occur as the main word in the subject of a clause, the object of a verb, or the object of a preposition .Lexical categories are defined in terms of how their members combine with other kinds of...

s and adjective
Adjective
In grammar, an adjective is a 'describing' word; the main syntactic role of which is to qualify a noun or noun phrase, giving more information about the object signified....

s. The following information is extracted from the glosses:
  • Word sense disambiguation
    Word sense disambiguation
    In computational linguistics, word-sense disambiguation is an open problem of natural language processing, which governs the process of identifying which sense of a word is used in a sentence, when the word has multiple meanings...

  • Parse tree
    Parse tree
    A concrete syntax tree or parse tree or parsing treeis an ordered, rooted tree that represents the syntactic structure of a string according to some formal grammar. In a parse tree, the interior nodes are labeled by non-terminals of the grammar, while the leaf nodes are labeled by terminals of the...

  • Logic form
    Logic form
    Logic forms are simple, first-order logic knowledge representations of natural language sentences formed by the conjunction of concept predicates related through shared arguments. Each noun, verb, adjective, adverb, pronoun, preposition and conjunction generates a predicate. Logic forms can be...



As an example, the following information is available for the synset excellent, first-class, fantabulous:

Gloss:
of the highest quality

Word sense disambiguation:
of
the
highest
quality

Parse tree:
(TOP (S (NP (JJ excellent) )
(VP (VBZ is)
(NP (NP (NN something) )
(PP (IN of)
(NP (DT the) (JJS highest) (NN quality) ) ) ) )
(. .) ) )

Logic form:
excellent:JJ(x1) -> of:IN (x1, x2) highest:JJ(x2) quality:NN(x2)

Data quality

Each gloss is first tagged
Part-of-speech tagging
In corpus linguistics, part-of-speech tagging , also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text as corresponding to a particular part of speech, based on both its definition, as well as its context—i.e...

 using Brill's tagger. The glosses are then parsed using both Charniak's parser and an in-house Collins' style parser. Each parsed gloss is then assigned a level of quality:
  • Gold: those that have been manually checked
  • Silver: those where both parsers have produced the same output
  • Normal: those where different outputs have been produced—in these situations the output of the in-house parser is used

External links



Page currently not available
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK