Terminology extraction
Encyclopedia
Terminology mining, term extraction, term recognition, or glossary extraction, is a subtask of information extraction
Information extraction
Information extraction is a type of information retrieval whose goal is to automatically extract structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language...

. The goal of terminology extraction is to automatically extract relevant terms from a given corpus
Text corpus
In linguistics, a corpus or text corpus is a large and structured set of texts...

.

In the semantic web
Semantic Web
The Semantic Web is a collaborative movement led by the World Wide Web Consortium that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of...

 era, a growing number of communities and networked enterprises started to access and interoperate through the internet
Internet
The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...

. Modeling these communities and their information needs is important for several web application
Web application
A web application is an application that is accessed over a network such as the Internet or an intranet. The term may also mean a computer software application that is coded in a browser-supported language and reliant on a common web browser to render the application executable.Web applications are...

s, like topic-driven web crawler
Web crawler
A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. Other terms for Web crawlers are ants, automatic indexers, bots, Web spiders, Web robots, or—especially in the FOAF community—Web scutters.This process is called Web...

s, web service
Web service
A Web service is a method of communication between two electronic devices over the web.The W3C defines a "Web service" as "a software system designed to support interoperable machine-to-machine interaction over a network". It has an interface described in a machine-processable format...

s, recommender systems
Recommendation system
Recommender systems, recommendation systems, recommendation engines, recommendation frameworks, recommendation platforms or simply recommender form or work from a specific type of information filtering system technique that attempts to recommend information items Recommender systems, recommendation...

, etc. The development of terminology extraction is essential to the language industry
Language industry
The language industry is the sector of activity dedicated to designing, producing, and marketing tools, products, or services related to computerized language processing...

.

One of the first steps to model the knowledge domain
Domain knowledge
Domain knowledge is that valid knowledge used to refer to an area of human endeavour, an autonomous computer activity, or other specialized discipline.Specialists and experts use and develop their own domain knowledge...

 of a virtual community
Virtual community
A virtual community is a social network of individuals who interact through specific media, potentially crossing geographical and political boundaries in order to pursue mutual interests or goals...

 is to collect a vocabulary of domain-relevant terms, constituting the linguistic surface manifestation of domain concepts. Several methods to automatically extract technical terms from domain-specific document warehouse
Warehouse
A warehouse is a commercial building for storage of goods. Warehouses are used by manufacturers, importers, exporters, wholesalers, transport businesses, customs, etc. They are usually large plain buildings in industrial areas of cities and towns. They usually have loading docks to load and unload...

s have been described in the literature.

Typically, approaches to automatic term extraction make use of linguistic processors (part of speech tagging, phrase chunking
Phrase chunking
Phrase chunking is a natural language process that separates and segments a sentence into its subconstituents, such as noun, verb, and prepositional phrases.-External links:**...

) to extract terminological candidates, i.e. syntactically plausible terminological noun phrase
Noun phrase
In grammar, a noun phrase, nominal phrase, or nominal group is a phrase based on a noun, pronoun, or other noun-like word optionally accompanied by modifiers such as adjectives....

s, NPs (e.g. compounds "credit card", adjective-NPs "local tourist information office", and prepositional-NPs "board of directors" - in English, the first two constructs are the most frequent). Terminological entries are then filtered from the candidate list using statistical and machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

 methods. Once filtered, because of their low ambiguity and high specificity, these terms are particularly useful for conceptualizing a knowledge domain
Domain knowledge
Domain knowledge is that valid knowledge used to refer to an area of human endeavour, an autonomous computer activity, or other specialized discipline.Specialists and experts use and develop their own domain knowledge...

 or for supporting the creation of a domain ontology. Furthermore, terminology extraction is a very useful starting point for semantic similarity
Semantic similarity
Semantic similarity or semantic relatedness is a concept whereby a set of documents or terms within term lists are assigned a metric based on the likeness of their meaning / semantic content....

, knowledge management
Knowledge management
Knowledge management comprises a range of strategies and practices used in an organization to identify, create, represent, distribute, and enable adoption of insights and experiences...

, human translation
Translation
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. Whereas interpreting undoubtedly antedates writing, translation began only after the appearance of written literature; there exist partial translations of the Sumerian Epic of...

 and machine translation
Machine translation
Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...

, etc.

See also

  • Computational linguistics
    Computational linguistics
    Computational linguistics is an interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective....

  • Glossary
    Glossary
    A glossary, also known as an idioticon, vocabulary, or clavis, is an alphabetical list of terms in a particular domain of knowledge with the definitions for those terms...

  • Natural language processing
    Natural language processing
    Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....

  • Domain Ontology
  • Subject indexing
    Subject indexing
    Subject indexing is the act of describing or classifying a document by index terms or other symbols in order to indicate what the document is about, to summarize its content or to increase its findability. In other words, it is about identifying and describing the subject of documents...

  • Taxonomy
    Taxonomy
    Taxonomy is the science of identifying and naming species, and arranging them into a classification. The field of taxonomy, sometimes referred to as "biological taxonomy", revolves around the description and use of taxonomic units, known as taxa...

  • Terminology
    Terminology
    Terminology is the study of terms and their use. Terms are words and compound words that in specific contexts are given specific meanings, meanings that may deviate from the meaning the same words have in other contexts and in everyday language. The discipline Terminology studies among other...

  • Text mining
    Text mining
    Text mining, sometimes alternately referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as...

  • Text simplification
    Text simplification
    Text simplification is an operation used in natural language processing to modify, enhance, classify or otherwise process an existing corpus of human-readable text in such a way that the grammar and structure of the prose is greatly simplified, while the underlying meaning and information remains...


External links

  • Lexterm, a free/open-source Lexical Extractor for Terminology and Translation (mono- and bilingual extraction).
  • Sematext Key Phrase Extractor, a package for extraction of Collocations, Statistically Improbable Phrases (SIPs), etc. by Sematext
    Sematext
    Sematext is a Brooklyn, NY based company providing commercial support, consulting, development and products around search, Natural Language Processing, Recommendation Engines, and Text Analytics. The company’s services and products are aimed at organizations using or evaluating Lucene, Solr,...

  • Five Filters Term Extraction, a free software term extraction service web application
    Web application
    A web application is an application that is accessed over a network such as the Internet or an intranet. The term may also mean a computer software application that is coded in a browser-supported language and reliant on a common web browser to render the application executable.Web applications are...

  • AlchemyAPI, a web-based multi-lingual keyword / terminology extraction API web application
    Web application
    A web application is an application that is accessed over a network such as the Internet or an intranet. The term may also mean a computer software application that is coded in a browser-supported language and reliant on a common web browser to render the application executable.Web applications are...

  • Zemanta API, a web-based keyword extraction and disambiguation API by Zemanta
    Zemanta
    Zemanta is a content suggestion engine for bloggers and other content creators.- Features :Zemanta analyzes user-generated content using natural language processing and semantic search technology to suggest pictures, tags and links to related articles.Zemanta suggests content from Wikipedia,...

  • Yahoo Term Extraction API web application
    Web application
    A web application is an application that is accessed over a network such as the Internet or an intranet. The term may also mean a computer software application that is coded in a browser-supported language and reliant on a common web browser to render the application executable.Web applications are...

  • Introduction to terminology management, by IBM
    IBM
    International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

  • TerMine, a term management system by the UK's National Centre for Text Mining
    National Centre for Text Mining
    The National Centre for Text Mining was the world’s first publicly funded text mining centre. It was established to provide support, advice, and information on TM technologies and to disseminate information from the larger TM community, while also providing tailored services and tools in response...

    . web application
    Web application
    A web application is an application that is accessed over a network such as the Internet or an intranet. The term may also mean a computer software application that is coded in a browser-supported language and reliant on a common web browser to render the application executable.Web applications are...

  • TermExtractor, a free terminology extraction web application
    Web application
    A web application is an application that is accessed over a network such as the Internet or an intranet. The term may also mean a computer software application that is coded in a browser-supported language and reliant on a common web browser to render the application executable.Web applications are...

  • TermFinder, free online terminology extractor web application
    Web application
    A web application is an application that is accessed over a network such as the Internet or an intranet. The term may also mean a computer software application that is coded in a browser-supported language and reliant on a common web browser to render the application executable.Web applications are...

  • TrM Extractor, experimental terminology extractor web application
    Web application
    A web application is an application that is accessed over a network such as the Internet or an intranet. The term may also mean a computer software application that is coded in a browser-supported language and reliant on a common web browser to render the application executable.Web applications are...

     that can process English and Hungarian texts
  • Statistical Bilingual Terminology Extractor, online terminology extractor web application
    Web application
    A web application is an application that is accessed over a network such as the Internet or an intranet. The term may also mean a computer software application that is coded in a browser-supported language and reliant on a common web browser to render the application executable.Web applications are...

  • Ngram Statistics Package, open source package for identifying collocations
  • Heartsome Araya Bilingual Terminology Extractor for TMX files, by Heartsome Europe
  • English Phrases Extractor, by the Blogscope team at the University of Toronto, extracted terms are used to search for conceptually related blogs over the web rather than for linguistic analysis purpose web application
    Web application
    A web application is an application that is accessed over a network such as the Internet or an intranet. The term may also mean a computer software application that is coded in a browser-supported language and reliant on a common web browser to render the application executable.Web applications are...

  • A demo of Document Skimming and Scanning using domain-related terms extracted from news articles. The domain terms are organised into 'term clouds' for visualising key concepts in the news.
  • An interface for extracting domain-relevant terms from documents using the OT and TH measures. A list of documents together with their automatically extracted domain-relevant terms are available for browsing here.
  • Gabor Melli's info page on terminology extraction
  • Yahoo! Quest demo exploiting term extraction for browsing the Yahoo! Answers Q&A collection
  • ExtractKeyword, A Free Online Keyword Tool to Extract Keywords, Analyse Keyword Density from Webpages and Paragraphs. Support Vietnamese and English contents.
  • Ultimate Research Assistant, A free online literacy tool with strong multi-lingual terminology extraction capabilities and visualizations including bar charts, mind maps, and taxonomies. Includes XML web services for term extraction, text summarization, and taxonomy generation / clustering.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK