All Topics  
Natural language processing

 

   Email Print
   Bookmark   Link






 

Natural language processing



 
 
Natural language processing (NLP) is a field of computer science
Computer science

Computer science is the study of the theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems....
 concerned with the interactions between computers and human (natural) languages. Natural language generation
Natural language generation

Natural Language Generation is the natural language processing task of generating natural language from a machine representation system such as a knowledge base or a logical form....
 systems convert information from computer databases into readable human language. Natural language understanding
Natural language understanding

Natural language understanding is an advanced subtopic of Natural language processing that deals with machine reading comprehension....
 systems convert samples of human language into more formal representations that are easier for computer
Computer

A computer is a machine that manipulates Data according to a list of Code .The first devices that resemble modern computers date to the mid-20th century , although the computer concept and various machines similar to computers existed earlier....
 programs to manipulate. Many problems within NLP apply to both generation and understanding; for example, a computer must be able to model morphology
Morphology (linguistics)

Morphology is the identification, analysis and description of structure of words . While words are generally accepted as being the smallest units of syntax, it is clear that in most languages, words can be related to other words by rules....
 (the structure of words) in order to understand an English sentence, but a model of morphology is also needed for producing a grammatically correct English sentence.

NLP has significant overlap with the field of computational linguistics
Computational linguistics

Computational linguistics is an interdisciplinary field dealing with the Statistics and/or rule-based modeling of natural language from a computational perspective....
, and is often considered a sub-field of artificial intelligence
Artificial intelligence

Artificial intelligence is the intelligence of machines and the branch of computer science which aims to create it. Major AI textbooks define the field as "the study and design of intelligent agents,"...
.






Discussion
Ask a question about 'Natural language processing'
Start a new discussion about 'Natural language processing'
Answer questions from other users
Full Discussion Forum



Encyclopedia


Natural language processing (NLP) is a field of computer science
Computer science

Computer science is the study of the theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems....
 concerned with the interactions between computers and human (natural) languages. Natural language generation
Natural language generation

Natural Language Generation is the natural language processing task of generating natural language from a machine representation system such as a knowledge base or a logical form....
 systems convert information from computer databases into readable human language. Natural language understanding
Natural language understanding

Natural language understanding is an advanced subtopic of Natural language processing that deals with machine reading comprehension....
 systems convert samples of human language into more formal representations that are easier for computer
Computer

A computer is a machine that manipulates Data according to a list of Code .The first devices that resemble modern computers date to the mid-20th century , although the computer concept and various machines similar to computers existed earlier....
 programs to manipulate. Many problems within NLP apply to both generation and understanding; for example, a computer must be able to model morphology
Morphology (linguistics)

Morphology is the identification, analysis and description of structure of words . While words are generally accepted as being the smallest units of syntax, it is clear that in most languages, words can be related to other words by rules....
 (the structure of words) in order to understand an English sentence, but a model of morphology is also needed for producing a grammatically correct English sentence.

NLP has significant overlap with the field of computational linguistics
Computational linguistics

Computational linguistics is an interdisciplinary field dealing with the Statistics and/or rule-based modeling of natural language from a computational perspective....
, and is often considered a sub-field of artificial intelligence
Artificial intelligence

Artificial intelligence is the intelligence of machines and the branch of computer science which aims to create it. Major AI textbooks define the field as "the study and design of intelligent agents,"...
. The term natural language
Natural language

In the philosophy of language, a natural language is a language that is spoken, Sign language, or writing by humans for general-purpose communication, as distinguished from formal languages and from constructed languages....
 is used to distinguish human languages (such as Spanish, Swahili or Swedish) from formal
Formal language

A formal language is a set of words, i.e. finite string of letters, or symbols. The inventory from which these letters are taken is called the alphabet over which the language is defined....
 or computer languages (such as C++, Java or LISP). Although NLP may encompass both text and speech, work on speech processing
Speech processing

Speech processing is the study of Speech communication Signal_ and the processing methods of these signals.The signals are usually processed in a digital representation whereby speech processing can be seen as the intersection of digital signal processing and natural language processing....
 has evolved into a separate field.

Tasks and limitations

In theory, natural-language processing is a very attractive method of human-computer interaction. Early systems such as SHRDLU
SHRDLU

SHRDLU was an early natural language understanding computer program, developed by Terry Winograd at MIT from 1968-1970. It was written in the Planner programming language and Lisp programming language on the Digital Equipment Corporation PDP-6 computer and a DEC graphics computer terminal....
, working in restricted "blocks world
Blocks world

The blocks world is one of the most famous planning domains in artificial intelligence. The program was created by Terry Winograd and is a limited-domain Natural language system that can understand typed commands and move blocks around on a surface....
s" with restricted vocabularies, worked extremely well, leading researchers to excessive optimism, which was soon lost when the systems were extended to more realistic situations with real-world ambiguity
Ambiguity

Ambiguity is the property of being ambiguous, where a word, term, notation, sign, symbol, phrase, Sentence , or any other form used for communication, is called ambiguous if it can be interpreted in more than one way....
 and complexity
Complexity

In general usage, complexity tends to be used to characterize something with many parts in intricate arrangement. In science there are at this time a number of approaches to characterizing complexity, many of which are reflected in this article....
.

Natural-language understanding is sometimes referred to as an AI-complete
AI-complete

In the field of artificial intelligence, the most difficult problems are informally known as AI-complete or AI-hard, implying that the difficulty of these computational problems is equivalent to solving the central artificial intelligence problem?making computers as intelligent as people, or strong AI....
 problem, because natural-language recognition seems to require extensive knowledge about the outside world and the ability to manipulate it. The definition of "understanding
Understanding

Understanding is a psychology process related to an abstract or physical object, such as a person, situation, or message whereby one is able to think about it and use concepts to deal adequately with that object....
" is one of the major problems in natural-language processing.

Subproblems

Speech segmentation
Speech segmentation

Speech segmentation is the process of identifying the boundaries between words, syllables, or phonemes in spoken natural languages. The term applies both to the human mind processes used by humans, and to artificial processes of natural language processing....
: In most spoken languages, the sounds representing successive letters blend into each other, so the conversion of the analog signal to discrete characters can be a very difficult process. Also, in natural speech there are hardly any pauses between successive words; the location of those boundaries usually must take into account grammatical and semantic constraints, as well as the context
Context

Context may refer to:* ConTeXt, a macro package for the TeX typesetting system* ConTEXT, a Windows text editor* Context , the relevant constraints of the communicative situation that influence language use, language variation and discourse...
.

Text segmentation
Text segmentation

Text segmentation is the process of dividing writing into meaningful units, such as sentences or topics. The term applies both to human mind processes used by humans when reading text, and to artificial processes implemented in computers, which are the subject of natural language processing....
: Some written languages like Chinese
Chinese language

Chinese or the Sinitic language is a language family consisting of language mutually unintelligible to varying degrees. Originally the indigenous languages spoken by the Han Chinese in China, it forms one of the two branches of Sino-Tibetan languages of languages....
, Japanese
Japanese language

IPA: [n?iho?go] is a language spoken by over 130 million people in Japan and in Japanese emigrant communities. It is related to the Ryukyuan languages....
 and Thai
Thai language

Thai , is the national language and official language language of Thailand and the mother tongue of the Thai people, Thailand's dominant ethnic group....
 do not have single-word boundaries either, so any significant text parsing
Parsing

In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a sequence of lexical analysis#Token to determine their grammatical structure with respect to a given formal grammar....
 usually requires the identification of word boundaries, which is often a non-trivial task.

Part-of-speech tagging
Part-of-speech tagging

Part-of-speech tagging , also called grammatical tagging or word-category disambiguation, is the process of marking up the words in a text as corresponding to a particular parts of speech, based on both its definition, as well as its context?i.e., relationship with adjacent and related words in a phrase, sentence, or paragraph....
:

Word sense disambiguation
Word sense disambiguation

In computational linguistics, word sense disambiguation is the process of identifying which word sense of a word is used in any given Sentence , when the word has a number of distinct senses....
: Many words have more than one meaning
Meaning

Meaning may refer to:...
; we have to select the meaning which makes the most sense in context.

Syntactic ambiguity
Syntactic ambiguity

Syntactic ambiguity is a property of Sentence s which may be reasonably interpreted in more than one way, or reasonably interpreted to mean more than one thing....
: The grammar
Grammar

Grammar is the field of linguistics that covers the conventions governing the use of any given natural language. It includes morphology and syntax, often complemented by phonetics, phonology, semantics, and pragmatics....
 for natural language
Natural language

In the philosophy of language, a natural language is a language that is spoken, Sign language, or writing by humans for general-purpose communication, as distinguished from formal languages and from constructed languages....
s is ambiguous, i.e. there are often multiple possible parse tree
Parse tree

A parse tree or concrete syntax tree is an tree that represents the syntax structure of a string according to some formal grammar. In a parse tree, the interior nodes are labeled by nonterminals of the grammar, while the leaf nodes are labeled by terminal symbol of the grammar....
s for a given sentence. Choosing the most appropriate one usually requires semantic
Semantics

Semantics is the study of meaning in communication. The word is derived from the Greek language word s??a?t???? , "significant", from s??a??? , "to signify, to indicate" and that from s??a , "sign, mark, token"....
 and contextual information. Specific problem components of syntactic ambiguity include sentence boundary disambiguation
Sentence boundary disambiguation

Sentence boundary disambiguation is the problem in natural language processing of deciding where Sentence begin and end. Often natural language processing tools require their input to be divided into sentences for a number of reasons....
.

Imperfect or irregular input : Foreign or regional accents and vocal impediments in speech; typing or grammatical errors, OCR
Optical character recognition

Optical character recognition, usually abbreviated to OCR, is the mechanical or Electronics translation of s of handwritten, typewritten or printed text into machine-editable text....
 errors in texts.

Speech acts and plans: A sentence can often be considered an action by the speaker. The sentence structure alone may not contain enough information to define this action. For instance, a question is actually the speaker requesting some sort of response from the listener. The desired response may be verbal, physical, or some combination. For example, "Can you pass the class?" is a request for a simple yes-or-no answer, while "Can you pass the salt?" is requesting a physical action to be performed. It is not appropriate to respond with "Yes, I can pass the salt," without the accompanying action (although "No" or "I can't reach the salt" would explain a lack of action).

Statistical NLP

Statistical natural-language processing uses stochastic
Stochastic

Stochastic means random.A stochastic process is one whose behavior is non-Deterministic system in that a system's subsequent state is determined both by the process's predictable actions and by a random element....
, probabilistic and statistical methods to resolve some of the difficulties discussed above, especially those which arise because longer sentences are highly ambiguous when processed with realistic grammars, yielding thousands or millions of possible analyses. Methods for disambiguation often involve the use of corpora
Corpus linguistics

Corpus linguistics is the study of language as expressed in samples or "real world" text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language....
 and Markov models. Statistical NLP comprises all quantitative approaches to automated language processing
Language processing

Language processing refers to the way human beings process speech or writing and understand it as language. Most recent theories back the idea that this process is made completely by and inside the brain....
, including probabilistic modeling, information theory
Information theory

Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Historically, information theory was developed by Claude E....
, and linear algebra
Linear algebra

Linear algebra is the branch of mathematics concerned with the study of Euclidean vectors, vector spaces , linear maps , and system of linear equations....
. The technology for statistical NLP comes mainly from machine learning
Machine learning

Machine learning is the subfield of artificial intelligence that is concerned with the design and development of algorithms that allow computers to improve their performance over time based on data, such as from sensor data or databases....
 and data mining
Data mining

Data mining is the process of extracting hidden patterns from data. As more data is gathered, with the amount of data doubling every three years, data mining is becoming an increasingly important tool to transform this data into information....
, both of which are fields of artificial intelligence
Artificial intelligence

Artificial intelligence is the intelligence of machines and the branch of computer science which aims to create it. Major AI textbooks define the field as "the study and design of intelligent agents,"...
that involve learning from data.

Major tasks in NLP

  • Automatic summarization
    Automatic summarization

    Automatic summarization is the creation of a shortened version of a writing by a computer program. The product of this procedure still contains the most important points of the original text....
     -
  • Foreign language reading aid
    Foreign language reading aid

    A foreign language reading aid is a computer program that assists a non-native language user to read properly in their target language. The proper reading means that the pronunciation should be correct and stress to different parts of the words should be proper....
  • Foreign language writing aid
    Foreign Language Writing Aid

    A foreign language writing aid is a computer program that assists a non-native language user in writing decently in their target language. Assistive operations can be classified into two categories: on-the-fly prompts and post-writing checks....
  • Information extraction
    Information extraction

    In natural language processing, information extraction is a type of information retrieval whose goal is to automatically extract structured information, i.e....
  • Information retrieval
    Information retrieval

    Information retrieval is the science of searching for documents, for information within documents and for Metadata about documents, as well as that of searching relational databases and the World Wide Web....
     (IR) - IR is concerned with storing, searching and retrieving information. It is a separate field within computer science (closer to databases), but IR relies on some NLP methods (for example, stemming). Some current research and applications seek to bridge the gap between IR and NLP.
  • Machine translation
    Machine translation

    Machine translation, sometimes referred to by the abbreviation MT, is a sub-field of computational linguistics that investigates the use of computer software to translation text or speech from one natural language to another....
     - Automatically translating from one human language to another.
  • Named entity recognition
    Named entity recognition

    Named entity recognition is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc....
     (NER) - Given a stream of text, determining which items in the text map to proper names, such as people or places. Although in English, named entities are marked with capitalized words, many other languages do not use capitalization to distinguish named entities.
  • Natural language generation
    Natural language generation

    Natural Language Generation is the natural language processing task of generating natural language from a machine representation system such as a knowledge base or a logical form....
  • Natural language understanding
    Natural language understanding

    Natural language understanding is an advanced subtopic of Natural language processing that deals with machine reading comprehension....
  • Optical character recognition
    Optical character recognition

    Optical character recognition, usually abbreviated to OCR, is the mechanical or Electronics translation of s of handwritten, typewritten or printed text into machine-editable text....
  • anaphora resolution
  • Question answering
    Question answering

    In information retrieval, question answering is the task of automatically answering a question posed in natural language. To find the answer to a question, a QA computer program may use either a pre-structured database or a collection of natural language documents ....
     - Given a human language question, the task of producing a human-language answer. The question may be a closed-ended (such as "What is the capital of Canada?") or open-ended (such as "What is the meaning of life?").
  • Speech recognition
    Speech recognition

    Speech recognition converts spoken words to machine-readable input . The term "voice recognition" is sometimes incorrectly used to refer to speech recognition, when actually referring to speaker recognition, which attempts to identify the person speaking, as opposed to what is being said....
     - Given a sound clip of a person or people speaking, the task of producing a text dictation of the speaker(s). (The opposite of text to speech.)
  • Spoken dialogue system
  • Text simplification
    Text simplification

    Text simplification is an operation used in natural language processing to modify, enhance, classify or otherwise process an existing corpus of human-readable text in such a way that the grammar and structure of the prose is greatly simplified, while the underlying Meaning and information remains the same....
  • Text-to-speech
  • Text-proofing


Concrete problems

Some examples of the problems faced by natural-language-understanding systems:

  • The sentences "We gave the monkeys the bananas because they were hungry" and "We gave the monkeys the bananas because they were over-ripe" have the same surface grammatical structure. However, the pronoun they refers to monkeys in one sentence and bananas in the other, and it is impossible to tell which without a knowledge of the properties of monkeys and bananas.
  • A string of words may be interpreted in different ways. For example, the string "Time flies like an arrow" may be interpreted in a variety of ways:
    • The common simile
      Simile

      A simile is a figure of speech comparing two unlike things, often introduced with the word "like" or "as". Even though similes and metaphors are both forms of comparison, similes allow the two ideas to remain distinct in spite of their similarities, whereas metaphors seek to equate two ideas despite their differences....
      : time
      Time

      Time is a component of the measurement used to sequence events, to compare the durations of events and the intervals between them, and to quantify the motions of objects....
       moves quickly just like an arrow does;
    • measure the speed of flies like you would measure that of an arrow (thus interpreted as an imperative) - i.e. (You should) time flies as you would (time) an arrow.;
    • measure the speed of flies like an arrow would - i.e. Time flies in the same way that an arrow would (time them).;
    • measure the speed of flies that are like arrows - i.e. Time those flies that are like arrows;
    • all of a type of flying insect, "time-flies," collectively enjoys a single arrow (compare Fruit flies like a banana);
    • each of a type of flying insect, "time-flies," individually enjoys a different arrow (similar comparison applies);
    • A concrete object, for example the magazine, Time
      Time (magazine)

      Time is a weekly United States newsmagazine, similar to Newsweek and U.S. News & World Report. A European edition is published from London....
      , travels through the air in an arrow-like manner.


English is particularly challenging in this regard because it has little inflectional morphology
Inflectional morphology

Inflectional morphology is a part of the study of linguistics.To apply an inflection is to change the form of a word so as to give it extra meaning....
 to distinguish between parts of speech.

  • English and several other languages don't specify which word an adjective applies to. For example, in the string "pretty little girls' school".
    • Does the school look little?
    • Do the girls look little?
    • Do the girls look pretty?
    • Does the school look pretty?


  • We will often imply additional information in spoken language by the way we place stress on words. The sentence "I never said she stole my money" demonstrates the importance stress can play in a sentence, and thus the inherent difficulty a natural language processor can have in parsing it. Depending on which word the speaker places the stress, this sentence could have several distinct meanings:
    • "I never said she stole my money" - Someone else said it, but I didn't.
    • "I never said she stole my money" - I simply didn't ever say it.
    • "I never said she stole my money" - I might have implied it in some way, but I never explicitly said it.
    • "I never said she stole my money" - I said someone took it; I didn't say it was she.
    • "I never said she stole my money" - I just said she probably borrowed it.
    • "I never said she stole my money" - I said she stole someone else's money.
    • "I never said she stole my money" - I said she stole something, but not my money.


Evaluation of natural language processing


Objectives

The goal of NLP evaluation is to measure one or more qualities of an algorithm or a system, in order to determine whether (or to what extent) the system answers the goals of its designers, or meets the needs of its users. Research in NLP evaluation has received considerable attention, because the definition of proper evaluation criteria is one way to specify precisely an NLP problem, going thus beyond the vagueness of tasks defined only as language understanding or language generation. A precise set of evaluation criteria, which includes mainly evaluation data and evaluation metrics, enables several teams to compare their solutions to a given NLP problem.

Short history of evaluation in NLP

The first evaluation campaign on written texts seems to be a campaign dedicated to message understanding in 1987 (Pallet 1998). Then, the Parseval/GEIG project compared phrase-structure grammars (Black 1991). A series of campaigns within Tipster project were realized on tasks like summarization, translation and searching (Hirshman 1998). In 1994, in Germany, the Morpholympics compared German taggers. Then, the Senseval and Romanseval campaigns were conducted with the objectives of semantic disambiguation. In 1996, the Sparkle campaign compared syntactic parsers in four different languages (English, French, German and Italian). In France, the Grace project compared a set of 21 taggers for French in 1997 (Adda 1999). In 2004, during the Technolangue/Easy
Technolangue/Easy

Technolangue/Easy was the first evaluation campaign for the syntactic parsers of French language.This project was supported by the French Research Ministry....
 project, 13 parsers for French were compared. Large-scale evaluation of dependency parsers were performed in the context of the CoNLL shared tasks in 2006 and 2007. In Italy, the evalita campaign was conducted in 2007 to compare various tools for Italian . In France, within the ANR-Passage project (end of 2007), 10 parsers for French were compared .

Adda G., Mariani J., Paroubek P., Rajman M. 1999 L'action GRACE d'évaluation de l'assignation des parties du discours pour le français. Langues vol-2
Black E., Abney S., Flickinger D., Gdaniec C., Grishman R., Harrison P., Hindle D., Ingria R., Jelinek F., Klavans J., Liberman M., Marcus M., Reukos S., Santoni B., Strzalkowski T. 1991 A procedure for quantitatively comparing the syntactic coverage of English grammars. DARPA Speech and Natural Language Workshop
Hirshman L. 1998 Language understanding evaluation: lessons learned from MUC and ATIS. LREC Granada
Pallet D.S. 1998 The NIST role in automatic speech recognition benchmark tests. LREC Granada

Different types of evaluation

Depending on the evaluation procedures, a number of distinctions are traditionally made in NLP evaluation.

  • Intrinsic vs. extrinsic evaluation


Intrinsic evaluation considers an isolated NLP system and characterizes its performance mainly with respect to a gold standard result, pre-defined by the evaluators. Extrinsic evaluation, also called evaluation in use considers the NLP system in a more complex setting, either as an embedded system or serving a precise function for a human user. The extrinsic performance of the system is then characterized in terms of its utility with respect to the overall task of the complex system or the human user. For example, consider a syntactic parser that is based on the output of some new part of speech (POS) tagger. An intrinsic evaluation would run the POS tagger on some labelled data, and compare the system output of the POS tagger to the gold standard (correct) output. An extrinsic evaluation would run the parser with some other POS tagger, and then with the new POS tagger, and compare the parsing accuracy.

  • Black-box vs. glass-box evaluation


Black-box evaluation requires one to run an NLP system on a given data set and to measure a number of parameters related to the quality of the process (speed, reliability, resource consumption) and, most importantly, to the quality of the result (e.g. the accuracy of data annotation or the fidelity of a translation). Glass-box evaluation looks at the design of the system, the algorithms that are implemented, the linguistic resources it uses (e.g. vocabulary size), etc. Given the complexity of NLP problems, it is often difficult to predict performance only on the basis of glass-box evaluation, but this type of evaluation is more informative with respect to error analysis or future developments of a system.

  • Automatic vs. manual evaluation


In many cases, automatic procedures can be defined to evaluate an NLP system by comparing its output with the gold standard (or desired) one. Although the cost of producing the gold standard can be quite high, automatic evaluation can be repeated as often as needed without much additional costs (on the same input data). However, for many NLP problems, the definition of a gold standard is a complex task, and can prove impossible when inter-annotator agreement is insufficient. Manual evaluation is performed by human judges, which are instructed to estimate the quality of a system, or most often of a sample of its output, based on a number of criteria. Although, thanks to their linguistic competence, human judges can be considered as the reference for a number of language processing tasks, there is also considerable variation across their ratings. This is why automatic evaluation is sometimes referred to as objective evaluation, while the human kind appears to be more subjective.

Shared tasks (Campaigns)

  • BioCreative
    BioCreative

    BioCreAtIvE consists in a community-wide effort for evaluating information extraction and text mining developments in the biological domain.Three main tasks were posed at the first BioCreAtIvE challenge: the entity extraction task, the gene name normalization task, and the functional annotation of gene products task....
  • Message Understanding Conference
    Message Understanding Conference

    The Message Understanding Conferences were initiated and financed by DARPA to encouragethe development of new and better methods of information extraction....
  • Technolangue/Easy
    Technolangue/Easy

    Technolangue/Easy was the first evaluation campaign for the syntactic parsers of French language.This project was supported by the French Research Ministry....
  • Text Retrieval Conference
    Text Retrieval Conference

    The Text REtrieval Conference is an on-going series of workshops focusing on a list of different information retrieval research areas, or tracks. It is co-sponsored by the National Institute of Standards and Technology and the Disruptive Technology Office of the United States Department of Defense, and began in 1992 as part of the TIPS...


Standardization in NLP

An ISO sub-committee is working in order to ease interoperability between Lexical resource
Lexical resource

A lexical resource is a database consisting of one or several lexicons.Depending on the type of languages that are addressed, the LR may be qualified as monolingual, bilingual or multilingual....
s and NLP programs. The sub-committee is part of ISO/TC37 and is called ISO/TC37/SC4. Some ISO standards are already published but most of them are under construction, mainly on lexicon representation (see LMF
Lexical Markup Framework

Lexical Markup Framework is the ISO International Organization for Standardization ISO/TC37 standard for natural language processing and machine-readable dictionary lexicons....
), annotation and data category registry.

Journals

  • Computational Linguistics
    Computational Linguistics (journal)

    Computational Linguistics, published by The MIT Press for the Association for Computational Linguistics , is a leading journal in the field of computational linguistics....
  • Language Resources and Evaluation
  • Linguistic Issues in Language Technology
    Linguistic Issues in Language Technology

    Linguistic Issues in Language Technology is an Open Access journal that, according to its web page, "focusses on relationships between linguistic insights, which can prove valuable to language technology, and language technology, which can enrich linguistic research" [1]....


Organizations and conferences


Associations

  • Association for Computational Linguistics
    Association for Computational Linguistics

    The Association for Computational Linguistics is the international scientific and professional society for people working on problems involving natural language and computation....
  • Association for Machine Translation in the Americas
  • AFNLP
    AFNLP

    AFNLP is the organization for coordinating the natural language processing related activities and events in the Asia-Pacific region....
     - Asian Federation of Natural Language Processing Associations
  • Australasian Language Technology Association
    Australasian Language Technology Association

    The Australasian Language Technology Association promotes language technology research and development in Australia and New Zealand.ALTA organises regular events for the exchange of research results and for academic and industrial training, and co-ordinates activities with other professional societies....
     (ALTA)

Conferences

  • Language Resources and Evaluation
    LREC

    The International Conference on Language Resources and Evaluation is a biennial conference organised by the European Language Resources Association with the support of institutions and organisations involved in Natural language processing....


Software tools

  • General Architecture for Text Engineering
    General Architecture for Text Engineering

    General Architecture for Text Engineering or GATE is a Java software toolkit originally developed at the University of Sheffield since 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for all sorts of Natural language processing tasks, including Information extraction in many languages....
  • Natural Language Toolkit
    Natural Language Toolkit

    Natural Language Toolkit or, more commonly, NLTK is a suite of Library and programs for symbolic and statistical natural language processing for the Python ....
     (NLTK): a Python
    Python (programming language)

    Python is a general-purpose high-level programming language. Its design philosophy emphasizes code readability. Python's core syntax and semantics are Minimalism , while the standard library is large and comprehensive....
     library suite
  • Expert System S.p.A.
    Expert System S.p.A.

    Expert System is a software company, founded in Italy in 1989, pioneer in developing and marketing semantic technologies to understand and manage unstructured information....
  • OpenNLP
  • MontyLingua
    MontyLingua

    MontyLingua is a popular natural language processing toolkit. It is a suite of Library and programs for symbolic and statistical natural language processing for both the Python and Java programming languages....
  • - Free software packages for NLP research, including a Semantic Role Labeler, Named Entity Tagger, Coreference Resolution, and more! This also the home of Learning-Based Java (Machine Learning Framework) and Sparse Network of Winnows (Learning Architecture).


See also

  • Biomedical text mining
    Biomedical text mining

    Biomedical text mining refers to text mining applied to texts and literature of the biomedical and molecular biology domain. It is a rather recent research field on the edge of natural language processing, bioinformatics, medical informatics and computational linguistics....
  • Chatterbot
    Chatterbot

    A chatterbot is a type of conversational agent, a computer program designed to simulate an intelligent conversation with one or more human users via auditory or textual methods....
  • Compound term processing
    Compound term processing

    Compound term processing is the name that is used for a category of techniques in Information retrieval applications that performs matching on the basis of compound terms....
  • Computational linguistics
    Computational linguistics

    Computational linguistics is an interdisciplinary field dealing with the Statistics and/or rule-based modeling of natural language from a computational perspective....
  • Computer-assisted reviewing
    Computer-assisted reviewing

    Computer-assisted reviewing tools are pieces of software based on text comparison and analysis algorithms. These tools focus on the differences between two documents, taking into account each document's typeface through an intelligent analysis....
  • Controlled natural language
    Controlled natural language

    Controlled natural languages are subsets of natural languages, obtained byrestricting the grammar and vocabulary in orderto reduce or eliminate ambiguity and complexity....
  • Human language technology
  • Information retrieval
    Information retrieval

    Information retrieval is the science of searching for documents, for information within documents and for Metadata about documents, as well as that of searching relational databases and the World Wide Web....
  • Latent semantic indexing
  • Lexical markup framework
    Lexical Markup Framework

    Lexical Markup Framework is the ISO International Organization for Standardization ISO/TC37 standard for natural language processing and machine-readable dictionary lexicons....
  • lojban
    Lojban

    Lojban is a constructed language, syntactically unambiguous human language based on First-order logic. Its predecessor is Loglan, the original logical language by James Cooke Brown....
     / loglan
    Loglan

    Loglan is a constructed language originally designed for linguistic research, particularly for investigating the Sapir-Whorf Hypothesis. The language was developed beginning in 1955 by Dr....
  • Transderivational search
    Transderivational search

    Transderivational search is a psychology and cybernetics term, meaning when a search is being conducted for a fuzzy logic match across a broad field....
  • Speech Recognition
    Speech recognition

    Speech recognition converts spoken words to machine-readable input . The term "voice recognition" is sometimes incorrectly used to refer to speech recognition, when actually referring to speaker recognition, which attempts to identify the person speaking, as opposed to what is being said....


Implementations

  • Cypher
    Cypher transcoder

    The Cypher transcoder is a Natural Language Processing engine that converts plain language statements and phrases into RDF triples and SPARQL queries....
    , a framework for transforming natural language phrases and statements into SPARQL and RDF. Uses the Metalanguage Ontology to describe language constructs such as phrase grammars, morphology rules and lexicons.
  • Infonic
    Infonic

    Infonic is a UK-based company which develops document management, enterprise content management, text analysis and SharePoint replication computer software....
     Sentiment, an NLP-based news analysis software package that reads news flows and provides news sentiment
    Market sentiment

    Market sentiment is the general feeling or mood of the investment community as to the anticipated market trends. This feeling or sentiment is the summation of a variety of factors including market data, technical analysis, government reports and/or national and world events....
     signals for the algorithmic trading
    Algorithmic trading

    In electronic trading, algorithmic trading or automated trading, also known as algo trading, black-box trading, or robo trading, is the use of computer programs for entering trading order with the computer algorithm deciding on certain aspects of the order such as the timing, price, or even the final quantity of the o...
     systems of investment banks
  • LinguaStream
    LinguaStream

    LinguaStream is a generic platform for Natural Language Processing , based on incremental enrichment of electronic documents. It allows complex processing streams to be designed and evaluated, assembling analysis components of various types and levels: part-of-speech, syntax, semantics, discourse or statistical....
    , a generic platform for NLP experimentation
  • MARF
    Modular Audio Recognition Framework

    Modular Audio Recognition Framework is an open-source research platform and a collection of Voice message, sound, Speech communication, writing and natural language processing algorithms written in Java and arranged into a Modularity and extensible Software framework that attempts to facilitate addition of new algorithms....
    , a framework for voice and statistical NLP processing
    Stochastic grammar

    A stochastic grammar is a grammar framework with a probabilistic notion of grammaticality:*Stochastic context-free grammar*Statistical parsing...
  • Nortel Speech Server
    Nortel Speech Server

    The Nortel Speech Server in telecommunications is a speech processing system manufactured by Nortel. The system supports many functions but is primarily used for large-Vocabulary speech recognition, Natural language processing, text-to-speech, and speaker verification....
    , a speech processing
    Speech processing

    Speech processing is the study of Speech communication Signal_ and the processing methods of these signals.The signals are usually processed in a digital representation whereby speech processing can be seen as the intersection of digital signal processing and natural language processing....
     system primarily used for large-vocabulary speech recognition
    Speech recognition

    Speech recognition converts spoken words to machine-readable input . The term "voice recognition" is sometimes incorrectly used to refer to speech recognition, when actually referring to speaker recognition, which attempts to identify the person speaking, as opposed to what is being said....
    , natural-language understanding, text-to-speech, and speaker verification


Related academic articles

  • Bates, M. (1995). Models of natural language understanding. Proceedings of the National Academy of Sciences of the United States of America, Vol. 92, No. 22 (Oct. 24, 1995), pp. 9977-9982.


External links


Resources

  • [https://kitwiki.csc.fi/twiki/bin/view/FiLT/FiLTWikiEn Language Technology Documentation Centre in Finland (FiLT)]


Organizations