LGTE
Encyclopedia
Lucene Geographic and Temporal (LGTE) is an information retrieval
Information retrieval
Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...

 tool developed at Technical University of Lisbon
Technical University of Lisbon
The Technical University of Lisbon is a Portuguese public university. It was created in 1930 in Lisbon, as a confederation of older schools, and comprises, nowadays, the faculties and institutes of veterinary medicine; agricultural sciences; economics and business administration; engineering,...

 which can be used as a search engine
Search engine
A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...

 or as evaluation system for information retrieval techniques for research purposes. The first implementation powered by LGTE was the search engine of DIGMAP, a project co-funded by the community programme eContentplus between 2006 and 2008, which was aimed to provide services available on the web over old digitized maps from a group of partners over Europe including several National Libraries.

The tool LGTE is built in Java Programming Language around the Lucene
Lucene
Apache Lucene is a free/open source information retrieval software library, originally created in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License....

 library for full-text search and introduces several extensions for dealing with geographical and temporal information. The package also includes utilities for information retrieval evaluation, such as classes for handling CLEF/TREC (Cross Language Evaluation Forúm/Text Retrieval Conference) topics and document collections.

Technically LGTE is a layer on the top of Lucene and provides an extended Lucene API to integrate several services like snippets generation, query expansion
Query expansion
Query expansion is the process of reformulating a seed query to improve retrieval performance in information retrieval operations.In the context of web search engines, query expansion involves evaluating a user's input and expanding the search query to match additional documents...

, and many others. The LGTE provides the chance to implement new probabilistic models. The API depends of a set of modifications at Lucene level, originally created by the researchers of the University of Amsterdam in a software tool named Lucene-lm developed by the group of Information and Language Processing Systems (ILPS). At the time, the tool was tested with success for the Okapi BM25 model, and a multinomial language model, but also includes divergence from randomness models.

The LGTE 1.1.9 and later versions also provide the possibility to isolate the index fields in different index folders. Another recent feature is the configuration of Hierarchic Indexes using foreign key fields. This gives the chance to create scores for example based on the text of the sentence combined with the general score of the entire page.

Features

  • Provides Isolated Fields using different folders
  • Provides Hierarchic indexes through foreign key fields
  • Provides classes to parse documents using Yahoo PlaceMaker
  • Provides a simple and effective abstraction layer on top of Lucene
  • Supports integrated retrieval and ranking with basis on thematic, temporal and geographical aspects.
  • Supports the Lucene standard retrieval model, as well as the more advanced probabilistic retrieval approaches.
  • Supports Rochio Query Expansion.
  • Provides a framework for IR evaluation experiments (e.g. handling CLEF/TREC topics).
  • Includes a Java
    Java (programming language)
    Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

    alternative to the trec_eval tool, capable of performing significance tests over pairs of runs.
  • Includes a simple test application for searching over the Braun Corpus or the Cranfield Corpus.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK