GLIMPSE - AbsoluteAstronomy.com

GLIMPSE is a text indexing and retrieval software program originally developed at the University of Arizona

University of Arizona

The University of Arizona is a land-grant and space-grant public institution of higher education and research located in Tucson, Arizona, United States. The University of Arizona was the first university in the state of Arizona, founded in 1885...

by Udi Manber

Udi Manber

Udi Manber is an Israeli computer scientist. He is one of the authors of agrep and GLIMPSE. As of April 2008, he is employed by Google as one of their vice presidents of engineering.-Biography:...

, Sun Wu

Sun Wu

Sun Wu may mean:*Sun Tzu, a Chinese military strategist of the sixth century BC and the author of The Art of War*Sun Wukong, or the Monkey King, a figure from Chinese legend*Eastern Wu, a state in southeastern China during the Three Kingdoms Period...

, and Burra Gopal. A web server version called WebGlimpse is now being maintained under a pay per line licence. Neither project could be considered open source

Open source

The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...

although there are some similarities.

GLIMPSE stands for GLobal IMPlicit SEarch. While many text indexing schemes create quite large indexes (usually around 50% of the size of the original text), a GLIMPSE-created index is only 2-4% of the size of the original text.

GLIMPSE uses and takes a great deal of inspiration from Agrep

Agrep

agrep is a proprietary fuzzy string searching program, developed by Udi Manber and Sun Wu between 1988 and 1991, for use with the Unix operating system...

, which was also developed at the University of Arizona, but GLIMPSE uses a high level index whereas Agrep parses all the text each time.

The basic algorithm is similar to other text indexing and retrieval engines, except that the text records in the index are huge, consisting of multiple files each. This index is searched using a boolean matching algorithm like most other text indexing and retrieval engines. After one or more of these large text records is matched, Agrep is used to actually scan for the exact text desired. While this is slower than traditional totally indexed approaches, the advantage of the smaller index is seen to be advantageous to the individual user. This approach would not work particularly well across websites, but would work reasonably well for a single site, or a single workstation. In addition, the smaller index can be created more quickly than a full index.

External links

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.