Egothor
Encyclopedia
Egothor is an open source
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...

 search engine
Search engine
A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...

 implementation written entirely in Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

 to ensure cross platform compatibility.

It is aimed at any application involving full text search, and the project lies particular emphasis on its cross platform compatibility. Egothor has been successfully configured as a standalone application, in libraries or as peer-to-peer hub.

Egothor is able to recognize many common file formats: HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

, PDF
Portable Document Format
Portable Document Format is an open standard for document exchange. This file format, created by Adobe Systems in 1993, is used for representing documents in a manner independent of application software, hardware, and operating systems....

, PS
PostScript
PostScript is a dynamically typed concatenative programming language created by John Warnock and Charles Geschke in 1982. It is best known for its use as a page description language in the electronic and desktop publishing areas. Adobe PostScript 3 is also the worldwide printing and imaging...

, DOC
DOC (computing)
In computing, DOC or doc is a filename extension for word processing documents; most commonly for Microsoft Word. Historically, the extension was used for documentation in plain-text format, particularly of programs or computer hardware, on a wide range of operating systems...

 and XLS
XLS
XLS may refer to:* Microsoft Excel file format, a spreadsheet file format* Cadillac XLS, a future Cadillac model* Saint-Louis Airport , near Saint-Louis, Senegal...

. Its architecture means that other file formats can be easily added. The engine can index about 50 pages a second and comes with a high capacity crawler robot which is compatible with the robots.txt standard.

Egothor supports both Boolean and vector search
Vector space model
Vector space model is an algebraic model for representing text documents as vectors of identifiers, such as, for example, index terms. It is used in information filtering, information retrieval, indexing and relevancy rankings...

. Its open architecture can easily be extended and supports almost any language. The search engine is currently mostly used as a demo or in small scale projects.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK