ElasticSearch
Encyclopedia
ElasticSearch is a distributed, RESTful, free
Free software
Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...

/open source search server based on Apache Lucene. It is developed by Shay Banon and is released under the Apache Software License. ElasticSearch is developed in Java.

History

Shay Banon created Compass
Compass Project
Compass is a free/open source Java Search Engine Framework built on top of Lucene created by Shay Banon.Compass provides a simple API for working with Lucene...

 in 2004. While thinking about the third version of Compass he realized that it would be necessary to rewrite big parts of Compass to "create a scalable search solution". So he created "a solution built from the ground up to be distributed" and used a common interface, JSON over HTTP, suitable for programming languages other than Java as well. Shay Banon released the first version of ElasticSearch in February 2010.

In a french interview some more ideas are explained.

Features

ElasticSearch can be used to search all kind of documents. It provides a scalable search solution, has near real-time search and support for multi-tenancy. "ElasticSearch is distributed, which means that indices can be divided into shards and each shard can have zero or more replicas. Each node hosts one or more shards, and acts as a coordinator to delegate operations to the correct shard(s). Rebalancing and routing are done automatically [...]".

It uses Apache Lucene and tries to make all features of it available through the JSON and Java API. It has features like facetting but also a feature called percolator, which can be useful to be notified if new documents match for registered queries.

Another feature is called 'Gateway' and handles the long term persistence of the index- i.e. an index can be recovered from the Gateway in a case of a server crash. ElasticSearch supports real-time GET requests, which makes it more suitable as NoSQL
Nosql
In computing, NoSQL is a broad class of database management systems that differ from the classic model of the relational database management system in some significant ways. These data stores may not require fixed table schemas, usually avoid join operations, and typically scale horizontally...

 solution, but it lacks distributed transactions.

Comparison to other Software

Apache Solr is another open source search server built on top of Apache Lucene. There are some documents comparing features or performance of Apache Solr and ElasticSearch. In an article from Ryan Sonnek it will be pointed out how Apache Solr and ElasticSearch compare regarding near real-time indexing and searching.

A slide from Peter Karich lists advantages and disadvantages of ElasticSearch, which can be summarized as follows:

Advantages:
  • ElasticSearch is distributed. No separate project required. Replicas are near real-time too, which is called "Push replication".
  • ElasticSearch fully supports the near real-time search of Apache Lucene.
  • Handling multi-tenancy is not a special configuration, where with Solr a more advanced setup is necessary.
  • ElasticSearch introduces the concept of the Gateway


Disadvantages:
  • Only one main developer
  • No autowarming feature


Different Usage
  • Use parent/child feature instead of Solr's results grouping or have a look into this issue .
  • No XML support, only JSON
  • Common container deployment (as a war file) is in development plugin
  • No convenient wrapper for Java beans as the @Field annotation in SolrJ

Users

There are already smaller and some bigger companies using ElasticSearch. For example Stumbleupon and Mozilla have reported that and why they are using ElasticSearch.

Videos

  • What's Next 2011
  • Berlin Buzzwords 2011
  • PHP UK 2011
  • YAPC::EU 2010
  • Berlin Buzzwords 2010

See also

  • Apache Solr
  • Xapian
    Xapian
    Xapian is an open source probabilistic information retrieval library, released under the GNU General Public License . It is a full text search engine library for programmers....

  • Sphinx (search engine)
    Sphinx (search engine)
    Sphinx is a free software search engine designed with indexing database content in mind. It currently supports MySQL, PostgreSQL, and ODBC-compliant databases as data sources natively. Other data sources can be indexed via pipe in a custom XML format...

  • Information extraction
    Information extraction
    Information extraction is a type of information retrieval whose goal is to automatically extract structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language...

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK