GiST
Encyclopedia
In computing, GiST or Generalized Search Tree, is a data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...

 and API that can be used to build a variety of disk-based search trees. GiST is a generalization of the B+ tree
B+ tree
In computer science, a B+ tree or B plus tree is a type of tree which represents sorted data in a way that allows for efficient insertion, retrieval and removal of records, each of which is identified by a key. It is a dynamic, multilevel index, with maximum and minimum bounds on the number of...

, providing a concurrent and recoverable height-balanced search tree infrastructure without making any assumptions about the type of data being stored, or the queries being serviced. GiST can be used to easily implement a range of well-known indexes, including B+ tree
B+ tree
In computer science, a B+ tree or B plus tree is a type of tree which represents sorted data in a way that allows for efficient insertion, retrieval and removal of records, each of which is identified by a key. It is a dynamic, multilevel index, with maximum and minimum bounds on the number of...

s, R-tree
R-tree
R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons. The R-tree was proposed by Antonin Guttman in 1984 and has found significant use in both research and real-world applications...

s, hB-trees, RD-trees, and many others; it also allows for easy development of specialized indexes for new data types. It cannot be used directly to implement non-height-balanced trees such as quad trees
Quadtree
A quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are most often used to partition a two dimensional space by recursively subdividing it into four quadrants or regions. The regions may be square or rectangular, or may have arbitrary shapes. This...

 or prefix trees
Trie
In computer science, a trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the...

 (tries), though like prefix trees it does support compression, including lossy compression
Lossy data compression
In information technology, "lossy" compression is a data encoding method that compresses data by discarding some of it. The procedure aims to minimize the amount of data that need to be held, handled, and/or transmitted by a computer...

. GiST can be used for any data type that can be naturally ordered into a hierarchy of superset
SuperSet
SuperSet Software was a group founded by friends and former Eyring Research Institute co-workers Drew Major, Dale Neibaur, Kyle Powell and later joined by Mark Hurst...

s. Not only is it extensible in terms of data type support and tree layout, it allows the extension writer to support any query predicates that they choose. The most widely-used GiST implementation is in the PostgreSQL
PostgreSQL
PostgreSQL, often simply Postgres, is an object-relational database management system available for many platforms including Linux, FreeBSD, Solaris, MS Windows and Mac OS X. It is released under the PostgreSQL License, which is an MIT-style license, and is thus free and open source software...

 relational database
Relational database
A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...

; it was also implemented in the Informix
Informix
IBM Informix is a family of relational database management system developed by IBM. It is positioned as IBM's flagship data server for online transaction processing as well as integrated solutions...

 Universal Server, and as a standalone library, libgist.

GiST is an example of software extensibility
Extensibility
In software engineering, extensibility is a system design principle where the implementation takes into consideration future growth. It is a systemic measure of the ability to extend a system and the level of effort required to implement the extension...

 in the context of database systems: it allows the easy evolution of a database system to support new tree-based indexes. It achieves this by factoring out its core system infrastructure from a narrow API that is sufficient to capture the application-specific aspects of a wide variety of index designs. The GiST infrastructure code manages the layout of the index pages on disk, the algorithms for searching indexes and deleting from indexes, and complex transactional details such as page-level locking for high concurrency and write-ahead logging for crash recovery. This allows authors of new tree-based indexes to focus on implementing the novel features of the new index type — for example, the way in which subsets of the data should be described for search — without becoming experts in database system internals.

Although originally designed for answering Boolean selection queries, GiST can also support nearest-neighbor search
Nearest neighbor search
Nearest neighbor search , also known as proximity search, similarity search or closest point search, is an optimization problem for finding closest points in metric spaces. The problem is: given a set S of points in a metric space M and a query point q ∈ M, find the closest point in S to q...

, and various forms of statistical approximation
Approximation
An approximation is a representation of something that is not exact, but still close enough to be useful. Although approximation is most often applied to numbers, it is also frequently applied to such things as mathematical functions, shapes, and physical laws.Approximations may be used because...

 over large data sets.

The PostgreSQL GiST implementation includes support for variable length keys, composite keys, concurrency control and recovery; these features are inherited by all GiST extensions. There are several contributed modules developed using GiST and distributed with PostgreSQL. For example:
  • rtree_gist, btree_gist - GiST implementation of R-Tree and B-Tree
  • intarray - index support for one-dimensional array of int4's
  • tsearch2 - a searchable (full text) data type with indexed access
  • ltree - data types, indexed access methods and queries for data organized as a tree-like structures
  • hstore - a storage for (key,value) data
  • cube - data type, representing multidimensional cubes


The PostgreSQL GiST implementation provides the indexing support for the PostGIS
PostGIS
PostGIS is an open source software program that adds support for geographic objects to the PostgreSQL object-relational database. PostGIS follows the Simple Features for SQL specification from the Open Geospatial Consortium .-Features:...

 (geographic information system
Geographic Information System
A geographic information system, geographical information science, or geospatial information studies is a system designed to capture, store, manipulate, analyze, manage, and present all types of geographically referenced data...

) and the BioPostgres bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

system.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK