Index (database)
Encyclopedia
A database index is a data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...

 that improves the speed of data retrieval operations on a database table
Table (database)
In relational databases and flat file databases, a table is a set of data elements that is organized using a model of vertical columns and horizontal rows. A table has a specified number of columns, but can have any number of rows...

 at the cost of slower writes and increased storage space. Indexes can be created using one or more columns of a database table
Column (database)
In the context of a relational database table, a column is a set of data values of a particular simple type, one for each row of the table. The columns provide the structure according to which the rows are composed....

, providing the basis for both rapid random lookups and efficient access of ordered records.
The disk space required to store the index is typically less than that required by the table (since indices usually contain only the key-fields according to which the table is to be arranged, and exclude all the other details in the table), yielding the possibility to store indices in memory for a table whose data is too large to store in memory.

In a relational database
Relational database
A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...

, an index is a copy of one part of a table. Some databases extend the power of indexing by allowing indices to be created on functions or expressions
Expression (programming)
An expression in a programming language is a combination of explicit values, constants, variables, operators, and functions that are interpreted according to the particular rules of precedence and of association for a particular programming language, which computes and then produces another value...

. For example, an index could be created on upper(last_name), which would only store the upper case versions of the last_name field in the index. Another option sometimes supported is the use of "filtered" indices , where index entries are created only for those records that satisfy some conditional expression. A further aspect of flexibility is to permit indexing on user-defined functions, as well as expressions formed from an assortment of built-in functions.

Indices may be defined as unique or non-unique. A unique index acts as a constraint on the table by preventing duplicate entries in the index and thus the backing table.

Non-clustered

The data is present in random order, but the logical ordering is specified by the index. The data rows may be randomly spread throughout the table. The non-clustered index tree contains the index keys in sorted order, with the leaf level of the index containing the pointer to the page and the row number in the data page. In non-clustered index:
  • The physical order of the rows is not the same as the index order.
  • Typically created on column used in JOIN, WHERE, and ORDER BY clauses.
  • Good for tables whose values may be modified frequently.


Microsoft SQL Server
Microsoft SQL Server
Microsoft SQL Server is a relational database server, developed by Microsoft: It is a software product whose primary function is to store and retrieve data as requested by other software applications, be it those on the same computer or those running on another computer across a network...

 creates non-clustered indexes by default when CREATE INDEX command is given. There can be more than one non-clustered index on a database table. There can be as many as 249 nonclustered indexes per table in SQL Server 2005 and 999 nonclustered indexes per table in SQL Server 2008. It also creates a clustered index on a primary key by default.

Clustered

Clustering alters the data block into a certain distinct order to match the index, resulting in the row data being stored in order. Therefore, only one clustered index can be created on a given database table. Clustered indices can greatly increase overall speed of retrieval, but usually only where the data is accessed sequentially in the same or reverse order of the clustered index, or when a range of items is selected.

Since the physical records are in this sort order on disk, the next row item in the sequence is immediately before or after the last one, and so fewer data block reads are required. The primary feature of a clustered index is therefore the ordering of the physical data rows in accordance with the index blocks that point to them. Some databases separate the data and index blocks into separate files, others put two completely different data blocks within the same physical file(s).
Create an object where the physical order of rows is same as the index order of the rows and the bottom(leaf) level of clustered index contains the actual data rows.

They are known as "index organized tables" under Oracle database
Oracle database
The Oracle Database is an object-relational database management system produced and marketed by Oracle Corporation....

.

Cluster (Oracle)

In Oracle database, multiple tables can be joined into a cluster (not to be confused with clustered index described above). The records for the tables sharing the value of a cluster key shall be stored together in the same or nearby data blocks. This may improve the joins of these tables on the cluster key, since the matching records are stored together and less I/O is required to locate them. The data layout in the tables which are parts of the cluster is defined by the cluster configuration. A cluster can be keyed with a B-Tree
B-tree
In computer science, a B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree is a generalization of a binary search tree in that a node can have more than two children...

 index or a hash table
Hash table
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...

. The data block in which the table record will be stored is defined by the value of the cluster key.

Column order

The order in which columns are listed in the index definition is important. It is possible to retrieve a set of row identifiers using only the first indexed column. However, it is not possible or efficient (on most databases) to retrieve the set of row identifiers using only the second or greater indexed column.

For example, imagine a phone book that is organized by city first, then by last name, and then by first name. If you are given the city, you can easily extract the list of all phone numbers for that city. However, in this phone book it would be very tedious to find all the phone numbers for a given last name. You would have to look within each city's section for the entries with that last name. Some databases can do this, others just won’t use the index.

Applications and limitations

Indices are useful for many applications but come with some limitations. Consider the following SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....

 statement: SELECT first_name FROM people WHERE last_name = 'Smith';. To process this statement without an index the database software must look at the last_name column on every row in the table (this is known as a full table scan
Full table scan
Full table scan is a scan made on the database where each row of the table under scan is read in a sequential order and the columns encountered are checked for the validity of a condition...

). With an index the database simply follows the B-tree
B-tree
In computer science, a B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree is a generalization of a binary search tree in that a node can have more than two children...

 data structure until the Smith entry has been found; this is much less computationally expensive than a full table scan.

Consider this SQL statement: SELECT email_address FROM customers WHERE email_address LIKE '%@yahoo.com';. This query would yield an email address for every customer whose email address ends with "@yahoo.com", but even if the email_address column has been indexed the database must perform a full index scan. This is because the index is built with the assumption that words go from left to right. With a wildcard
Wildcard character
-Telecommunication:In telecommunications, a wildcard character is a character that may be substituted for any of a defined subset of all possible characters....

 at the beginning of the search-term, the database software is unable to use the underlying b-tree data structure (in other words, the WHERE-clause is not sargable
Sargable
In relational databases, a condition in a query is said to be sargable if the DBMS engine can take advantage of an index to speed up the execution of the query...

). This problem can be solved through the addition of another index created on reverse(email_address) and a SQL query like this: SELECT email_address FROM customers WHERE reverse(email_address) LIKE reverse('%@yahoo.com');. This puts the wild-card at the right-most part of the query (now moc.oohay@%) which the index on reverse(email_address) can satisfy.

Bitmap index

A bitmap index is a special kind of index that stores the bulk of its data as bit arrays (bitmaps) and answers most queries by performing bitwise logical operations
Bitwise operation
A bitwise operation operates on one or more bit patterns or binary numerals at the level of their individual bits. This is used directly at the digital hardware level as well as in microcode, machine code and certain kinds of high level languages...

 on these bitmaps. The most commonly used index, such as B+trees, are most efficient if the values it indexes do not repeat or repeat a smaller number of times. In contrast, the bitmap index is designed for cases where the values of a variable repeat very frequently. For example, the gender field in a customer database usually contains two distinct values: male or female. For such variables, the bitmap index can have a significant performance advantage over the commonly used trees.

Dense index

A dense index in database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

s is a file
Computer file
A computer file is a block of arbitrary information, or resource for storing information, which is available to a computer program and is usually based on some kind of durable storage. A file is durable in the sense that it remains available for programs to use after the current program has finished...

 with pairs of keys and pointers for every record
Record (computer science)
In computer science, a record is an instance of a product of primitive data types called a tuple. In C it is the compound data in a struct. Records are among the simplest data structures. A record is a value that contains other values, typically in fixed number and sequence and typically indexed...

 in the data file. Every key in this file is associated with a particular pointer to a record in the sorted data file. In clustered indices with duplicate keys, the dense index points to the first record with that key.

Sparse index

A sparse index in databases is a file with pairs of keys and pointers for every block
Block (data storage)
In computing , a block is a sequence of bytes or bits, having a nominal length . Data thus structured are said to be blocked. The process of putting data into blocks is called blocking. Blocking is used to facilitate the handling of the data-stream by the computer program receiving the data...

 in the data file. Every key in this file is associated with a particular pointer to the block in the sorted data file. In clustered indices with duplicate keys, the sparse index points to the lowest search key in each block.
primary key is a sparse index.

Reverse index

A reverse key index reverses the key value before entering it in the index. E.g., the value 24538 becomes 83542 in the index. Reversing the key value is particularly useful for indexing data such as sequence numbers, where new key values monotonically increase.

Index implementations

Indices can be implemented using a variety of data structures. Popular indices include balanced trees, B+ tree
B+ tree
In computer science, a B+ tree or B plus tree is a type of tree which represents sorted data in a way that allows for efficient insertion, retrieval and removal of records, each of which is identified by a key. It is a dynamic, multilevel index, with maximum and minimum bounds on the number of...

s and hashes
Hash table
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...

.

In Microsoft SQL Server
Microsoft SQL Server
Microsoft SQL Server is a relational database server, developed by Microsoft: It is a software product whose primary function is to store and retrieve data as requested by other software applications, be it those on the same computer or those running on another computer across a network...

, the leaf node of the clustered index corresponds to the actual data, not simply a pointer to data that resides elsewhere, as is the case with a non-clustered index. Each relation can have a single clustered index and many unclustered indices.

Index concurrency control

An index is typically being accessed concurrently by several transactions and processes, and thus needs concurrency control
Concurrency control
In information technology and computer science, especially in the fields of computer programming , operating systems , multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible.Computer...

. While in principle indexes can utilize the common database concurrency control methods, specialized concurrency control methods for indexes exist, which are applied in conjunction with the common methods for a substantial performance gain.

Covering Index

In most cases, an index is used to quickly locate the data record(s) from which the required data is read. In other words, the index is only used to locate data records in the table and not to return data.

A covering index is a special case where the index itself contains the required data field(s) and can return the data.

Consider the following table (other fields omitted):
ID Name Other Fields
12 Plug ...
13 Lamp ...
14 Fuse ...


To find the Name for ID 13, an index on (ID) will be useful, but the record must still be read to get the Name. However, an index on (ID, Name) contains the required data field and eliminates the need to look up the record.

A covering index can dramatically speed up data retrieval but may itself be large due to the additional keys, which slow down data insertion & update. To reduce such index size, some systems allow non-key fields to be included in the index. Non-key fields are not themselves part of the index ordering but only included at the leaf level, allowing for a covering index with less overall index size.

Standardization

There is no standard about creating indexes because the ISO SQL Standard does not cover physical aspects. Indexes are one of the physical parts of database conception among others like storage (tablespace or filegroups). RDBMS vendors all give a CREATE INDEX syntax with some specific options which depends on functionalities they provide to customers.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK