49

Indexing for performance in Database Systems

Share on TwitterShare on TumblrSubmit to StumbleUponSave on DeliciousDigg This

 

Introduction
 
Indexes assist with query processing by speeding up access to the data stored in tables and views. An index is a database object that, when a table is created, can provide faster access path to data and can facilitate faster query execution. Prior to the advent of the relation database, database users have to specifically know how to get at a particular piece of data. The navigation path to the data had to be included as part of the data request. One of the big advances to relational databases was to eliminate the need for this extra information mapping data request to the underlying physical structure. When the user needs to access some data items, instead of always searching through the whole table (a full table scan), an index facilitates to find the data items within the table in a transparent way. An index is designed to let the database system find and retrieve specific rows with less overall Input/Output (I/O) cost. The more indexes there are, the better for the queries; however more indexes mean more overhead whenever the data is inserted, updated, or deleted because the index must be maintained.
 
Types of indexes
 
Now let’s discuss the different types of indexes in relational databases. 
 
The most used type of index is the BTree index. BTree, meaning binary tree, looks like an upside down tree. BTree indexes are improperly named as they are not actually a binary tree (two-branches in each level). Essentially, a BTree consists of a root node, branch nodes (intermediate level), and ultimately leaf nodes which is lowest level and contains key values and the information to access to the data item. Te leaf node differs based on whether the index type is clustered or nonclustered. If it’s a clustered index, the leaf node is the actual data pages or blocks itself. If it’s a nonclustered index, the leaf node contains the pointers to the data item. BTree construction and searching methods are highly efficient for both reading and changing the underlying data, automatically changing the structure of the BTree without any overflow. Let’s say that the number of I/O operations required to access to a data items is proportional to the number of branch levels that must be traversed to reach the leaf node. In the Figure 1, you can see an example of a BTree index.
 
Let me say that overflow is bad situation for indexes because changes are placed outside of the optimized index structure. Enough changes and the underlying overflow can reduce the efficiency of indexes instead of improving table access performance.

Figure 1.

Another index structure is a bitmap index. A bitmap index contains binary representation for each record. They are sometimes misused and they are extremely vulnerable to overflows. It’s remarkable to say new values cannot be updated into existing bitmap indexes as easily as we can do in BTree indexes. In the Figure 2, you can see an example of a bitmap index where two bitmaps are created for two values (one for the male and one for the female). When M is encountered, the M bitmap is set to 1 and the F bitmap is set to 0. It’s sensible to use bitmap indexes when there are few distinct values for the indexed column, in relation to the total number of rows in the underlying table.

Figure 2.

ISAM (Indexed Sequential Access Method) indexes are a type of index which uses a simple structure with a list of record numbers. They are best used for static data as their internal list structure does not allow easy updates, thus increasing the overflow effect.

Hash indexes are a type of index based on hash table concepts. They are highly efficient for read access, but they don’t allow easy updates and the overflow effect in hash indexes is worst than in ISAM indexes. Both ISAM and hash indexes are not good for heavily changing data because their structure overflow easily with inserted records.

It’s remarkable to say that the most commonly used index in relational databases is BTree indexes; the other types of indexes are only applicable to only specific cases.

 

Besides the index structure, there are several ways to create the index based on several options:

 

  • Ascending and descending indexes to sort the values of the indexes.
  • Unique indexes which have unique values (no duplicate values).
  • Non-unique indexes which allow duplicate values.
  • Composite indexes which are built from several columns of a table named as index key.
  • Compressed indexes which allow the compression of composite indexes where repeated values are effectively stored in the index.

 

Index keys make up the content of the index acting as a logical entity that identifies a particular index entry. An index key can consist of one or more columns of a table and this is the value that will be used to access the data on the table. Each column in key may be its sort order. The key is also used to determine uniqueness.

The index keys are used as the criteria in the WHERE clause of the data request. In a composite or multicolumn index, the leading edge of the index must be included in the WHERE clause of the SQL statement. This must be the leading edge of the index, because the data on the index is organized on the first index key value and then the subsequent key values. For example, if an index is created on the columns last_name and first_name of the employee table. The SQL statement in Listing 1 and Listing 2 can benefit from the created index.
 

 

select *
from employee
where last_name=’Olamendy’ and first_name=’John’

 

Listing 1
 

 

select *
from employee
where last_name=’Olamendy’;

 

Listing 2

However, the query in Listing 3 will not use the index at all. In this case, a table scan will be used.
 

 

select *
from employee
where first_name=’John’;

 

Listing 3

Indexes can also be classified as clustered and nonclustered. A clustered index stores the data in the table in sorted order according in the leaf page of index based on the underlying index key. Because the data is stored in the leaf node of the index, the data is accessed through the index structure. There is no other way to access the data but through the index (there is no way to access the data directly). The leaf nodes are linked together via pointers (known as linked list). Users can only define one clustered index per table.

A nonclustered index is both logical and physical independent of the physical structure of the table. Unlike the clustered index, the data in the table is not stored in the leaf nodes of the index; instead the leaf nodes contain either a cluster key pointing to clustered index or information pointing directly to the data in the table. Therefore, many nonclustered indexes can exist on single time at one time.

Indexes can also be classified as unique and nonunique. A unique index ensures that the values contained in within the unique index columns will appear only once within a table. In SQL Server, a unique index on the table does not allow insert multiple NULL values on the underlying unique columns. An Oracle index does not have an index entry if the key for the associated table is NULL; therefore, it can exist multiple values of NULL on a table with a defined unique index.

Creating indexes in SQL Server

An index in SQL server is created by the CREATE INDEX statement in Listing 4.
 

 

create [unique] [clustered | nonclustered] index {index_name}
on {table_name}({column_name}[ASC|DESC][,…n])
[include(column_name[,…n])]
[with (index_options)[,…n]]
[on {file_group_name}]

 

Listing 4

Where the parameters are specified as:

  • Unique. Specifies that index to be created is unique.
  • Clustered. Specifies that the index is a clustered index.
  • Nonclustered. Specifies that the index is a nonclustered index. This is the default type of index.
     
  • Include. Specifies included nonindexed columns.
  • With. Options related to the physical structure of the index.
  • On. Specifies in which filegroup the index will be created.

In order to drop an index, you need to use the statement in Listing 5. When you drop an index, it’s physically removed from the database.

drop index {index_name} on {table_name}

Listing 5

And finally in this section, let’s talk about rebuilding index. Sometimes you need to rebuild the index because data is inserted or updated in the index, and a page split occurs. These page splits cause the physical structure of the index to become fragment and the underlying tree structure is unbalanced. In order to rebuild a more efficient tree structure, the index must be rebuilt. With SQL Server 2005, Microsoft recommends to use the ALTER INDEX REORGANIZE statement. If the fragmentation is greater than 30 percent, Microsoft recommends using ALTER INDEX REBUILD WITH (ONLINE=ON) statement (dropping and recreating the index). You can check the level of fragmentation in an index by using the sys.dm_db_index_physical_stats function.

In SQL Server 2005, you can also disable an index using the ALTER INDEX DISABLE statement. This means to deny access to the index without removing the index definition and the underlying statistics.

Creating indexes in Oracle

An index in Oracle is created basically by the CREATE INDEX statement in Listing 6.

create [unique] index {index_name}
on {table_name}({column_name}[ASC|DESC][,…n])

Listing 6

In Oracle we can find four types of indexes: standard BTree index (already discussed), bitmap index (already discussed), reverse key index and function-based index.

In the case of BTree index, it’s remarkable to say that the implementation of clustered index in Oracle is by creating index organized tables (IOT), in which the leaf nodes stores the entire row of data rather the ROWID pointing to the associated row. But there are some drawbacks concerning index organized tables such as they cannot use a UNIQUE constraint or be stored in cluster environment; in addition, they don’t support distribution, replication, and partitioning. The creation of an index organized table is shown in Listing 7.
create table customer

( id number(6,0) not null,
fullname varchar2(50) not,
age number(4,0),
constraint pk_customer primary key(id)
) organization index
tablespace mytablespace;

Listing 7

Reverse key index is designed to address a very specific scenario concerning with an index based on an ascending value, such as a sequence. In this scenario, all new rows will be inserted into the rightmost leaf node of the index, because each new value is the highest value. In addition, any deletion from the index has a tendency to be skewed toward the left side as older rows are removed. This situation ends up with an unbalanced BTree index where the left side of the index is more sparsely populated than the leaf nodes on the right side. You can solve this problem by periodically rebuilding the index. However, Oracle has also introduced the reverse key index to solve this problem. This index takes the value in a key and reverses its order.

In Oracle, we can find bitmap indexes and a special version of this type of index named bitmap join index. Bitmap indexes are designed for data warehouse environments where the full set of queries are not known at system implementation time. This type of indexes is not designed for OLTP systems.The creation of bitmap index is very simple. Let’s create a bitmap index on the loc column of the dept table in Oracle as shown in Listing 8.

create bitmap index bmpndx_dept on dept(loc);

Listing 8

A bitmap join index can improve the performance in a data warehouse even more than a bitmap index. The bitmap join index uses the same physical structure as the bitmap index, but the bits in the index indicate how one table (generally a fact table) is joined to another table. Let’s illustrate this concepts using the table emp and dept in the Oracle database. Let’s suppose that we have the following query as shown in Listing 9.

select e.*
from emp e, dep d
where e.deptno = d.deptno and dept.name=’RESEARCH’;

Listing 9

The bitmap join index enables to index the dept.dname column, but not to have the index point to the dept, but at the emp table (see Listing 10).

create bitmap index bmpndx_dept_emp on emp(d.dname)
from emp e, dept d
where e.deptno = d.deptno;

Listing 10

And finally, Oracle supports a type of index named function-based indexes. A function-based index is just like a standard BTree index or bitmap index, except that Oracle only stores the result of the function, instead of the values of columns in a table (see Listing 11).
Let’s create a function-based index on the UPPER function value of the ename column of the emp table.

create index fb_ndx_emp_ename on emp(upper(ename));

Listing 11

The query in Listing 12 uses this type of index.

select *
from emp
where upper(ename)=’JOHN’;

 

Listing 12

Finally, I can say that in Oracle, we have the possibility to rebuild the index to shrink the index. This can be done using the ALTER INDEX REBUILD statement.

 

prajapat