This chapter covers the more advanced PostgreSQL subjects. These include optimizing table access with indices, advanced table concepts such as inheritance and constraints, the practical use of non-atomic array values, and explicit use of transactions and cursors. The sophistication of these features greatly sets PostgreSQL apart from many other relational database management systems.
This chapter also documents programmatic concepts such as triggers and sequences. Finally, extending PostgreSQL through the addition of user-defined functions and operators is documented for programmers wanting to add customized routines to the database.
Indices are database objects which can greatly increase database performance, contributing to faster execution of statements involving comparative criteria. An index tracks the data on one or more columns in a table, allowing conditional clauses (such as the WHERE clause) to find their targeted rows more efficiently.
The internal workings of indices vary, and there are several implementations to choose from. This section describes the different types of indices available, and explains when you should use one type over the other.
While indices exist to enhance performance, they also contribute to system overhead. Indices must be updated as data in the column which they are applied to fluctuates. An excess of infrequently used indices will decrease performance in, if the amount of time spent maintaining them outweighs the time saved through queries they apply to. In general, indices should be applied only to columns that you expect to use frequently in comparative expressions.
Indices are created with the CREATE INDEX SQL command. Here is the syntax for CREATE INDEX:
CREATE [ UNIQUE ] INDEX indexname ON table [ USING indextype ] ( column [ opclass ] [, ...] )
In this syntax, indexname is the name of the new index to be created, table is the name of the table to be indexed, and column is the name of a specific column to be indexed). Optionally, the indextype parameter may be set to specify what index implementation to be used, and the opclass parameter may be set to indicate what operator class should be used in sorting input values.
Operator classes are stored in PostgreSQL's pg_opclass column. Unless you are especially knowledgeable of the technical inner workings of PostgreSQL's operator classes, this option should not be used.
Regarding the column to index, multiple names may be specified, separated by commas; doing so creates an index across both specified columns. Such an index will only be used by PostgreSQL when executing SQL statements which search all indexed columns in the WHERE clause through the AND keyword. Multi-column indices are limited to a maximum of 16 columns in a default PostgreSQL installation, and may not use an index type other than B-tree.
In determining which columns to create an index upon, consider which columns will be most frequently accessed for search conditions. For example, while the books table keeps an index on its id column (the primary key), the title column is also frequently checked for in WHERE conditions. Adding a secondary index to the title column will greatly increase the performance of SQL statements making comparisons against values in that column.
Example 7-1 shows such an index being created, and uses the \d psql slash command to view the books table. In addition to showing the table's column types, this command also shows the indices that have been created on it.
Example 7-1. Creating an Index
booktown=# CREATE INDEX books_title_idx booktown-# ON books (title); CREATE booktown=# \d books Table "books" Attribute | Type | Modifier ------------+---------+---------- id | integer | not null title | text | not null author_id | integer | subject_id | integer | Indices: books_id_pkey, books_title_idx
Certain types of table constraints, notably the PRIMARY KEY and UNIQUE constraints, result in the implicit creation of an index for use in enforcing the constraint. In Example 7-2 you see the creation of the Book Town authors table with the PRIMARY KEY constraint placed on its id column. This use of PRIMARY KEY causes an index called authors_pkey to be implicitly created.
Example 7-2. Implicit Index Creation
booktown=# CREATE TABLE authors (id integer PRIMARY KEY, booktown(# last_name text, booktown(# first_name text); NOTICE: CREATE TABLE/PRIMARY KEY will create implicit index 'authors_pkey' for table 'authors' CREATE example=# \d authors Table "authors" Attribute | Type | Modifier ------------+---------+---------- id | integer | not null last_name | text | first_name | text | Index: authors_pkey
The index created in Example 7-2 enables PostgreSQL to quickly verify that the primary key value is unique for any new row inserted into the table, as well as enhances the performance of queries which use the id column as a search condition.
Specifying the UNIQUE keyword causes the index to disallow duplicate values within the column (or set of columns) it indexes. Creating a UNIQUE index on a table is functionally identical to creating a table with the UNIQUE constraint (see the section called Using Constraints later in this chapter).
Example 7-3 creates a unique index called unique_publisher_idx on the publishers table's name column. This will disallow two publishers from having the same name in this table.
Example 7-3. Creating a Unique Index
booktown=# CREATE UNIQUE INDEX unique_publisher_idx booktown-# ON publishers (name); CREATE booktown=# \d publishers Table "publishers" Attribute | Type | Modifier -----------+---------+---------- id | integer | not null name | text | address | text | Indices: publishers_pkey, unique_publisher_idx
As the NULL value does not technically match any value, duplicate instances of NULL can be inserted into a column with a unique index placed on it. This is the main practical difference between a unique index and an index implicitly created by the PRIMARY KEY constraint, which does not allow NULL values at all.
Note that the UNIQUE clause may not be used in conjunction with the USING clause for any index type other than B-tree.
The optional USING clause can be used to specify the type of index to implement. As of PostgreSQL 7.1.x, three types of indices are supported:
PostgreSQL's B-tree implementation uses Lehman-Yao high-concurrency B-tree algorithms, and is both the most capable, and most commonly used indexing method. For this reason, it is the default index type.
The R-tree implementation is primarily useful for spacial data type operations (i.e., operations on geometric types), and utilizes Guttman's quadratic split algorithm. The Hash implementation utilizes Litwin's linear hashing routines, which have been traditionally used for indices that involve frequent direct equal-to comparisons (e.g., with the = operator).
At the time of this writing, for PostgreSQL 7.1.x, the B-tree index implementation is by far the most capable and flexible of the available index types. At this time, it is recommended that you use the B-tree index type over the Hash implementation, even for direct = comparisons. The Hash index exists mostly for legacy reasons, though it may still be used if you are sure your system would benefit from a Hash index over a B-tree index.
As stated, the R-tree index implementation is recommended for indexing geometric types; be aware, however, that several limitations exist on the R-tree implementation. For example, you cannot create a unique R-tree index, nor can you create an R-tree index on more than one column. In these cases, it makes more sense to rely on the capable B-tree index type.
The USING clause can be used with the keywords BTREE, RTREE and HASH in order to explicitly choose the type of index you want to create. Specifying BTREE explicitly chooses the default index type.
Example 7-4 creates a table called polygons, which stores spacial data of the type polygon. An index named spacial_idx is then applied to its shape column with the R-tree implementation.
Example 7-4. Specifying an Index Type
booktown=# CREATE TABLE polygons (shape polygon); CREATE booktown=# CREATE INDEX spacial_idx ON polygons USING RTREE (shape); CREATE
Again, unless you have a solid conceptual understanding of why one index type might be preferable in your system over another, we advise you to use the default B-tree type.
A slightly modified form of the CREATE INDEX command can be used to index the results of a function on a column value, rather than the column value itself. This is called a functional index.
Here is the syntax to create a functional index:
CREATE [ UNIQUE ] INDEX indexname ON table [ USING indextype ] ( functionname ( column [, ...] ) [ opclass ] )
The only difference in this syntax is that the index is created on the results of the specified function applied to each column value. All other clauses have the same effect as the functional index.
Functional indices are useful on table columns that commonly have their values prepared through a function before being compared against values in a SQL statement. For example, the upper() function is commonly used to make case-insensitive comparisons. Creating an index using the upper() function enables such case-insensitive comparisons to be carried out efficiently.
Example 7-5 creates a functional index named upper_title_idx on the books table. It uses the upper() function on the title column as the basis to create the index. Then it performs an example SQL query that is more efficiently executed as a result of the newly created functional index.
The SQL command to permanently destroy an existing index from a table is DROP INDEX. Here is the syntax for DROP INDEX:
DROP INDEX indexname [, ...]
In this syntax, indexname is the name of the index that you wish to permanently remove. Multiple indices to drop may be specified, separated by commas.