Introduction to Relational Databases

PostgreSQL is a sophisticated object-relational database management system (ORDBMS). An ORDBMS is an extension of the more traditional relational database management systems (RDBMS). An RDBMS enables a user to store related pieces of data via two-dimensional data structures called tables. This data may consist of many defined types, such as integers, floating-point numbers, character strings, and timestamps. Inserted data in the table is categorized using a grid-like system of vertical columns, and horizontal rows. The relational model was built on a strong premise of conceptual simplicity, which is arguably both its most prominent strength and weakness.

The object-relational aspect of PostgreSQL adds numerous enhancements to the straight relational data model. These include support for, arrays (multiple values in a single column), inheritance (child-parent relationships between tables), and functions (programmatic methods invoked by SQL statements). For the advanced developer, PostgreSQL even has support for extensibility of its data types, and of its procedural languages.

Due to this object-relational concept, tables are sometimes called classes, while rows and columns can be referred to as object-instances and object-attributes, respectively. We will use this terminology interchangeably in this book. Other SQL data structures, such as indices and views, can be referred to as database objects.

Note

Take care to observe that while PostgreSQL is considered object-relational, this is not synonymous with object-oriented, a term pertaining to many modern programming languages. While PostgreSQL supports several objective improvements to the relational model, it is still more accurate to refer to PostgreSQL as a relational database management system (RDBMS).

Understanding Databases

While PostgreSQL is commonly considered an RDBMS, or a "database," it may not be commonly understood what is meant specifically by the word database. A database within PostgreSQL is an object-relational implementation of what is formally called a schema in SQL99.

Put simply, a database is a stored set of data which is logically interrelated. Typically, this data can be accessed in a multi-user environment. This is the case with PostgreSQL, though there are well-defined rights and restrictions enforced with that access.

What may not be commonly understood is that PostgreSQL can at one time have several databases concurrently available, each with their own owner, and each with their own unique tables, views, indices, sequences and functions.

In order to create a table, function, or any other database object, you must be connected to a specific database via a PostgreSQL client. Once connected, an object can be created, which is then owned by the connected database, and will be therefore inaccessible from any other database (though a client may have several connections open to different databases).

By keeping fundamental data objects segregated into their own databases in this fashion, you run a smaller risk of running into a naming collision by choosing a table name already chosen for another purpose (e.g., if two users each wanted to have a table called "products" for two separate applications). This is because neither database has any knowledge of the other database's components, and will not attempt to make any kind of logical relationship between them. Furthermore, as the same rule applies to object-relational data objects, users may even create functions and language definitions within their database which are inaccessible by other users connected to other databases running within PostgreSQL.

A default installation of PostgreSQL installs only one functional database, which is called template1. It is named template1 to represent the template based nature of the database. Any database which is created after it is essentially a clone, inheriting any of its database objects, including table structure, functions, languages, etc. It is not uncommon to create a default database for new PostgreSQL users with the same name as their PostgreSQL username, as PostgreSQL will attempt to connect to a database with the same name as the connecting user if a database name is not specified.

Understanding Tables

Tables are quite possibly the most important aspect of SQL to understand inside and out, as all of your data will reside within them. In order to be able to correctly plan and design your SQL data structures, and any programmatic routines toward accessing and applying that data, a thorough understanding of tables is an absolute pre-requisite.

A table is composed of columns and rows, and their intersections are fields .. If you have ever used spreadsheet software before (such as Excel), these two terms are visually represented in the same manner, and the fields within a table are equivalent to the cells within a spreadsheet. From a general perspective, columns within a table describe the name and type of data that will be found (and can be entered) by a row for that column's fields. Rows within a table represent records composed of fields that are described from left to right by their corresponding column's name and type. Each field in a row is implicitly correlated with each other field in that row. In this sense, columns can be thought of as descriptors for the discrete, sequential elements of a row, and each row can be thought of as a stored record matching that description.

Table 3-1 illustrates a simple table called books, used by our imaginary bookstore, Book Town. This table is frequently referred to in this book throughout our examples. Each of its stored records describes a book by a numeric identifier, title, author identifier, and subject identifier. These characteristics, from left to right, are described by its columns (id, title, author_id, and subject_id).

Table 3-1. An Example SQL Table

id

title

author_id

subject_id

7808

The Shining

4156

9

156

The Tell-Tale Heart

15

9

4513

Dune

1866

15

4267

2001: A Space Odyssey

2001

15

1608

The Cat in the Hat

1809

2

1590

Bartholomew and the Oobleck

1809

2

As you can see, this describes a table with four columns, in a fixed left to right order, currently populated by six rows (also known as tuples, or records). It is essential to understand that in a relational database, while a table has a fixed column order, rows themselves are inherently unordered. You will see later, as the SQL's query structure is explained in Chapter 4, that there are ways within SQL to order selected rows. However, the rows in the database itself are not automatically ordered in any consistently predictable way. The explicit ordering of records must be an important consideration in the creation of appropriate and applicable SQL queries in situations when the order used is meaningful.

Every table must have at least one column, but tables may at times contain no rows. This is because each vertical column corresponds to a relatively fixed attribute of the data represented in that table (such as the title column in the previous example's books table). Without a column, a row's contents would be ambiguous; without a row, a table is merely lacking recorded data. As of PostgreSQL 7.1, there is a maximum of 1600 columns to a table, and an unlimited number of rows (i.e., limited only by hardware limitations, such as disk space).

In Table 3-1, the column names fairly clearly indicate the significance of each column. The decision of how to name columns is fairly arbitrary, though, and care must be taken in planning table names and conventions to avoid ambiguity.

Though it may not be immediately obvious, each of the columns of a table have an associated data type. While a column's data type helps to further describe the sort of information it contains, it constrains the kind of data that may be inserted into the column. For example, the author_id column is of type integer; this signifies that any insertion attempts not consisting of pure a integer (e.g., "110a") will fail. These types are described in more detail in the section called Data Types.

This section introduced the general concepts of how data is logically arranged in a relational database, and within tables. The next section, the section called SQL Statements, addresses the topic of SQL Statements, which are the basis for all interactions with the database.