Thread: Using indexes and keys

Using indexes and keys

From

John Taylor

Date:

08 February 2002, 12:20:40

I'm a bit confused about the use of indexes with postgresql.
I've a number of questions that are probably obvious when you know
the answer, but I can't find in the documentation.

1) It seems that I can only have primary keys in postgresql,
   does this mean that keys are not the same as indexes ?

2) If I define a column as a primary key, does that mean that it will be
   indexed, so I don't nee to create an index as well ?

2a) If so - If I define two or more columns as the primary key, will they
    be indexed seperately, or do I need to index the 2nd and 3rd columns
    seperately

3) If I create an index on a column (it is not unique and so can't be a key),
   what do I need to query on - the index name or the column name ?

Thanks
JohnT

Re: Using indexes and keys

From

"Josh Berkus"

Date:

08 February 2002, 12:50:20

John,

> I'm a bit confused about the use of indexes with postgresql.
> I've a number of questions that are probably obvious when you know
> the answer, but I can't find in the documentation.

Actually, I like your questions.  All of the below are obvious to
 veteran DB programmers, but not to newbies.  I'll put together a FAQ
 based on your questions for Techdocs.  Few of the answers below are
 particular to Postgres; you will find them similar across all SQL
 RDBMSs.

> 1) It seems that I can only have primary keys in postgresql,
>    does this mean that keys are not the same as indexes ?

Correct, they are not the same.  A "key" is a unique set of data that
 allows you to uniquely identify a row.  One of these Keys, chosen by
 you, becomes a "Primary Key" by which the database system finds the
 row for which you're looking.  The other Keys, if any, are "candidate
 keys."  You may wish to put Constraints on these Keys to enforce their
 uniqueness.  Example:

Create Table staff (
    staff_id SERIAL NOT NULL PRIMARY KEY,
    staff_name VARCHAR (100) NOT NULL,
    SSN VARCHAR (9) NOT NULL,
    department VARCHAR (25) NOT NULL,
    hire_date TIMESTAMP NOT NULL,
    comments VARCHAR(200) NULL,
    CONSTRAINT cs_SSN_unq UNIQUE(SSN),
    CONSTRAINT cs_staff_combo_unq UNIQUE (staff_name, department,
 hire_date)
    );

In the table above, the integer index value staff_id is my Primary Key.
  The SSN is a Candidate Key which I have chosen not to make my Primary
 Key.  Because of the uniqueness constraint, the combination
 ofstaff_name, department, and hire_date is also a Candidate Key.

If I have two natural Candidate Keys, why did I choose to use a SERIAL
 integer ID value?  Convenience.  SERIAL ID's are the easiest things to
 work with as Primary Keys in most cases.  Because the ID has no
 purpose outside of providing a Primary Key, you don't have to worry
 about it changing for an existing record, or getting re-used.

> 2) If I define a column as a primary key, does that mean that it will
>  be
>    indexed, so I don't nee to create an index as well ?

In Postgres and most other RDBMSs, yes.  When you declare a Primary
 Key, Postgres will tell you that it is creating both an index and a
 uniqueness constraint on that column.  While neither the index nor the
 constraint are "part of" the key, Postgres (like most RDBMSs) needs
 them to enforce the primary key.

> 2a) If so - If I define two or more columns as the primary key, will
>  they
>     be indexed seperately, or do I need to index the 2nd and 3rd
>  columns
>     seperately

They will be indexed together.  So, if you need to do lookups on the
 3rd column seperate from the 1st, you will need to build a seperate
 index for that column alone.

CAUTION:  From my 7 years of RDBMS programming experience, multi-column
 primary keys are a huge headache and you will soon find yourself
 re-building the table with an independant SERIAL primary key.  If you
 want to know why, just try doing a LEFT OUTER JOIN on two tables with
 3-column primary keys.  Bleah!

> 3) If I create an index on a column (it is not unique and so can't be
>  a key),
>    what do I need to query on - the index name or the column name ?

The column name.  Your request (query) goes through a Query Planner,
 which decides with good accuracy what the fastest way to find your
 data is.  Sometimes it uses your index, sometimes not, depending on
 what you queried (indexes are not always faster).  All you have to do
 is run ANALYZE on your database periodically so that the Query Planner
 has accurate current information to base its decisions on.

You would benefit from reading a few general RDBMS books.  See the book
 review page:  http://techdocs.postgresql.org/bookreviews.php

Finally, the new version, 7.2, keeps statistics on the use your indexes
 are seeing, which over time will help you improve your database by
 dropping indexes that are not used and adding the ones you need.

-Josh Berkus

______AGLIO DATABASE SOLUTIONS___________________________
                                       Josh Berkus
  Complete information technology      josh@agliodbs.com
   and data management solutions       (415) 565-7293
  for law firms, small businesses        fax 621-2533
    and non-profit organizations.      San Francisco

Re: Using indexes and keys

From

John Taylor

Date:

11 February 2002, 05:31:38

Josh,

I've taken your advise and ordered a book, but until that arrives,
one further question, if you don't mind:

On Friday 08 February 2002 17:49, Josh Berkus wrote:

>
> > 2a) If so - If I define two or more columns as the primary key, will
> >  they
> >     be indexed seperately, or do I need to index the 2nd and 3rd
> >  columns
> >     seperately
>
> They will be indexed together.  So, if you need to do lookups on the
>  3rd column seperate from the 1st, you will need to build a seperate
>  index for that column alone.
>

If I need to lookup on a combined key, how do I do that ?
The CREATE TABLE syntax doesn't seem to allow me to combine the columns into a single key name.

Additionally, if I combine two VARCHAR columns are the following keys unique ?

...
    a VARCHAR(10),
    b VARCHAR(10),
    PRIMARY KEY (a,b),
...

a="abc", b="def"

a="ab", b="cdef"

If not, would I need to make a CHAR(10) rather than VARCHAR(10), or is there some other trick ?

Thanks
JohnT

Re: Using indexes and keys

From

"Josh Berkus"

Date:

11 February 2002, 11:51:04

John,

> If I need to lookup on a combined key, how do I do that ?

To SELECT or JOIN based on a multi-column key, you need to use WHERE
 conditions for each column:

...FROM tablea JOIN tableb ON (tablea.c1 = tableb.c1 AND tablea.c2 =
 tableb.c2)

If you want to make sure and use the multi-column index on that key,
 make sure to keep the columns in the same order as they were in the
 key declaration.

> The CREATE TABLE syntax doesn't seem to allow me to combine the
>  columns into a single key name.

Why not?

CREATE TABLE tablea (
    col1 INT4 NOT NULL,
    col2 INT4 NOT NULL,
    CONSTRAINT tablea_pk PRIMARY KEY (col1, col2)
    );

> Additionally, if I combine two VARCHAR columns are the following keys
>  unique ?
>
> ...
>     a VARCHAR(10),
>     b VARCHAR(10),
>     PRIMARY KEY (a,b),
> ...
>
> a="abc", b="def"
>
> a="ab", b="cdef"

Yes, they are unique.  Postgres does not concatinate fields to make a
 multi-column key.

-Josh

P.S. For anyone just tuning into the thread, keep in mind that
 multi-column keys are considerably more trouble than they're worth 75%
 of the time.

7.1 -> 7.2 on Debian

From

ghaverla@freenet.edmonton.ab.ca

Date:

12 March 2002, 23:16:50

Hi,

  About a week ago, I was out apt-get'ing things, and
various dependencies decided postgresql needed to get
upgraded.  Fine, I don't use it for much yet, but
had some data in it.  During the install, I got a
message saying I needed to do the dump/restore manually,
and it went along on its merry way.  Tonight, I went to
do the dump.  Debian has a shell script (supposedly)
all set up to do this, but somehow I don't think it worked.
If I run this custom script, I get some output to the screen
at the end which is postgresql related, but looks like
error messages from the log.  The dump is 218 bytes long.
If I try to run the 7.1 dumpall command manually, it tells
me the postmaster isn't running.  Which is correct, Debian
has locked out the postmaster since the database dump
needed to be done manually.

Can this dump really be only 218 bytes long?  Or should
a person uninstall 7.2, reinstall 7.1, dump the data,
uninstall 7.1, reinstall 7.2?

Gord

Matter Realisations     http://www.materialisations.com/
Gordon Haverland, B.Sc. M.Eng. President
101  9504 182 St. NW    Edmonton, AB, CA  T5T 3A7
780/481-8019            ghaverla @ freenet.edmonton.ab.ca
780/993-1274 (cell)

Re: 7.1 -> 7.2 on Debian

From

Oliver Elphick

Date:

13 March 2002, 04:17:46

On Wed, 2002-03-13 at 04:16, ghaverla@freenet.edmonton.ab.ca wrote:
> Hi,
>
>   About a week ago, I was out apt-get'ing things, and
> various dependencies decided postgresql needed to get
> upgraded.  Fine, I don't use it for much yet, but
> had some data in it.  During the install, I got a
> message saying I needed to do the dump/restore manually,
> and it went along on its merry way.  Tonight, I went to
> do the dump.  Debian has a shell script (supposedly)
> all set up to do this, but somehow I don't think it worked.
> If I run this custom script, I get some output to the screen
> at the end which is postgresql related, but looks like
> error messages from the log.  The dump is 218 bytes long.
> If I try to run the 7.1 dumpall command manually, it tells
> me the postmaster isn't running.  Which is correct, Debian
> has locked out the postmaster since the database dump
> needed to be done manually.

You can start it.  As root do:

  /etc/init.d/postgresql start

and then connect with psql, but this will probably confirm that you have
an empty datbase.

> Can this dump really be only 218 bytes long?  Or should
> a person uninstall 7.2, reinstall 7.1, dump the data,
> uninstall 7.1, reinstall 7.2?

You have probably hit a bug in the automatic upgrade process of the
Debian packaging.  Your old database is stored in
$PGDATA/../data.xx.save (where xx is the old version), but needs the 7.1
binaries to read it.  These should be stored in
/usr/lib/postgresql/dumpall/7.1 but there are sometimes problems along
the way.

This procedure is quite tricky, since it needs cooperation between the
old and the new versions of the package.  The latest release (7.2-5) has
improved the procedure and may help; it is still in unstable.

Please follow this procedure.

In each of the directories in $PGDATA/.. check the contents of
PG_VERSION, so as to be sure which is which.  Make sure that the 7.1
directory is called $PGDATA.  It would be wise to have a separate backup
of this.

Reinstall 7.1.3-7, check that the data is all present and correct and
then retry the upgrade using 7.2-5.  Then let me know how it goes.  If
possible, use script to capture the entire upgrade session.

Since this is a Debian packaging issue, further discussion would be more
appropriate in private email or on the Debian mailing lists.

--
Oliver Elphick                                Oliver.Elphick@lfix.co.uk
Isle of Wight                              http://www.lfix.co.uk/oliver
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C

     "Who is like Thee among the gods, O Lord? Who is like
      Thee, majestic in holiness, awesome in praises,
      working wonders?"
                              Exodus 15:11