Re: pgsql and large tables - Mailing list pgsql-general

From Andrew G. Hammond
Subject Re: pgsql and large tables
Date
Msg-id E164Sxb-000366-00@xyzzy.lan.internal
Whole thread Raw
In response to pgsql and large tables  ("Gurupartap Davis" <partap@yahoo.com>)
List pgsql-general
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2001 November 13 03:49 pm, Gurupartap Davis wrote:

> I initially created the table and with a primary key (5 fields: char(4),
> varchar(32), date, int, int) and a couple additional indexes, one that just
> changes the order of the primary key, and one that sorts on the date.  Then
> I imported 40 million rows and tried some queries, but discovered that none
> of my queries were using indexes, causing them to take forever.

If your primary key is spread over 5 fields, then it's index will also be
spread over those 5 fields.  This means that in order to use the index, your
querries must specify those fields in the WHERE clause.  Order is also
important.  From what it looks like above, your index is sorted on first the
char(4), then varchar(32) ... to the last int.  So if you specify everything
except the first column (char(4)) then your primary key index will be useless
to you.

> So I read somewhere in the archives thatyou need to VACUUM a table
> regularly for indexes to work properly.

As I understand it, VACUUM simply cleans up obsolete versions of the data (an
artifact from the transactions?), but VACUUM ANALYZE generates some
statistics about various useful stuff that allows the optimizer or planner or
something to do it's job more effectively.

>  Should I make an artificial primary key with a serial type to simplify
> things?  Anyone have any hints at all for me?

Integer primary keys (such as the SERIAL type) are probably going to be a
heck of a lot more efficient than a 5 field mixed key.  I ALWAYS start EVERY
table I make with (id SERIAL PRIMARY KEY, ...

I believe that this is sound practice.  You can still add extra constraints
(such as declaring a UNIQUE INDEX over several fields) to ensure data
consistency, but using an integer id as the primary key has always paid off
for me.  As an aside, I extend my nomenclature to use key_foo where foo is
the name of the table being referenced when I use foreign keys.  Makes life a
lot easier in any number of situations.

- --
Andrew G. Hammond     mailto:drew@xyzzy.dhs.org   http://xyzzy.dhs.org/~drew/
56 2A 54 EF 19 C0 3B 43 72 69 5B E3 69 5B A1 1F                  613-389-5481
5CD3 62B0 254B DEB1 86E0  8959 093E F70A B457 84B1
"To blow recursion you must first blow recur" -- me
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iEYEARECAAYFAjv0IvwACgkQCT73CrRXhLHkWwCeNiCFdaTpB7y2k20DTCVTdzcf
y9IAn3/m6tSNmxB1zI0LFx1cVn17Bfbh
=voiW
-----END PGP SIGNATURE-----

pgsql-general by date:

Previous
From: "P.J. \"Josh\" Rovero"
Date:
Subject: Delete performance
Next
From: Bruce Momjian
Date:
Subject: [PHP] Re: SQL Load Balancer for PostgreSQL