Re: [NOVICE] Do non-sequential primary keys slow performance significantly?? - Mailing list pgsql-general

From Brandon Aiken
Subject Re: [NOVICE] Do non-sequential primary keys slow performance significantly??
Date
Msg-id F8E84F0F56445B4CB39E019EF67DACBA340A78@exchsrvr.winemantech.com
Whole thread Raw
List pgsql-general
I would expect no performance difference at all.  All primary keys
automatically get an index, and the index is effectively an optimized
dictionary, hash, two-dimensional array, or list of tuples of the key
values and the address of the record for that key.  Indexes are designed
to eliminate the physical performance penalty from arbitrarily large and
variable data sets.

My only trepidation is using unpredictable values for primary keys.
Certainly they're candidate keys and should be unique in the table, but
I wouldn't be comfortable using an unpredictable value as a primary key.
A surrogate key combined with a unique constraint on your random field
seems like a better choice here, but that's entirely a subjective
opinion.

--
Brandon Aiken
CS/IT Systems Engineer
-----Original Message-----
From: pgsql-novice-owner@postgresql.org
[mailto:pgsql-novice-owner@postgresql.org] On Behalf Of Damian C
Sent: Friday, September 29, 2006 1:29 AM
To: pgsql-novice@postgresql.org
Subject: [NOVICE] Do non-sequential primary keys slow performance
significantly??

Hello,
The most difficult part of this question is justifying WHY we would
want to use random primary keys!  There is a very strong reason for
doing so, although not quite compelling.

We are Java developers developing desktop applications that persist
data in postgres. This is a pretty "low spec" database as it will only
servicing a few PCs.  We do this via Hibernate so our SQL & Postrges
skills and insights are relatively lacking.  I certainly don't really
understand the gory internal details of postgres.

We have an internal proposal to use what are virtually random 128 bit
numbers for our primary keys.  These are not truley random in any
mathematical sense, and they will be unique, but they are certainly
NOT sequential.

In my ignorant bliss I would suspect that postgres will run more
slowly using random primary keys. Can anyone provide any "rules of
thumb" for how this may effect performance??  Is it a plain dumb
idea?? Or maybe it would have only modest impact??

Any comments, insights, pointers are very much appreciated,

Thanks,
-Damian

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

pgsql-general by date:

Previous
From: snacktime
Date:
Subject: Re: using schema's for data separation
Next
From: Erik Jones
Date:
Subject: Re: Array assignment behavior (was Re: [ADMIN] Stored procedure