Thread: Do non-sequential primary keys slow performance significantly??

Do non-sequential primary keys slow performance significantly??

From

"Damian C"

Date:

29 September 2006, 02:29:23

Hello,
The most difficult part of this question is justifying WHY we would
want to use random primary keys!  There is a very strong reason for
doing so, although not quite compelling.

We are Java developers developing desktop applications that persist
data in postgres. This is a pretty "low spec" database as it will only
servicing a few PCs.  We do this via Hibernate so our SQL & Postrges
skills and insights are relatively lacking.  I certainly don't really
understand the gory internal details of postgres.

We have an internal proposal to use what are virtually random 128 bit
numbers for our primary keys.  These are not truley random in any
mathematical sense, and they will be unique, but they are certainly
NOT sequential.

In my ignorant bliss I would suspect that postgres will run more
slowly using random primary keys. Can anyone provide any "rules of
thumb" for how this may effect performance??  Is it a plain dumb
idea?? Or maybe it would have only modest impact??

Any comments, insights, pointers are very much appreciated,

Thanks,
-Damian

Re: Do non-sequential primary keys slow performance significantly??

From

Tom Lane

Date:

29 September 2006, 02:49:26

"Damian C" <jamianb@gmail.com> writes:
> In my ignorant bliss I would suspect that postgres will run more
> slowly using random primary keys.

More slowly compared to what?

If your performance bottleneck is concurrent insertions, random keys
should win over sequential keys because the insert load is distributed
over the whole index, leading to less page-level lock contention.
There might be other scenarios where sequential keys are better.

For a database servicing "only a few PCs" I'm not sure you should even
spend any time thinking about it --- do what's easiest for your
application code.

            regards, tom lane

Re: Do non-sequential primary keys slow performance

From

Shane Ambler

Date:

29 September 2006, 10:37:21

On 29/9/2006 14:59, "Damian C" <jamianb@gmail.com> wrote:

> Hello,
> The most difficult part of this question is justifying WHY we would
> want to use random primary keys!  There is a very strong reason for
> doing so, although not quite compelling.

Either you want it that way because that's your first thought of how to do
it or you have reasoning behind why you want to do it that way.

> We are Java developers developing desktop applications that persist
> data in postgres. This is a pretty "low spec" database as it will only
> servicing a few PCs.  We do this via Hibernate so our SQL & Postrges
> skills and insights are relatively lacking.  I certainly don't really
> understand the gory internal details of postgres.
>
> We have an internal proposal to use what are virtually random 128 bit
> numbers for our primary keys.  These are not truley random in any
> mathematical sense, and they will be unique, but they are certainly
> NOT sequential.

Specifying it as the primary key or a unique index will ensure the
uniqueness of entries.

> In my ignorant bliss I would suspect that postgres will run more
> slowly using random primary keys. Can anyone provide any "rules of
> thumb" for how this may effect performance??  Is it a plain dumb
> idea?? Or maybe it would have only modest impact??
>
> Any comments, insights, pointers are very much appreciated,
>

As for being non-sequential this is how a live database will end up (to some
degree) - as rows get deleted the auto assigned sequential numbers get
deleted along with them and don't get re-used leaving gaps.

I believe the only performance different will stem from using a 128bit
column for the key against say a 32 or 64 bit serial. As in larger data to
save and search through and compare.

When you say that you want it as your primary key, how much will you use it
to link to other tables? If it is creating a link to 3 other tables it
wouldn't be a problem but if you use it to link 50 tables with complex joins
in your searches then you may want to reconsider or develop a work around.
Or is it simply just your unique row identifier?

The fact that you say it will only service a few pc's would lead me to think
that you aren't expecting millions of rows to be entered, this would suggest
that the performance difference would not be intolerable even if it is
noticeable to your application and you could get better performance with a
different design.

--

Shane Ambler
Postgres@007Marketing.com

Get Sheeky @ http://Sheeky.Biz

Re: Do non-sequential primary keys slow performance significantly??

From

Richard Broersma Jr

Date:

29 September 2006, 12:10:27

> The most difficult part of this question is justifying WHY we would
> want to use random primary keys!  There is a very strong reason for
> doing so, although not quite compelling.

One problem with using random generated primary keys that I've
recently read about deal with insert failing do to primary key
duplication.

If the size of your dataset grows to become a significant percentage
of the size of the integer type used for your random primary key,
the probability of inserting a duplicated number dramatically
increases.  I imagine that this problem could contribute to poor
preformance for large bulk inserts that have to add logic for
dealing with re-trying a insert if a duplicate number is created.

Regards,

Richard Broersma Jr.

Re: Do non-sequential primary keys slow performance significantly??

From

Bruno Wolff III

Date:

29 September 2006, 23:31:15

On Fri, Sep 29, 2006 at 08:10:23 -0700,
  Richard Broersma Jr <rabroersma@yahoo.com> wrote:
> > The most difficult part of this question is justifying WHY we would
> > want to use random primary keys!  There is a very strong reason for
> > doing so, although not quite compelling.
>
> One problem with using random generated primary keys that I've
> recently read about deal with insert failing do to primary key
> duplication.
>
> If the size of your dataset grows to become a significant percentage
> of the size of the integer type used for your random primary key,
> the probability of inserting a duplicated number dramatically
> increases.  I imagine that this problem could contribute to poor
> preformance for large bulk inserts that have to add logic for
> dealing with re-trying a insert if a duplicate number is created.

They are using 128 bit keys! If their random number generator actually
works, they shouldn't have a problem until they have generated on the order
of 2^64 keys. That isn't likely to happen any time soon.