Re: surrogate vs natural primary keys - Mailing list pgsql-sql

From Steve Midgley
Subject Re: surrogate vs natural primary keys
Date
Msg-id 20080918050121.2715D64FCBF@postgresql.org
Whole thread Raw
In response to surrogate vs natural primary keys  (Seb <spluque@gmail.com>)
List pgsql-sql
>To: pgsql-sql@postgresql.org
>From:  Seb <spluque@gmail.com>
>Subject: Re: surrogate vs natural primary keys
>Date:  Mon, 15 Sep 2008 17:56:31 -0500
>Organization:  Church of Emacs
>Lines: 20
>Message-ID:  <87hc8h2f34.fsf@patagonia.sebmags.homelinux.org>
>References:  <87y71t2hlu.fsf@patagonia.sebmags.homelinux.org>
>         <dcc563d10809151545r70d7d2a7j25931d7557dd88f3@mail.gmail.com>
>X-Archive-Number: 200809/101
>X-Sequence-Number: 31553
>
>On Mon, 15 Sep 2008 16:45:08 -0600,
>"Scott Marlowe" <scott.marlowe@gmail.com> wrote:
>
>[...]
>
> > I think this question is a lot like "how large should I set
> > shared_buffers?"  There's lots of different answers based on how 
> you
> > are using your data.
>
>Yes, this is precisely what I'm after: *criteria* to help me decide
>which approach to take for different scenarios.  Such guidance is what
>seems to be lacking from most of the discussions I've seen on the
>subject.  It's hard to distill this information when most of the
>discussion is centered on advocating one or the other approach.

I think Scott and others have laid out the main ideas in a very 
cool-headed way already, but here's my follow-on input:

I agree with Andrew Sullivan that using industry standard id's as your 
primary key can be problematic. But I do sometimes apply unique indices 
to such "industry standard" columns to ensure they are in fact unique 
and can be a surrogate for the "real" integer/serial primary key.

As a rule, I have decided to stay away from "meaningful" (natural) 
primary keys for these reasons:

1) They sometimes change b/c of business rule changes, forcing 
technical changes to the relationship model, when only internal table 
schema changes should be required to support the new business 
requirements.

2) Generating arbitrary/surrogate keys is easier b/c you can use 
sequence generators. (When creating a new record, I have to figure out 
the value of a meaningful column before saving the record which 
sometimes I don't want to do!)

3) Surrogate keys are guaranteed unique regardless of semantic content 
of the table.

4) All tables can all join to each other in the same ways: property.id 
holds the same data type as contact.id. All id fields are the same in 
type/format.

I think there's even a reasonable argument for "globally unique" 
surrogate keys: all keys for any table use the same sequence of id's. I 
implemented a system in the 90's that used globally unique id's and it 
opened up some interesting solutions that I wouldn't have thought of 
when I started the project (self joins were the same as foreign joins 
since the id's in both entities were guaranteed unique).

I've heard some people argue the use of GUID's for id's but I've been 
too scared to try that in a real system.

Sequential, arbitrary primary keys (as surrogate keys) are predictable 
though. So if you share those keys with the public (via URL's for 
example), then competitors can learn information about your business 
(how fast keys are generated for a certain table for example).

That's an argument for random, arbitrary primary keys though, not for 
compound/meaningful keys.

I think natural or compound keys make more sense to DBA's and let you 
implement some kinds of database solutions more quickly.

All in all, I don't really understand the merits of natural keys 
outside of data warehouse applications. In data warehouses, in my 
experience, compound natural keys just end up turning into fact tables! 
:)

In summary: I've never heard someone say they've been bitten by using 
an arbitrary surrogate key system, but I myself have been bitten and 
have heard lots of stories of problems when using natural keys.

I hope this helps some,

Steve



pgsql-sql by date:

Previous
From: Robert Edwards
Date:
Subject: Re: May I have an assistance on CREATE TABLE Command
Next
From: "Raphael Bauduin"
Date:
Subject: inserting boolean values in dynamic queries