Re: surrogate vs natural primary keys - Mailing list pgsql-sql
From | Steve Midgley |
---|---|
Subject | Re: surrogate vs natural primary keys |
Date | |
Msg-id | 20080918050121.2715D64FCBF@postgresql.org Whole thread Raw |
In response to | surrogate vs natural primary keys (Seb <spluque@gmail.com>) |
List | pgsql-sql |
>To: pgsql-sql@postgresql.org >From: Seb <spluque@gmail.com> >Subject: Re: surrogate vs natural primary keys >Date: Mon, 15 Sep 2008 17:56:31 -0500 >Organization: Church of Emacs >Lines: 20 >Message-ID: <87hc8h2f34.fsf@patagonia.sebmags.homelinux.org> >References: <87y71t2hlu.fsf@patagonia.sebmags.homelinux.org> > <dcc563d10809151545r70d7d2a7j25931d7557dd88f3@mail.gmail.com> >X-Archive-Number: 200809/101 >X-Sequence-Number: 31553 > >On Mon, 15 Sep 2008 16:45:08 -0600, >"Scott Marlowe" <scott.marlowe@gmail.com> wrote: > >[...] > > > I think this question is a lot like "how large should I set > > shared_buffers?" There's lots of different answers based on how > you > > are using your data. > >Yes, this is precisely what I'm after: *criteria* to help me decide >which approach to take for different scenarios. Such guidance is what >seems to be lacking from most of the discussions I've seen on the >subject. It's hard to distill this information when most of the >discussion is centered on advocating one or the other approach. I think Scott and others have laid out the main ideas in a very cool-headed way already, but here's my follow-on input: I agree with Andrew Sullivan that using industry standard id's as your primary key can be problematic. But I do sometimes apply unique indices to such "industry standard" columns to ensure they are in fact unique and can be a surrogate for the "real" integer/serial primary key. As a rule, I have decided to stay away from "meaningful" (natural) primary keys for these reasons: 1) They sometimes change b/c of business rule changes, forcing technical changes to the relationship model, when only internal table schema changes should be required to support the new business requirements. 2) Generating arbitrary/surrogate keys is easier b/c you can use sequence generators. (When creating a new record, I have to figure out the value of a meaningful column before saving the record which sometimes I don't want to do!) 3) Surrogate keys are guaranteed unique regardless of semantic content of the table. 4) All tables can all join to each other in the same ways: property.id holds the same data type as contact.id. All id fields are the same in type/format. I think there's even a reasonable argument for "globally unique" surrogate keys: all keys for any table use the same sequence of id's. I implemented a system in the 90's that used globally unique id's and it opened up some interesting solutions that I wouldn't have thought of when I started the project (self joins were the same as foreign joins since the id's in both entities were guaranteed unique). I've heard some people argue the use of GUID's for id's but I've been too scared to try that in a real system. Sequential, arbitrary primary keys (as surrogate keys) are predictable though. So if you share those keys with the public (via URL's for example), then competitors can learn information about your business (how fast keys are generated for a certain table for example). That's an argument for random, arbitrary primary keys though, not for compound/meaningful keys. I think natural or compound keys make more sense to DBA's and let you implement some kinds of database solutions more quickly. All in all, I don't really understand the merits of natural keys outside of data warehouse applications. In data warehouses, in my experience, compound natural keys just end up turning into fact tables! :) In summary: I've never heard someone say they've been bitten by using an arbitrary surrogate key system, but I myself have been bitten and have heard lots of stories of problems when using natural keys. I hope this helps some, Steve