Re: pervasiveness of surrogate (also called synthetic) keys - Mailing list pgsql-general

From Greg Smith
Subject Re: pervasiveness of surrogate (also called synthetic) keys
Date
Msg-id 4DBEC96E.8070901@2ndQuadrant.com
Whole thread Raw
In response to Re: pervasiveness of surrogate (also called synthetic) keys  (Karsten Hilbert <Karsten.Hilbert@gmx.net>)
Responses Re: pervasiveness of surrogate (also called synthetic) keys  (Jim Irrer <irrer@umich.edu>)
Re: pervasiveness of surrogate (also called synthetic) keys  (Merlin Moncure <mmoncure@gmail.com>)
Re: pervasiveness of surrogate (also called synthetic) keys  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-general
On 05/01/2011 06:12 PM, Karsten Hilbert wrote:
> Good to know since I'm only a lowly medical doctor not
> having much schooling in database matters beyond this list,
> the PostgreSQL docs, and the Celko book.
>

This debate exists at all levels of experience, and the only thing that
changes as you get more experienced people involved is an increase in
anecdotes on each side.  The sole time I ever found myself arguing with
Joe Celko is over an article he wrote recommending natural keys, using
an example from the automotive industry.  Problem was, the specific
example he gave was flat out wrong.  I was working in automotive MIS at
the time, and the thing he was saying would never change did, in fact,
change every year--in only a fraction of a percent of cases, in an
extremely subtle way that snuck up on people and wreaked much
confusion.  That's typical for an early natural key design:  you get it
working fine in V1.0, only to discover months or years down the road
there's a case you never considered you don't model correctly, and it
may take some sort of conversion to fix.

The reason why there's a strong preference for surrogate keys is that
they always work and you can avoid ever needing to come up with a better
design. if you just use them and forget about it.  The position Merlin
has advocated here, that there should always be a natural key available
if you know the data well enough, may be true.  But few people are good
enough designers to be sure they've made the decision correctly, and the
downsides of being wrong can be a long, painful conversion process.
Easier for most people to just eliminate the possibility of making a
mistake by using auto-generated surrogate keys, where the primary
problem you'll run into is merely using more space/resources than you
might otherwise need to have.  It minimizes the worst-case--mistake make
in the model, expensive re-design--by adding overhead that makes the
average case more expensive.  Software design usually has enough risks
that any time you can eliminate one just by throwing some resources at
it, that's normally the right thing to do.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


pgsql-general by date:

Previous
From: Raymond O'Donnell
Date:
Subject: Re: wnat ot edit pg_hba.conf file from command prompt
Next
From: Marek Więckowski
Date:
Subject: Re: auto-reconnect: temp schemas, sequences, transactions