Re: pervasiveness of surrogate (also called synthetic) keys - Mailing list pgsql-general

From Jim Irrer
Subject Re: pervasiveness of surrogate (also called synthetic) keys
Date
Msg-id BANLkTi=y_9mLux5059vj8arx2BCyNgTtSA@mail.gmail.com
Whole thread Raw
In response to Re: pervasiveness of surrogate (also called synthetic) keys  (Greg Smith <greg@2ndQuadrant.com>)
List pgsql-general
I've been carefully reading all of the comments with great interest.

Thanks very much for the thoughtful responses - very enlightening.

- Jim (the topic originator)

Jim Irrer     irrer@umich.edu       (734) 647-4409
University of Michigan Hospital Radiation Oncology
519 W. William St.             Ann Arbor, MI 48103


On Mon, May 2, 2011 at 11:10 AM, Greg Smith <greg@2ndquadrant.com> wrote:
On 05/01/2011 06:12 PM, Karsten Hilbert wrote:
Good to know since I'm only a lowly medical doctor not
having much schooling in database matters beyond this list,
the PostgreSQL docs, and the Celko book.
 

This debate exists at all levels of experience, and the only thing that changes as you get more experienced people involved is an increase in anecdotes on each side.  The sole time I ever found myself arguing with Joe Celko is over an article he wrote recommending natural keys, using an example from the automotive industry.  Problem was, the specific example he gave was flat out wrong.  I was working in automotive MIS at the time, and the thing he was saying would never change did, in fact, change every year--in only a fraction of a percent of cases, in an extremely subtle way that snuck up on people and wreaked much confusion.  That's typical for an early natural key design:  you get it working fine in V1.0, only to discover months or years down the road there's a case you never considered you don't model correctly, and it may take some sort of conversion to fix.

The reason why there's a strong preference for surrogate keys is that they always work and you can avoid ever needing to come up with a better design. if you just use them and forget about it.  The position Merlin has advocated here, that there should always be a natural key available if you know the data well enough, may be true.  But few people are good enough designers to be sure they've made the decision correctly, and the downsides of being wrong can be a long, painful conversion process.  Easier for most people to just eliminate the possibility of making a mistake by using auto-generated surrogate keys, where the primary problem you'll run into is merely using more space/resources than you might otherwise need to have.  It minimizes the worst-case--mistake make in the model, expensive re-design--by adding overhead that makes the average case more expensive.  Software design usually has enough risks that any time you can eliminate one just by throwing some resources at it, that's normally the right thing to do.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

pgsql-general by date:

Previous
From: jgoulet
Date:
Subject: Re: pipe line error (psql command)
Next
From: Merlin Moncure
Date:
Subject: Re: pervasiveness of surrogate (also called synthetic) keys