Re: Surrogate keys (Was: enums) - Mailing list pgsql-hackers

From Jim C. Nasby
Subject Re: Surrogate keys (Was: enums)
Date
Msg-id 20060118191150.GL17896@pervasive.com
Whole thread Raw
In response to Re: Surrogate keys (Was: enums)  (Leandro Guimarães Faria Corcete DUTRA<leandro@dutra.fastmail.fm>)
List pgsql-hackers
On Wed, Jan 18, 2006 at 01:08:53PM +0000, Leandro Guimar??es Faria Corcete DUTRA wrote:
> > b) If each parent record will have many children, the space savings from
> > using a surrogate key can be quite large
> 
> Not such a common case.
Hmmm...

Many blog entries per user... Many blog comments per entry

Many PO's per customer... many line items per PO...
Etc., etc. I would argue that one-many relationships are far more common
than one-one, and it's very common for an integer ID to be a more
compact representation than a real key.

> > c) depending on how you view things, putting actual keys all over the
> > place is denormalized
> 
> How come?  Never!
Huh?

One of the tenants of normalization is that you don't repeat data. You
don't use customer name in your PO table, because it's asking for
problems; what if a customer changes names (as just one example).

> > Generally, I just use surrogate keys for everything unless performance
> > dictates something else.
> 
> What I am proposing is the reverse: use natural keys for everything unless 
> performance dictates something else.
> 
> In support of my PoV: 
> http://blogs.ittoolbox.com/database/soup/archives/007327.asp?rss=1

Read the bottom of it:

"I am not saying that you should avoid autonumber surrogate keys like an
SCO executive. The danger is not in their use but in their abuse. The
"events_id" column in the "events" table didn't give us any trouble
until we began to rely on it as the sole key for the table. The
accounting application gave us problems because we were using the ID as
the entire handle for the records. That crossed the line from use to
misuse, and we suffered for it."

To paraphrase, the issue isn't that surrogate keys were used for RI; the
issue is that proper keys were not setup to begin with. Does it make
sense to have a customer table where customer_name isn't unique? Almost
certainly not. But that's just one possible constraint you might put on
that table. To put words in Josh's mouth, the issue isn't with using a
surrogate key, it's with not thinking about what constraints you should
be placing on your data.

Take a look at cbk's comment; he does a great job of summing the issue
up.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: debug_query_string and multiple statements
Next
From: Jaime Casanova
Date:
Subject: log_min_messages and debug levels