Re: surrogate key or not? - Mailing list pgsql-sql

From Josh Berkus
Subject Re: surrogate key or not?
Date
Msg-id 200407210001.00102.josh@agliodbs.com
Whole thread Raw
In response to Re: surrogate key or not?  ("Iain" <iain@mst.co.jp>)
Responses Re: surrogate key or not?  (sad <sad@bankir.ru>)
Re: surrogate key or not?  (Rod Taylor <pg@rbt.ca>)
Re: surrogate key or not?  (David Garamond <lists@zara.6.isreserved.com>)
List pgsql-sql
Markus, Iain,

> Thus I see it more as an issue of business logic than performance. There are
> of course many other considerations with relational theory and stuff like
> that which you could debate endlessly. I expect that googling on "surrogate
> keys" would yeild interesting results.

Frankly, I couldn't disagree more.   This is why it was so problematic for the 
SQL committee to enshrine "primary keys" and sequences in the standard; it 
mis-educates database designers into believing that surrogate keys are 
somehow part of the data model.  They are most decidely NOT.

Given:  Surrogate keys, by definition, represent no real data;
Given:  Only items which represent real data have any place ina data model
Conclusion:  Surrogate keys have no place in the data model

There are, in fact, three very good reasons to use surrogate keys, all of 
which are strictly due to limitations of technology; that is, implementation 
and performance issues, NOT business logic.  They are:

1) Convenience:  It's very annoying to have to refer to a 4-column foriegn key 
whenever you do a join in queries or want to delete a record, as well as 
tracking a 4-element composite in your client software.

2) Performance:  INT and BIGINT data types are among the most compact and 
efficient stored in most RDBMSs.   So using anything else as a key would 
likely result in a loss of performance on large-table joins.

3) Mutability:  Most RDBMSs are very inefficient about CASCADE deletes and 
updates.   Some RDBMSs do not support CASCADE, forcing the client software to 
fix all the dependant rows.  This means that DBAs are very reluctant to use 
columns which change frequently as join keys.

All three of these implementation issues are, at least in theory, 
surmountable.   For example, Sybase overcame problems (1) and (3) by creating 
an automated, system-controlled hash key based on the table's real key.  This 
was a solution endorsed by E.F. Codd in the mid-90's when he came to regret 
his promotion of the "Primary Key" idea in the SQL standard.  

Now, you're probably wondering "why does this guy regard surrogate keys as a 
problem?"   I'll tell you:  I absolutely cannot count the number of "bad 
databases" I've encountered which contained tables with a surrogate key, and 
NO REAL KEY of any kind.   This makes data normalization impossible, and 
cleanup of the database becomes a labor-intensive process requiring 
hand-examination of each row.

-- 
-Josh BerkusAglio Database SolutionsSan Francisco



pgsql-sql by date:

Previous
From: azah azah
Date:
Subject: Re: date_format in postresql
Next
From: sad
Date:
Subject: Re: locks and triggers. give me an advice please