Re: surrogate key or not? - Mailing list pgsql-sql
From | Josh Berkus |
---|---|
Subject | Re: surrogate key or not? |
Date | |
Msg-id | 200407210001.00102.josh@agliodbs.com Whole thread Raw |
In response to | Re: surrogate key or not? ("Iain" <iain@mst.co.jp>) |
Responses |
Re: surrogate key or not?
Re: surrogate key or not? Re: surrogate key or not? |
List | pgsql-sql |
Markus, Iain, > Thus I see it more as an issue of business logic than performance. There are > of course many other considerations with relational theory and stuff like > that which you could debate endlessly. I expect that googling on "surrogate > keys" would yeild interesting results. Frankly, I couldn't disagree more. This is why it was so problematic for the SQL committee to enshrine "primary keys" and sequences in the standard; it mis-educates database designers into believing that surrogate keys are somehow part of the data model. They are most decidely NOT. Given: Surrogate keys, by definition, represent no real data; Given: Only items which represent real data have any place ina data model Conclusion: Surrogate keys have no place in the data model There are, in fact, three very good reasons to use surrogate keys, all of which are strictly due to limitations of technology; that is, implementation and performance issues, NOT business logic. They are: 1) Convenience: It's very annoying to have to refer to a 4-column foriegn key whenever you do a join in queries or want to delete a record, as well as tracking a 4-element composite in your client software. 2) Performance: INT and BIGINT data types are among the most compact and efficient stored in most RDBMSs. So using anything else as a key would likely result in a loss of performance on large-table joins. 3) Mutability: Most RDBMSs are very inefficient about CASCADE deletes and updates. Some RDBMSs do not support CASCADE, forcing the client software to fix all the dependant rows. This means that DBAs are very reluctant to use columns which change frequently as join keys. All three of these implementation issues are, at least in theory, surmountable. For example, Sybase overcame problems (1) and (3) by creating an automated, system-controlled hash key based on the table's real key. This was a solution endorsed by E.F. Codd in the mid-90's when he came to regret his promotion of the "Primary Key" idea in the SQL standard. Now, you're probably wondering "why does this guy regard surrogate keys as a problem?" I'll tell you: I absolutely cannot count the number of "bad databases" I've encountered which contained tables with a surrogate key, and NO REAL KEY of any kind. This makes data normalization impossible, and cleanup of the database becomes a labor-intensive process requiring hand-examination of each row. -- -Josh BerkusAglio Database SolutionsSan Francisco