Re: surrogate key or not? - Mailing list pgsql-sql
From | Iain |
---|---|
Subject | Re: surrogate key or not? |
Date | |
Msg-id | 00ad01c46f95$94ebad20$7201a8c0@mst1x5r347kymb Whole thread Raw |
In response to | surrogate key or not? (Markus Bertheau <twanger@bluetwanger.de>) |
List | pgsql-sql |
Hi Josh, Considering that I generally agree with your comments (in this and your later posts) I'd say I didn't make myself clear in my brief comment. Of the considerations: performance, convenience and business logic, I personally rate performance as the lowest priority. The convenience part has been well covered in other posts and is second most in my opinion.. By business logic I was thinking of a situation we had on an old db where employees were idenitfied by codes. These codes were transmitted all over the database (as you can imagine) which meant that the codes couldn't be recycled as employees came and left. The answer was to introduce an integer based auto-generated key. That way the data associated with an ex-employee is associated with that employee, not with the code. It may be that this really a convenience issue as opposed to business logic, but in my thinking the code was data, not a primary key - it is not even an alternate key (if ex-employees have their code nulled it can be made "unique" though). Perhaps the term surrogate key doesn't apply in this case, I have to admit that I'm not 100% on terminalogy here. In the end, accurately representing the business logic is most importent, and use of a contrived sequential key shouldnt preclude the use of unique constraints where needed so the problem you described of duplicate events should never have happened anyway. This is a great topic though, I find it really interesting. Regards Iain ----- Original Message ----- From: "Josh Berkus" <josh@agliodbs.com> To: "Iain" <iain@mst.co.jp>; "Markus Bertheau" <twanger@bluetwanger.de>; <pgsql-sql@postgresql.org> Sent: Wednesday, July 21, 2004 4:00 PM Subject: Re: [SQL] surrogate key or not? > Markus, Iain, > > > Thus I see it more as an issue of business logic than performance. There are > > of course many other considerations with relational theory and stuff like > > that which you could debate endlessly. I expect that googling on "surrogate > > keys" would yeild interesting results. > > Frankly, I couldn't disagree more. This is why it was so problematic for the > SQL committee to enshrine "primary keys" and sequences in the standard; it > mis-educates database designers into believing that surrogate keys are > somehow part of the data model. They are most decidely NOT. > > Given: Surrogate keys, by definition, represent no real data; > Given: Only items which represent real data have any place in > a data model > Conclusion: Surrogate keys have no place in the data model > > There are, in fact, three very good reasons to use surrogate keys, all of > which are strictly due to limitations of technology; that is, implementation > and performance issues, NOT business logic. They are: > > 1) Convenience: It's very annoying to have to refer to a 4-column foriegn key > whenever you do a join in queries or want to delete a record, as well as > tracking a 4-element composite in your client software. > > 2) Performance: INT and BIGINT data types are among the most compact and > efficient stored in most RDBMSs. So using anything else as a key would > likely result in a loss of performance on large-table joins. > > 3) Mutability: Most RDBMSs are very inefficient about CASCADE deletes and > updates. Some RDBMSs do not support CASCADE, forcing the client software to > fix all the dependant rows. This means that DBAs are very reluctant to use > columns which change frequently as join keys. > > All three of these implementation issues are, at least in theory, > surmountable. For example, Sybase overcame problems (1) and (3) by creating > an automated, system-controlled hash key based on the table's real key. This > was a solution endorsed by E.F. Codd in the mid-90's when he came to regret > his promotion of the "Primary Key" idea in the SQL standard. > > Now, you're probably wondering "why does this guy regard surrogate keys as a > problem?" I'll tell you: I absolutely cannot count the number of "bad > databases" I've encountered which contained tables with a surrogate key, and > NO REAL KEY of any kind. This makes data normalization impossible, and > cleanup of the database becomes a labor-intensive process requiring > hand-examination of each row. > > -- > -Josh Berkus > Aglio Database Solutions > San Francisco