Thread: Table design question
I'm reading Joe Celko's book SQL Programming Style for the second time and although I've been an OO developer for quite a few years I'm fairly green wrt SQL. Joe is obviously something of a curmudgeon and I would fall squarely into his newbie OO developer ordinal scale and I'm trying to avoid the slide into stupid newbie OO developer. So I'm designing a table and I'm looking for an appropriate key. The natural key is a string from a few characters up to a maximum of perhaps 100. Joe gets quite fierce about avoiding the use of a serial id column as a key. The string is unique in the table and fits the criteria for a key. So should I follow Joe's advice and use my natural key as the primary key? It sounds reasonable but it will mean at least one other table will have the string as a foreign key. My postgres intro book has id columns all over the place but is it really that big an issue these days to have a 100 character primary key? Are there postgres-specific implications for either approach? Thanks Dave
> So I'm designing a table and I'm looking for an appropriate key. The > natural key is a string from a few characters up to a maximum of > perhaps 100. Joe gets quite fierce about avoiding the use of a serial > id column as a key. The string is unique in the table and fits the > criteria for a key. So should I follow Joe's advice and use my natural > key as the primary key? It sounds reasonable but it will mean at least > one other table will have the string as a foreign key. My postgres Unfortunately as a result of the implementation of most commercial databases (and PostgreSQL), using a string as the primary key will cause a performance hit. Yes, normally it is better and for things like status flags and other constant values that rarely change it is a good approach. I would tend to add a SERIAL and make it the primary key (using it in foreign tables) but also making the string column unique and not null'able. Essentially it gives the table two primary keys in the hope that some day PostgreSQL will add a layer of abstraction between primary/foreign key lookups and presentation that allows for usage of strings directly without a performance hit. --
"David Clarke" <pigwin32@gmail.com> writes: > is it really that big an issue these days to have a 100 character primary > key? Are there postgres-specific implications for either approach? It's exactly the same size issue as ever. A 20% increase in space usage is a 20% performance hit in certain types of queries regardless of how fast or cheap your hardware has become. This is an issue where reasonable people differ quite vociferously so you may get contradictory responses. But really it depends heavily on the exact circumstances and requires a judgement call based on experience. Any time someone tries to sum it up with a blanket rule it's going to be wrong some of the time. But that said I also tend to tilt towards creating serial ids. Unless the string is already a synthetic unique identifier created by your application you can't really trust its "uniqueness" for all time. Plenty of people have built databases using natural unique keys that turned out to be not so unique or turned out to just be a poor choice for external reasons (think of Social Security Numbers, for example). I've had my string unique identifiers corrupted by uppercasing, declared case insensitive, declared private information that couldn't be leaked, and declared offensive words that had to be updated. Each of which is a pain to deal with when it's your primary key. -- greg
On Thu, 01 Jun 2006, David Clarke wrote: > So I'm designing a table and I'm looking for an appropriate key. The > natural key is a string from a few characters up to a maximum of > perhaps 100. Joe gets quite fierce about avoiding the use of a serial > id column as a key. The string is unique in the table and fits the The use of surrogate keys is a mostly religious issue. Celko is an Orthodox, many others are Reform. Where you want to align yourself is a personal choice. In defense of the Reform movement, I'd note that modern DBs are more performant with ints than varchars. More importantly, I've found it much easier to modify DBs designed with surrogate keys than natural keys, especially when natural keys span columns. It allows a rather simple convention for coders to write against, and avoids some messy modification issues when the spec changes. It is not my intention to bash the purists, and there are good arguments on the Ortho side, too. I'm merely giving a bit of advice from the point of view of someone who lives in the constant evolution side of DB usage. -j -- Jamie Lawrence jal@jal.org When I was a boy I was told that anybody could become President. Now I'm beginning to believe it. - Clarence Darrow
I never use anything other than "id SERIAL NOT NULL PRIMARY KEY" for my PKEY's -- as an absolute rule -- I guess I am a purist... Everything else (the other columns) can have unique constraints, etcetera and be FOREIGN KEYS, etc... Try INSERTING your 100 character "natural" key into a table with 10M++ rows only to find out there there is already a duplicate.... talk about a performance hit.... or SELECT -- you end up using way too much RAM and bandwidth -- unecessarily... IMHO: You ought to use a numeric, auto-generated sequence (SERIAL) for you PKEY's ... ""David Clarke"" <pigwin32@gmail.com> wrote in message news:12b7ac1e0606010405u6e062f71mf4adbaeb6c46df5f@mail.gmail.com... > I'm reading Joe Celko's book SQL Programming Style for the second time > and although I've been an OO developer for quite a few years I'm > fairly green wrt SQL. Joe is obviously something of a curmudgeon and I > would fall squarely into his newbie OO developer ordinal scale and I'm > trying to avoid the slide into stupid newbie OO developer. > > So I'm designing a table and I'm looking for an appropriate key. The > natural key is a string from a few characters up to a maximum of > perhaps 100. Joe gets quite fierce about avoiding the use of a serial > id column as a key. The string is unique in the table and fits the > criteria for a key. So should I follow Joe's advice and use my natural > key as the primary key? It sounds reasonable but it will mean at least > one other table will have the string as a foreign key. My postgres > intro book has id columns all over the place but is it really that big > an issue these days to have a 100 character primary key? Are there > postgres-specific implications for either approach? > > Thanks > Dave > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
"codeWarrior" <gpatnude@hotmail.com> writes: > I never use anything other than "id SERIAL NOT NULL PRIMARY KEY" for my > PKEY's -- as an absolute rule -- I guess I am a purist... Everything else > (the other columns) can have unique constraints, etcetera and be FOREIGN > KEYS, etc... > > Try INSERTING your 100 character "natural" key into a table with 10M++ rows > only to find out there there is already a duplicate.... talk about a > performance hit.... or SELECT -- you end up using way too much RAM and > bandwidth -- unecessarily... Celko is decidedly *NOT* promoting the notion that you should use a 100 byte long "natural key." Jamie's comments of "Orthodox versus Reform" seem reasonably appropriate in outlining something of the difference between the positions. Just because Celko is "Orthodox" doesn't mean he's *stupid*; he makes all sorts of noises about using some ISO standard (11179) such that your primary key values are actually described in some authoritative manner. <http://metadata-standards.org/11179/> If you go about using ISO-11179, then, in principle, you could register your scheme for describing IDs for the objects in your system with some ISO people so that others might even become aware of the official status of this. I may not care for doing this; you may not either; a company that builds auto parts that they want to sell into the automotive industry may care about standardizing their part IDs quite a lot. They're not interested in generating stupidly long identifiers; that's inefficient in many ways... > IMHO: You ought to use a numeric, auto-generated sequence (SERIAL) > for you PKEY's ... That's definitely the "Reform" position ;-), and it is a position that can be rationally held. -- select 'cbbrowne' || '@' || 'cbbrowne.com'; http://cbbrowne.com/info/multiplexor.html Rules of the Evil Overlord #185. "If I capture an enemy known for escaping via ingenious and fantastic little gadgets, I will order a full cavity search and confiscate all personal items before throwing him in my dungeon." <http://www.eviloverlord.com/>
On Thu, 01 Jun 2006, Chris Browne wrote: > Celko is decidedly *NOT* promoting the notion that you should use a > 100 byte long "natural key." > > Jamie's comments of "Orthodox versus Reform" seem reasonably > appropriate in outlining something of the difference between the > positions. Just to be clear, that was all I was trying to do. I probably should have mentioned that any attempt to use such an attribute as a PK should be met with a baseball bat or other shillelagh-ish implement, but was interrupted several times during that email drafting. > I may not care for doing this; you may not either; a company that > builds auto parts that they want to sell into the automotive industry > may care about standardizing their part IDs quite a lot. This is another important point. In some situations, a rigid data model can be a godsend to coders. If you happen to sit in such an enviable position, I would encourage you to take advantage of it. (This doesn't mean picking bad keys, of course.) I liberally sprinkle surrogate keys around simply because most of the projects I work on have transient requirements, so spontaneous rejiggery and various pokery are both commonplace, and SKs provide "enough" data integrity that the cost/benefit curve seems to peak there. Were I doing projects that had longer release cycles, I'd re-evaluate that position, and likely see a marginal reduction in bugs. None of this should be taken as bashing Celko - he's a smart man and an excellent source of advice. -j -- Jamie Lawrence jal@jal.org When I talked to the president, he was loaded. - Brent Scowcroft, Kissinger's assistant, 10/11/73
On 6/2/06, postgres@jal.org <postgres@jal.org> wrote: > On Thu, 01 Jun 2006, Chris Browne wrote: > > > Celko is decidedly *NOT* promoting the notion that you should use a > > 100 byte long "natural key." > > > > Jamie's comments of "Orthodox versus Reform" seem reasonably > > appropriate in outlining something of the difference between the > > positions. > > Just to be clear, that was all I was trying to do. I probably should > have mentioned that any attempt to use such an attribute as a PK should > be met with a baseball bat or other shillelagh-ish implement, but was > interrupted several times during that email drafting. > > > I may not care for doing this; you may not either; a company that > > builds auto parts that they want to sell into the automotive industry > > may care about standardizing their part IDs quite a lot. > > This is another important point. In some situations, a rigid data model > can be a godsend to coders. If you happen to sit in such an enviable > position, I would encourage you to take advantage of it. (This doesn't > mean picking bad keys, of course.) > > None of this should be taken as bashing Celko - he's a smart man and an > excellent source of advice. > > -j > Thanks everyone who replied (and also for the insightful and measured responses, not every news group is so lucky). I had progressed down the path of the serial id column but re-reading Celko's book - he spends some pages railing against "proprietary auto-numbering features" - I wanted to feel confident I was making the right choice. Thanks again Dave