Thread: Table design question

Table design question

From
"David Clarke"
Date:
I'm reading Joe Celko's book SQL Programming Style for the second time
and although I've been an OO developer for quite a few years I'm
fairly green wrt SQL. Joe is obviously something of a curmudgeon and I
would fall squarely into his newbie OO developer ordinal scale and I'm
trying to avoid the slide into stupid newbie OO developer.

So I'm designing a table and I'm looking for an appropriate key. The
natural key is a string from a few characters up to a maximum of
perhaps 100. Joe gets quite fierce about avoiding the use of a serial
id column as a key. The string is unique in the table and fits the
criteria for a key. So should I follow Joe's advice and use my natural
key as the primary key? It sounds reasonable but it will mean at least
one other table will have the string as a foreign key. My postgres
intro book has id columns all over the place but is it really that big
an issue these days to have a 100 character primary key? Are there
postgres-specific implications for either approach?

Thanks
Dave


Re: Table design question

From
Rod Taylor
Date:
> So I'm designing a table and I'm looking for an appropriate key. The
> natural key is a string from a few characters up to a maximum of
> perhaps 100. Joe gets quite fierce about avoiding the use of a serial
> id column as a key. The string is unique in the table and fits the
> criteria for a key. So should I follow Joe's advice and use my natural
> key as the primary key? It sounds reasonable but it will mean at least
> one other table will have the string as a foreign key. My postgres

Unfortunately as a result of the implementation of most commercial
databases (and PostgreSQL), using a string as the primary key will cause
a performance hit.

Yes, normally it is better and for things like status flags and other
constant values that rarely change it is a good approach.


I would tend to add a SERIAL and make it the primary key (using it in
foreign tables) but also making the string column unique and not
null'able.

Essentially it gives the table two primary keys in the hope that some
day PostgreSQL will add a layer of abstraction between primary/foreign
key lookups and presentation that allows for usage of strings directly
without a performance hit.
-- 



Re: Table design question

From
Greg Stark
Date:
"David Clarke" <pigwin32@gmail.com> writes:

> is it really that big an issue these days to have a 100 character primary
> key? Are there postgres-specific implications for either approach?

It's exactly the same size issue as ever. A 20% increase in space usage is a
20% performance hit in certain types of queries regardless of how fast or
cheap your hardware has become.

This is an issue where reasonable people differ quite vociferously so you may
get contradictory responses. But really it depends heavily on the exact
circumstances and requires a judgement call based on experience. Any time
someone tries to sum it up with a blanket rule it's going to be wrong some of
the time.

But that said I also tend to tilt towards creating serial ids. Unless the
string is already a synthetic unique identifier created by your application
you can't really trust its "uniqueness" for all time. Plenty of people have
built databases using natural unique keys that turned out to be not so unique
or turned out to just be a poor choice for external reasons (think of Social
Security Numbers, for example).

I've had my string unique identifiers corrupted by uppercasing, declared case
insensitive, declared private information that couldn't be leaked, and
declared offensive words that had to be updated. Each of which is a pain to
deal with when it's your primary key.

-- 
greg



Re: Table design question

From
postgres@jal.org
Date:
On Thu, 01 Jun 2006, David Clarke wrote:

> So I'm designing a table and I'm looking for an appropriate key. The
> natural key is a string from a few characters up to a maximum of
> perhaps 100. Joe gets quite fierce about avoiding the use of a serial
> id column as a key. The string is unique in the table and fits the

The use of surrogate keys is a mostly religious issue. Celko is an
Orthodox, many others are Reform. Where you want to align yourself
is a personal choice. 

In defense of the Reform movement, I'd note that modern DBs are more
performant with ints than varchars. More importantly, I've found it
much easier to modify DBs designed with surrogate keys than natural
keys, especially when natural keys span columns. It allows a rather
simple convention for coders to write against, and avoids some messy
modification issues when the spec changes.

It is not my intention to bash the purists, and there are good arguments
on the Ortho side, too. I'm merely giving a bit of advice from the
point of view of someone who lives in the constant evolution side
of DB usage.

-j

-- 
Jamie Lawrence                                        jal@jal.org
When I was a boy I was told that anybody could become President. 
Now I'm beginning to believe it.  - Clarence Darrow




Re: Table design question

From
"codeWarrior"
Date:
I never use anything other than "id SERIAL NOT NULL PRIMARY KEY" for my 
PKEY's -- as an absolute rule -- I guess I am a purist...  Everything else 
(the other columns) can have unique constraints, etcetera and be FOREIGN 
KEYS, etc...

Try INSERTING your 100 character "natural" key into a table with 10M++ rows 
only to find out there there is already a duplicate.... talk about a 
performance hit.... or SELECT -- you end up using way too much RAM and 
bandwidth -- unecessarily...


IMHO: You ought to use a numeric, auto-generated sequence (SERIAL) for you 
PKEY's ...



""David Clarke"" <pigwin32@gmail.com> wrote in message 
news:12b7ac1e0606010405u6e062f71mf4adbaeb6c46df5f@mail.gmail.com...
> I'm reading Joe Celko's book SQL Programming Style for the second time
> and although I've been an OO developer for quite a few years I'm
> fairly green wrt SQL. Joe is obviously something of a curmudgeon and I
> would fall squarely into his newbie OO developer ordinal scale and I'm
> trying to avoid the slide into stupid newbie OO developer.
>
> So I'm designing a table and I'm looking for an appropriate key. The
> natural key is a string from a few characters up to a maximum of
> perhaps 100. Joe gets quite fierce about avoiding the use of a serial
> id column as a key. The string is unique in the table and fits the
> criteria for a key. So should I follow Joe's advice and use my natural
> key as the primary key? It sounds reasonable but it will mean at least
> one other table will have the string as a foreign key. My postgres
> intro book has id columns all over the place but is it really that big
> an issue these days to have a 100 character primary key? Are there
> postgres-specific implications for either approach?
>
> Thanks
> Dave
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
> 




Re: Table design question

From
Chris Browne
Date:
"codeWarrior" <gpatnude@hotmail.com> writes:
> I never use anything other than "id SERIAL NOT NULL PRIMARY KEY" for my 
> PKEY's -- as an absolute rule -- I guess I am a purist...  Everything else 
> (the other columns) can have unique constraints, etcetera and be FOREIGN 
> KEYS, etc...
>
> Try INSERTING your 100 character "natural" key into a table with 10M++ rows 
> only to find out there there is already a duplicate.... talk about a 
> performance hit.... or SELECT -- you end up using way too much RAM and 
> bandwidth -- unecessarily...

Celko is decidedly *NOT* promoting the notion that you should use a
100 byte long "natural key."

Jamie's comments of "Orthodox versus Reform" seem reasonably
appropriate in outlining something of the difference between the
positions.  

Just because Celko is "Orthodox" doesn't mean he's *stupid*; he makes
all sorts of noises about using some ISO standard (11179) such that
your primary key values are actually described in some authoritative
manner.

<http://metadata-standards.org/11179/>

If you go about using ISO-11179, then, in principle, you could
register your scheme for describing IDs for the objects in your system
with some ISO people so that others might even become aware of the
official status of this.

I may not care for doing this; you may not either; a company that
builds auto parts that they want to sell into the automotive industry
may care about standardizing their part IDs quite a lot.

They're not interested in generating stupidly long identifiers; that's
inefficient in many ways...

> IMHO: You ought to use a numeric, auto-generated sequence (SERIAL)
> for you PKEY's ...

That's definitely the "Reform" position ;-), and it is a position that
can be rationally held.
-- 
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://cbbrowne.com/info/multiplexor.html
Rules of  the Evil  Overlord #185.  "If I capture  an enemy  known for
escaping via  ingenious and fantastic  little gadgets, I will  order a
full cavity  search and confiscate all personal  items before throwing
him in my dungeon."  <http://www.eviloverlord.com/>


Re: Table design question

From
postgres@jal.org
Date:
On Thu, 01 Jun 2006, Chris Browne wrote:

> Celko is decidedly *NOT* promoting the notion that you should use a
> 100 byte long "natural key."
> 
> Jamie's comments of "Orthodox versus Reform" seem reasonably
> appropriate in outlining something of the difference between the
> positions.  

Just to be clear, that was all I was trying to do. I probably should
have mentioned that any attempt to use such an attribute as a PK should 
be met with a baseball bat or other shillelagh-ish implement, but was 
interrupted several times during that email drafting.

> I may not care for doing this; you may not either; a company that
> builds auto parts that they want to sell into the automotive industry
> may care about standardizing their part IDs quite a lot.

This is another important point. In some situations, a rigid data model
can be a godsend to coders. If you happen to sit in such an enviable
position, I would encourage you to take advantage of it. (This doesn't
mean picking bad keys, of course.) 

I liberally sprinkle surrogate keys around simply because most of the
projects I work on have transient requirements, so spontaneous rejiggery 
and various pokery are both commonplace, and SKs provide "enough" data
integrity that the cost/benefit curve seems to peak there. Were I doing
projects that had longer release cycles, I'd re-evaluate that position,
and likely see a marginal reduction in bugs.

None of this should be taken as bashing Celko - he's a smart man and an
excellent source of advice.

-j

-- 
Jamie Lawrence                                        jal@jal.org
When I talked to the president, he was loaded. - Brent Scowcroft, Kissinger's assistant, 10/11/73




Re: Table design question

From
"David Clarke"
Date:
On 6/2/06, postgres@jal.org <postgres@jal.org> wrote:
> On Thu, 01 Jun 2006, Chris Browne wrote:
>
> > Celko is decidedly *NOT* promoting the notion that you should use a
> > 100 byte long "natural key."
> >
> > Jamie's comments of "Orthodox versus Reform" seem reasonably
> > appropriate in outlining something of the difference between the
> > positions.
>
> Just to be clear, that was all I was trying to do. I probably should
> have mentioned that any attempt to use such an attribute as a PK should
> be met with a baseball bat or other shillelagh-ish implement, but was
> interrupted several times during that email drafting.
>
> > I may not care for doing this; you may not either; a company that
> > builds auto parts that they want to sell into the automotive industry
> > may care about standardizing their part IDs quite a lot.
>
> This is another important point. In some situations, a rigid data model
> can be a godsend to coders. If you happen to sit in such an enviable
> position, I would encourage you to take advantage of it. (This doesn't
> mean picking bad keys, of course.)
>
> None of this should be taken as bashing Celko - he's a smart man and an
> excellent source of advice.
>
> -j
>

Thanks everyone who replied (and also for the insightful and measured
responses, not every news group is so lucky). I had progressed down
the path of the serial id column but re-reading Celko's book - he
spends some pages railing against "proprietary auto-numbering
features" - I wanted to feel confident I was making the right choice.

Thanks again
Dave