RE: Database Design Question - Mailing list pgsql-sql

From Joel Burton
Subject RE: Database Design Question
Date
Msg-id Pine.LNX.4.21.0107272316430.18148-100000@olympus.scw.org
Whole thread Raw
In response to RE: Database Design Question  (Jimmie Fulton <JFulton@ehso.emory.edu>)
List pgsql-sql
On Fri, 27 Jul 2001, Jimmie Fulton wrote:

> I'm by no means a crack database designer, but I do have my ideas on this
> subject.  I prefer that every table has a unique integer (auto-incrementing)
> primary key.  Why?  Consistency.  With the alternative, some tables may not
> have a clear-cut candidate for a unique id.  In your example, you had
> "customer".  How many "John Smith"s would it take before we decide that is
> not a good identifier.  On the other hand, some tables would have perfectly
> logical identifiers.  Part numbers, SSNs....  So, you would need to create
> some tables with integer primary keys, and others would have some other
> natural identifier.  That to me is inconsistent.  Every table should be,
> IMHO, predictable in it's definition of a primary key.  I don't even have to
> guess what the names of my primary keys are either because the are all named
> <tablename>_ID.  Always.  I've only come up with these thoughts on my own,
> and have not extensively tried the other way, so I'd be interested in
> hearing other's ideas for the other side.

For large scale databases, there are theories aplenty about proper naming,
etc., and these should be be investigated. For small/simple databases,
this might be overkill in complexity and learning curve.

I teach a series of classes on small-scale database design for nonprofit
organizations, and recommend some simple rules:
* for tables where there is no candidate key (ie a Person table where
there is no SSN, etc.), use the table name + "id", and use a SERIAL-type.
* for tables where there is a candidate key, and that candidate key meets
all the usual criteria (always present, always unique, changes very
rarely), use the table name + "code", and use the appropriate type
(text/int/whatever), UNLESS
* there exists a very common name for this piece of info. Rather than
calling a SSN a "personcode" (in the above example), call it the SSN
* always put the primary key first in the table

Why not always use a SERIAL integer? It's a pain when a perfectly good
value exists. For example, in a table keeping track of US states, their
capitals, and governors, does it really make sense to code Maryland as
"45", when a memorable, unique, unlikely-to-change code already exists
("md")? Using a random number when a real-world code could do only forces
your user to do that lookup themselves.

[apologies to the international readers: Maryland is a state in the USA,
and "MD" is the postal code abbreviation for it]

I think that you could make some basic rules that would give you a system
that is intuitive and easy-to-rememeber, _without_ sacrificing clarity for
rigidity.

hth,
-- 
Joel Burton   <jburton@scw.org>
Director of Information Systems, Support Center of Washington



pgsql-sql by date:

Previous
From: "John Oakes"
Date:
Subject: Why does this plpgslq always return 1?
Next
From: Stephan Szabo
Date:
Subject: Re: Why does this plpgslq always return 1?