Re: Primary keys for companies and people - Mailing list pgsql-general

From Merlin Moncure
Subject Re: Primary keys for companies and people
Date
Msg-id b42b73150602030923m62a6c2fck2c4c7a3cdc6b18e2@mail.gmail.com
Whole thread Raw
In response to Re: Primary keys for companies and people  (Michael Glaesemann <grzm@myrealbox.com>)
Responses Re: Primary keys for companies and people  (Michael Glaesemann <grzm@myrealbox.com>)
List pgsql-general
> I definitely agree with you here, Merlin. Mutability is not the issue
> at hand. May I ask what strategies you use for determining uniqueness
> for people?

Well, that depends on the particular problem at hand.  If you had two
john smiths in your system, how would you distinguish them? If you
assinged an account number in your system and gave it to the person to
refer back to you, this ok...this is not a surrogate key per se, but a
meaningful alias.

However, that technique can't always be applied, take the case of the
'contacts' table for an account.  Since you don't give each contact of
each accunt a number and you don't print a number representing them on
various reports (there would be no reason to), adding a surrogate to
the table adds nothing to your database, it's just meaningless
infromation with no semantic value.  There *must* be a semantic
difference between the two John Smiths or you should be storing one
record in the database, not two.

If you kind determine an easy natural differentiator, invent one:
create table contact
(
  account text, name text, memo text,
  primary key(account, name, memo)
);

The memo field is blank in most cases unlees it's needed.  Suppose you
were filling contact information in your databse and Taking your
second John Smith from an account...your operator says, 'we already
have a john smith for your account, can you give us something to
identify him?'  Put that in the memo field and there you go.

Now your operator is taking information which has value pertaining to
the scope of the problem domain your application exists in.  This is
just one example of how to approach the problem  Now there is no
ambiguiity about which john smith you are dealing with. This may not
be a perfect solution for every application but there is basically has
to be a method of finding semantic unquenes to your data or you have a
hole in your data model.  Glossing over that hole with artificial
information solves nothing.

There are pracitcal reasons to use surrogates but the uniqueness
argument generally holds no water.  By 'generating' uniqueness you are
breaking your data on mathematical terms.  Until you truly understand
the ramifcations of that statement you can't really understand when
the practical cases apply.  Many of the arguments for surrugates based
on mutability and uniqueness are simply illogical.

The performance issue is more complex and leads to the issue of
practicality.  I wouldn't find any fault with a modeler who
benchmarked his app with a surrogate vs. a 5 part key  and chose the
former as long as he truly understood the downside to doing that
(extra joins).

Merlin

pgsql-general by date:

Previous
From: David Brain
Date:
Subject: Re: Postgres 7.3.2 -> 8.1.2 upgrade performance issue
Next
From: Michael Fuhr
Date:
Subject: Re: C Language Stored Procedure Returning No Data