Home > mailing lists

Re: surrogate key or not? - Mailing list pgsql-sql

From	Josh Berkus
Subject	Re: surrogate key or not?
Date	July 22, 2004 17:07:18
Msg-id	200407221005.59482.josh@agliodbs.com Whole thread Raw
In response to	Re: surrogate key or not? (Karsten Hilbert <Karsten.Hilbert@gmx.net>)
Responses	Re: surrogate key or not?
List	pgsql-sql

Tree view

Kasten,

> I have until now used surrogate primary keys on all table like
> so:

<snip>

> Short question: Is this OK re your concerns for using
> surrogates, eg. using a surrogate but making sure that at any
> one time there *would* be a real primary key candidate ?

Yes, this is an example of that. You have a long text field as part of the
key, and that would kill you performance-wise if diagnosis was referred to in
other tables and joined in views.

Keep in mind, though, that if a table is an "end node"; that is, if its PK is
not used as an FK by any other table, then worries about the performance of
keys and size of indexes are unfounded. In fact, for such tables, the
surrogate key is a performance drag; it adds a column and an index which are
not needed.

Now, addressing your table, I would have concerns other than the use of
primary keys. I suggest humbly that your data model/business logic may
need some development:

create table diagnosis ( pk serial primary key, fk_patient integer not null references patient(pk)
on update cascade on delete cascade, narrative text not null, unique(fk_patient, narrative)

);

This was obviously created so that a patient could have multiple diagnoses.
However, there is no information in the table to indicate *why* there are
multiple diagnoses. And you are using a real key based on a long text
field; always hazardous, as there are many ways to phrase the same
information and duplication is likely. To do it in english, your postulates
look like:

PATIENT 67 was given a diagnosis of WATER ON THE KNEE.
PATIENT 456 was given a diagnosis of ACUTE HYPOCHONDRIA.

But this is a bit sketchy. Who made these diagnoses? When did they make
them? Why? This table could carry a *lot* more information, and should (sql
is shorthand)

create table diagnosis ( pk serial primary key, fk_patient integer references patient(pk), fk_visit integer
referencesvisits(pk), fk_complaint integer references complaints(pk) fk_staff integer references medical_staff(pk)
narrative text, unique(fk_patient, fk_visit, fk_complaint, fk_staff)

);

Then your postulates become *much* more informative:

PATIENT 67 was given a diagnosis by STAFF MEMBER 12 on his VISIT #3 in response to NOT BEING ABLE TO WALK of WATER ON
THEKNEE

PATIENT 456 was given a diagnosis by STAFF MEMBER 19 on his VISIT #192 in response to THE CREEPY-CRAWLIES of ACUTE
HYPOCHONDRIA

It also allows you to establish a much more useful key; it's reasonable to
expect that a single staff member on one visit in response to one complaint
would only give one diagnosis. Otherwise, you have more than database
problems. And it prevents you from having to rely on a flaky long text key.

--
Josh Berkus
Aglio Database Solutions
San Francisco

pgsql-sql by date:

From: Gerardo Castillo
Date: 22 July 2004, 16:46:24
Subject: Problem with transaction in functions and tempory tables

From: Tom Lane
Date: 22 July 2004, 17:10:52
Subject: Re: LIKE on index not working

Re: surrogate key or not? - Mailing list pgsql-sql

Previous

Next