Re: surrogate key or not? - Mailing list pgsql-sql

From Josh Berkus
Subject Re: surrogate key or not?
Date
Msg-id 200407221005.59482.josh@agliodbs.com
Whole thread Raw
In response to Re: surrogate key or not?  (Karsten Hilbert <Karsten.Hilbert@gmx.net>)
Responses Re: surrogate key or not?
List pgsql-sql
Kasten,

> I have until now used surrogate primary keys on all table like
> so:

<snip>

> Short question: Is this OK re your concerns for using
> surrogates, eg. using a surrogate but making sure that at any
> one time there *would* be a real primary key candidate ?

Yes, this is an example of that.  You have a long text field as part of the 
key, and that would kill you performance-wise if diagnosis was referred to in 
other tables and joined in views.   

Keep in mind, though, that if a table is an "end node"; that is, if its PK is 
not used as an FK by any other table, then worries about the performance of 
keys and size of indexes are unfounded.   In fact, for such tables, the 
surrogate key is a performance drag; it adds a column and an index which are 
not needed.

Now, addressing your table, I would have concerns other than the use of 
primary keys.    I suggest humbly that your data model/business logic may 
need some development:

create table diagnosis (   pk serial primary key,   fk_patient integer       not null       references patient(pk)
on update cascade       on delete cascade,   narrative text       not null,   unique(fk_patient, narrative)
 
);

This was obviously created so that a patient could have multiple diagnoses.   
However, there is no information in the table to indicate *why* there are 
multiple diagnoses.   And you are using a real key based on a long text 
field; always hazardous, as there are many ways to phrase the same 
information and duplication is likely.   To do it in english, your postulates 
look like:

PATIENT 67 was given a diagnosis of WATER ON THE KNEE.
PATIENT 456 was given a diagnosis of ACUTE HYPOCHONDRIA.

But this is a bit sketchy.   Who made these diagnoses?   When did they make 
them?  Why?  This table could carry a *lot* more information, and should (sql 
is shorthand)

create table diagnosis (   pk serial primary key,   fk_patient integer  references patient(pk),   fk_visit integer
referencesvisits(pk),   fk_complaint integer references complaints(pk)   fk_staff integer references medical_staff(pk)
narrative text,   unique(fk_patient, fk_visit, fk_complaint, fk_staff)
 
);

Then your postulates become *much* more informative:

PATIENT 67 was given a diagnosis by STAFF MEMBER 12 on his VISIT #3   in response to NOT BEING ABLE TO WALK of WATER ON
THEKNEE
 
PATIENT 456 was given a diagnosis by STAFF MEMBER 19 on his VISIT #192   in response to THE CREEPY-CRAWLIES of ACUTE
HYPOCHONDRIA

It also allows you to establish a much more useful key; it's reasonable to 
expect that a single staff member on one visit in response to one complaint 
would only give one diagnosis.   Otherwise, you have more than database 
problems.  And it prevents you from having to rely on a flaky long text key.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco


pgsql-sql by date:

Previous
From: Gerardo Castillo
Date:
Subject: Problem with transaction in functions and tempory tables
Next
From: Tom Lane
Date:
Subject: Re: LIKE on index not working