Re: surrogate key or not? - Mailing list pgsql-sql
From | Josh Berkus |
---|---|
Subject | Re: surrogate key or not? |
Date | |
Msg-id | 200407221005.59482.josh@agliodbs.com Whole thread Raw |
In response to | Re: surrogate key or not? (Karsten Hilbert <Karsten.Hilbert@gmx.net>) |
Responses |
Re: surrogate key or not?
|
List | pgsql-sql |
Kasten, > I have until now used surrogate primary keys on all table like > so: <snip> > Short question: Is this OK re your concerns for using > surrogates, eg. using a surrogate but making sure that at any > one time there *would* be a real primary key candidate ? Yes, this is an example of that. You have a long text field as part of the key, and that would kill you performance-wise if diagnosis was referred to in other tables and joined in views. Keep in mind, though, that if a table is an "end node"; that is, if its PK is not used as an FK by any other table, then worries about the performance of keys and size of indexes are unfounded. In fact, for such tables, the surrogate key is a performance drag; it adds a column and an index which are not needed. Now, addressing your table, I would have concerns other than the use of primary keys. I suggest humbly that your data model/business logic may need some development: create table diagnosis ( pk serial primary key, fk_patient integer not null references patient(pk) on update cascade on delete cascade, narrative text not null, unique(fk_patient, narrative) ); This was obviously created so that a patient could have multiple diagnoses. However, there is no information in the table to indicate *why* there are multiple diagnoses. And you are using a real key based on a long text field; always hazardous, as there are many ways to phrase the same information and duplication is likely. To do it in english, your postulates look like: PATIENT 67 was given a diagnosis of WATER ON THE KNEE. PATIENT 456 was given a diagnosis of ACUTE HYPOCHONDRIA. But this is a bit sketchy. Who made these diagnoses? When did they make them? Why? This table could carry a *lot* more information, and should (sql is shorthand) create table diagnosis ( pk serial primary key, fk_patient integer references patient(pk), fk_visit integer referencesvisits(pk), fk_complaint integer references complaints(pk) fk_staff integer references medical_staff(pk) narrative text, unique(fk_patient, fk_visit, fk_complaint, fk_staff) ); Then your postulates become *much* more informative: PATIENT 67 was given a diagnosis by STAFF MEMBER 12 on his VISIT #3 in response to NOT BEING ABLE TO WALK of WATER ON THEKNEE PATIENT 456 was given a diagnosis by STAFF MEMBER 19 on his VISIT #192 in response to THE CREEPY-CRAWLIES of ACUTE HYPOCHONDRIA It also allows you to establish a much more useful key; it's reasonable to expect that a single staff member on one visit in response to one complaint would only give one diagnosis. Otherwise, you have more than database problems. And it prevents you from having to rely on a flaky long text key. -- Josh Berkus Aglio Database Solutions San Francisco