Re: Why not cascade? (was: Using varchar primary keys) - Mailing list pgsql-general

From Merlin Moncure
Subject Re: Why not cascade? (was: Using varchar primary keys)
Date
Msg-id CAHyXU0xKd5H4sFkOHkEKCxs-om5bEs=ULsf=mvdKCiYaNKWsBA@mail.gmail.com
Whole thread Raw
In response to Why not cascade? (was: Using varchar primary keys)  (Gavan Schneider <pg-gts@snkmail.com>)
List pgsql-general
On Wed, Apr 3, 2013 at 2:58 AM, Gavan Schneider <pg-gts@snkmail.com> wrote:
> On 3/4/13 at 1:49 PM, dix1wjifgt@sneakemail.com (Julian
> tempura-at-internode.on.net |pg-gts/Basic|) wrote:
>
>> ... having to really think it out is probably a good sign that you
>> should stick to a surrogate unless you are really sure. (again I don't
>> advocate ON UPDATE CASCADE as a solution should you change your mind)
>>
> OK this is interesting.
>
> Why not cascade?
>
> Assuming someone makes the dB design as straight forward as possible, avoids
> obfuscation of key values (since this mostly only gets the present and the
> next developer into trouble, not the mythical external hacker), and has
> constraints with cascaded updates in place to keep it all consistent.
> Something changes in the real world, the DBA makes the dB reflect this
> change and the cascade ensures everything is still consistent. Where is the
> problem with this?
>
> When there is a lot of work involved this needs to be taken into account,
> but what is the basis for such a general prohibition on a modern SQL dB? why
> not use the feature?

Well, the disadvantages are:
*) you could end up cascading to a large amount of records
*) with a consequently large amount of locks
*) a lot of database developers are oblivious to how RI works
*) a schema level change of a key is a lot easier to do (but really,
the value of this is overstated)

Basically, it boils down to, 'more expensive updates, and schema
changes are slightly easier'

At the end of the day, once you sift through the mountains of BS that
tend to characterize the natural/surrogate debate (which I am going to
very carefully define as using an automatically generated identifier
such as a UUID for purposes of relating) what you end up with is a
system that is optimized for updates at the expense of many other
things, especially large data mining queries (more indexes and more
joins) and general readability.

In many cases this is a fair, or even a good, trade-off...but not
always, so we have to be flexible.  What seals the deal for me though
is that there is a near-perfect correlation with 100% surrogate
databases and what I would describe as gross modeling errors.

merlin


pgsql-general by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: initdb of pg 9.0.13 fails on pg_authid
Next
From: Mads.Tandrup@schneider-electric.com
Date:
Subject: CVE-2013-1899 security issue and limited IP addresses in pg_hba.conf