Re: Integrity on large sites - Mailing list pgsql-general

From Andrew Sullivan
Subject Re: Integrity on large sites
Date
Msg-id 20070523194115.GC27056@phlogiston.dyndns.org
Whole thread Raw
In response to Integrity on large sites  (Naz Gassiep <naz@mira.net>)
List pgsql-general
On Wed, May 23, 2007 at 12:12:52PM +1000, Naz Gassiep wrote:
> give me nightmares. Is it really true that large sites turn RI off to
> improve performance,

You can't "turn it off", but you can "not use it".  And I suppose
there are shops where they don't use it; after all, you can make any
computer system arbitrarily fast if the answer doesn't have to be
right.

By the way, the idea that application-level checks will make your
data integrity cheaper is the sort of idea that application
programmers who don't know anything about databases like to flog.
There are of course occasions where this is trivially true (e.g., you
require two pieces of data in a command, so you error before talking
to the database if in your parse stage they're not both there).  But
any non-trivial integrity check is automatically going to impose
database queries, and anyone who claims the application programmer
can magically make the round trip cheaper than doing the same
operation inside the database is, bluntly, talking nonsense.

Worse, by moving the checking code out to the application, you also
move the maintenance of all that checking code out into the
application, where two different programmers can implement these
one-off checks in subtly different ways, introducing strange,
hard-to-troubleshoot data anomalies that take days to puzzle out and
fix.  Then someone in senior management asks why the database didn't
just catch this on its own, at which point you re-implement all those
foreign keys while _still_ paying the cost of the client-side
validation code (which never gets ripped out), which means that the
whole thing ends up operating _slower_ than any other possible
implementation.  I Have Been In That Meeting.

The whole reason we're using relational databases is so that the
relations can be queried _and maintained_.  Databases vendors didn't
invent foreign keys in order to slow down their systems so they could
have angry customers.  They implemented them in order to protect
their customers' data from bugs in application code.  If your data is
worth storing, it's surely worth storing correctly.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
"The year's penultimate month" is not in truth a good way of saying
November.
        --H.W. Fowler

pgsql-general by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Remove query results from cache
Next
From: Marek Lewczuk
Date:
Subject: the future of pljava development