Thread: Further thoughts on Referential Integrity

Further thoughts on Referential Integrity

From

"Wm. G. Urquhart"

Date:

20 May 2002, 08:52:08

I'm going to show my ignorance of PortgreSQL here since I've only been
using it for a week! But I've been reading the documentation and I've came
across CREATE RULE. I assume from what I understand it would be possible
to implement my DEFAULT 0 using a rule.

I'd need to remove the actual constraint from the table column and then
use an INSERT / UPDATE rule to check to see if the column was 0 then I
ignore it, if it's not; then I check to see if its in the 'referenced'
table.

Would this work or am I asking for trouble?

Your thoughts would be appreciated.

--
Regards,


Wm. G. Urquhart
Custom Software Solutions
http://www.wurquhart.co.uk

Re: Further thoughts on Referential Integrity

From

"Joel Burton"

Date:

20 May 2002, 09:22:15

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Wm. G. Urquhart
> Sent: Monday, May 20, 2002 9:07 AM
> To: PostgreSQL General Forum
> Subject: [GENERAL] Further thoughts on Referential Integrity
>
>
> I'm going to show my ignorance of PortgreSQL here since I've only been
> using it for a week! But I've been reading the documentation and I've came
> across CREATE RULE. I assume from what I understand it would be possible
> to implement my DEFAULT 0 using a rule.
>
> I'd need to remove the actual constraint from the table column and then
> use an INSERT / UPDATE rule to check to see if the column was 0 then I
> ignore it, if it's not; then I check to see if its in the 'referenced'
> table.
>
> Would this work or am I asking for trouble?

Both. :)

Why are you not storing NULL in the table, rather than 0? This would require
no additional work on your part and would let you use the built-in RI
features. Plus, with a VIEW, you could always show 0 rather than NULL
(COALESCE ...). [You referenced a disaster yesterday, but don't seem to have
posted to the list about it.]

If you *really* want to store 0 rather than NULL, you could either:

(a) add a 0-value column in the referenced table (cheating but easy and
straightforward -- I used to call these "orphanages": places to keep an
records that would otherwise be RI orphans), or

(b) re-write the PG referential integrity features using triggers. Put
BEFORE triggers for INSERT and UPDATE on the child table, and on DELETE on
the parent table. The triggers should call plpgsql functions that check for
the related values and either raise an exception or allow it. This isn't
that much work, but: a) it's non-standard (much easier to understand later
if you use the built-in RI, plus you get CASCADE features, DEFERRABLE
options, etc.), and b) it will be much slower than the built-in versions
(which are written in C).

Out of these options, I'd strongly recommend a preference of traditional
NULLs for unknown, then adding a 0-value column, and re-writing RI only if
you have a _really_ good reason. Do you?

- J.

Joel BURTON | joel@joelburton.com | joelburton.com | aim: wjoelburton
Knowledge Management & Technology Consultant

Re: Further thoughts on Referential Integrity

From

"Wm. G. Urquhart"

Date:

20 May 2002, 09:33:48

On Mon, 20 May 2002, Joel Burton wrote:

<snip/>

> Both. :)
>
> Why are you not storing NULL in the table, rather than 0? This would require
> no additional work on your part and would let you use the built-in RI
> features. Plus, with a VIEW, you could always show 0 rather than NULL
> (COALESCE ...). [You referenced a disaster yesterday, but don't seem to have
> posted to the list about it.]

This is true and the obvious answer. But since in my mental dictionary
NULL means undefined I wanted to use 0 to allow me to speed up! selects
for reports.

SELECT * From Patients WHERE Vaccine = 0;

as opposed to

SELECT * FROM Patients WHERE Vaccine IS NULL ;

Since I assume that NULL is not included in an index? I may be talking
absolute drivel here but I hope you see the logic (albeit twisted) of my
approach.

undefined then

> If you *really* want to store 0 rather than NULL, you could either:
>
> (a) add a 0-value column in the referenced table (cheating but easy and
> straightforward -- I used to call these "orphanages": places to keep an
> records that would otherwise be RI orphans), or

Can you explain what the above means / is and how to do it please.

> (b) re-write the PG referential integrity features using triggers. Put
> BEFORE triggers for INSERT and UPDATE on the child table, and on DELETE on
> the parent table. The triggers should call plpgsql functions that check for
> the related values and either raise an exception or allow it. This isn't
> that much work, but: a) it's non-standard (much easier to understand later
> if you use the built-in RI, plus you get CASCADE features, DEFERRABLE
> options, etc.), and b) it will be much slower than the built-in versions
> (which are written in C).
>
> Out of these options, I'd strongly recommend a preference of traditional
> NULLs for unknown, then adding a 0-value column, and re-writing RI only if
> you have a _really_ good reason. Do you?
>
> - J.
>
> Joel BURTON | joel@joelburton.com | joelburton.com | aim: wjoelburton
> Knowledge Management & Technology Consultant
>

As regards my reference to the 'disaster' my Server, an old Sun, went tits
up yesterday when a hard disk went down, followed by the PSU!

--
Regards,


Wm. G. Urquhart
Custom Software Solutions
http://www.wurquhart.co.uk

Re: Further thoughts on Referential Integrity

From

"Joel Burton"

Date:

20 May 2002, 09:59:39

> -----Original Message-----
> From: Wm. G. Urquhart [mailto:wgu@wurquhart.co.uk]
> Sent: Monday, May 20, 2002 9:48 AM
> To: Joel Burton
> Cc: PostgreSQL General Forum
> Subject: RE: [GENERAL] Further thoughts on Referential Integrity
>
>
> On Mon, 20 May 2002, Joel Burton wrote:
>
> > Why are you not storing NULL in the table, rather than 0? This
> would require
> > no additional work on your part and would let you use the built-in RI
> > features. Plus, with a VIEW, you could always show 0 rather than NULL
> > (COALESCE ...).
>
> This is true and the obvious answer. But since in my mental dictionary
> NULL means undefined I wanted to use 0 to allow me to speed up! selects
> for reports.
>
> SELECT * From Patients WHERE Vaccine = 0;
>
> as opposed to
>
> SELECT * FROM Patients WHERE Vaccine IS NULL ;
>
> Since I assume that NULL is not included in an index? I may be talking
> absolute drivel here but I hope you see the logic (albeit twisted) of my
> approach.

Finding NULL integers in a table does take longer than finding 0 values when
an index is used, so, yes, you're right here.

create table lotsaints(i int);

create function fillints(int) returns int as 'begin for i in 1 .. $1 loop
insert into lotsaints values ( (random()*10000)::int+1 ); end loop; return
$1; end' language plpgsql;

select fillints(10000);

create function addstuff(int) returns int as 'begin for i in 1 .. $1 loop
insert into lotsaints values (0); insert into lotsaints values (null); end
loop; return $1; end' language plpgsql;

select addstuff(160);

create index lotsaints_idx on lotsaints(i);
vacuum lotsaints;
set enable_seqscan=off;

explain analyze select * from lotsaints where i=0;       -- 3.14 msec
explain analyze select * from lotsaints where i is null; -- 15.19 msec

> > (a) add a 0-value column in the referenced table (cheating but easy and
> > straightforward -- I used to call these "orphanages": places to keep an
> > records that would otherwise be RI orphans), or
>
> Can you explain what the above means / is and how to do it please.

Let's assume that this is a customers -> orders database.

It's a simple idea: in the customers table, add a fake customer with PK=0.
It's then legal to reference this from the child table. You can either
exclude this customer in your SELECT statements or (better) write a VIEW
that does this for you, and work from the view.

--

So, I'd say you _still_ have three options:

* use NULLs (which will be slower by a bit when the index is being used),

* use an orphanage (which will be slower by a __tiny__ bit for all selects,
because you'll be adding one more exclusion to your WHERE clause)

* or re-write RI (which will be much slower for INSERTs, DELETEs, UPDATEs,
and might be a bit more of a pain to port to other systems or maintain).

Unless you very rarely make changes to the table or update speeds don't
matter, I wouldn't do the RI-rewriting.

I, personally, would pick the first option in most cases, but I have a
fairly fast server, fairly small data sets, and I like the clean approach. A
few microseconds here or there won't add up to much for me. But if you have
lots of data, the second is probably your best bet.

HTH.

Joel BURTON | joel@joelburton.com | joelburton.com | aim: wjoelburton
Knowledge Management & Technology Consultant

Re: Further thoughts on Referential Integrity

From

Tom Lane

Date:

20 May 2002, 11:14:18

"Joel Burton" <joel@joelburton.com> writes:
> So, I'd say you _still_ have three options:

> * use NULLs (which will be slower by a bit when the index is being used),

> * use an orphanage (which will be slower by a __tiny__ bit for all selects,
> because you'll be adding one more exclusion to your WHERE clause)

> * or re-write RI (which will be much slower for INSERTs, DELETEs, UPDATEs,
> and might be a bit more of a pain to port to other systems or maintain).

There's also option 4:

* use NULLs, and make a partial index using "WHERE foo IS NULL" to
catalog the rows containing nulls.

The cost of maintaining the extra index wouldn't be worth it unless
looking for the null rows is done very frequently ... but it is another
option to consider.

            regards, tom lane