Thread: Further thoughts on Referential Integrity
I'm going to show my ignorance of PortgreSQL here since I've only been using it for a week! But I've been reading the documentation and I've came across CREATE RULE. I assume from what I understand it would be possible to implement my DEFAULT 0 using a rule. I'd need to remove the actual constraint from the table column and then use an INSERT / UPDATE rule to check to see if the column was 0 then I ignore it, if it's not; then I check to see if its in the 'referenced' table. Would this work or am I asking for trouble? Your thoughts would be appreciated. -- Regards, Wm. G. Urquhart Custom Software Solutions http://www.wurquhart.co.uk
> -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Wm. G. Urquhart > Sent: Monday, May 20, 2002 9:07 AM > To: PostgreSQL General Forum > Subject: [GENERAL] Further thoughts on Referential Integrity > > > I'm going to show my ignorance of PortgreSQL here since I've only been > using it for a week! But I've been reading the documentation and I've came > across CREATE RULE. I assume from what I understand it would be possible > to implement my DEFAULT 0 using a rule. > > I'd need to remove the actual constraint from the table column and then > use an INSERT / UPDATE rule to check to see if the column was 0 then I > ignore it, if it's not; then I check to see if its in the 'referenced' > table. > > Would this work or am I asking for trouble? Both. :) Why are you not storing NULL in the table, rather than 0? This would require no additional work on your part and would let you use the built-in RI features. Plus, with a VIEW, you could always show 0 rather than NULL (COALESCE ...). [You referenced a disaster yesterday, but don't seem to have posted to the list about it.] If you *really* want to store 0 rather than NULL, you could either: (a) add a 0-value column in the referenced table (cheating but easy and straightforward -- I used to call these "orphanages": places to keep an records that would otherwise be RI orphans), or (b) re-write the PG referential integrity features using triggers. Put BEFORE triggers for INSERT and UPDATE on the child table, and on DELETE on the parent table. The triggers should call plpgsql functions that check for the related values and either raise an exception or allow it. This isn't that much work, but: a) it's non-standard (much easier to understand later if you use the built-in RI, plus you get CASCADE features, DEFERRABLE options, etc.), and b) it will be much slower than the built-in versions (which are written in C). Out of these options, I'd strongly recommend a preference of traditional NULLs for unknown, then adding a 0-value column, and re-writing RI only if you have a _really_ good reason. Do you? - J. Joel BURTON | joel@joelburton.com | joelburton.com | aim: wjoelburton Knowledge Management & Technology Consultant
On Mon, 20 May 2002, Joel Burton wrote: <snip/> > Both. :) > > Why are you not storing NULL in the table, rather than 0? This would require > no additional work on your part and would let you use the built-in RI > features. Plus, with a VIEW, you could always show 0 rather than NULL > (COALESCE ...). [You referenced a disaster yesterday, but don't seem to have > posted to the list about it.] This is true and the obvious answer. But since in my mental dictionary NULL means undefined I wanted to use 0 to allow me to speed up! selects for reports. SELECT * From Patients WHERE Vaccine = 0; as opposed to SELECT * FROM Patients WHERE Vaccine IS NULL ; Since I assume that NULL is not included in an index? I may be talking absolute drivel here but I hope you see the logic (albeit twisted) of my approach. undefined then > If you *really* want to store 0 rather than NULL, you could either: > > (a) add a 0-value column in the referenced table (cheating but easy and > straightforward -- I used to call these "orphanages": places to keep an > records that would otherwise be RI orphans), or Can you explain what the above means / is and how to do it please. > (b) re-write the PG referential integrity features using triggers. Put > BEFORE triggers for INSERT and UPDATE on the child table, and on DELETE on > the parent table. The triggers should call plpgsql functions that check for > the related values and either raise an exception or allow it. This isn't > that much work, but: a) it's non-standard (much easier to understand later > if you use the built-in RI, plus you get CASCADE features, DEFERRABLE > options, etc.), and b) it will be much slower than the built-in versions > (which are written in C). > > Out of these options, I'd strongly recommend a preference of traditional > NULLs for unknown, then adding a 0-value column, and re-writing RI only if > you have a _really_ good reason. Do you? > > - J. > > Joel BURTON | joel@joelburton.com | joelburton.com | aim: wjoelburton > Knowledge Management & Technology Consultant > As regards my reference to the 'disaster' my Server, an old Sun, went tits up yesterday when a hard disk went down, followed by the PSU! -- Regards, Wm. G. Urquhart Custom Software Solutions http://www.wurquhart.co.uk
> -----Original Message----- > From: Wm. G. Urquhart [mailto:wgu@wurquhart.co.uk] > Sent: Monday, May 20, 2002 9:48 AM > To: Joel Burton > Cc: PostgreSQL General Forum > Subject: RE: [GENERAL] Further thoughts on Referential Integrity > > > On Mon, 20 May 2002, Joel Burton wrote: > > > Why are you not storing NULL in the table, rather than 0? This > would require > > no additional work on your part and would let you use the built-in RI > > features. Plus, with a VIEW, you could always show 0 rather than NULL > > (COALESCE ...). > > This is true and the obvious answer. But since in my mental dictionary > NULL means undefined I wanted to use 0 to allow me to speed up! selects > for reports. > > SELECT * From Patients WHERE Vaccine = 0; > > as opposed to > > SELECT * FROM Patients WHERE Vaccine IS NULL ; > > Since I assume that NULL is not included in an index? I may be talking > absolute drivel here but I hope you see the logic (albeit twisted) of my > approach. Finding NULL integers in a table does take longer than finding 0 values when an index is used, so, yes, you're right here. create table lotsaints(i int); create function fillints(int) returns int as 'begin for i in 1 .. $1 loop insert into lotsaints values ( (random()*10000)::int+1 ); end loop; return $1; end' language plpgsql; select fillints(10000); create function addstuff(int) returns int as 'begin for i in 1 .. $1 loop insert into lotsaints values (0); insert into lotsaints values (null); end loop; return $1; end' language plpgsql; select addstuff(160); create index lotsaints_idx on lotsaints(i); vacuum lotsaints; set enable_seqscan=off; explain analyze select * from lotsaints where i=0; -- 3.14 msec explain analyze select * from lotsaints where i is null; -- 15.19 msec > > (a) add a 0-value column in the referenced table (cheating but easy and > > straightforward -- I used to call these "orphanages": places to keep an > > records that would otherwise be RI orphans), or > > Can you explain what the above means / is and how to do it please. Let's assume that this is a customers -> orders database. It's a simple idea: in the customers table, add a fake customer with PK=0. It's then legal to reference this from the child table. You can either exclude this customer in your SELECT statements or (better) write a VIEW that does this for you, and work from the view. -- So, I'd say you _still_ have three options: * use NULLs (which will be slower by a bit when the index is being used), * use an orphanage (which will be slower by a __tiny__ bit for all selects, because you'll be adding one more exclusion to your WHERE clause) * or re-write RI (which will be much slower for INSERTs, DELETEs, UPDATEs, and might be a bit more of a pain to port to other systems or maintain). Unless you very rarely make changes to the table or update speeds don't matter, I wouldn't do the RI-rewriting. I, personally, would pick the first option in most cases, but I have a fairly fast server, fairly small data sets, and I like the clean approach. A few microseconds here or there won't add up to much for me. But if you have lots of data, the second is probably your best bet. HTH. Joel BURTON | joel@joelburton.com | joelburton.com | aim: wjoelburton Knowledge Management & Technology Consultant
"Joel Burton" <joel@joelburton.com> writes: > So, I'd say you _still_ have three options: > * use NULLs (which will be slower by a bit when the index is being used), > * use an orphanage (which will be slower by a __tiny__ bit for all selects, > because you'll be adding one more exclusion to your WHERE clause) > * or re-write RI (which will be much slower for INSERTs, DELETEs, UPDATEs, > and might be a bit more of a pain to port to other systems or maintain). There's also option 4: * use NULLs, and make a partial index using "WHERE foo IS NULL" to catalog the rows containing nulls. The cost of maintaining the extra index wouldn't be worth it unless looking for the null rows is done very frequently ... but it is another option to consider. regards, tom lane