Re: Guarenteeing complex referencial integrity through custom triggers - Mailing list pgsql-hackers
From | Joris Dobbelsteen |
---|---|
Subject | Re: Guarenteeing complex referencial integrity through custom triggers |
Date | |
Msg-id | 73427AD314CC364C8DF0FFF9C4D693FF037A4C@nehemiah.joris2k.local Whole thread Raw |
In response to | Re: Guarenteeing complex referencial integrity through custom triggers ("Greg Sabino Mullane" <greg@turnstep.com>) |
List | pgsql-hackers |
>-----Original Message----- >From: pgsql-hackers-owner@postgresql.org >[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Greg >Sabino Mullane >Sent: woensdag 28 maart 2007 2:50 >To: pgsql-hackers@postgresql.org >Subject: Re: [HACKERS] Guarenteeing complex referencial >integrity through custom triggers > [snip] > >Much too elaborate - I'm sorry, but I don't think anyone here >is willing to wade through nearly 900 lines of code. Can you >break it down to a simpler test case? Or try and break the >schema I provided in my previous message perhaps? Was the only readily available example I had available at the moment in very short notice. -- Generic type of printer CREATE TABLE printer ( id SERIAL NOT NULL PRIMARY KEY, brand TEXT NOT NULL, model TEXT NOT NULL ); INSERT INTO printer (brand,model) VALUES ('epson','1200'); INSERT INTO printer (brand,model) VALUES ('hp','laserjet99'); -- A printer can hold one or more cartridges, distinguished by 'color' CREATE TABLE cartridge_types ( ptype INT NOT NULL, CONSTRAINT cc_ptype FOREIGN KEY (ptype) REFERENCES printer(id) ON DELETE RESTRICT, color TEXT NOT NULL, PRIMARY KEY (ptype, color) ); INSERT INTO cartridge_types VALUES (1,'black'); INSERT INTO cartridge_types VALUES (2,'black'); INSERT INTO cartridge_types VALUES (2,'color'); -- Specific printers in the company CREATE TABLE printers ( id SERIAL NOT NULL PRIMARY KEY, ptype INTEGER NOT NULL, CONSTRAINT ptype FOREIGN KEY(ptype) REFERENCES printer(id), location TEXT NOT NULL ); INSERT INTO printers(ptype,location) VALUES (1,'Room 234'); INSERT INTO printers(ptype,location) VALUES (2,'Break room #2'); INSERT INTO printers(ptype,location) VALUES (2,'NE corner of warehouse'); -- Printers require lots of pesky hardware updates CREATE TABLE cartridge_change ( printer_id INT NOT NULL, CONSTRAINT change_printer FOREIGN KEY (printer_id) REFERENCES printers(id), color TEXT NOT NULL, whenchanged TIMESTAMPTZ NOT NULL DEFAULT now() ); -- -- !!!!!!!!!!!!!!!!!!! -- NOTICE -- -- This constraint is invalid, printer_id should reference printers, not printer... -- IF this constraint where valid, you could never change a cartridge on printer #3... --ALTER TABLE cartridge_change ADD CONSTRAINT cc FOREIGN KEY (printer_id, color) REFERENCES cartridge_types (ptype,color); INSERT INTO cartridge_change (printer_id, color) VALUES (1,'black'); -- dropped your last insert, was a constraint violation... ------------------------ >> You cannot enforce the above constraint in the database. The keyword >> is serializable isolation level. >> >> Create new printer + cartidge_defs and such. >> Now in T1 delete a cartridge_def >> Now in T2 insert a cartridge replacement. > >Works as expected in my schema when I tried it: T2 blocked at >the insert, waiting on T1. Once T1 committed, T2 threw an >error, as the insert was no longer valid. Using serializable >or not, same result. As noted above, you constraint does not enforce the constraint I mentioned (though with loose grammer). It cannot use the primitives that are in the postgresql database, but rather need triggers to have them enforced. FOR ALL t IN cartridge_change THERE MUST EXIST (SELECT 1 FROM printers INNER JOIN cartridge_types ON cartridge_types.ptype= printers.ptype WHERE printers.id = cartridge_change.printer_id AND cartridge_types.color = cartridge_change.color ) If we replace a catridge, the cartridge_type should be defined. Obviously we follow good practices: Before inserting a row into cartridge_change, we should check the cartridge_types table and lock the row. After deleting a row in cartridge_types, we should check the cartridge_change table for constraint violation. We will first lock the parent and then the childs. This gives consistency (normally) and reduces the chance of deadlocks. This means two constraints must be defined (assume you have plpgsql available). CREATE FUNCTION "TR_cartridgeRI_cartridge_change_insupd"() RETURNS trigger AS $BODY$BEGIN-- This must be a BEFORE trigger: we lock the parent first...PERFORM 1FROM printers p INNER JOIN cartridge_typesc ON p.ptype = c.ptypeWHERE p.id = NEW.printer_idAND c.color = NEW.colorFOR SHARE; IF NOT FOUND THEN RAISE EXCEPTION 'Cartridge_type not defined';END IF; RETURN NEW; END;$BODY$ LANGUAGE 'plpgsql' VOLATILE; CREATE TRIGGER "TR_cartridgeRI" BEFORE INSERT OR UPDATE ON cartridge_change FOR EACH ROW EXECUTE PROCEDURE "TR_RI_cartridgeRI_cartridge_change_insupd"(); CREATE FUNCTION "TR_cartridgeRI_cartridge_types_upddel"() RETURNS trigger AS $BODY$BEGIN-- This must be a AFTER trigger: we lock the parent first...PERFORM 1FROM printers p INNER JOIN cartridge_changec ON p.id = c.printer_idWHERE p.ptype = OLD.ptypeAND c.color = OLD.colorFOR SHARE; IF FOUND THEN RAISE EXCEPTION 'Cartridge_changes found';END IF; RETURN NEW; END;$BODY$ LANGUAGE 'plpgsql' VOLATILE; CREATE TRIGGER "TR_cartridgeRI" AFTER UPDATE OR DELETE ON cartridge_types FOR EACH ROW EXECUTE PROCEDURE "TR_cartridgeRI_cartridge_types_upddel"(); -------------------------------------- Lets test it: INSERT INTO cartridge_change (printer_id, color) VALUES (1,'purple'); ERROR: Cartridge_type not defined DELETE FROM cartridge_types; ERROR: Cartridge_changes found So these seem to work perfectly well... --------------------------------------- Now to violate the constraints T1: BEGIN ISOLATION LEVEL SERIALIZABLE; T2: BEGIN ISOLATION LEVEL SERIALIZABLE; -- T1 will insert into cartridge_change, but first the trigger -- it will lock the parent (it should return exactly a single row). -- Now the actual function T1: INSERT INTO cartridge_change (printer_id, color) VALUES (2,'color'); -- T2 will now delete the 'color' cartridge definition for hp laserjet99. -- It will block, but not fail with an error (as it should) T2: DELETE FROM cartridge_types WHERE ptype=2 AND color='color'; -- T1 commits in the meanwhile T1: COMMIT -- T2 is not waiting any more T2: COMMIT; At this point there is something in cartridge_change that shouldn't be there. A color cartridge was changed, even though there wasn't any color cartridge defined any more. --------------------------------------------- This will not happen with read commited isolation, though. However I have no way of enforcing this. The ugly workarround for the problem is to do the following: (please reload the data at this point) BEGIN; DELETE FROM cartridge_change; DELETE FROM printers; DELETE FROM cartridge_types; DELETE FROM printer; INSERT INTO printer (id,brand,model) VALUES (1,'epson','1200'); INSERT INTO printer (id,brand,model) VALUES (2,'hp','laserjet99'); INSERT INTO cartridge_types VALUES (1,'black'); INSERT INTO cartridge_types VALUES (2,'black'); INSERT INTO cartridge_types VALUES (2,'color'); INSERT INTO printers(id,ptype,location) VALUES (1,1,'Room 234'); INSERT INTO printers(id,ptype,location) VALUES (2,2,'Break room #2'); INSERT INTO printers(id,ptype,location) VALUES (3,2,'NE corner of warehouse'); INSERT INTO cartridge_change (printer_id, color) VALUES (1,'black'); COMMIT; Now patch the first trigger to do an update operation instead... CREATE OR REPLACE FUNCTION "TR_cartridgeRI_cartridge_change_insupd"() RETURNS trigger AS $BODY$BEGIN-- This must be a BEFORE trigger: we lock the parent first...PERFORM 1FROM printers p INNER JOIN cartridge_typesc ON p.ptype = c.ptypeWHERE p.id = NEW.printer_idAND c.color = NEW.colorFOR SHARE; IF NOT FOUND THEN RAISE EXCEPTION 'Cartridge_type not defined';END IF; RETURN NEW; END;$BODY$ LANGUAGE 'plpgsql' VOLATILE; If you now rerun on the At this point the T2: DELETE ... Will fail with "ERROR: could not serialize access due to concurrent update". This is what is desired, but not at the costs presented here. This things has a few undesired side-effects: * Physical access for delete/insert on WAL and table files. * Takes an exclusive lock, so blocks other transactions trying to insert/modify. * Two simultanously active transactions doing inserts, with the second serializable, will produce a "cannot serialize" failure, though its not needed in any way. In this case we rely (explicitly) on locking of the parent tuple. This is the same as done in Oracle (but here there are a few optimizations there, I think). In contrast, the cross-checking over different snapshots has the nice effect of significantly reducing lock contention, since the exclusive lock is a shared lock instead. Foremost it allows for concurrent inserts into the child table (for the same parent). In fact, it truly allows transactions that are active at the same time to do inserts for the same parent tuples. This is the reason for the feature to be desirable: * Guarentee database consistency, even for complex constraints. * Provide the best possible concurrency. ------------------------------------- Now for a little motivation of the subject. My believes are that ACID properties for a database are really good things (hence my not so good opinion about MySQL for the upcoming future). In fact, even several years ago I developed a database that provided decent consistency over its predecessor. Unfortunally performance suffered. However the project was very successful as it finally didn't require frequent network administrator interference to 'fix' the data. Now for complex projects its good if there is a solid system that you can rely on to enforce you constraints on the data. It should support you doing things correctly, and punish for mistakes made in significantly more complex code. The absense of a feature here might have impact on operations running in serializable isolation. Though usage of this will be uncommon, postgres has commited support for it. In this respect one can ask: * Can materialized views run out-of-sync? * Could the constraint be enforced through a materialized view? In fact, if we make assuptions on the correct operation of the triggers: * DoS attacks? * Intentional 'corruption' by users? ------------------------------------- Quite likely I should have done this earlier and not cut the corners as much as possible. I'm sorry for that and should have known earlier. Nevertheless, hopefully this clears it a bit up and provides you with a better understanding of what I intended to say earlier. Thanks for the effort, - Joris
pgsql-hackers by date: