Thread: PostgreSQL 7.4.2 allows foreign key violation
Hi, On PostgreSQL 7.4.2 I can create a situation in which a foreign key is violated: bug=# SELECT b.name, exists(SELECT 1 FROM a WHERE a.name = b.name) FROM b; name | ?column? ------+---------- xxx | f (1 запись) bug=# \d+ b Таблица "public.b" Колонка | Тип | Модификаторы | Описание ----------------+--------+--------------------------+------------------ name | text | not null | Индексы: "b_pkey" ключевое поле, btree (name) Ограничения по вторичному ключу: "$1" FOREIGN KEY (name) REFERENCES a(name) ON UPDATE CASCADE Правила: b_rename AS ON UPDATE TO b DO INSTEAD UPDATE a SET name = new.name WHERE (a.name = old.name) I create the situation as follows: CREATE TABLE a (name TEXT PRIMARY KEY); INSERT INTO a VALUES ('xxx'); CREATE TABLE b (name TEXT PRIMARY KEY REFERENCES a(name) ON UPDATE CASCADE); INSERT INTO b VALUES ('xxx'); CREATE RULE b_rename AS ON UPDATE TO b DO INSTEAD UPDATE a SET name = NEW.name WHERE name = OLD.name; UPDATE b SET name = 'yyy' WHERE name = 'xxx'; SELECT b.name, exists(SELECT 1 FROM a WHERE a.name = b.name) FROM b; name | ?column? ------+---------- xxx | f (1 запись) Up to here I thought that the following was going on: The UPDATE b statement was rewritten into a UPDATE a statement by the rule system. The update on a triggers the foreign key update on b. This UPDATE gets rewritten again by the rule system to update a instead. The update to a triggers the foreign key again, which recognizes that it is already running and does nothing. The outer foreign key is done and the update to a is realized. b stays unchanged. But then I discovered that if I update the row in a prior to creating the rule, the rule works as expected: CREATE TABLE a (name TEXT PRIMARY KEY); INSERT INTO a VALUES ('xxx'); CREATE TABLE b (name TEXT PRIMARY KEY REFERENCES a(name) ON UPDATE CASCADE); INSERT INTO b VALUES ('xxx'); UPDATE a SET name = 'zzz' WHERE name = 'xxx'; CREATE RULE b_rename AS ON UPDATE TO b DO INSTEAD UPDATE a SET name = NEW.name WHERE name = OLD.name; UPDATE b SET name = 'yyy' WHERE name = 'zzz'; SELECT b.name, exists(SELECT 1 FROM a WHERE a.name = b.name) FROM b; name | ?column? ------+---------- yyy | t (1 запись) This somehow renders my theory invalid. Can someone comment? I also tried the same rule without INSTEAD. That does what I want and it is what I'm using in the application now. I wonder if that is The Right Way®. And should PostgreSQL allow foreign key violations like in the example above? Thanks -- Markus Bertheau <twanger@bluetwanger.de>
On Fri, 6 Aug 2004, Markus Bertheau wrote: > Up to here I thought that the following was going on: The UPDATE b > statement was rewritten into a UPDATE a statement by the rule system. > The update on a triggers the foreign key update on b. This UPDATE gets > rewritten again by the rule system to update a instead. The update to a > triggers the foreign key again, which recognizes that it is already > running and does nothing. The outer foreign key is done and the update > to a is realized. b stays unchanged. > > But then I discovered that if I update the row in a prior to creating > the rule, the rule works as expected: > > CREATE TABLE a (name TEXT PRIMARY KEY); > INSERT INTO a VALUES ('xxx'); > CREATE TABLE b (name TEXT PRIMARY KEY REFERENCES a(name) ON UPDATE CASCADE); > INSERT INTO b VALUES ('xxx'); > UPDATE a SET name = 'zzz' WHERE name = 'xxx'; > CREATE RULE b_rename AS ON UPDATE TO b DO INSTEAD UPDATE a SET name = NEW.name WHERE name = OLD.name; > UPDATE b SET name = 'yyy' WHERE name = 'zzz'; > SELECT b.name, exists(SELECT 1 FROM a WHERE a.name = b.name) FROM b; > name | ?column? > ------+---------- > yyy | t > (1 запись) > > This somehow renders my theory invalid. Can someone comment? Only for that session. The foreign key query is planned at the first update so it doesn't see the rule until you get to a new session. > I also tried the same rule without INSTEAD. That does what I want and it > is what I'm using in the application now. I wonder if that is The Right > Way®. Probably. > And should PostgreSQL allow foreign key violations like in the example > above? Probably not. It also looks like before triggers returning NULL can break them. I think we'd been worried about the added cost of doing the check when the average case doesn't have this problem but we should probably just eat it. In practice I think it's one line of code per action function (on update set default already does it). Any opinions out there?
Markus Bertheau <twanger@bluetwanger.de> writes: > I create the situation as follows: > CREATE TABLE a (name TEXT PRIMARY KEY); > INSERT INTO a VALUES ('xxx'); > CREATE TABLE b (name TEXT PRIMARY KEY REFERENCES a(name) ON UPDATE CASCADE); > INSERT INTO b VALUES ('xxx'); > CREATE RULE b_rename AS ON UPDATE TO b DO INSTEAD UPDATE a SET name = NEW.name WHERE name = OLD.name; > UPDATE b SET name = 'yyy' WHERE name = 'xxx'; > SELECT b.name, exists(SELECT 1 FROM a WHERE a.name = b.name) FROM b; The difficulty here is that the CASCADE is implemented by generating an "UPDATE b" command ... which is rewritten by your rule and thus fails to affect table b at all. It would probably be better if the RI implementation acted at a lower level and wasn't affected by rules, but for the foreseeable future the answer is "don't do that". > But then I discovered that if I update the row in a prior to creating > the rule, the rule works as expected: Only for the moment --- you're depending on a cached plan for the foreign-key update. Start a fresh backend and it's broken again. regards, tom lane
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > It also looks like before triggers returning NULL can break them. Or a BEFORE trigger that overrides the attempted field update. > I think we'd been worried about the added cost of doing the check when > the average case doesn't have this problem but we should probably just > eat it. In practice I think it's one line of code per action function > (on update set default already does it). Already does what? I see nothing in there that would override either triggers or rules... > Any opinions out there? I seem to recall some discussions to the effect that having these updates subject to rules/triggers is not necessarily bad. For example, if you were using a rule or trigger to log all updates of table B someplace else, you'd probably be annoyed to find the RI updates bypassing your logging mechanism. There's no perfect solution ... regards, tom lane
On Fri, 6 Aug 2004, Tom Lane wrote: > Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > > It also looks like before triggers returning NULL can break them. > > Or a BEFORE trigger that overrides the attempted field update. > > > I think we'd been worried about the added cost of doing the check when > > the average case doesn't have this problem but we should probably just > > eat it. In practice I think it's one line of code per action function > > (on update set default already does it). > > Already does what? I see nothing in there that would override either > triggers or rules... It's not for overriding the triggers or rules, but instead checking that the post action state is valid (by running the no action code which makes sure that either another row now has the pk value or that there are no longer any matching rows).
On Fri, 6 Aug 2004, Stephan Szabo wrote: > On Fri, 6 Aug 2004, Tom Lane wrote: > > > Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > > > It also looks like before triggers returning NULL can break them. > > > > Or a BEFORE trigger that overrides the attempted field update. > > > > > I think we'd been worried about the added cost of doing the check when > > > the average case doesn't have this problem but we should probably just > > > eat it. In practice I think it's one line of code per action function > > > (on update set default already does it). > > > > Already does what? I see nothing in there that would override either > > triggers or rules... > > It's not for overriding the triggers or rules, but instead checking that > the post action state is valid (by running the no action code which > makes sure that either another row now has the pk value or that there are > no longer any matching rows). To make that clearer, that another row now has the old pk value, or that there are no longer any matching rows to the old pk value.
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > On Fri, 6 Aug 2004, Tom Lane wrote: >> Already does what? I see nothing in there that would override either >> triggers or rules... > It's not for overriding the triggers or rules, but instead checking that > the post action state is valid (by running the no action code which > makes sure that either another row now has the pk value or that there are > no longer any matching rows). Oh, I see. Seems an awfully expensive solution though :-( regards, tom lane
On Fri, 6 Aug 2004, Tom Lane wrote: > Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > > On Fri, 6 Aug 2004, Tom Lane wrote: > >> Already does what? I see nothing in there that would override either > >> triggers or rules... > > > It's not for overriding the triggers or rules, but instead checking that > > the post action state is valid (by running the no action code which > > makes sure that either another row now has the pk value or that there are > > no longer any matching rows). > > Oh, I see. Seems an awfully expensive solution though :-( Yeah. That's the major downside. We could potentially do it only if there are any instead rules or before triggers on the action we want to do (either delete or update). I'm not sure if/how expensive that would be to determine. I think we could potentially check rules when the query is planned (since presumably, those are the only rules that matter), but I don't think that'd work for the triggers.
On 8/6/2004 1:23 PM, Tom Lane wrote: > Stephan Szabo <sszabo@megazone.bigpanda.com> writes: >> On Fri, 6 Aug 2004, Tom Lane wrote: >>> Already does what? I see nothing in there that would override either >>> triggers or rules... > >> It's not for overriding the triggers or rules, but instead checking that >> the post action state is valid (by running the no action code which >> makes sure that either another row now has the pk value or that there are >> no longer any matching rows). > > Oh, I see. Seems an awfully expensive solution though :-( IMHO it is one of the cases that are on the line of "doctor, when I ... then don't do it". As you said, there is no perfect solution. Triggers and rules can conflict in several ways, but we don't want to sacrifice one for making the other failsafe. Jan > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
On Sun, 8 Aug 2004, Jan Wieck wrote: > On 8/6/2004 1:23 PM, Tom Lane wrote: > > Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > >> On Fri, 6 Aug 2004, Tom Lane wrote: > >>> Already does what? I see nothing in there that would override either > >>> triggers or rules... > > > >> It's not for overriding the triggers or rules, but instead checking that > >> the post action state is valid (by running the no action code which > >> makes sure that either another row now has the pk value or that there are > >> no longer any matching rows). > > > > Oh, I see. Seems an awfully expensive solution though :-( > > IMHO it is one of the cases that are on the line of "doctor, when I ... > then don't do it". As you said, there is no perfect solution. Triggers > and rules can conflict in several ways, but we don't want to sacrifice > one for making the other failsafe. True, but I don't think we're sacrificing one for the other. We'd be sacrificing speed AFAICS. As I see it we've got the following: a) Do no code changes. The constraints can be fooled by some situations, make sure that it's documented and point people to the documentation. Upsides: No code changes, no further slowdown of functioning of constraint. Downside: Constraint can be violated b) Always run the no action check code. We always run the no action code after the action to check to make sure that the dependent rows are no longer there (or are supported by some other value). This is the theoretical model of the constraint in the spec, I believe. Upsides: Constraint should be theoretically difficult to break with this form of violation, barring bugs. Minor code change. Downsides: Everyone pays the (non-trivial) cost to do the check to fix this case. The extra check also potentially grabs yet more locks. c) Run the no action check code when we think there's some chance of this situation occuring. The "some chance" could be always in which case this is the same as b, if there are any instead rules or before triggers on the acted upon table for the action being run (update or delete), or something more complicated. Upsides: Compared to a, we would get a constraint that's harder to break. Compared to b, we hopefully lessen the cost to people not using the combination. Downsides: More involved code changes and testing to make sure it's right. We still add a cost to everyone to check the state. The constraint now "acts differently" for people using instead rules or before triggers which means it's an additional variable to deal with when debugging problems.