Thread: Foreign key quandries
Going through the issues in doing dirty reads in foreign keys I've come up with a few cases that I'm fairly uncertain about how to handle with regards to deadlocks and figured I should ask for advice because I think I'm missing something painfully obvious, but don't get large enough blocks of time to think about it to figure out what it is. I'd thought maybe it'd be enough to say which type of thing on which constraint and use that to basically say that we don't need to wait on a transaction that's waiting on us due to a modification to the other table, but AFAICS that lets through a bad case:T1: insert into fk values (2);T2: delete from pk where key=3;T2: delete from pk where key=2;T1:insert into fk values (3); If T1 doesn't wait in this case, you can get into a case where a bad row is inserted into fk if you then have:T1: delete from fk where key=2;T1: commit; Now there's no row to make the second delete fail but transaction 2 still can't commit due to the fk row with key 3. I'd then thought of doing something based on what row/value transaction 2 was waiting on, but that has problems. Given a foreign key with no referential actions and a sequence like: Transaction 1 inserts into the foreign key table a row with a referencing key of 2.Transaction 1 checks the foreign keyTransaction2 deletes the primary key rows having keys 2 and 3Transaction 1 inserts another row into the foreign key tablewith a referencing key of 2.Transactions 1 and 2 start checking the foreign key. AFAICS, transaction 2 needs to wait since there's already a row it can see in the foreign key table that's not yet committed (so it doesn't know if the delete works or not). We can tell transaction 1 that it doesn't need to wait on transaction 2 because transaction 1 is inserting a value that transaction 2 will see in its check, thus we're saved from the first case. However, this has the potential to deadlock if we had for example, inserted a foreign key table row of 3 rather than 2 as the second insert in transaction 1 and the delete check for 2 went first. If we knew that it was also going to be checking the 3 rows, we'd be safe, but then we've got to keep that information in some way that's visible to other transactions AFAICS. And, if the checks were done in the order delete check for 3, delete check for 2(t2 blocks), insert check for 3, we'd be back in the state of the first example. :(
I'm not sure I understand the question. The case as described simply has to deadlock because your approaching the same values with conflicting tasks from opposite directions. Detect it, kill one, and continue on. Same problem as: T1 insert into pk values (1); T2 insert into pk values (2); T2 insert into pk values (1); T1 insert into pk values (2); It's up to the application(s) to minimize risk by approaching values in a common order for a given set of work. On Fri, 2003-02-28 at 16:58, Stephan Szabo wrote: > Going through the issues in doing dirty reads in foreign keys I've come up > with a few cases that I'm fairly uncertain about how to handle with > regards to deadlocks and figured I should ask for advice because I think > I'm missing something painfully obvious, but don't get large enough blocks > of time to think about it to figure out what it is. > > > I'd thought maybe it'd be enough to say which type of > thing on which constraint and use that to basically say that > we don't need to wait on a transaction that's waiting on us > due to a modification to the other table, but AFAICS > that lets through a bad case: > T1: insert into fk values (2); > T2: delete from pk where key=3; > T2: delete from pk where key=2; > T1: insert into fk values (3); > If T1 doesn't wait in this case, you can get into a case where > a bad row is inserted into fk if you then have: > T1: delete from fk where key=2; > T1: commit; > Now there's no row to make the second delete fail but > transaction 2 still can't commit due to the fk row with key 3. > > > > I'd then thought of doing something based on what row/value > transaction 2 was waiting on, but that has problems. > Given a foreign key with no referential actions and a > sequence like: > > Transaction 1 inserts into the foreign key table a row > with a referencing key of 2. > Transaction 1 checks the foreign key > Transaction 2 deletes the primary key rows having keys > 2 and 3 > Transaction 1 inserts another row into the foreign key > table with a referencing key of 2. > Transactions 1 and 2 start checking the foreign key. > > AFAICS, transaction 2 needs to wait since there's already a > row it can see in the foreign key table that's not yet committed > (so it doesn't know if the delete works or not). We can tell > transaction 1 that it doesn't need to wait on transaction 2 > because transaction 1 is inserting a value that transaction 2 > will see in its check, thus we're saved from the first case. > > However, this has the potential to deadlock if we had for example, > inserted a foreign key table row of 3 rather than 2 as the second > insert in transaction 1 and the delete check for 2 went first. If > we knew that it was also going to be checking the 3 rows, we'd be > safe, but then we've got to keep that information in some way that's > visible to other transactions AFAICS. And, if the checks were > done in the order delete check for 3, delete check for 2(t2 blocks), > insert check for 3, we'd be back in the state of the first example. -- Rod Taylor <rbt@rbt.ca> PGP Key: http://www.rbt.ca/rbtpub.asc
On 1 Mar 2003, Rod Taylor wrote: > I'm not sure I understand the question. The case as described simply has > to deadlock because your approaching the same values with conflicting > tasks from opposite directions. Well, the problem is that two cases (one of which I think deadlock is unnecessary in) are very similar. As I see it: T1: insert 2 T2: delete 2 T1: insert 2/update 2 (non-key fields)shouldn't need to deadlock. T1: insert 2 T2: delete 2 & 3 * delete 2's check blocks before checking 3 T1: insert 3should not need to deadlock I think T1: insert 2 T2: delete 3 T2: delete 2(or delete 2 & 3 where 3's check goes then 2's check blocks) T1: insert 3does need to deadlock In the second case, both deletes have happened so the row the insert wants to check against is marked for deletion, but since it's going to be checking for the 3 row in the future, and will error if T1 commits, I think it's safe for it to go through. I'm trying to find a way to differentiate the second and third case given that I'm running inside a constraint check on insert 3. It'd be easy if transaction 1 could see that it's going to be checking for the 3 row later, but I think that'd involve keeping around alot of information about the rows that are affected in some shared way which could get really large.
On Sat, 2003-03-01 at 00:44, Stephan Szabo wrote: > On 1 Mar 2003, Rod Taylor wrote: > > > I'm not sure I understand the question. The case as described simply has > > to deadlock because your approaching the same values with conflicting > > tasks from opposite directions. > > Well, the problem is that two cases (one of which I think deadlock is > unnecessary in) are very similar. I see. Now I see what your asking about. > As I see it: > > T1: insert 2 > T2: delete 2 > T1: insert 2/update 2 (non-key fields) > shouldn't need to deadlock. > T1: insert 2 > T2: delete 2 & 3 > * delete 2's check blocks before > checking 3 > T1: insert 3 > should not need to deadlock I think > T1: insert 2 > T2: delete 3 > T2: delete 2 > (or delete 2 & 3 where 3's check goes then 2's check blocks) > T1: insert 3 > does need to deadlock > > In the second case, both deletes have happened so the row the insert wants > to check against is marked for deletion, but since it's going to be > checking for the 3 row in the future, and will error if T1 commits, I > think it's safe for it to go through. > > I'm trying to find a way to differentiate the second and third case given > that I'm running inside a constraint check on insert 3. It'd be easy if > transaction 1 could see that it's going to be checking for the 3 row > later, but I think that'd involve keeping around alot of information about > the rows that are affected in some shared way which could get really > large. Isn't the differentiation going to happen automatically? In case 2: T1: create fk tuple (uncommitted) -> value 2 T2: delete pk tuple value 2 T2: scan through fk table, find uncommitted tuple value 2 ... sleep T2: scan through fk table, find uncommitted tuple value 2 ... sleep T2: scan through fk table, find uncommitted tuple value 2 ... sleep T1: create fk tuple (uncommitted) -> value 3 T1: commit T2: scan through fk table, find tuple value 2 ... its committed T2: <run cascade procedure on tuples found in fk table for value 2> T2: continue scan through fk table, find tuple value 3 ... its committed T2: <run cascade procedure on tuples found in fk table for value 3> T2: All is well -- return control to user. In case 3: T1: create fk tuple (uncommitted) -> value 2 T2: delete pk tuple value 3 T2: scan through fk table, value 3 not found T2: delete pk tuple value 2 T2: scan through fk table, find uncommitted tuple value 2 ... sleep T2: scan through fk table, find uncommitted tuple value 2 ... sleep T2: scan through fk table, find uncommitted tuple value 2 ... sleep T1: create fk value 3 T1: scan through pk table, find uncommitted tuple value 3 ... sleep T2: scan through fk table, find uncommitted tuple value 2 ... sleep T1: scan through pk table, find uncommitted tuple value 3 ... sleep T2: scan through fk table, find uncommitted tuple value 2 ... sleep -- Rod Taylor <rbt@rbt.ca> PGP Key: http://www.rbt.ca/rbtpub.asc
Gah, hit wrong key combination and the email sent early. Anyway, after that 'sleep' mess at the bottom is: T1 or T2: Sleeping too long -- lets run deadlock detection code T1 or T2: Kill off random participant of deadlock. The other participant is then allowed to continue their work. On Sat, 2003-03-01 at 02:03, Rod Taylor wrote: > On Sat, 2003-03-01 at 00:44, Stephan Szabo wrote: > > On 1 Mar 2003, Rod Taylor wrote: > > > > > I'm not sure I understand the question. The case as described simply has > > > to deadlock because your approaching the same values with conflicting > > > tasks from opposite directions. > > > > Well, the problem is that two cases (one of which I think deadlock is > > unnecessary in) are very similar. > > I see. Now I see what your asking about. > > > As I see it: > > > > T1: insert 2 > > T2: delete 2 > > T1: insert 2/update 2 (non-key fields) > > shouldn't need to deadlock. > > > T1: insert 2 > > T2: delete 2 & 3 > > * delete 2's check blocks before > > checking 3 > > T1: insert 3 > > should not need to deadlock I think > > > T1: insert 2 > > T2: delete 3 > > T2: delete 2 > > (or delete 2 & 3 where 3's check goes then 2's check blocks) > > T1: insert 3 > > does need to deadlock > > > > In the second case, both deletes have happened so the row the insert wants > > to check against is marked for deletion, but since it's going to be > > checking for the 3 row in the future, and will error if T1 commits, I > > think it's safe for it to go through. > > > > I'm trying to find a way to differentiate the second and third case given > > that I'm running inside a constraint check on insert 3. It'd be easy if > > transaction 1 could see that it's going to be checking for the 3 row > > later, but I think that'd involve keeping around alot of information about > > the rows that are affected in some shared way which could get really > > large. > > Isn't the differentiation going to happen automatically? > > In case 2: > > T1: create fk tuple (uncommitted) -> value 2 > T2: delete pk tuple value 2 > T2: scan through fk table, find uncommitted tuple value 2 ... sleep > T2: scan through fk table, find uncommitted tuple value 2 ... sleep > T2: scan through fk table, find uncommitted tuple value 2 ... sleep > T1: create fk tuple (uncommitted) -> value 3 > T1: commit > T2: scan through fk table, find tuple value 2 ... its committed > T2: <run cascade procedure on tuples found in fk table for value 2> > T2: continue scan through fk table, find tuple value 3 ... its committed > T2: <run cascade procedure on tuples found in fk table for value 3> > T2: All is well -- return control to user. > > In case 3: > T1: create fk tuple (uncommitted) -> value 2 > T2: delete pk tuple value 3 > T2: scan through fk table, value 3 not found > T2: delete pk tuple value 2 > T2: scan through fk table, find uncommitted tuple value 2 ... sleep > T2: scan through fk table, find uncommitted tuple value 2 ... sleep > T2: scan through fk table, find uncommitted tuple value 2 ... sleep > T1: create fk value 3 > T1: scan through pk table, find uncommitted tuple value 3 ... sleep > T2: scan through fk table, find uncommitted tuple value 2 ... sleep > T1: scan through pk table, find uncommitted tuple value 3 ... sleep > T2: scan through fk table, find uncommitted tuple value 2 ... sleep -- Rod Taylor <rbt@rbt.ca> PGP Key: http://www.rbt.ca/rbtpub.asc
On 1 Mar 2003, Rod Taylor wrote: > Gah, hit wrong key combination and the email sent early. > > Anyway, after that 'sleep' mess at the bottom is: > T1 or T2: Sleeping too long -- lets run deadlock detection code > T1 or T2: Kill off random participant of deadlock. > > The other participant is then allowed to continue their work. > > > Isn't the differentiation going to happen automatically? The problem is that in case 2, both tuples 2 and 3 are already removed before either foreign key check runs, so when T1 adds the value 3 row and checks the pk table it will find that its pk row has been modified. If the ordering went, delete 2 - check 2, delete 3 - check 3, this wouldn't be a problem, but then that'd fail in a spec-non-compliant way if row 2 refered to row 3. > > In case 2: > > > > T1: create fk tuple (uncommitted) -> value 2 * T1: scan through pk table, no problems > > T2: delete pk tuple value 2 * T2: delete pk tuple value 3 > > T2: scan through fk table, find uncommitted tuple value 2 ... sleep > > T2: scan through fk table, find uncommitted tuple value 2 ... sleep > > T2: scan through fk table, find uncommitted tuple value 2 ... sleep > > T1: create fk tuple (uncommitted) -> value 3 * T1: scan through pk table, find modified tuple value 3 ...
On Sat, 2003-03-01 at 02:38, Stephan Szabo wrote: > On 1 Mar 2003, Rod Taylor wrote: > > > Gah, hit wrong key combination and the email sent early. > > > > Anyway, after that 'sleep' mess at the bottom is: > > T1 or T2: Sleeping too long -- lets run deadlock detection code > > T1 or T2: Kill off random participant of deadlock. > > > > The other participant is then allowed to continue their work. > > > > > Isn't the differentiation going to happen automatically? > > The problem is that in case 2, both tuples 2 and 3 are already removed > before either foreign key check runs, so when T1 adds the value 3 > row and checks the pk table it will find that its pk row has been > modified. If the ordering went, delete 2 - check 2, delete 3 - check > 3, this wouldn't be a problem, but then that'd fail in a > spec-non-compliant way if row 2 refered to row 3. The foreign key cascade is explicitly deferred to the end of the statement via the trigger queue, but there is no reason that the foreign key code can't run immediately for each tuple removed. For the defer check to commit case, have the code add the element to the trigger queue by making a fake trigger wrapper -- that isn't really a trigger (simplest method probably) and it should and will deadlock. > > > In case 2: > > > > > > T1: create fk tuple (uncommitted) -> value 2 > * T1: scan through pk table, no problems > > > T2: delete pk tuple value 2 > * T2: delete pk tuple value 3 > > > T2: scan through fk table, find uncommitted tuple value 2 ... sleep > > > T2: scan through fk table, find uncommitted tuple value 2 ... sleep > > > T2: scan through fk table, find uncommitted tuple value 2 ... sleep > > > T1: create fk tuple (uncommitted) -> value 3 > * T1: scan through pk table, find modified tuple value 3 ... -- Rod Taylor <rbt@rbt.ca> PGP Key: http://www.rbt.ca/rbtpub.asc
On 1 Mar 2003, Rod Taylor wrote: > On Sat, 2003-03-01 at 02:38, Stephan Szabo wrote: > > On 1 Mar 2003, Rod Taylor wrote: > > > > > Gah, hit wrong key combination and the email sent early. > > > > > > Anyway, after that 'sleep' mess at the bottom is: > > > T1 or T2: Sleeping too long -- lets run deadlock detection code > > > T1 or T2: Kill off random participant of deadlock. > > > > > > The other participant is then allowed to continue their work. > > > > > > > Isn't the differentiation going to happen automatically? > > > > The problem is that in case 2, both tuples 2 and 3 are already removed > > before either foreign key check runs, so when T1 adds the value 3 > > row and checks the pk table it will find that its pk row has been > > modified. If the ordering went, delete 2 - check 2, delete 3 - check > > 3, this wouldn't be a problem, but then that'd fail in a > > spec-non-compliant way if row 2 refered to row 3. > > The foreign key cascade is explicitly deferred to the end of the > statement via the trigger queue, but there is no reason that the foreign > key code can't run immediately for each tuple removed. The problem happens when you have two rows in a table that refer to each other with a foreign key to the same table. If both are deleted, it must succeed, but it won't if you do the check in between the deletes unless I'm missing something. It's effectively the same problem as we currently have with the unique constraint (premature checking of the constraint). AFAICS, you need to defer to end of statement to get the correct semantics out of the checks, or you need to have a state where the rows are sort of pseudo-deleted/updated which could be better.
On Sat, 2003-03-01 at 11:06, Stephan Szabo wrote: > On 1 Mar 2003, Rod Taylor wrote: > > > On Sat, 2003-03-01 at 02:38, Stephan Szabo wrote: > > > On 1 Mar 2003, Rod Taylor wrote: > > > > > > > Gah, hit wrong key combination and the email sent early. > > > > > > > > Anyway, after that 'sleep' mess at the bottom is: > > > > T1 or T2: Sleeping too long -- lets run deadlock detection code > > > > T1 or T2: Kill off random participant of deadlock. > > > > > > > > The other participant is then allowed to continue their work. > > > > > > > > > Isn't the differentiation going to happen automatically? > > > > > > The problem is that in case 2, both tuples 2 and 3 are already removed > > > before either foreign key check runs, so when T1 adds the value 3 > > > row and checks the pk table it will find that its pk row has been > > > modified. If the ordering went, delete 2 - check 2, delete 3 - check > > > 3, this wouldn't be a problem, but then that'd fail in a > > > spec-non-compliant way if row 2 refered to row 3. > > > > The foreign key cascade is explicitly deferred to the end of the > > statement via the trigger queue, but there is no reason that the foreign > > key code can't run immediately for each tuple removed. > > The problem happens when you have two rows in a table that refer to each > other with a foreign key to the same table. If both are deleted, it must > succeed, but it won't if you do the check in between the deletes unless > I'm missing something. It's effectively the same problem as we currently In the general case, you're not going to have interdependent tables. Store a list of tables touched (changed -- not just read) by keys. The second fkey to touch a table is deferred until the end of the statement and will have semantics required for that (deadlock in case 2). So consider a cascade delete from pk tab1 to fk tab2, and pk tab2 to fk tab1. tabl -> tab2 tab2 -> tab1 Delete from pk tab1 value in (2, 3) - Seek tab1, find tuple value 2 - Record tab1 as unsafe - tab2 fkey doesn't find tab2 on dirty list -- starts work - Seek tab2, find tuple value 2, cascade delete - Record tab2 as dirty - tab1 fkey finds tab1 on dirty list -- defers work - tab2 fkey finishs scan on tab2 (no more tuples of value 2) - Seek on tab1 continues, find tuple value 3 - Record tab1 as dirty - tab2 fkey finds tab2 on dirty list but it was done by the tab2 fkey (myself) -- so we assume this is a different value (keys are unique) and safe to deal with - seek tab2, find tuple value 3, cascade delete - Record tab2 as dirty - tab1 fkey finds tab1 on dirty list -- defers work - tab2 fkey finishs scan on tab2 (no more tuples of value 3) - seek on tab1 continues -- no more tuples of value 2 or 3 - command counter increment for finished 'delete' on tab1 - deferred (tab2 pk -> tab1 fkey) actions begin execution. - ... This *will* deadlock in case 2 as described earlier. But it's no worse than what we have today -- in fact it's slightly better as one of the fkeys is fast and clean. But this is not a common scenario, as it represents a parent <-> child relationship split between two or more tables. Tuples in two different tables cross keyed are already deferred for the insert to succeed -- as such will deadlock by default since checks are deferred until commit. Whats more interesting is a parent <-> child relationship on a single table. What you really want to hint at as being dirty is a given pk value within a table -- but that becomes difficult when multiple foreign keys on different unique keys are involved. I think the tree table (parent / child are in same table, making a tree) also needs to defer it's checks and be susceptible to deadlocks when involved in a 'case 2' scenario. The accounting required to check for loops won't be worth it if there are a large number of values -- especially when multiple foreign keys on different unique keys are involved. Anyway, the 'checking the constraint too early' problem only exists when looking at the same tuples and there is a temporary problem. The solution for the value = value + 1 case in a unique constraint seems to be to recheck at the end of the statement if there is still a problem. If there wasn't an issue during the initial index insert, then everything will be clean later as well. Fkeys have a slight advantage as they tend to change the table being scanned, so more work is safe to be done immediately -- but tracking what has and hasn't been touched in a given CommandCounter will be important. > have with the unique constraint (premature checking of the constraint). > AFAICS, you need to defer to end of statement to get the correct semantics > out of the checks, or you need to have a state where the rows are sort of > pseudo-deleted/updated which could be better. > -- Rod Taylor <rbt@rbt.ca> PGP Key: http://www.rbt.ca/rbtpub.asc
Seems like this sort of approach is going to introduce a huge amount of very fragile mechanism, and probably a wide variety of hard-to-reproduce bugs :-(. ISTM the only thing we really need to do to address the complaints about FKs is to invent some kind of sharable row-level lock. Then transactions adding references to an FK table would take out shared instead of exclusive locks on PK rows; nothing else changes. Of course, it's easy to say "sharable row-level lock" and not so easy to come up with an implementation that has decent performance. But when it was done, I think we might have some faith that it works. I'm going to have great difficulty putting any faith at all in an FK implementation that relies on dirty reads. regards, tom lane
On Sat, 1 Mar 2003, Tom Lane wrote: > Seems like this sort of approach is going to introduce a huge amount of > very fragile mechanism, and probably a wide variety of hard-to-reproduce > bugs :-(. Possibly, that's why I brought it up here because more minds to find problems are better. > ISTM the only thing we really need to do to address the complaints about > FKs is to invent some kind of sharable row-level lock. Then > transactions adding references to an FK table would take out shared > instead of exclusive locks on PK rows; nothing else changes. That gets rid of most of the problems. There are problems with read locks locking more than they may need to (insert into fk followed by another transaction doing a modification of the referenced row that doesn't affect the key). > Of course, it's easy to say "sharable row-level lock" and not so easy to > come up with an implementation that has decent performance. But when it > was done, I think we might have some faith that it works. I'm going to > have great difficulty putting any faith at all in an FK implementation > that relies on dirty reads. As a thought exercise, what all is involved in making sharable row-level locks... No matter what is done for foreign keys, they're still useful to constraints written by users, and even using them now in foreign keys doesn't prevent a future change if the other issues are worked out.
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > On Sat, 1 Mar 2003, Tom Lane wrote: >> ISTM the only thing we really need to do to address the complaints about >> FKs is to invent some kind of sharable row-level lock. > That gets rid of most of the problems. There are problems with read locks > locking more than they may need to (insert into fk followed by another > transaction doing a modification of the referenced row that doesn't affect > the key). Agreed, it's not a 100% solution ... but it would be way better than what we have. (Hm, I wonder whether a read lock could specify that only certain columns are locked? Nah, probably too much trouble.) >> Of course, it's easy to say "sharable row-level lock" and not so easy to >> come up with an implementation that has decent performance. > As a thought exercise, what all is involved in making sharable row-level > locks... ISTM the problem is where to keep the state. You can't keep track of an indefinite number of shared-lock holders in the on-row header ... but if you try to keep the state in shared memory, you can't keep track of a very large number of locked rows, either. Perhaps some scheme could be implemented to keep lock state in memory but spill to disk when there get to be too many locked rows. I don't see how to make that work efficiently, though. We talked about this a week or two back, and someone (was it Rod?) asked essentially "do we *need* any state --- would a marker on the row that it's share-locked be enough?". I suppose we could use an infomask bit to indicate share-locking and overload xmax as a count of the number of lockers. Then each transaction with read locks would have to have local state remembering every row it's read-locked (this is much less bad than shared state, since local RAM is more easily expansible), and at either transaction commit or abort you'd have to run around and decrement those counts. (But when you decrement a count to zero, what next? You still need to figure out who's blocked on that row and release them.) This could perhaps be made to work with reasonable efficiency, but it makes me nervous. If someone crashes while holding read locks, how do you recover? Seems like you need to scan the *entire database* during restart to zero out shared-lock counts. In general we have stayed away from the notion of requiring transactions to do end-of-transaction cleanup on disk, and I think that is a good design choice not to be tossed away lightly. So I don't see how to do that efficiently. But still, it seems a more tractable problem than trying to prove a dirty-read implementation correct. regards, tom lane