Thread: Fix FK deadlock, but no magic please
I have been using PostgreSQL for a year now and have seen a lot of people run into the same problem that I have, deadlock caused by foreign keys. For me, this problem is to the point that if my customer grows much more they won't be able to use PostgreSQL any more. That not withstanding, I am concerned about the fix that has been proposed by at least one developer. The terms "dirty read" and "magic" came up during the description of the fix. The term "dirty read" is a dirty phrase when your using proper transactioning. The term "magic" is not what you want to hear when the database is supposed to be the rock that everything else depends on. Other databases have tackled this issue without the above terms. From what I can tell, there is a standard and non-standard way this can be fixed in PostgreSQL. The standard way would be to implement FK as a part of the schema and create the hooks to allow read locks on records by FK only. The non-standard way would be to expand the SQL interface with a non standard FOR READ statement (or something similar) and then continue to use triggers. Only the developers can decide which method will be easier, work better, or is more in line with the overall goals of PostgreSQL. I've even tried to get an estimate from pgsql.com for this issue, but they just ignored me. I figure that the alternative is to get Oracle which has a price tag equivalent to at least 20 man weeks of effort (minimum). I'd much rather see this kind of money go toward making PostgreSQL better. I don't know if I can actually get the money, but I would at least like to know what to shoot for. Maybe I can get multiple customers to split the fee.
On Thu, 16 Jan 2003, Jon Swinth wrote: > The terms "dirty read" and "magic" came up during the description of the fix. > The term "dirty read" is a dirty phrase when your using proper > transactioning. The term "magic" is not what you want to hear when the > database is supposed to be the rock that everything else depends on. Well, I used magic flippently as short for, I haven't worked out the details yet because I'm in the sit down design and try things out stage. I'll apologize for using the term then. As for dirty reads, given that part of the part that was referred to as magic involves doing waits on transactions so that it's very much like the current row locks except with foreign key based knowledge embedded so as to help avoid deadlocks. Yes, it's seeing rows that haven't been committed, but it's mostly seeing them to find out what transactions it needs to wait on. > Other databases have tackled this issue without the above terms. From what I > can tell, there is a standard and non-standard way this can be fixed in > PostgreSQL. The standard way would be to implement FK as a part of the > schema and create the hooks to allow read locks on records by FK only. The > non-standard way would be to expand the SQL interface with a non standard FOR > READ statement (or something similar) and then continue to use triggers. > Only the developers can decide which method will be easier, work better, or > is more in line with the overall goals of PostgreSQL. Record read locks are not quite as good a solution as dirty reads from a performance standpoint, which is why we've been aiming that direction first. You'd need column locks pretty much to get equivalent behavior afaict. The issue is that with record read locks, you prevent updates to rows that do not affect the key values. > I've even tried to get an estimate from pgsql.com for this issue, but they > just ignored me. I figure that the alternative is to get Oracle which has a > price tag equivalent to at least 20 man weeks of effort (minimum). I'd much > rather see this kind of money go toward making PostgreSQL better. I don't > know if I can actually get the money, but I would at least like to know what > to shoot for. Maybe I can get multiple customers to split the fee. I personally can't (I have a full time job that has nothing to do with PostgreSQL which is part of why this hasn't gotten done yet), but I certainly wouldn't want that to stop someone who does have the time and could take the money from doing it instead. :)
Thanks Stephan for you quick response. On Thursday 16 January 2003 10:55 am, Stephan Szabo wrote: > On Thu, 16 Jan 2003, Jon Swinth wrote: > > The terms "dirty read" and "magic" came up during the description of the > > fix. The term "dirty read" is a dirty phrase when your using proper > > transactioning. The term "magic" is not what you want to hear when the > > database is supposed to be the rock that everything else depends on. > > Well, I used magic flippently as short for, I haven't worked out the > details yet because I'm in the sit down design and try things out stage. > I'll apologize for using the term then. > > As for dirty reads, given that part of the part that was referred to as > magic involves doing waits on transactions so that it's very much like > the current row locks except with foreign key based knowledge embedded > so as to help avoid deadlocks. Yes, it's seeing rows that haven't > been committed, but it's mostly seeing them to find out what transactions > it needs to wait on. > Fair enough. > > Other databases have tackled this issue without the above terms. From > > what I can tell, there is a standard and non-standard way this can be > > fixed in PostgreSQL. The standard way would be to implement FK as a part > > of the schema and create the hooks to allow read locks on records by FK > > only. The non-standard way would be to expand the SQL interface with a > > non standard FOR READ statement (or something similar) and then continue > > to use triggers. Only the developers can decide which method will be > > easier, work better, or is more in line with the overall goals of > > PostgreSQL. > > Record read locks are not quite as good a solution as dirty reads from a > performance standpoint, which is why we've been aiming that direction > first. You'd need column locks pretty much to get equivalent behavior > afaict. The issue is that with record read locks, you prevent updates to > rows that do not affect the key values. > From the standpoint of expected behaviour, I don't think you have any choice but to use record read locks. When someone does a write lock on a FK table record they have the expectation that they can do anything they want with the record including changing the PK or deleting the record. That is as long as there were no referencing records before the write lock was obtained. This means that someone else shouldn't be able to insert a record referencing while the FK table record has a write lock. Not being able to get a read lock when someone else has a write lock is expected behaviour. A single record should be able to have one write lock or multiple read locks, but not both. If I have a program that checks for referencing records, deletes them if found, and obtains a write lock on the FK record then I should reasonably assume that I can change anything about that record including delete it. If you don't prevent the write lock when a read lock is there then the person obtaining the write lock to very well get errors that they wouldn't normally expect. > > I've even tried to get an estimate from pgsql.com for this issue, but > > they just ignored me. I figure that the alternative is to get Oracle > > which has a price tag equivalent to at least 20 man weeks of effort > > (minimum). I'd much rather see this kind of money go toward making > > PostgreSQL better. I don't know if I can actually get the money, but I > > would at least like to know what to shoot for. Maybe I can get multiple > > customers to split the fee. > > I personally can't (I have a full time job that has nothing to do with > PostgreSQL which is part of why this hasn't gotten done yet), but I > certainly wouldn't want that to stop someone who does have the time and > could take the money from doing it instead. :)
On Thu, 16 Jan 2003, Jon Swinth wrote: > > Record read locks are not quite as good a solution as dirty reads from a > > performance standpoint, which is why we've been aiming that direction > > first. You'd need column locks pretty much to get equivalent behavior > > afaict. The issue is that with record read locks, you prevent updates to > > rows that do not affect the key values. > > > > >From the standpoint of expected behaviour, I don't think you have any choice > but to use record read locks. When someone does a write lock on a FK table > record they have the expectation that they can do anything they want with the > record including changing the PK or deleting the record. That is as long as > there were no referencing records before the write lock was obtained. This > means that someone else shouldn't be able to insert a record referencing > while the FK table record has a write lock. > > Not being able to get a read lock when someone else has a write lock is > expected behaviour. A single record should be able to have one write lock or > multiple read locks, but not both. If I have a program that checks for > referencing records, deletes them if found, and obtains a write lock on the > FK record then I should reasonably assume that I can change anything about > that record including delete it. If you don't prevent the write lock when a > read lock is there then the person obtaining the write lock to very well get > errors that they wouldn't normally expect. Well, for example (assuming two pk rows with 1 and 2 as keys) T1: begin; T1: insert into fk values (1); T2: begin; T2: insert into fk values (2); T1: update pk set nonkey='a' where key=2; T2: update pk set nonkey='b' where key=1; Should this deadlock? I'd say no, barring triggers/rules, because those two updates shouldn't affect the constraint and so whenever possible the constraint's behavior shouldn't interfere. But the obvious record read lock solution would afaics. The inserts would grab a read lock on the two pk rows, neither transaction would be able to get the write lock it wants and it'd deadlock. Or, what about T1: begin; T1: insert into fk values (1); T2: begin; T2: insert into fk values (1); T1: update pk set nonkey='a' where key=1; T2: update pk set nonkey='a' where key=1; Without those inserts this would be okay, one would wait for the other and everything would be okay, but with them I think this deadlocks as well. Maybe I'm misunderstanding the scheme you'd expect the foreign keys to use with the read locks.
I am a little confused on your examples On Thursday 16 January 2003 12:00 pm, Stephan Szabo wrote: > > Well, for example (assuming two pk rows with 1 and 2 as keys) > > T1: begin; > T1: insert into fk values (1); > T2: begin; > T2: insert into fk values (2); > T1: update pk set nonkey='a' where key=2; > T2: update pk set nonkey='b' where key=1; > Maybe I don't understand this example. If T2 inserted fk 2, how did T1 manage to update a record that references it before T2 committed? For T1, fk 2 doesn't exist yet so there couldn't be any records referencing it. > Should this deadlock? I'd say no, barring triggers/rules, because > those two updates shouldn't affect the constraint and so whenever > possible the constraint's behavior shouldn't interfere. But the > obvious record read lock solution would afaics. The inserts would grab a > read lock on the two pk rows, neither transaction would be able to > get the write lock it wants and it'd deadlock. Or, what about > > T1: begin; > T1: insert into fk values (1); > T2: begin; > T2: insert into fk values (1); > T1: update pk set nonkey='a' where key=1; > T2: update pk set nonkey='a' where key=1; > > Without those inserts this would be okay, one would wait for the other > and everything would be okay, but with them I think this deadlocks as > well. Actually, T2 should block until T1 is committed or rolls back. The system cannot determine if the T2 fk should succeed or fail until T1 is committed or rolled back. Either that or I am mis-understanding your example again. Maybe you should use table names. > > Maybe I'm misunderstanding the scheme you'd expect the foreign keys to use > with the read locks. The concept of a read lock is to make sure the record doesn't change until the end of the transaction. Selects on the record are allowed. Other read locks are allowed. A write look should fail or wait until all read locks are released. A read lock should fail or wait for a write lock. What I was trying to point out is that once a write lock is granted, the grantee should be able to do anything they want with the record that is legal in that transaction's view of the DB. Otherwise, updates and deletes may fail without any way for the code to determine why (it can't select to find the offending record). Here is an example: TEST_TYPE table has a record with PK of 1 TEST_REC table has record (5) that references TEST_TYPE (1) T1: wants to change all references to (1) so locks TEST_TYPE (1) T1: inserts TEST_TYPE (2) T1: updates TEST_REC (5) to reference TEST_TYPE (2) T2: inserts TEST_REC (6) referencing TEST_TYPE (1) I am saying that T2 should error or wait but lets say that you allow it T1: attempts to delete TEST_TYPE (1) At this point a SQL error would occur because of TEST_REC (6) T1: doesn't know what to do because it still can't see or change TEST_REC (6) no matter how good the code is on catching SQL exceptions and handling them, T1 can't complete. By the way, you can change the T2 entry to the very top of the list with the same effect. One of the great things about transactions and locking is that you can write code that, short of a DB failure, can handle every situation. Your proposing doing away with that.
On Thu, 16 Jan 2003, Jon Swinth wrote: > I am a little confused on your examples > > On Thursday 16 January 2003 12:00 pm, Stephan Szabo wrote: > > > > Well, for example (assuming two pk rows with 1 and 2 as keys) > > > > T1: begin; > > T1: insert into fk values (1); > > T2: begin; > > T2: insert into fk values (2); > > T1: update pk set nonkey='a' where key=2; > > T2: update pk set nonkey='b' where key=1; > > > > Maybe I don't understand this example. If T2 inserted fk 2, how did T1 manage > to update a record that references it before T2 committed? For T1, fk 2 > doesn't exist yet so there couldn't be any records referencing it. Noone has completed in the above. They're two concurrent transactions that may deadlock. AFAICT, what you want is a sequence like the below (including lock annotations) for the above. Transaction 1: begin; Transaction 2: begin; Transaction 1: insert into fk values (1); - Checks pk table for value, finds it, gets a read lock Transaction 2: insert into fk values (2); - Checks pk table for value, finds it, gets a read lock Transaction 1: update pk set nonkey='a' where key=2; - Wants a write lock on row with pk.key=2, can't get it because Transaction 2 has a read lock. It has to wait. Transaction 2: update pk set nonkey='a' where key=1; - Wants a write lock on row with pk.key=1, can't get it because Transaction 1 has a read lock. It has to wait.
Now I understand what you are trying to say, but what you are describing is normal (happens in most DBs) and rather uncommon (in my experience). General DB design is done so reference tables end up with a lot of read locks and rarely have a write lock. It would be cool if you could remove that contention, but not at the expense of expected write lock behaivor. On Thursday 16 January 2003 02:43 pm, Stephan Szabo wrote: > AFAICT, what you want is a sequence like the below (including lock > annotations) for the above. > > Transaction 1: begin; > Transaction 2: begin; > Transaction 1: insert into fk values (1); > - Checks pk table for value, finds it, gets a read lock > Transaction 2: insert into fk values (2); > - Checks pk table for value, finds it, gets a read lock > Transaction 1: update pk set nonkey='a' where key=2; > - Wants a write lock on row with pk.key=2, can't get it because > Transaction 2 has a read lock. It has to wait. > Transaction 2: update pk set nonkey='a' where key=1; > - Wants a write lock on row with pk.key=1, can't get it because > Transaction 1 has a read lock. It has to wait.
On Thu, 16 Jan 2003, Jon Swinth wrote: > Now I understand what you are trying to say, but what you are describing is > normal (happens in most DBs) and rather uncommon (in my experience). General > DB design is done so reference tables end up with a lot of read locks and > rarely have a write lock. It would be cool if you could remove that > contention, but not at the expense of expected write lock behaivor. The other example worries me more though. Two transactions working with the same pk row throughout. Transaction 1: begin; Transaction 2: begin; Transaction 1: insert into fk values (1); - Checks pk table for value, finds it, gets a read lock Transaction 2: insert into fk values (1); - Checks pk table for value, finds it, gets a read lock Transaction 1: update pk set nonkey='a' where key=1; - Wants a write lock on row with pk.key=1, we can't upgrade our lock since T2 also has a read lock. Transaction 2: update pk set nonkey='a' where key=1; - Same as above, except T1 For comparison, the dirty read(plus stuff that we aren't calling magic ;) ) version of the above basically goes: Transaction 1: begin; Transaction 2: begin; Transaction 1: insert into fk values (1); - Checks pk table for value, finds it Transaction 2: insert into fk values (1); - Checks pk table for value, finds it Transaction 1: update pk set nonkey='a' where key=1; - Notices that the key is not changed, doesn't check fk table at all Transaction 2: update pk set nonkey='a' where key=1; - Wait on transaction 1 since it has a lock on the row. ---- Basically the real difference externally is that in one case the blocking occurs before the action happens to the row and in the other, the action happens and the foreign key code is the one that does the blocking. It allows things like not blocking based on cases like the key not changing. I haven't determined if the "stuff" necessary to get all the cases working is practical yet, so I can't say for certain it's better, just that it has the potential.
On Thu, 16 Jan 2003, Jon Swinth wrote: > Now I understand what you are trying to say, but what you are describing is > normal (happens in most DBs) and rather uncommon (in my experience). General > DB design is done so reference tables end up with a lot of read locks and > rarely have a write lock. It would be cool if you could remove that > contention, but not at the expense of expected write lock behaivor. I think I may have also misunderstood which lock behavior you were worried about. In either scheme if someone does something like: Transaction 1: begin; Transaction 2: begin; Transaction 1: select for update from pk where key=1; - Gets a write lock on row with pk.key=1 [Or does an update or a delete or whatever] Transaction 2: insert into fk values (1); - Needs to wait on the write lock above That will stay true even in the dirty read scheme.
I think I see what you are trying to acheive. I agree with you and like the idea of forgoing read lock when the FK field is not modified. This should be a nice performance gain on blindly creating a read lock on a record. I guess I just got so worried about the term "dirty read" that I didn't understand the rest. Now, if we could only have the feature like Oracle of SELECT ... FOR UPDATE NOWAIT, so I can control how long we wait for a lock. Wait... can't do that until SQL exceptions stop voiding the transaction (I want to be able to retry the lock several times before giving up). Hey, forget that, if you can fix FK deadlock then I'll deal with the rest. On Thursday 16 January 2003 03:47 pm, Stephan Szabo wrote: > I think I may have also misunderstood which lock behavior you were worried > about. In either scheme if someone does something like: > > Transaction 1: begin; > Transaction 2: begin; > Transaction 1: select for update from pk where key=1; > - Gets a write lock on row with pk.key=1 > [Or does an update or a delete or whatever] > Transaction 2: insert into fk values (1); > - Needs to wait on the write lock above > > That will stay true even in the dirty read scheme.
On Fri, 17 Jan 2003, Jon Swinth wrote: > Now, if we could only have the feature like Oracle of SELECT ... FOR UPDATE > NOWAIT, so I can control how long we wait for a lock. Wait... can't do that > until SQL exceptions stop voiding the transaction (I want to be able to retry > the lock several times before giving up). What's Oracle's behavior when some of the rows can be locked and some can't?
Oracle's behavior AFAIK depends on which lock you are talking about. If you are talking about not being able to get a read lock then Oracle just blocks until it can, indefinately unless a deadlock is detected. If you are talking about a write lock, that depends on how the write lock is called. If you call SELECT ... FOR UPDATE or call UPDATE/DELETE without locking first then again Oracle will block until it can get the lock. If you call SELECT ... FOR UPDATE NOWAIT then Oracle will throw a specific SQL exception if a lock cannot be granted immediately. This allows you to do a sleep and retry in your code so that you only wait for a lock so long: boolean locked = false ; int retryCount = 3 ; while (!locked) { try { SELECT 1 FROM some_table WHERE some_condition FOR UPDATE NOWAIT ; } catch (SQLException e) { if (retryCount > 0) { retryCount += 1 ; sleep(1); } else { throw e ; } //end if } //end try } //end while With this kind of logic you can control how long you wait for a lock. If some idiot did something that locked a whole bunch of records, you don't want the entire DB to come to a stop waiting to get a lock. That type of thing can't be caught by deadlock detection since the idiot isn't trying to lock anything else. Unfortunately, this is not possible in PostgreSQL even if you added the NOWAIT functionality (yet?). Thats because the first SQL Exception thrown automatically voids the transaction and you are forced to roll back. It doesn't matter that your code knows how to get around the exception. PostgreSQL requires that you call rollback and start over. It is actually kind of funny. When I first came across this it was the first time in my life that I had ever seen a SQL exception on a sequence select (caused by a SQL Exception just before the select). On Friday 17 January 2003 10:09 am, Stephan Szabo wrote: > On Fri, 17 Jan 2003, Jon Swinth wrote: > > Now, if we could only have the feature like Oracle of SELECT ... FOR > > UPDATE NOWAIT, so I can control how long we wait for a lock. Wait... > > can't do that until SQL exceptions stop voiding the transaction (I want > > to be able to retry the lock several times before giving up). > > What's Oracle's behavior when some of the rows can be locked and some > can't?
On Fri, 17 Jan 2003, Jon Swinth wrote: > Oracle's behavior AFAIK depends on which lock you are talking about. If you > are talking about not being able to get a read lock then Oracle just blocks > until it can, indefinately unless a deadlock is detected. > > If you are talking about a write lock, that depends on how the write lock is > called. If you call SELECT ... FOR UPDATE or call UPDATE/DELETE without > locking first then again Oracle will block until it can get the lock. If you > call SELECT ... FOR UPDATE NOWAIT then Oracle will throw a specific SQL > exception if a lock cannot be granted immediately. This allows you to do a > sleep and retry in your code so that you only wait for a lock so long: I was wondering about the NOWAIT specific behavior particularly. You could send a notice (completion condition) I guess, but an actual exception wouldn't work.