Thread: Locking & concurrency - best practices
I have a "parent_tbl" and dozens of data tables, with foreign keys referencing the PK of "parent_tbl" (one-to-many). There are 100+ users accessing the application, usually (but not always) each user is working on a different record in parent_tbl. (this would seem like a pretty standard scenario for a lot of apps) Each user performs multiple queries in a transaction, reading and modifying the data in parent_tbl and multipe data tables before commiting. I need the data to be consistent during and after the transaction. (I basically need a way to lock a row in parent_tbl, and all rows in the data tables referencing that row, and prevent new rows from being inserted that reference that row). To guard against this, I added "FOR UPDATE" to queries against the parent_tbl and "LOCK TABLE IN EXCLUSIVE MODE" before queries against all of the data tables. This works, except it slows down the entire application because all transactions are serialized. Even users who are working on seperate records in parent_tbl are not allowed to proceed simultaneously. This is not ideal, the vast majority of access to this database is users working on separate records. Should I drop the "LOCK TABLE" statements completely? As long as *every* part of the application that modifies data obtains a "FOR UPDATE" lock on the parent table's record first, there shouldn't be any concurrency issues. But, I realize I'm really only implementing advisory locking, and there's nothing preventing data corruption from any application that forgets or leaves out the "FOR UPDATE". Is this the best practice for dealing with this situation? Should I be using real advisory locks instead of "FOR UPDATE" ? What are the pros & cons of each?
On Jan 14, 2008 2:43 PM, Adam Rich <adam.r@indigodynamic.com> wrote: > > I have a "parent_tbl" and dozens of data tables, with foreign keys > referencing the PK of "parent_tbl" (one-to-many). There are 100+ > users accessing the application, usually (but not always) each user > is working on a different record in parent_tbl. (this would seem like > a pretty standard scenario for a lot of apps) You should be able to do "select for update" on both parent and child records and get the effect you desire. Think up your own worst case scenario for concurrent updates, then sit down at two or more psql terminals, and try to simulate such a thing and see what happens. Experimentation is a great tool.
> You should be able to do "select for update" on both parent and child > records and get the effect you desire. > I don't think that will work. Let me demonstrate: (this is simplified, but sufficient to make my point) -- Connection 1 -- begin trans; select * from parent_tbl where id=1 for update; select count(*) into myvar from data_tbl where fk=1; -- connection 2 runs here (see below) -- if (myvar < 3) then update parent_tbl set status=1 where id=1; else update parent_tbl set status=2 where id=1; end if; commit; -- Connection 2 -- begin trans; insert into data_tbl (fk, data) values (1, 'foo'); insert into data_tbl (fk, data) values (1, 'bar'); insert into data_tbl (fk, data) values (1, 'baz'); commit; -- End example -- In what way would you use "FOR UPDATE" on data_tbl to ensure parent_tbl doesn't end up with the wrong status ? AFAIK, "FOR UPDATE" locks only the rows returned, and does nothing to prevent new inserts. using a "serialized" isolation doesn't seem appropriate either. As far as I can tell, the only options are locking the entire data_tbl at the start of both connections (which unfortunately also blocks all other transactions with id/fk != 1), or using advisory locks.
On Jan 14, 2008 3:31 PM, Adam Rich <adam.r@indigodynamic.com> wrote: > > You should be able to do "select for update" on both parent and child > > records and get the effect you desire. > > > > I don't think that will work. Let me demonstrate: > (this is simplified, but sufficient to make my point) > > -- Connection 1 -- > begin trans; > > select * from parent_tbl > where id=1 for update; > > select count(*) into myvar > from data_tbl where fk=1; You're right. with count(*) involved, you won't be able to get an accurate view. Generally speaking, when you've got to count rows like that, locking the table is the only thing that works. That or changing how you process the data.
Adam Rich wrote: > I have a "parent_tbl" and dozens of data tables, with foreign keys > referencing the PK of "parent_tbl" (one-to-many). There are 100+ > users accessing the application, usually (but not always) each user > is working on a different record in parent_tbl. (this would seem like a pretty standard scenario for a lot of apps) > > Each user performs multiple queries in a transaction, reading and > modifying the data in parent_tbl and multipe data tables before > commiting. I need the data to be consistent during and after the > transaction. (I basically need a way to lock a row in parent_tbl, > and all rows in the data tables referencing that row, and prevent > new rows from being inserted that reference that row). > > To guard against this, I added "FOR UPDATE" to queries against the > parent_tbl and "LOCK TABLE IN EXCLUSIVE MODE" before queries against > all of the data tables. This works, except it slows down the entire > application because all transactions are serialized. Even users who > are working on seperate records in parent_tbl are not allowed to > proceed simultaneously. This is not ideal, the vast majority of > access to this database is users working on separate records. > > Should I drop the "LOCK TABLE" statements completely? As long as > *every* part of the application that modifies data obtains a > "FOR UPDATE" lock on the parent table's record first, there shouldn't > be any concurrency issues. But, I realize I'm really only implementing > advisory locking, and there's nothing preventing data corruption from > any application that forgets or leaves out the "FOR UPDATE". > > Is this the best practice for dealing with this situation? Should I > be using real advisory locks instead of "FOR UPDATE" ? What are the > pros & cons of each? > In our program we wrote the locking into the program, and created a modulelock table like: create table moduelock( userid int, module int, primary key (userid, module) ) The program then locks things before it uses them... but we also have pretty low contention for modules. A lock is: begin insert into modulelock... commit; if commit ok, then go ahead. When we are done, delete from modulelock where ... -Andy
On Jan 14, 2008, at 3:54 PM, andy wrote: > In our program we wrote the locking into the program, and created a > modulelock table like: > > create table moduelock( > userid int, > module int, > primary key (userid, module) > ) > > The program then locks things before it uses them... but we also > have pretty low contention for modules. > > A lock is: > begin > insert into modulelock... > commit; > > if commit ok, then go ahead. When we are done, delete from > modulelock where ... From what I can tell, this kind of roll-your-own application level locking system is exactly what advisory locks are for. Search the archives for the last couple of weeks as I remember someone posting some really helpful functions to assist in using advisory locks. Erik Jones DBA | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
> > From what I can tell, this kind of roll-your-own application level > locking system is exactly what advisory locks are for. Search the > archives for the last couple of weeks as I remember someone posting > some really helpful functions to assist in using advisory locks. > > Erik Jones Yes & No... it depends on the lifetime of the locks you need. The new advisory locks in postgres only live for the duration of your session. The ones Andy describes will live past session end, connection end, even through database restarts. And if you're using replication or log shipping, the locks will be propagated to partner databases as well. If you need your locks to live past session end, the advisory locks won't help you.
On Jan 14, 2008, at 4:57 PM, Adam Rich wrote: >> >> From what I can tell, this kind of roll-your-own application level >> locking system is exactly what advisory locks are for. Search the >> archives for the last couple of weeks as I remember someone posting >> some really helpful functions to assist in using advisory locks. >> >> Erik Jones > > Yes & No... it depends on the lifetime of the locks you need. The new > advisory locks in postgres only live for the duration of your session. > The ones Andy describes will live past session end, connection end, > even through database restarts. And if you're using replication or > log shipping, the locks will be propagated to partner databases > as well. > > If you need your locks to live past session end, the advisory locks > won't help you. Good point. Erik Jones DBA | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
On Jan 14, 2008 4:31 PM, Adam Rich <adam.r@indigodynamic.com> wrote: > > You should be able to do "select for update" on both parent and child > > records and get the effect you desire. > > > > I don't think that will work. Let me demonstrate: > (this is simplified, but sufficient to make my point) > > -- Connection 1 -- > begin trans; > > select * from parent_tbl > where id=1 for update; > > select count(*) into myvar > from data_tbl where fk=1; > > -- connection 2 runs here (see below) -- > > if (myvar < 3) then > update parent_tbl > set status=1 where id=1; > else > update parent_tbl > set status=2 where id=1; > end if; > > commit; > > -- Connection 2 -- > > begin trans; > insert into data_tbl (fk, data) values (1, 'foo'); > insert into data_tbl (fk, data) values (1, 'bar'); > insert into data_tbl (fk, data) values (1, 'baz'); > commit; > > -- End example -- > > In what way would you use "FOR UPDATE" on data_tbl > to ensure parent_tbl doesn't end up with the wrong > status ? AFAIK, "FOR UPDATE" locks only the rows > returned, and does nothing to prevent new inserts. > using a "serialized" isolation doesn't seem appropriate > either. As far as I can tell, the only options are > locking the entire data_tbl at the start of both > connections (which unfortunately also blocks all > other transactions with id/fk != 1), or using > advisory locks. Advisory locks would work here (better that than table lock), but I don't think that's the right approach. Transaction 2 should simply do a select * from parent_tbl where id=1 for update; at the start of the transaction. The idea here is that a property of 'parent_tbl' is the count of _all_ it's data elements. Therefore, locking should be consistently applied at the parent level, so you serialize access to a particular parent. merlin
On Jan 14, 2008 5:57 PM, Adam Rich <adam.r@indigodynamic.com> wrote: > > > > From what I can tell, this kind of roll-your-own application level > > locking system is exactly what advisory locks are for. Search the > > archives for the last couple of weeks as I remember someone posting > > some really helpful functions to assist in using advisory locks. > > > > Erik Jones > > Yes & No... it depends on the lifetime of the locks you need. The new > advisory locks in postgres only live for the duration of your session. > The ones Andy describes will live past session end, connection end, > even through database restarts. And if you're using replication or > log shipping, the locks will be propagated to partner databases > as well. > > If you need your locks to live past session end, the advisory locks > won't help you. That's not really a lock (although it behaves like one). That's simply a field in a table that says 'If i'm this do that otherwise do that'. I don't know if there's a formal definition of locks, so I'm loosely going to define them as things that protect access to the data that are not in the data. merlin
> Advisory locks would work here (better that than table lock), but I > don't think that's the right approach. Transaction 2 should simply do > a > select * from parent_tbl > where id=1 for update; > > at the start of the transaction. That's actually what I'm doing (just forgot to include it in the simplified example). What I'm struggling with is that since these locks aren't enforced in one central place, so I have to run the "for update" query in every far corner of my code that touches data, whether or not it reads or writes to parent_tbl. If any of the developers forget to add it, the data can become corrupted. And since I'm essentially using row-level locks as advisory locks, I wondered if just using advisory locks directly would benefit us somehow, in quicker transactions, CPU/memory overhead, WAL, etc. In my real application, there are lots of "parent_tbl" and when I try to "for update" the appropriate ones, I get deadlocks. I know in theory, I only need to lock things in the same order, everywhere. But in practice, it seems hard to achieve.
On Jan 15, 2008 12:03 AM, Adam Rich <adam.r@indigodynamic.com> wrote: > > Advisory locks would work here (better that than table lock), but I > > don't think that's the right approach. Transaction 2 should simply do > > a > > select * from parent_tbl > > where id=1 for update; > > > > at the start of the transaction. > > That's actually what I'm doing (just forgot to include it in the > simplified example). What I'm struggling with is that since these locks > aren't enforced in one central place, so I have to run the "for update" > query in every far corner of my code that touches data, whether or not > it reads or writes to parent_tbl. If any of the developers forget > to add it, the data can become corrupted. And since I'm essentially > using row-level locks as advisory locks, I wondered if just using > advisory locks directly would benefit us somehow, in quicker > transactions, CPU/memory overhead, WAL, etc. I think you have it backwards...you are considering using advisory locks as row level locks. Advisory locks do not get released at the end of the transaction so you have to be little careful with them, particularly in light of your neglectful developers comment. Advisory locks also stack, which is something to be careful of. > In my real application, there are lots of "parent_tbl" and when I try > to "for update" the appropriate ones, I get deadlocks. I know in > theory, I only need to lock things in the same order, everywhere. > But in practice, it seems hard to achieve. You are simply having to play the hand you dealt yourself with that design. I don't think having to lock a record before writing to it is all that terrible, but I understand your perspective. You have a few general strategies to look at to prevent having to do this: *) push insert into data table to a function (this is still a cooperative method) *) write a trigger on data table that acquires the lock on parent for insert (or possibly delete), or cache parent status in parent table via trigger *) rethink your table design so that parent status is run through the parent table, forcing a lock *) write a rule, although I don't advise this *) kick your developers until they lock records properly merlin