Thread: Locking & concurrency - best practices

Locking & concurrency - best practices

From
Adam Rich
Date:
I have a "parent_tbl" and dozens of data tables, with foreign keys
referencing the PK of "parent_tbl" (one-to-many).  There are 100+
users accessing the application, usually (but not always) each user
is working on a different record in parent_tbl.  (this would seem like a pretty standard scenario for a lot of apps)

Each user performs multiple queries in a transaction, reading and
modifying the data in parent_tbl and multipe data tables before
commiting.  I need the data to be consistent during and after the
transaction.  (I basically need a way to lock a row in parent_tbl,
and all rows in the data tables referencing that row, and prevent
new rows from being inserted that reference that row).

To guard against this, I added "FOR UPDATE" to queries against the
parent_tbl and "LOCK TABLE IN EXCLUSIVE MODE" before queries against
all of the data tables.  This works, except it slows down the entire
application because all transactions are serialized.  Even users who
are working on seperate records in parent_tbl are not allowed to
proceed simultaneously.  This is not ideal, the vast majority of
access to this database is users working on separate records.

Should I drop the "LOCK TABLE" statements completely?  As long as
*every* part of the application that modifies data obtains a
"FOR UPDATE" lock on the parent table's record first, there shouldn't
be any concurrency issues.  But, I realize I'm really only implementing
advisory locking, and there's nothing preventing data corruption from
any application that forgets or leaves out the "FOR UPDATE".

Is this the best practice for dealing with this situation?  Should I
be using real advisory locks instead of "FOR UPDATE" ?  What are the
pros & cons of each?










Re: Locking & concurrency - best practices

From
"Scott Marlowe"
Date:
On Jan 14, 2008 2:43 PM, Adam Rich <adam.r@indigodynamic.com> wrote:
>
> I have a "parent_tbl" and dozens of data tables, with foreign keys
> referencing the PK of "parent_tbl" (one-to-many).  There are 100+
> users accessing the application, usually (but not always) each user
> is working on a different record in parent_tbl.  (this would seem like
> a pretty standard scenario for a lot of apps)

You should be able to do "select for update" on both parent and child
records and get the effect you desire.

Think up your own worst case scenario for concurrent updates, then sit
down at two or more psql terminals, and try to simulate such a thing
and see what happens.  Experimentation is a great tool.

Re: Locking & concurrency - best practices

From
"Adam Rich"
Date:
> You should be able to do "select for update" on both parent and child
> records and get the effect you desire.
>

I don't think that will work.  Let me demonstrate:
(this is simplified, but sufficient to make my point)

-- Connection 1 --
begin trans;

select * from parent_tbl
where id=1 for update;

select count(*) into myvar
from data_tbl where fk=1;

-- connection 2 runs here (see below) --

if (myvar < 3) then
   update parent_tbl
   set status=1 where id=1;
else
   update parent_tbl
   set status=2 where id=1;
end if;

commit;

-- Connection 2 --

begin trans;
insert into data_tbl (fk, data) values (1, 'foo');
insert into data_tbl (fk, data) values (1, 'bar');
insert into data_tbl (fk, data) values (1, 'baz');
commit;

-- End example --

In what way would you use "FOR UPDATE" on data_tbl
to ensure parent_tbl doesn't end up with the wrong
status ?  AFAIK, "FOR UPDATE" locks only the rows
returned, and does nothing to prevent new inserts.
using a "serialized" isolation doesn't seem appropriate
either.  As far as I can tell, the only options are
locking the entire data_tbl at the start of both
connections (which unfortunately also blocks all
other transactions with id/fk != 1), or using
advisory locks.







Re: Locking & concurrency - best practices

From
"Scott Marlowe"
Date:
On Jan 14, 2008 3:31 PM, Adam Rich <adam.r@indigodynamic.com> wrote:
> > You should be able to do "select for update" on both parent and child
> > records and get the effect you desire.
> >
>
> I don't think that will work.  Let me demonstrate:
> (this is simplified, but sufficient to make my point)
>
> -- Connection 1 --
> begin trans;
>
> select * from parent_tbl
> where id=1 for update;
>
> select count(*) into myvar
> from data_tbl where fk=1;

You're right. with count(*) involved, you won't be able to get an accurate view.

Generally speaking, when you've got to count rows like that, locking
the table is the only thing that works.

That or changing how you process the data.

Re: Locking & concurrency - best practices

From
andy
Date:
Adam Rich wrote:
> I have a "parent_tbl" and dozens of data tables, with foreign keys
> referencing the PK of "parent_tbl" (one-to-many).  There are 100+
> users accessing the application, usually (but not always) each user
> is working on a different record in parent_tbl.  (this would seem like a pretty standard scenario for a lot of apps)
>
> Each user performs multiple queries in a transaction, reading and
> modifying the data in parent_tbl and multipe data tables before
> commiting.  I need the data to be consistent during and after the
> transaction.  (I basically need a way to lock a row in parent_tbl,
> and all rows in the data tables referencing that row, and prevent
> new rows from being inserted that reference that row).
>
> To guard against this, I added "FOR UPDATE" to queries against the
> parent_tbl and "LOCK TABLE IN EXCLUSIVE MODE" before queries against
> all of the data tables.  This works, except it slows down the entire
> application because all transactions are serialized.  Even users who
> are working on seperate records in parent_tbl are not allowed to
> proceed simultaneously.  This is not ideal, the vast majority of
> access to this database is users working on separate records.
>
> Should I drop the "LOCK TABLE" statements completely?  As long as
> *every* part of the application that modifies data obtains a
> "FOR UPDATE" lock on the parent table's record first, there shouldn't
> be any concurrency issues.  But, I realize I'm really only implementing
> advisory locking, and there's nothing preventing data corruption from
> any application that forgets or leaves out the "FOR UPDATE".
>
> Is this the best practice for dealing with this situation?  Should I
> be using real advisory locks instead of "FOR UPDATE" ?  What are the
> pros & cons of each?
>

In our program we wrote the locking into the program, and created a
modulelock table like:

create table moduelock(
   userid int,
   module int,
   primary key (userid, module)
)

The program then locks things before it uses them... but we also have
pretty low contention for modules.

A lock is:
begin
insert into modulelock...
commit;

if commit ok, then go ahead.  When we are done, delete from modulelock
where ...

-Andy

Re: Locking & concurrency - best practices

From
Erik Jones
Date:
On Jan 14, 2008, at 3:54 PM, andy wrote:

> In our program we wrote the locking into the program, and created a
> modulelock table like:
>
> create table moduelock(
>   userid int,
>   module int,
>   primary key (userid, module)
> )
>
> The program then locks things before it uses them... but we also
> have pretty low contention for modules.
>
> A lock is:
> begin
> insert into modulelock...
> commit;
>
> if commit ok, then go ahead.  When we are done, delete from
> modulelock where ...

 From what I can tell, this kind of roll-your-own application level
locking system is exactly what advisory locks are for.  Search the
archives for the last couple of weeks as I remember someone posting
some really helpful functions to assist in using advisory locks.

Erik Jones

DBA | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com




Re: Locking & concurrency - best practices

From
"Adam Rich"
Date:
>
>  From what I can tell, this kind of roll-your-own application level
> locking system is exactly what advisory locks are for.  Search the
> archives for the last couple of weeks as I remember someone posting
> some really helpful functions to assist in using advisory locks.
>
> Erik Jones

Yes & No... it depends on the lifetime of the locks you need.  The new
advisory locks in postgres only live for the duration of your session.
The ones Andy describes will live past session end, connection end,
even through database restarts.  And if you're using replication or
log shipping, the locks will be propagated to partner databases
as well.

If you need your locks to live past session end, the advisory locks
won't help you.






Re: Locking & concurrency - best practices

From
Erik Jones
Date:
On Jan 14, 2008, at 4:57 PM, Adam Rich wrote:

>>
>>  From what I can tell, this kind of roll-your-own application level
>> locking system is exactly what advisory locks are for.  Search the
>> archives for the last couple of weeks as I remember someone posting
>> some really helpful functions to assist in using advisory locks.
>>
>> Erik Jones
>
> Yes & No... it depends on the lifetime of the locks you need.  The new
> advisory locks in postgres only live for the duration of your session.
> The ones Andy describes will live past session end, connection end,
> even through database restarts.  And if you're using replication or
> log shipping, the locks will be propagated to partner databases
> as well.
>
> If you need your locks to live past session end, the advisory locks
> won't help you.

Good point.

Erik Jones

DBA | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com




Re: Locking & concurrency - best practices

From
"Merlin Moncure"
Date:
On Jan 14, 2008 4:31 PM, Adam Rich <adam.r@indigodynamic.com> wrote:
> > You should be able to do "select for update" on both parent and child
> > records and get the effect you desire.
> >
>
> I don't think that will work.  Let me demonstrate:
> (this is simplified, but sufficient to make my point)
>
> -- Connection 1 --
> begin trans;
>
> select * from parent_tbl
> where id=1 for update;
>
> select count(*) into myvar
> from data_tbl where fk=1;
>
> -- connection 2 runs here (see below) --
>
> if (myvar < 3) then
>    update parent_tbl
>    set status=1 where id=1;
> else
>    update parent_tbl
>    set status=2 where id=1;
> end if;
>
> commit;
>
> -- Connection 2 --
>
> begin trans;
> insert into data_tbl (fk, data) values (1, 'foo');
> insert into data_tbl (fk, data) values (1, 'bar');
> insert into data_tbl (fk, data) values (1, 'baz');
> commit;
>
> -- End example --
>
> In what way would you use "FOR UPDATE" on data_tbl
> to ensure parent_tbl doesn't end up with the wrong
> status ?  AFAIK, "FOR UPDATE" locks only the rows
> returned, and does nothing to prevent new inserts.
> using a "serialized" isolation doesn't seem appropriate
> either.  As far as I can tell, the only options are
> locking the entire data_tbl at the start of both
> connections (which unfortunately also blocks all
> other transactions with id/fk != 1), or using
> advisory locks.

Advisory locks would work here (better that than table lock), but I
don't think that's the right approach.  Transaction 2 should simply do
a
select * from parent_tbl
where id=1 for update;

at the start of the transaction.  The idea here is that a property of
'parent_tbl' is the count of _all_ it's data elements.  Therefore,
locking should be consistently applied at the parent level, so you
serialize access to a particular parent.

merlin

Re: Locking & concurrency - best practices

From
"Merlin Moncure"
Date:
On Jan 14, 2008 5:57 PM, Adam Rich <adam.r@indigodynamic.com> wrote:
> >
> >  From what I can tell, this kind of roll-your-own application level
> > locking system is exactly what advisory locks are for.  Search the
> > archives for the last couple of weeks as I remember someone posting
> > some really helpful functions to assist in using advisory locks.
> >
> > Erik Jones
>
> Yes & No... it depends on the lifetime of the locks you need.  The new
> advisory locks in postgres only live for the duration of your session.
> The ones Andy describes will live past session end, connection end,
> even through database restarts.  And if you're using replication or
> log shipping, the locks will be propagated to partner databases
> as well.
>
> If you need your locks to live past session end, the advisory locks
> won't help you.

That's not really a lock (although it behaves like one).   That's
simply a field in a table that says 'If i'm this do that otherwise do
that'.  I don't know if there's a formal definition of locks, so I'm
loosely going to define them as things that protect access to the data
that are not in the data.

merlin

Re: Locking & concurrency - best practices

From
"Adam Rich"
Date:
> Advisory locks would work here (better that than table lock), but I
> don't think that's the right approach.  Transaction 2 should simply do
> a
> select * from parent_tbl
> where id=1 for update;
>
> at the start of the transaction.

That's actually what I'm doing (just forgot to include it in the
simplified example).  What I'm struggling with is that since these locks
aren't enforced in one central place, so I have to run the "for update"
query in every far corner of my code that touches data, whether or not
it reads or writes to parent_tbl.  If any of the developers forget
to add it, the data can become corrupted.  And since I'm essentially
using row-level locks as advisory locks, I wondered if just using
advisory locks directly would benefit us somehow, in quicker
transactions, CPU/memory overhead, WAL, etc.

In my real application, there are lots of "parent_tbl" and when I try
to "for update" the appropriate ones, I get deadlocks.  I know in
theory, I only need to lock things in the same order, everywhere.
But in practice, it seems hard to achieve.







Re: Locking & concurrency - best practices

From
"Merlin Moncure"
Date:
On Jan 15, 2008 12:03 AM, Adam Rich <adam.r@indigodynamic.com> wrote:
> > Advisory locks would work here (better that than table lock), but I
> > don't think that's the right approach.  Transaction 2 should simply do
> > a
> > select * from parent_tbl
> > where id=1 for update;
> >
> > at the start of the transaction.
>
> That's actually what I'm doing (just forgot to include it in the
> simplified example).  What I'm struggling with is that since these locks
> aren't enforced in one central place, so I have to run the "for update"
> query in every far corner of my code that touches data, whether or not
> it reads or writes to parent_tbl.  If any of the developers forget
> to add it, the data can become corrupted.  And since I'm essentially
> using row-level locks as advisory locks, I wondered if just using
> advisory locks directly would benefit us somehow, in quicker
> transactions, CPU/memory overhead, WAL, etc.

I think you have it backwards...you are considering using advisory
locks as row level locks.  Advisory locks do not get released at the
end of the transaction so you have to be little careful with them,
particularly in light of your neglectful developers comment.  Advisory
locks also stack, which is something to be careful of.

> In my real application, there are lots of "parent_tbl" and when I try
> to "for update" the appropriate ones, I get deadlocks.  I know in
> theory, I only need to lock things in the same order, everywhere.
> But in practice, it seems hard to achieve.

You are simply having to play the hand you dealt yourself with that
design.  I don't think having to lock a record before writing to it is
all that terrible, but I understand your perspective.  You have a few
general strategies to look at to prevent having to do this:

*) push insert into data table to a function (this is still a
cooperative method)
*) write a trigger on data table that acquires the lock on parent for
insert (or possibly delete), or cache parent status in parent table
via trigger
*) rethink your table design so that parent status is run through the
parent table, forcing a lock
*) write a rule, although I don't advise this
*) kick your developers until they lock records properly

merlin