Thread: bug in fast-path locking

bug in fast-path locking

From
Robert Haas
Date:
On Sun, Apr 8, 2012 at 12:43 PM, Boszormenyi Zoltan <zb@cybertec.at> wrote:
>> Indeed, the unpatched GIT version crashes if you enter
>>  =#lock TABLE pgbench_accounts ;
>> the second time in session 2 after the first one failed. Also,
>> manually spelling it out:
>>
>> Session 1:
>>
>> $ psql
>> psql (9.2devel)
>> Type "help" for help.
>>
>> zozo=# begin;
>> BEGIN
>> zozo=# lock table pgbench_accounts;
>> LOCK TABLE
>> zozo=#
>>
>> Session 2:
>>
>> zozo=# begin;
>> BEGIN
>> zozo=# savepoint a;
>> SAVEPOINT
>> zozo=# lock table pgbench_accounts;
>> ERROR:  canceling statement due to statement timeout
>> zozo=# rollback to a;
>> ROLLBACK
>> zozo=# savepoint b;
>> SAVEPOINT
>> zozo=# lock table pgbench_accounts;
>> The connection to the server was lost. Attempting reset: Failed.
>> !>
>>
>> Server log after the second lock table:
>>
>> TRAP: FailedAssertion("!(locallock->holdsStrongLockCount == 0)", File:
>> "lock.c", Line: 749)
>> LOG:  server process (PID 12978) was terminated by signal 6: Aborted
>
>
> Robert, the Assert triggering with the above procedure
> is in your "fast path" locking code with current GIT.

Yes, that sure looks like a bug.  It seems that if the top-level
transaction is aborting, then LockReleaseAll() is called and
everything gets cleaned up properly; or if a subtransaction is
aborting after the lock is fully granted, then the locks held by the
subtransaction are released one at a time using LockRelease(), but if
the subtransaction is aborted *during the lock wait* then we only do
LockWaitCancel(), which doesn't clean up the LOCALLOCK.  Before the
fast-lock patch, that didn't really matter, but now it does, because
that LOCALLOCK is tracking the fact that we're holding onto a shared
resource - the strong lock count.  So I think that LockWaitCancel()
needs some kind of adjustment, but I haven't figured out exactly what
it is yet.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: bug in fast-path locking

From
Robert Haas
Date:
On Sun, Apr 8, 2012 at 9:37 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Robert, the Assert triggering with the above procedure
>> is in your "fast path" locking code with current GIT.
>
> Yes, that sure looks like a bug.  It seems that if the top-level
> transaction is aborting, then LockReleaseAll() is called and
> everything gets cleaned up properly; or if a subtransaction is
> aborting after the lock is fully granted, then the locks held by the
> subtransaction are released one at a time using LockRelease(), but if
> the subtransaction is aborted *during the lock wait* then we only do
> LockWaitCancel(), which doesn't clean up the LOCALLOCK.  Before the
> fast-lock patch, that didn't really matter, but now it does, because
> that LOCALLOCK is tracking the fact that we're holding onto a shared
> resource - the strong lock count.  So I think that LockWaitCancel()
> needs some kind of adjustment, but I haven't figured out exactly what
> it is yet.

I looked at this more.  The above analysis is basically correct, but
the problem goes a bit beyond an error in LockWaitCancel().  We could
also crap out with an error before getting as far as LockWaitCancel()
and have the same problem.  I think that a correct statement of the
problem is this: from the time we bump the strong lock count, up until
the time we're done acquiring the lock (or give up on acquiring it),
we need to have an error-cleanup hook in place that will unbump the
strong lock count if we error out.   Once we're done updating the
shared and local lock tables, the special handling ceases to be
needed, because any subsequent lock release will go through
LockRelease() or LockReleaseAll(), which will do the appropriate
clenaup.

The attached patch is an attempt at implementing that; any reviews appreciated.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: bug in fast-path locking

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I looked at this more.  The above analysis is basically correct, but
> the problem goes a bit beyond an error in LockWaitCancel().  We could
> also crap out with an error before getting as far as LockWaitCancel()
> and have the same problem.  I think that a correct statement of the
> problem is this: from the time we bump the strong lock count, up until
> the time we're done acquiring the lock (or give up on acquiring it),
> we need to have an error-cleanup hook in place that will unbump the
> strong lock count if we error out.   Once we're done updating the
> shared and local lock tables, the special handling ceases to be
> needed, because any subsequent lock release will go through
> LockRelease() or LockReleaseAll(), which will do the appropriate
> clenaup.

Haven't looked at the code, but maybe it'd be better to not bump the
strong lock count in the first place until the final step of updating
the lock tables?
        regards, tom lane


Re: bug in fast-path locking

From
Robert Haas
Date:
On Mon, Apr 9, 2012 at 1:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I looked at this more.  The above analysis is basically correct, but
>> the problem goes a bit beyond an error in LockWaitCancel().  We could
>> also crap out with an error before getting as far as LockWaitCancel()
>> and have the same problem.  I think that a correct statement of the
>> problem is this: from the time we bump the strong lock count, up until
>> the time we're done acquiring the lock (or give up on acquiring it),
>> we need to have an error-cleanup hook in place that will unbump the
>> strong lock count if we error out.   Once we're done updating the
>> shared and local lock tables, the special handling ceases to be
>> needed, because any subsequent lock release will go through
>> LockRelease() or LockReleaseAll(), which will do the appropriate
>> clenaup.
>
> Haven't looked at the code, but maybe it'd be better to not bump the
> strong lock count in the first place until the final step of updating
> the lock tables?

Well, unfortunately, that would break the entire mechanism.  The idea
is that we bump the strong lock count first.  That prevents anyone
from taking any more fast-path locks on the target relation.  Then, we
go through and find any existing fast-path locks that have already
been taken, and turn them into regular locks.  Finally, we resolve the
actual lock request and either grant the lock or block, depending on
whether conflicts exist.  So there's some necessary separation between
the action of bumping the strong lock count and updating the lock
tables; the entire mechanism relies on being able to do non-trivial
processing in between.  I thought that I had nailed down the error
exit cases in the original patch, but this test case, and some code
reading with fresh eyes, shows that I didn't do half so good a job as
I had thought.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: bug in fast-path locking

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Apr 9, 2012 at 1:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Haven't looked at the code, but maybe it'd be better to not bump the
>> strong lock count in the first place until the final step of updating
>> the lock tables?

> Well, unfortunately, that would break the entire mechanism.  The idea
> is that we bump the strong lock count first.  That prevents anyone
> from taking any more fast-path locks on the target relation.  Then, we
> go through and find any existing fast-path locks that have already
> been taken, and turn them into regular locks.  Finally, we resolve the
> actual lock request and either grant the lock or block, depending on
> whether conflicts exist.

OK.  (Is that explained somewhere in the comments?  I confess I've not
paid any attention to this patch up to now.)  I wonder though whether
you actually need a *count*.  What if it were just a flag saying "do not
take any fast path locks on this object", and once set it didn't get
unset until there were no locks left at all on that object?  In
particular, it's not clear from what you're saying here why it's okay
to let the value revert once you've changed some of the FP locks to
regular locks.
        regards, tom lane


Re: bug in fast-path locking

From
Robert Haas
Date:
On Mon, Apr 9, 2012 at 2:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Mon, Apr 9, 2012 at 1:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Haven't looked at the code, but maybe it'd be better to not bump the
>>> strong lock count in the first place until the final step of updating
>>> the lock tables?
>
>> Well, unfortunately, that would break the entire mechanism.  The idea
>> is that we bump the strong lock count first.  That prevents anyone
>> from taking any more fast-path locks on the target relation.  Then, we
>> go through and find any existing fast-path locks that have already
>> been taken, and turn them into regular locks.  Finally, we resolve the
>> actual lock request and either grant the lock or block, depending on
>> whether conflicts exist.
>
> OK.  (Is that explained somewhere in the comments?  I confess I've not
> paid any attention to this patch up to now.)

There's a new section in src/backend/storage/lmgr/README on Fast Path
Locking, plus comments at various places in the code.  It's certainly
possible I've missed something that should be updated, but I did my
best.

> I wonder though whether
> you actually need a *count*.  What if it were just a flag saying "do not
> take any fast path locks on this object", and once set it didn't get
> unset until there were no locks left at all on that object?

I think if you read the above-referenced section of the README you'll
be deconfused.  The short version is that we divide up the space of
lockable objects into 1024 partitions and the strong lock counts are
actually a count of all locks in the partition.  It is therefore
theoretically possible for locking to get slower on table A because
somebody's got an AccessExclusiveLock on table B, if the low-order 10
bits of the locktag hashcodes happen to collide.  In such a case, all
locks on both relations would be forced out of the fast path until the
AccessExclusiveLock was released. If it so happens that table A is
getting pounded with something that looks a lot like pgbench -S -c 32
-j 32 on a system with more than a couple of cores, the user will be
sad.  I judge that real-world occurrences of this problem will be
quite rare, since most people have adequate reasons for long-lived
strong table locks anyway, and 1024 partitions seemed like enough to
keep most people from suffering too badly.  I don't see any way to
eliminate the theoretical possibility of this while still having the
basic mechanism work, either, though we could certainly crank up the
partition count.

> In
> particular, it's not clear from what you're saying here why it's okay
> to let the value revert once you've changed some of the FP locks to
> regular locks.

It's always safe to convert a fast-path lock to a regular lock; it
just costs you some performance.  The idea is that everything that
exists as a fast-path lock is something that's certain not to have any
lock conflicts.  As soon as we discover that a particular lock might
be involved in a lock conflict, we have to turn it into a "real" lock.So if backends 1, 2, and 3 take fast-path locks
onA (to SELECT from 
it, for example) and then backend 4 wants an AccessExclusiveLock, it
will pull the locks from those backends out of the fast-path mechanism
and make regular lock entries for them before checking for lock
conflicts.  Then, it will discover that there are in fact conflicts
and go to sleep.  When those backends go to release their locks, they
will notice that their locks have been moved to the main lock table
and will release them there, eventually waking up backend 4 to go do
his thing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: bug in fast-path locking

From
Jeff Davis
Date:
On Mon, 2012-04-09 at 16:11 -0400, Robert Haas wrote:
> > I wonder though whether
> > you actually need a *count*.  What if it were just a flag saying "do not
> > take any fast path locks on this object", and once set it didn't get
> > unset until there were no locks left at all on that object?
> 
> I think if you read the above-referenced section of the README you'll
> be deconfused.

My understanding:

The basic reason for the count is that we need to preserve the
information that a strong lock is being acquired between the time it's
put in FastPathStrongRelationLocks and the time it actually acquires the
lock in the lock manager.

By definition, the lock manager doesn't know about it yet (so we can't
use a simple rule like "there are no locks on the object so we can unset
the flag"). Therefore, the backend must indicate whether it's in this
code path or not; meaning that it needs to do something on the error
path (in this case, decrement the count). That was the source of this
bug.

There may be a way around this problem, but nothing occurs to me right
now.

Regards,Jeff Davis

PS: Oops, I missed this bug in the review, too.

PPS: In the README, FastPathStrongRelationLocks is referred to as
FastPathStrongLocks. Worth a quick update for easier symbol searching.



Re: bug in fast-path locking

From
Jim Nasby
Date:
On 4/9/12 12:32 PM, Robert Haas wrote:
> I looked at this more.  The above analysis is basically correct, but
> the problem goes a bit beyond an error in LockWaitCancel().  We could
> also crap out with an error before getting as far as LockWaitCancel()
> and have the same problem.  I think that a correct statement of the
> problem is this: from the time we bump the strong lock count, up until
> the time we're done acquiring the lock (or give up on acquiring it),
> we need to have an error-cleanup hook in place that will unbump the
> strong lock count if we error out.   Once we're done updating the
> shared and local lock tables, the special handling ceases to be
> needed, because any subsequent lock release will go through
> LockRelease() or LockReleaseAll(), which will do the appropriate
> clenaup.
>
> The attached patch is an attempt at implementing that; any reviews appreciated.

Dumb question... should operations in the various StrongLock functions take place in a critical section? Or is that
alreadyensure outside of these functions?
 
-- 
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net


Re: bug in fast-path locking

From
Jeff Davis
Date:
On Mon, 2012-04-09 at 17:42 -0500, Jim Nasby wrote:
> Dumb question... should operations in the various StrongLock functions
> take place in a critical section? Or is that already ensure outside of
> these functions?

Do you mean CRITICAL_SECTION() in the postgres sense (that is, avoid
error paths by making all ERRORs into PANICs and preventing interrupts);
or the sense described here:
http://en.wikipedia.org/wiki/Critical_section ?

If you mean in the postgres sense, you'd have to hold the critical
section open from the time you incremented the strong lock count all the
way until you decremented it (which is normally at the time the lock is
released); which is a cure worse than the disease.

Regards,Jeff Davis




Re: bug in fast-path locking

From
Jeff Davis
Date:
On Mon, 2012-04-09 at 13:32 -0400, Robert Haas wrote:
> I looked at this more.  The above analysis is basically correct, but
> the problem goes a bit beyond an error in LockWaitCancel().  We could
> also crap out with an error before getting as far as LockWaitCancel()
> and have the same problem.  I think that a correct statement of the
> problem is this: from the time we bump the strong lock count, up until
> the time we're done acquiring the lock (or give up on acquiring it),
> we need to have an error-cleanup hook in place that will unbump the
> strong lock count if we error out.   Once we're done updating the
> shared and local lock tables, the special handling ceases to be
> needed, because any subsequent lock release will go through
> LockRelease() or LockReleaseAll(), which will do the appropriate
> clenaup.
> 
> The attached patch is an attempt at implementing that; any reviews appreciated.
> 

This path doesn't have an AbortStrongLockAcquire:
 if (!(proclock->holdMask & LOCKBIT_ON(lockmode))) {   ...   elog(ERROR,...)

but other similar paths do:
 if (!proclock) {   AbortStrongLockAcquire();

I don't think it's necessary outside of LockErrorCleanup(), right?

I think there could be some more asserts. There are three sites that
decrement FastPathStrongRelationLocks, but in two of them
StrongLockInProgress should always be NULL.

Other than that, it looks good to me.

Regards,Jeff Davis







Re: bug in fast-path locking

From
Jeff Davis
Date:
On Mon, 2012-04-09 at 22:47 -0700, Jeff Davis wrote:
> but other similar paths do:
> 
>   if (!proclock)
>   {
>     AbortStrongLockAcquire();
> 
> I don't think it's necessary outside of LockErrorCleanup(), right?

I take that back, it's necessary for the dontwait case, too.

Regards,Jeff Davis



Re: bug in fast-path locking

From
Boszormenyi Zoltan
Date:
2012-04-09 19:32 keltezéssel, Robert Haas írta:
> On Sun, Apr 8, 2012 at 9:37 PM, Robert Haas<robertmhaas@gmail.com>  wrote:
>>> Robert, the Assert triggering with the above procedure
>>> is in your "fast path" locking code with current GIT.
>> Yes, that sure looks like a bug.  It seems that if the top-level
>> transaction is aborting, then LockReleaseAll() is called and
>> everything gets cleaned up properly; or if a subtransaction is
>> aborting after the lock is fully granted, then the locks held by the
>> subtransaction are released one at a time using LockRelease(), but if
>> the subtransaction is aborted *during the lock wait* then we only do
>> LockWaitCancel(), which doesn't clean up the LOCALLOCK.  Before the
>> fast-lock patch, that didn't really matter, but now it does, because
>> that LOCALLOCK is tracking the fact that we're holding onto a shared
>> resource - the strong lock count.  So I think that LockWaitCancel()
>> needs some kind of adjustment, but I haven't figured out exactly what
>> it is yet.
> I looked at this more.  The above analysis is basically correct, but
> the problem goes a bit beyond an error in LockWaitCancel().  We could
> also crap out with an error before getting as far as LockWaitCancel()
> and have the same problem.  I think that a correct statement of the
> problem is this: from the time we bump the strong lock count, up until
> the time we're done acquiring the lock (or give up on acquiring it),
> we need to have an error-cleanup hook in place that will unbump the
> strong lock count if we error out.   Once we're done updating the
> shared and local lock tables, the special handling ceases to be
> needed, because any subsequent lock release will go through
> LockRelease() or LockReleaseAll(), which will do the appropriate
> clenaup.
>
> The attached patch is an attempt at implementing that; any reviews appreciated.

This patch indeed fixes the scenario discovered by Cousin Marc.

Reading this patch also made me realize that my lock_timeout
patch needs adjusting, i.e. needs an AbortStrongLockAcquire()
call if waiting for a lock timed out.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig&  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de     http://www.postgresql.at/



Re: bug in fast-path locking

From
Jim Nasby
Date:
On 4/9/12 6:12 PM, Jeff Davis wrote:
> On Mon, 2012-04-09 at 17:42 -0500, Jim Nasby wrote:
>> Dumb question... should operations in the various StrongLock functions
>> take place in a critical section? Or is that already ensure outside of
>> these functions?
>
> Do you mean CRITICAL_SECTION() in the postgres sense (that is, avoid
> error paths by making all ERRORs into PANICs and preventing interrupts);
> or the sense described here:

Postgres sense. I thought there was concern about multiple people trying to increment or decrement the count at the
sametime, and if that was the case perhaps there was an issue with it not being in a CRITICAL_SECTION as well. But I
couldcertainly be wrong about this. :)
 

And yes, we'd definitely not want to be in a CRITICAL_SECTION for the duration of the operation...
-- 
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net