Re: bug in fast-path locking - Mailing list pgsql-hackers

From Boszormenyi Zoltan
Subject Re: bug in fast-path locking
Date
Msg-id 4F83D93E.10606@cybertec.at
Whole thread Raw
In response to Re: bug in fast-path locking  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
2012-04-09 19:32 keltezéssel, Robert Haas írta:
> On Sun, Apr 8, 2012 at 9:37 PM, Robert Haas<robertmhaas@gmail.com>  wrote:
>>> Robert, the Assert triggering with the above procedure
>>> is in your "fast path" locking code with current GIT.
>> Yes, that sure looks like a bug.  It seems that if the top-level
>> transaction is aborting, then LockReleaseAll() is called and
>> everything gets cleaned up properly; or if a subtransaction is
>> aborting after the lock is fully granted, then the locks held by the
>> subtransaction are released one at a time using LockRelease(), but if
>> the subtransaction is aborted *during the lock wait* then we only do
>> LockWaitCancel(), which doesn't clean up the LOCALLOCK.  Before the
>> fast-lock patch, that didn't really matter, but now it does, because
>> that LOCALLOCK is tracking the fact that we're holding onto a shared
>> resource - the strong lock count.  So I think that LockWaitCancel()
>> needs some kind of adjustment, but I haven't figured out exactly what
>> it is yet.
> I looked at this more.  The above analysis is basically correct, but
> the problem goes a bit beyond an error in LockWaitCancel().  We could
> also crap out with an error before getting as far as LockWaitCancel()
> and have the same problem.  I think that a correct statement of the
> problem is this: from the time we bump the strong lock count, up until
> the time we're done acquiring the lock (or give up on acquiring it),
> we need to have an error-cleanup hook in place that will unbump the
> strong lock count if we error out.   Once we're done updating the
> shared and local lock tables, the special handling ceases to be
> needed, because any subsequent lock release will go through
> LockRelease() or LockReleaseAll(), which will do the appropriate
> clenaup.
>
> The attached patch is an attempt at implementing that; any reviews appreciated.

This patch indeed fixes the scenario discovered by Cousin Marc.

Reading this patch also made me realize that my lock_timeout
patch needs adjusting, i.e. needs an AbortStrongLockAcquire()
call if waiting for a lock timed out.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig&  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de     http://www.postgresql.at/



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: To Do wiki
Next
From: Boszormenyi Zoltan
Date:
Subject: Re: [PATCH] lock_timeout and common SIGALRM framework