Re: bug in fast-path locking - Mailing list pgsql-hackers

From Robert Haas
Subject Re: bug in fast-path locking
Date
Msg-id CA+TgmoaqPhECf19D82KDix_bn6S=ef0wTe6nBjju-92md9kPAw@mail.gmail.com
Whole thread Raw
In response to bug in fast-path locking  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: bug in fast-path locking  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: bug in fast-path locking  (Jim Nasby <jim@nasby.net>)
Re: bug in fast-path locking  (Jeff Davis <pgsql@j-davis.com>)
Re: bug in fast-path locking  (Boszormenyi Zoltan <zb@cybertec.at>)
List pgsql-hackers
On Sun, Apr 8, 2012 at 9:37 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Robert, the Assert triggering with the above procedure
>> is in your "fast path" locking code with current GIT.
>
> Yes, that sure looks like a bug.  It seems that if the top-level
> transaction is aborting, then LockReleaseAll() is called and
> everything gets cleaned up properly; or if a subtransaction is
> aborting after the lock is fully granted, then the locks held by the
> subtransaction are released one at a time using LockRelease(), but if
> the subtransaction is aborted *during the lock wait* then we only do
> LockWaitCancel(), which doesn't clean up the LOCALLOCK.  Before the
> fast-lock patch, that didn't really matter, but now it does, because
> that LOCALLOCK is tracking the fact that we're holding onto a shared
> resource - the strong lock count.  So I think that LockWaitCancel()
> needs some kind of adjustment, but I haven't figured out exactly what
> it is yet.

I looked at this more.  The above analysis is basically correct, but
the problem goes a bit beyond an error in LockWaitCancel().  We could
also crap out with an error before getting as far as LockWaitCancel()
and have the same problem.  I think that a correct statement of the
problem is this: from the time we bump the strong lock count, up until
the time we're done acquiring the lock (or give up on acquiring it),
we need to have an error-cleanup hook in place that will unbump the
strong lock count if we error out.   Once we're done updating the
shared and local lock tables, the special handling ceases to be
needed, because any subsequent lock release will go through
LockRelease() or LockReleaseAll(), which will do the appropriate
clenaup.

The attached patch is an attempt at implementing that; any reviews appreciated.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Revisiting extract(epoch from timestamp)
Next
From: Atri Sharma
Date:
Subject: Re: [JDBC] Regarding GSoc Application