bug in fast-path locking - Mailing list pgsql-hackers

From Robert Haas
Subject bug in fast-path locking
Date
Msg-id CA+TgmobyD_4_NR5wVs7N6W5be9k6F0yQLTGNg4_jV5OUvesm8A@mail.gmail.com
Whole thread Raw
Responses Re: bug in fast-path locking
List pgsql-hackers
On Sun, Apr 8, 2012 at 12:43 PM, Boszormenyi Zoltan <zb@cybertec.at> wrote:
>> Indeed, the unpatched GIT version crashes if you enter
>>  =#lock TABLE pgbench_accounts ;
>> the second time in session 2 after the first one failed. Also,
>> manually spelling it out:
>>
>> Session 1:
>>
>> $ psql
>> psql (9.2devel)
>> Type "help" for help.
>>
>> zozo=# begin;
>> BEGIN
>> zozo=# lock table pgbench_accounts;
>> LOCK TABLE
>> zozo=#
>>
>> Session 2:
>>
>> zozo=# begin;
>> BEGIN
>> zozo=# savepoint a;
>> SAVEPOINT
>> zozo=# lock table pgbench_accounts;
>> ERROR:  canceling statement due to statement timeout
>> zozo=# rollback to a;
>> ROLLBACK
>> zozo=# savepoint b;
>> SAVEPOINT
>> zozo=# lock table pgbench_accounts;
>> The connection to the server was lost. Attempting reset: Failed.
>> !>
>>
>> Server log after the second lock table:
>>
>> TRAP: FailedAssertion("!(locallock->holdsStrongLockCount == 0)", File:
>> "lock.c", Line: 749)
>> LOG:  server process (PID 12978) was terminated by signal 6: Aborted
>
>
> Robert, the Assert triggering with the above procedure
> is in your "fast path" locking code with current GIT.

Yes, that sure looks like a bug.  It seems that if the top-level
transaction is aborting, then LockReleaseAll() is called and
everything gets cleaned up properly; or if a subtransaction is
aborting after the lock is fully granted, then the locks held by the
subtransaction are released one at a time using LockRelease(), but if
the subtransaction is aborted *during the lock wait* then we only do
LockWaitCancel(), which doesn't clean up the LOCALLOCK.  Before the
fast-lock patch, that didn't really matter, but now it does, because
that LOCALLOCK is tracking the fact that we're holding onto a shared
resource - the strong lock count.  So I think that LockWaitCancel()
needs some kind of adjustment, but I haven't figured out exactly what
it is yet.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Next
From: Noah Misch
Date:
Subject: Re: ECPG FETCH readahead