Another bug introduced by fastpath patch - Mailing list pgsql-hackers

From Tom Lane
Subject Another bug introduced by fastpath patch
Date
Msg-id 4040.1385591144@sss.pgh.pa.us
Whole thread Raw
Responses Re: Another bug introduced by fastpath patch  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
In LockReleaseAll, we have this coding:
   for (partition = 0; partition < NUM_LOCK_PARTITIONS; partition++)   {       LWLockId    partitionLock =
FirstLockMgrLock+ partition;       SHM_QUEUE  *procLocks = &(MyProc->myProcLocks[partition]);
 
       proclock = (PROCLOCK *) SHMQueueNext(procLocks, procLocks,
offsetof(PROCLOCK,procLink));
 
       if (!proclock)           continue;            /* needn't examine this partition */
       LWLockAcquire(partitionLock, LW_EXCLUSIVE);
       ... process the proclock list ...
       LWLockRelease(partitionLock);   }                            /* loop over partitions */


That is, we check the proclock list head before acquiring the matching
partitionLock, and assume we can skip scanning this partition if the list
is empty.  Once upon a time that was safe, but since the fast-path patch
it's not very safe, because some other backend could be in process of
promoting one of our fast-path locks (and hence changing the proclock
list).  If we could be sure that we had removed *all* our fastpath locks
in the earlier loop then it'd be okay, but because LockReleaseAll's name
is a bit of a misnomer, I don't think we can assume that.

The simplest fix is just to move the fetch of the list header down
past the LWLockAcquire.  This is pretty annoying, though, because in
typical small transactions that will require a lot of additional
partition lock acquire/releases.

I think it might be safe to check the list header, skip the partition
if it's null, but if it isn't then acquire the lock *and re-fetch the
list header*.  The idea here is that if there is someone else busy
promoting one of our fastpath locks, it must be a lock we'd decided not
to delete in the previous loop, and so we would again decide not to
delete it in the main loop.  Therefore, it doesn't matter if we decide
to skip a partition microseconds before somebody changes the list header
from null to not-null.  However, once we've decided to process a
partition, we'd better get an up-to-date view of the list state.

This does do some damage to the original concept that allLocks mode
would guarantee to clean up all locks of the target lockmethod even
if the locallock table was damaged.  I wonder whether we shouldn't
get rid of the existing logic about fastpath in the loop over locallocks
entirely, and replace it with code that just zeroes the fastpath array
when lockmethodid == DEFAULT_LOCKMETHOD and allLocks.  That would likely
be faster than what you've got here as well as more robust.  Or we
could add a restriction to EligibleForRelationFastPath that restricts
the fast-path mechanism to non-session locks, in which case we'd not
need to make the zeroing contingent on allLocks either.  I don't think
we take any fast-path-eligible locks in session mode anyway, so this
wouldn't be giving up any functionality on that end.

Comments?
        regards, tom lane



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: MultiXact pessmization in 9.3
Next
From: Bruce Momjian
Date:
Subject: Re: Suggestion: Issue warning when calling SET TRANSACTION outside transaction block