Ancient lock bug figured out - Mailing list pgsql-hackers

From Tom Lane
Subject Ancient lock bug figured out
Date
Msg-id 12872.975367809@sss.pgh.pa.us
Whole thread Raw
List pgsql-hackers
proc.c has the following code --- unchanged since Postgres95 ---
in HandleDeadlock():
   /* ---------------------    * Check to see if we've been awoken by anyone in the interim.    *    * If we have we
canreturn and resume our transaction -- happy day.    * Before we are awoken the process releasing the lock grants it
to   * us so we know that we don't have to wait anymore.    *    * Damn these names are LONG! -mer    *
---------------------   */   if (IpcSemaphoreGetCount(MyProc->sem.semId, MyProc->sem.semNum) ==
IpcSemaphoreDefaultStartValue)  {       UnlockLockTable();       return;   }
 
   /*    * you would think this would be unnecessary, but...    *    * this also means we've been removed already.  in
someports (e.g.,    * sparc and aix) the semop(2) implementation is such that we can    * actually end up in this
handlerafter someone has removed us from    * the queue and bopped the semaphore *but the test above fails to    *
detectthe semaphore update* (presumably something weird having to    * do with the order in which the semaphore wakeup
signaland SIGALRM    * get handled).    */   if (MyProc->links.prev == INVALID_OFFSET ||       MyProc->links.next ==
INVALID_OFFSET)  {       UnlockLockTable();       return;   }
 

Well, the reason control can get to the "apparently unnecessary" part is
not some weird portability glitch; it is that the first test is WRONG.
IpcSemaphoreGetCount() calls semctl(GETNCNT), which returns not the
current value of the semaphore as this code expects, but the number of
waiters on the semaphore.  Since the process doing this test is the
only one that'll ever wait on that semaphore, it is impossible for
IpcSemaphoreGetCount() to return anything but zero, and so the first
if() has never ever succeeded in the entire history of Postgres.

IpcSemaphoreGetCount is used nowhere else.  I see no particularly good
reason to have it at all, and certainly not to have it with a name so
easily mistaken for IpcSemaphoreGetValue.  It's about to be toast.

Next question is whether to recode the first test "correctly" with
IpcSemaphoreGetValue(), or just remove it.  Since we clearly have done
just fine with testing our internal link pointers to detect removal
from the queue, I'm inclined to remove the first test and thus save an
unnecessary kernel call.  Comments?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Constraint names using 'user namespace'?
Next
From: "Mitch Vincent"
Date:
Subject: Re: 8192 BLCKSZ ?