Re: [BUGS] BUG #3242: FATAL: could not unlock semaphore: error code 298 - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: [BUGS] BUG #3242: FATAL: could not unlock semaphore: error code 298
Date
Msg-id 4628F1D2.6050302@hagander.net
Whole thread Raw
In response to Re: [BUGS] BUG #3242: FATAL: could not unlock semaphore: error code 298  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Re: [BUGS] BUG #3242: FATAL: could not unlock semaphore: error code 298
Re: [BUGS] BUG #3242: FATAL: could not unlock semaphore: error code 298
List pgsql-hackers
Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> Tom Lane wrote:
>>> How is it possible for a semaphore to be unlocked "too many times"?
>>> It's supposed to be a running counter of the net V's minus P's, and
>>> yes it had better be able to count higher than one.  Have we chosen
>>> the wrong Windows primitive to implement this?
> 
>> No, it's definitly the right primitive. But we're creating it with a max
>> count of 1.
> 
> That's definitely wrong.  There are at least three reasons for a PG
> process's semaphore to be signaled (heavyweight lock release, LWLock
> release, pin count waiter), and at least two of them can occur
> concurrently (eg, if deadlock checker fires, it will need to take
> LWLocks, but there's nothing saying that the original lock won't be
> released while it waits for an LWLock).
> 
> The effective max count on Unixen is typically in the thousands,
> and I'd suggest the same on Windows unless there's some efficiency
> reason to keep it small (in which case, maybe ten would do).

AFAIK there's no problem with huge numbers (it takes an int32, and the
documentation says nothing about a limit - I'm sure it's just a 32-bit
counter in the kernel). I'll give that a shot.

Marcin - can you test a source patch? Or should I try to build you a
binary for testing? It'd be good if you can confirm that it works before
we commit anything, I think.


> I'm astonished that we've not seen this reported before.  Has the
> Windows sema code always been like that?

It could be an 8.2 problem, actually, since we had new semaphore code
there.  Looking at

http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/port/win32/Attic/sema.c?rev=1.13;content-type=text%2Fx-cvsweb-markup,
it looks like we may have used a *semaphore* with just one as top, but
then kept a counter in userspace as well... (Haven't looked through the
details of the code, but it looks that way from a casual view)

//Magnus


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [BUGS] BUG #3242: FATAL: could not unlock semaphore: error code 298
Next
From: Andrew Dunstan
Date:
Subject: Re: Re: [BUGS] BUG #3242: FATAL: could not unlock semaphore: error code 298