Re: [HACKERS] backend freezeing on win32 fixed (I hope ;-) ) - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] backend freezeing on win32 fixed (I hope ;-) )
Date
Msg-id 21783.934911055@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] backend freezeing on win32 fixed (I hope ;-) )  (Bruce Momjian <maillist@candle.pha.pa.us>)
Responses Re: [HACKERS] backend freezeing on win32 fixed (I hope ;-) )
List pgsql-hackers
Bruce Momjian <maillist@candle.pha.pa.us> writes:
>> storage/ipc/ipc.c ). Why it is, I don't know, but it seems that my solution
>> uses the ipc library in the right way. There are no longer any error
>> messages from the ipc library when running the server. And I can't say that
>> the ipc library is a 100% correct implementation of SysV IPC, it is probably
>> (sure ;-) )caused by the Windows internals.

> Seems we may have to use the patch, or make some other patch for NT-only
> that works around this NT bug.

I don't have a problem with installing an NT patch (lord knows there
are plenty of #ifdef __CYGWIN32__'s in the code already).  But I have
a problem with *this* patch because I don't believe we understand what
it is doing, and therefore I have no confidence in it.  The extent of
our understanding so far is that one backend can create a semaphore that
can be used by a later backend, but the postmaster cannot create a
semaphore that can be used by a later backend.  I don't really believe
that; I think there is something else going on.  Until we understand
what the something else is, I don't think we have a trustworthy
solution.

The real reason I feel itchy about this is that I know that interprocess
synchronization is a very tricky area, so I'm not confident that the
limited amount of testing Dan can do by himself proves that things are
solid.  As the old saw goes, "testing cannot prove the absence of bugs".
I want to have both clean test results *and* an understanding of what
we are doing before I will feel comfortable.

Looking again at the code, it occurs to me that a backend exiting
normally will probably leave its semaphore set nonzero, which could
(given a buggy IPC library) have something to do with whether another
process can attach to the sema or not.  The postmaster code is *trying*
to create the semas with nonzero starting values, but I see that the
backend code takes the additional step of doing    semun.val = IpcSemaphoreDefaultStartValue;    semctl(semId, semNum,
SETVAL,semun);
 
whereas the postmaster code doesn't.  Maybe the create call isn't
initializing the semaphores the way it's told to?  It'd be worth
trying adding a step like this to the postmaster preallocation.

In any case, I'd really like us to get some feedback from the author of
cygipc about this issue.  I don't mind working around a bug once we
understand exactly what the bug is --- but in this particular area,
I think guessing our way to a workaround isn't good enough.
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Cary O'Brien"
Date:
Subject: Re: [CORE] Re: tomorrow
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] backend freezeing on win32 fixed (I hope ;-) )