Re: "pgstat wait timeout" just got a lot more common on Windows - Mailing list pgsql-hackers

From Tom Lane
Subject Re: "pgstat wait timeout" just got a lot more common on Windows
Date
Msg-id 1735.1336664256@sss.pgh.pa.us
Whole thread Raw
In response to Re: "pgstat wait timeout" just got a lot more common on Windows  (Magnus Hagander <magnus@hagander.net>)
List pgsql-hackers
Magnus Hagander <magnus@hagander.net> writes:
> On May 10, 2012 4:59 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>> I spent some time staring at the Windows WaitLatchOrSocket code myself.
>> The only thing I could find that seemed wrong is that in the event
>> array, we list the latch's event before pgwin32_signal_event.  The
>> Microsoft documentation I looked at says that if more than one event
>> is ready, WaitforMultipleObjects reports the first such array member.
>> This means that if the latch is already set when control gets here,
>> signal handlers will not be serviced.

> Yeah, that does seem wrong.

>> That doesn't match what would
>> happen on a Unix machine, so it seems like at least a violation of the
>> POLA.  Hence I think we oughta swap the order of those two array
>> elements.  (Same issue in PGSemaphoreLock, btw, and I'm suspicious of
>> pgwin32_select.)  I do not however

> Maybe we need a loop that checks for all events?

I don't think so.  It's already the case that WaitLatch doesn't
guarantee that all possible flags are set in its result.  In connection
with Peter G's observation that we could simplify the API by rechecking
PostmasterIsAlive for WL_POSTMASTER_DEATH, I was planning to clarify
the API spec as "result bits that are set are guaranteed to reflect
reality, but it's not guaranteed that we set every bit that could
possibly be set".  This should not break any caller since the same
result could occur given a slight change in timing anyway; the caller
has to be prepared to come back and check for more conditions after it
services whatever WaitLatch does report.  However, signal service is
not a condition the caller is supposed to deal with, so I think we
want a guarantee that that happens inside WaitLatch.

>> see a way that that would explain the
>> pgstat failures, because the stats collector's latch really shouldn't
>> ever get set during normal regression test runs.

> So could there be something wrong in the other end, meaning the latch
> *does* get set?

Even if it did, it'd get cleared at the top of the loop, so that the
next call ought to handle things.  Tis a puzzlement.  AFAICS the only
condition WaitforMultipleObjects is going to see in these tests is
read-ready on the socket; surely it wouldn't fail to notice that?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Draft release notes complete
Next
From: Tom Lane
Date:
Subject: Re: Draft release notes complete