<p><br /> On May 10, 2012 4:59 PM, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>>
wrote:<br/> ><br /> > I wrote:<br /> > > Last night I changed the stats collector process to use<br /> >
>WaitLatchOrSocket instead of a periodic forced wakeup to see whether<br /> > > the postmaster has died. This
morningI observe that several Windows<br /> > > buildfarm members are showing regression test failures caused
by<br/> > > unexpected "pgstat wait timeout" warnings. Everybody else is fine.<br /> ><br /> > > This
suggeststhat there is something broken in the Windows<br /> > > implementation of WaitLatchOrSocket. I wonder
whetherit also<br /> > > tells us something we did not know about the underlying cause of<br /> > > those
messages. Not sure what though. Ideas? Can anyone who<br /> > > knows Windows take another look at
WaitLatchOrSocket?<br/> ><br /> > Anybody have any clues about that? If not, I think I'll have to revert<br />
>the pgstat changes for beta1, which isn't really forward progress.<p>Haven't had time to look at the code itself,
andwon't before wrap time. Sorry. <br /><p>> I spent some time staring at the Windows WaitLatchOrSocket code
myself.<br/> > The only thing I could find that seemed wrong is that in the event<br /> > array, we list the
latch'sevent before pgwin32_signal_event. The<br /> > Microsoft documentation I looked at says that if more than
oneevent<br /> > is ready, WaitforMultipleObjects reports the first such array member.<br /> > This means that if
thelatch is already set when control gets here,<br /> > signal handlers will not be serviced.<p>Yeah, that does seem
wrong.<p>> That doesn't match what would<br /> > happen on a Unix machine, so it seems like at least a violation
ofthe<br /> > POLA. Hence I think we oughta swap the order of those two array<br /> > elements. (Same issue in
PGSemaphoreLock,btw, and I'm suspicious of<br /> > pgwin32_select.) I do not however<p>Maybe we need a loop that
checksfor all events? <p>> see a way that that would explain the<br /> > pgstat failures, because the stats
collector'slatch really shouldn't<br /> > ever get set during normal regression test runs.<br /><p>So could there be
somethingwrong in the other end, meaning the latch *does* get set? <p>/Magnus <br />