Re: "pgstat wait timeout" just got a lot more common on Windows - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: "pgstat wait timeout" just got a lot more common on Windows
Date
Msg-id CABUevEwoivuFOkyWPBP=rWKywd1OxC7aROG1xfyULv5hqQnXkg@mail.gmail.com
Whole thread Raw
In response to Re: "pgstat wait timeout" just got a lot more common on Windows  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: "pgstat wait timeout" just got a lot more common on Windows
List pgsql-hackers
<p><br /> On May 10, 2012 4:59 PM, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>>
wrote:<br/> ><br /> > I wrote:<br /> > > Last night I changed the stats collector process to use<br /> >
>WaitLatchOrSocket instead of a periodic forced wakeup to see whether<br /> > > the postmaster has died.  This
morningI observe that several Windows<br /> > > buildfarm members are showing regression test failures caused
by<br/> > > unexpected "pgstat wait timeout" warnings.  Everybody else is fine.<br /> ><br /> > > This
suggeststhat there is something broken in the Windows<br /> > > implementation of WaitLatchOrSocket.  I wonder
whetherit also<br /> > > tells us something we did not know about the underlying cause of<br /> > > those
messages. Not sure what though.  Ideas?  Can anyone who<br /> > > knows Windows take another look at
WaitLatchOrSocket?<br/> ><br /> > Anybody have any clues about that?  If not, I think I'll have to revert<br />
>the pgstat changes for beta1, which isn't really forward progress.<p>Haven't had time to look at the code itself,
andwon't before wrap time. Sorry. <br /><p>> I spent some time staring at the Windows WaitLatchOrSocket code
myself.<br/> > The only thing I could find that seemed wrong is that in the event<br /> > array, we list the
latch'sevent before pgwin32_signal_event.  The<br /> > Microsoft documentation I looked at says that if more than
oneevent<br /> > is ready, WaitforMultipleObjects reports the first such array member.<br /> > This means that if
thelatch is already set when control gets here,<br /> > signal handlers will not be serviced.<p>Yeah, that does seem
wrong.<p>>  That doesn't match what would<br /> > happen on a Unix machine, so it seems like at least a violation
ofthe<br /> > POLA.  Hence I think we oughta swap the order of those two array<br /> > elements.  (Same issue in
PGSemaphoreLock,btw, and I'm suspicious of<br /> > pgwin32_select.)  I do not however<p>Maybe we need a loop that
checksfor all events? <p>> see a way that that would explain the<br /> > pgstat failures, because the stats
collector'slatch really shouldn't<br /> > ever get set during normal regression test runs.<br /><p>So could there be
somethingwrong in the other end, meaning the latch *does* get set? <p>/Magnus <br /> 

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Draft release notes complete
Next
From: Magnus Hagander
Date:
Subject: Re: Draft release notes complete