On 10.08.2011 21:45, Tom Lane wrote:
> We occasionally see $SUBJECT in the buildfarm, and I've also recently
> had reports of them from Red Hat customers. The obvious theory is that
> these reflect high load preventing the stats collector from responding,
> but it would really take pretty crushing load to make that happen if
> there were not anything funny going on.
>
> It struck me just now while reviewing the latch code that pg_usleep
> could sleep for less than the expected time if a signal happened, and
> if that happened repeatedly for some reason, perhaps the loop could
> complete in much less than the nominal time. I'm not sure I believe
> that idea either, but anyway I'm feeling motivated to try to gather more
> data.
I've also seen this on my laptop occasionally. The most recent case I
remember was when I COPYed a lot of data, so that the harddisk was
really busy. The system was a bit unresponsive anyway, because of all
the I/O happening.
So my theory is that if the I/O is really busy, write() on the stats
file blocks for more than 5 seconds, and you get the timeout.
> Does anyone have a problem with sticking a lot of debugging printout
> into backend_read_statsfile() in HEAD only? I'm envisioning it starting
> to dump assorted information including elapsed time, errno values, etc
> once the loop counter is more than halfway to expiration, which is
> already a situation that we shouldn't see under normal conditions.
No objections here.
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com