Tom Lane wrote:
> I wrote:
> > Anyway it's only a guess. It could well be that that machine was simply
> > so heavily loaded that the stats collector couldn't respond fast enough.
> > I'm just wondering whether there's an unrecognized bug lurking here.
>
> Still meditating on this ... and it strikes me that the pgstat.c code
> is really uncommunicative about problems. In particular,
> pgstat_read_statsfile_timestamp and pgstat_read_statsfile don't complain
> at all about being unable to read a stats file.
Yeah, I had the same thought.
> Lastly, backend_read_statsfile is designed to send an inquiry message
> every time through the loop, ie, every 10 msec. This is said to be in
> case the stats collector drops one. But is this enough to flood the
> collector and make things worse? I wonder if there should be some
> backoff there.
I also think the autovacuum worker minimum timestamp may be playing
games with the retry logic too. Maybe a worker is requesting a new file
continuously because pgstat is not able to provide one before the
deadline is past, and thus overloading it. I still think that 500ms is
too much for a worker, but backing off all the way to 10ms seems too
much. Maybe it should just be, say, 100ms.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support