Re: Better way of dealing with pgstat wait timeout during buildfarm runs? - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Better way of dealing with pgstat wait timeout during buildfarm runs?
Date
Msg-id 20141225201436.GK31801@alap3.anarazel.de
Whole thread Raw
In response to Re: Better way of dealing with pgstat wait timeout during buildfarm runs?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Better way of dealing with pgstat wait timeout during buildfarm runs?  (Tomas Vondra <tv@fuzzy.cz>)
Re: Better way of dealing with pgstat wait timeout during buildfarm runs?  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
On 2014-12-25 14:36:42 -0500, Tom Lane wrote:
> I wonder whether when multiple processes are demanding statsfile updates,
> there's some misbehavior that causes them to suck CPU away from the stats
> collector and/or convince it that it doesn't need to write anything.
> There are odd things in the logs in some of these events.  For example in
> today's failure on hamster,
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&dt=2014-12-25%2016%3A00%3A07
> there are two client-visible wait-timeout warnings, one each in the
> gist and spgist tests.  But in the postmaster log we find these in
> fairly close succession:
> 
> [549c38ba.724d:2] WARNING:  pgstat wait timeout
> [549c39b1.73e7:10] WARNING:  pgstat wait timeout
> [549c38ba.724d:3] WARNING:  pgstat wait timeout
> 
> Correlating these with other log entries shows that the first and third
> are from the autovacuum launcher while the second is from the gist test
> session.  So the spgist failure failed to get logged, and in any case the
> big picture is that we had four timeout warnings occurring in a pretty
> short span of time, in a parallel test set that's not all that demanding
> (12 parallel tests, well below our max).  Not sure what to make of that.

My guess is that a checkpoint happened at that time. Maybe it'd be a
good idea to make pg_regress start postgres with log_checkpoints
enabled? My guess is that we'd find horrendous 'sync' times.

Michael: Could you perhaps turn log_checkpoints on in the config?

> BTW, I notice that in the current state of pgstat.c, all the logic for
> keeping track of request arrival times is dead code, because nothing is
> actually looking at DBWriteRequest.request_time.  This makes me think that
> somebody simplified away some logic we maybe should have kept.  But if
> we're going to leave it like this, we could replace the DBWriteRequest
> data structure with a simple OID list and save a fair amount of code.

That's indeed odd. Seems to have been lost when the statsfile was split
into multiple files. Alvaro, Tomas?

I wondered for a second whether the split could be responsible somehow,
but there's reports of that in older backbranches as well:
http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=mereswine&dt=2014-12-23%2019%3A17%3A41

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?
Next
From: Tom Lane
Date:
Subject: Some other odd buildfarm failures