Re: Better way of dealing with pgstat wait timeout during buildfarm runs? - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Better way of dealing with pgstat wait timeout during buildfarm runs?
Date
Msg-id 549C817A.6010804@fuzzy.cz
Whole thread Raw
In response to Re: Better way of dealing with pgstat wait timeout during buildfarm runs?  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Better way of dealing with pgstat wait timeout during buildfarm runs?  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Better way of dealing with pgstat wait timeout during buildfarm runs?  (Michael Paquier <michael.paquier@gmail.com>)
Re: Better way of dealing with pgstat wait timeout during buildfarm runs?  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On 25.12.2014 21:14, Andres Freund wrote:
> On 2014-12-25 14:36:42 -0500, Tom Lane wrote:
>
> My guess is that a checkpoint happened at that time. Maybe it'd be a 
> good idea to make pg_regress start postgres with log_checkpoints 
> enabled? My guess is that we'd find horrendous 'sync' times.
> 
> Michael: Could you perhaps turn log_checkpoints on in the config?

Logging timestamps (using log_line_prefux) would be also helpful.

> 
>> BTW, I notice that in the current state of pgstat.c, all the logic
>> for keeping track of request arrival times is dead code, because
>> nothing is actually looking at DBWriteRequest.request_time. This
>> makes me think that somebody simplified away some logic we maybe
>> should have kept. But if we're going to leave it like this, we
>> could replace the DBWriteRequest data structure with a simple OID
>> list and save a fair amount of code.
> 
> That's indeed odd. Seems to have been lost when the statsfile was
> split into multiple files. Alvaro, Tomas?

The goal was to keep the logic as close to the original as possible.
IIRC there were "pgstat wait timeout" issues before, and in most cases
the conclusion was that it's probably because of overloaded I/O.

But maybe there actually was another bug, and it's entirely possible
that the split introduced a new one, and that's what we're seeing now.
The strange thing is that the split happened ~2 years ago, which is
inconsistent with the sudden increase of this kind of issues. So maybe
something changed on that particular animal (a failing SD card causing
I/O stalls, perhaps)?

Anyway, I happen to have a spare Raspberry PI, so I'll try to reproduce
and analyze the issue locally. But that won't happen until January.

> I wondered for a second whether the split could be responsible
> somehow, but there's reports of that in older backbranches as well:
> http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=mereswine&dt=2014-12-23%2019%3A17%3A41


Tomas



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Proposal: two new role attributes and/or capabilities?
Next
From: Tom Lane
Date:
Subject: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?