Home > mailing lists

Re: Better way of dealing with pgstat wait timeout during buildfarm runs? - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: Better way of dealing with pgstat wait timeout during buildfarm runs?
Date	December 25, 2014 21:28:39
Msg-id	549C817A.6010804@fuzzy.cz Whole thread Raw
In response to	Re: Better way of dealing with pgstat wait timeout during buildfarm runs? (Andres Freund <andres@2ndquadrant.com>)
Responses	Re: Better way of dealing with pgstat wait timeout during buildfarm runs? Re: Better way of dealing with pgstat wait timeout during buildfarm runs? Re: Better way of dealing with pgstat wait timeout during buildfarm runs?
List	pgsql-hackers

Tree view

On 25.12.2014 21:14, Andres Freund wrote:
> On 2014-12-25 14:36:42 -0500, Tom Lane wrote:
>
> My guess is that a checkpoint happened at that time. Maybe it'd be a 
> good idea to make pg_regress start postgres with log_checkpoints 
> enabled? My guess is that we'd find horrendous 'sync' times.
> 
> Michael: Could you perhaps turn log_checkpoints on in the config?

Logging timestamps (using log_line_prefux) would be also helpful.

> 
>> BTW, I notice that in the current state of pgstat.c, all the logic
>> for keeping track of request arrival times is dead code, because
>> nothing is actually looking at DBWriteRequest.request_time. This
>> makes me think that somebody simplified away some logic we maybe
>> should have kept. But if we're going to leave it like this, we
>> could replace the DBWriteRequest data structure with a simple OID
>> list and save a fair amount of code.
> 
> That's indeed odd. Seems to have been lost when the statsfile was
> split into multiple files. Alvaro, Tomas?

The goal was to keep the logic as close to the original as possible.
IIRC there were "pgstat wait timeout" issues before, and in most cases
the conclusion was that it's probably because of overloaded I/O.

But maybe there actually was another bug, and it's entirely possible
that the split introduced a new one, and that's what we're seeing now.
The strange thing is that the split happened ~2 years ago, which is
inconsistent with the sudden increase of this kind of issues. So maybe
something changed on that particular animal (a failing SD card causing
I/O stalls, perhaps)?

Anyway, I happen to have a spare Raspberry PI, so I'll try to reproduce
and analyze the issue locally. But that won't happen until January.

> I wondered for a second whether the split could be responsible
> somehow, but there's reports of that in older backbranches as well:
> http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=mereswine&dt=2014-12-23%2019%3A17%3A41

Tomas

pgsql-hackers by date:

From: Robert Haas
Date: 25 December 2014, 21:26:40
Subject: Re: Proposal: two new role attributes and/or capabilities?

From: Tom Lane
Date: 25 December 2014, 21:41:05
Subject: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

Re: Better way of dealing with pgstat wait timeout during buildfarm runs? - Mailing list pgsql-hackers

Previous

Next