Re: [HACKERS] Continuous buildfarm failures on hamster with bin-check - Mailing list pgsql-hackers

From Noah Misch
Subject Re: [HACKERS] Continuous buildfarm failures on hamster with bin-check
Date
Msg-id 20170604211229.GA1528911@rfd.leadboat.com
Whole thread Raw
In response to Re: [HACKERS] Continuous buildfarm failures on hamster with bin-check  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On Tue, Apr 18, 2017 at 09:59:26PM +0900, Michael Paquier wrote:
> On Tue, Apr 18, 2017 at 9:35 PM, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:
> > On 04/18/2017 08:23 AM, Michael Paquier wrote:
> >> Increasing wal_sender_timeout and wal_receiver_timeout can help in
> >> reducing the failures seen.
> >
> > OK, but you're only talking about a handful of these, right?
> 
> Yup, that would be one solution but that's not attacking the problem
> at its root.
> 
> > Lets's say we have a bunch of possible environment settings with names
> > that all begin with "PG_TAP_" PostgresNode.pm could check for the
> > existence of these and take action accordingly, and you could set them
> > on a buildfarm animal in the config file, or for interactive use in your
> > .profile.
> 
> That's the point I am trying to make upthread: slow buildfarm animals
> should have minimal impact on core code modifications. We could for
> example have one environment variable that lists all the parameters to
> modify in a single string and appends them at the end of
> postgresql.conf. But honestly I don't think that this is necessary if
> there is only one variable able to define a base directory for
> temporary statistics as the real bottleneck comes from there at least
> in the case of hamster. When initializing a node via PostgresNode.pm,
> we would just check for this variable, and the init() routine just
> creates a temporary folder in it, setting up temp_stats_path in
> postgresql.conf.

Each of the above approaches has fairly low impact on the code, so we should
use other criteria to choose.  I'd welcome a feature for augmenting every
postgresql.conf of every test suite (a generalization of "pg_regress
--temp-config", which has proven its value).  I can envision using it with
force_parallel_mode, default_transaction_isolation, log_*, wal_*_timeout,
autovacuum_naptime, and others.

Even for hamster, I'm skeptical that changing stats_temp_directory would
suffice.  Every hamster BinInstallCheck failure since 2017-02-13 had a "LOG:
terminating walsender process due to replication timeout".  Most, but not all,
of those replication timeouts followed a "LOG:  using stale statistics instead
of current ones because stats collector is not responding".  For the remaining
minority, I expect to eventually need wal_sender_timeout.  Example:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&dt=2017-02-24%2016%3A00%3A06



pgsql-hackers by date:

Previous
From: Beena Emerson
Date:
Subject: Re: [HACKERS] Default Partition for Range
Next
From: Mark Dilger
Date:
Subject: Re: [HACKERS] PostgreSQL 10 changes in exclusion constraints - did something change? CASE WHEN behavior oddity