Re: Adding a TAP test checking data consistency on standby withminRecoveryPoint - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Adding a TAP test checking data consistency on standby withminRecoveryPoint
Date
Msg-id 20190324124758.GC2558@paquier.xyz
Whole thread Raw
In response to Re: Adding a TAP test checking data consistency on standby with minRecoveryPoint  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Adding a TAP test checking data consistency on standby withminRecoveryPoint
List pgsql-hackers
On Sat, Mar 23, 2019 at 04:08:42PM -0700, Peter Geoghegan wrote:
> Seems like there might be a problem either caused by or detected by
> 016_min_consistency.pl on piculet:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet&dt=2019-03-23%2022%3A28%3A59

Interesting.  Based on what regress_log_016_min_consistency tells,
the test attempts to stop the standby in fast mode but it fails
because of a timeout:
### Stopping node "standby" using mode fast
[...]
pg_ctl: server does not shut down
Bail out!  system pg_ctl failed

There is only one place in the tests where that happens, and before
attempting to stop the standby we issue a checkpoint on it with its
primary killed:
# Issue a restart point on the standby now, which makes the checkpointer
# update minRecoveryPoint.
$standby->safe_psql('postgres', 'CHECKPOINT;');
[...]
$primary->stop('immediate');
$standby->stop('fast');

The failure is a bit weird, as I would expect all those three actions
to be sequential.  piculet is the only failure happening on the
buildfarm and it uses --disable-atomics, so I am wondering if that is
related and if 0dfe3d0 is part of that.  With a primary/standby set,
it could be possible to test that scenario pretty easily.  I'll give
it a shot.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: current_logfiles not following group access and instead followslog_file_mode permissions
Next
From: Amit Kapila
Date:
Subject: Re: Error message inconsistency