Thread: pgsql: Fix race in test of pg_switch_wal().

pgsql: Fix race in test of pg_switch_wal().

From
Noah Misch
Date:
Fix race in test of pg_switch_wal().

The test failed when something added WAL between pg_switch_wal() and
pg_current_wal_lsn(), seen on buildfarm members hornet and sungazer.
Fix v10, v9.6 and v9.5 by making this code mirror its v13+ counterpart.
v12 and v11 lack a counterpart.

Branch
------
REL9_5_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/6a7a5fce9a7421cc0e07341921787f55a814249b

Modified Files
--------------
src/test/recovery/t/020_archive_status.pl | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)


Re: pgsql: Fix race in test of pg_switch_wal().

From
Michael Paquier
Date:
On Mon, Sep 14, 2020 at 06:19:38AM +0000, Noah Misch wrote:
> Fix race in test of pg_switch_wal().
>
> The test failed when something added WAL between pg_switch_wal() and
> pg_current_wal_lsn(), seen on buildfarm members hornet and sungazer.
> Fix v10, v9.6 and v9.5 by making this code mirror its v13+ counterpart.
> v12 and v11 lack a counterpart.

Thanks Noah for that, I did not notice those buildfarm failures:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2020-07-24%2010%3A24%3A56

It is fine to ping me or just to begin a thread if there is a problem,
I would have taken care of it.

Thanks,
--
Michael

Attachment

Re: pgsql: Fix race in test of pg_switch_wal().

From
Noah Misch
Date:
On Mon, Sep 14, 2020 at 05:27:07PM +0900, Michael Paquier wrote:
> On Mon, Sep 14, 2020 at 06:19:38AM +0000, Noah Misch wrote:
> > Fix race in test of pg_switch_wal().
> > 
> > The test failed when something added WAL between pg_switch_wal() and
> > pg_current_wal_lsn(), seen on buildfarm members hornet and sungazer.
> > Fix v10, v9.6 and v9.5 by making this code mirror its v13+ counterpart.
> > v12 and v11 lack a counterpart.
> 
> Thanks Noah for that, I did not notice those buildfarm failures:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2020-07-24%2010%3A24%3A56
> 
> It is fine to ping me or just to begin a thread if there is a problem,
> I would have taken care of it.

There's a new 020_archive_status.pl failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2020-10-05%2023%3A02%3A17

Would you like to diagnose/fix that one?



Re: pgsql: Fix race in test of pg_switch_wal().

From
Michael Paquier
Date:
On Tue, Oct 06, 2020 at 07:03:27PM -0700, Noah Misch wrote:
> There's a new 020_archive_status.pl failure:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2020-10-05%2023%3A02%3A17
>
> Would you like to diagnose/fix that one?

Wow, thanks.  This does not looks like an issue coming directly from
the test though:
2020-10-06 00:20:46.786 UTC [20906622:8] LOG:  restored log file "000000010000000000000003" from archive
2020-10-06 00:20:46.803 UTC [10748670:1] ERROR:  could not open file "pg_xlog/000000010000000000000003": No such file
ordirectory 
2020-10-06 00:20:46.880 UTC [21496712:4] psql ERROR:  checkpoint request failed
2020-10-06 00:20:46.880 UTC [21496712:5] psql HINT:  Consult recent messages in the server log for details.
[...]
error running SQL: 'psql:<stdin>:1: ERROR:  checkpoint request failed
HINT:  Consult recent messages in the server log for details.'

And it looks like a race condition between the checkpointer and the
startup process.  This failure involves the first checkpoint triggered
in $standby2 after it gets created, with this standby reaching a
consistent point before triggering a manual restartpoint.  That's a
bit strange though, the startup process considers that this segment is
restored, but the checkpointer complains that it does not actually
exist, so that's in contradiction with what the startup process tells
us.  :/
--
Michael

Attachment