Re: standby recovery fails (tablespace related) (tentative patch and discussion) - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Date
Msg-id CA+hUKGJUyk_u43RcFWdS-txyoSY8tYLExyW+7=y7G9tEBO_MFg@mail.gmail.com
Whole thread Raw
In response to Re: standby recovery fails (tablespace related) (tentative patch and discussion)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: standby recovery fails (tablespace related) (tentative patch and discussion)
List pgsql-hackers
On Sun, Jul 31, 2022 at 2:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > WFM, pushed that way.
>
> Looks like conchuela is still intermittently unhappy.
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2022-07-30%2004%3A57%3A51

And here's one from CI that failed on Linux (this was a cfbot run with
an unrelated patch, parent commit b998196 so a few commits after "Fix
test instability"):

https://cirrus-ci.com/task/5282155000496128


https://api.cirrus-ci.com/v1/artifact/task/5282155000496128/log/src/test/recovery/tmp_check/log/033_replay_tsp_drops_primary1_WAL_LOG.log

It looks like this sequence is racy and we need to wait for more than
just "connection is made" before dropping the slot?

    $node_standby->start;

    # Make sure connection is made
    $node_primary->poll_query_until('postgres',
        'SELECT count(*) = 1 FROM pg_stat_replication');
    $node_primary->safe_psql('postgres', "SELECT
pg_drop_replication_slot('slot')");

Why not set the replication slot name so that the standby uses it
"properly", like in other tests?



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Next
From: Julien Rouhaud
Date:
Subject: Re: [PATCH] Add extra statistics to explain for Nested Loop