Re: standby recovery fails (tablespace related) (tentative patch and discussion) - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Date
Msg-id CA+hUKGL8BaUbdXT1OFO1rFWxtbN-RrJOztJpgpO1u74FQe5R-w@mail.gmail.com
Whole thread Raw
In response to Re: standby recovery fails (tablespace related) (tentative patch and discussion)  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
On Sun, Jul 31, 2022 at 3:46 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Sun, Jul 31, 2022 at 2:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > > WFM, pushed that way.
> >
> > Looks like conchuela is still intermittently unhappy.
> >
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2022-07-30%2004%3A57%3A51
>
> And here's one from CI that failed on Linux (this was a cfbot run with
> an unrelated patch, parent commit b998196 so a few commits after "Fix
> test instability"):
>
> https://cirrus-ci.com/task/5282155000496128
>
>
https://api.cirrus-ci.com/v1/artifact/task/5282155000496128/log/src/test/recovery/tmp_check/log/033_replay_tsp_drops_primary1_WAL_LOG.log
>
> It looks like this sequence is racy and we need to wait for more than
> just "connection is made" before dropping the slot?
>
>     $node_standby->start;
>
>     # Make sure connection is made
>     $node_primary->poll_query_until('postgres',
>         'SELECT count(*) = 1 FROM pg_stat_replication');
>     $node_primary->safe_psql('postgres', "SELECT
> pg_drop_replication_slot('slot')");
>
> Why not set the replication slot name so that the standby uses it
> "properly", like in other tests?

Or to keep doing it this way, does that pg_stat_replication query need
a WHERE clause looking at the state?



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg15b2: large objects lost on upgrade
Next
From: "Jonathan S. Katz"
Date:
Subject: Re: pg15b2: large objects lost on upgrade