Re: Loss of replication after simple misconfiguration - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: Loss of replication after simple misconfiguration
Date
Msg-id 20200410041434.GU1606@paquier.xyz
Whole thread Raw
In response to Re: Loss of replication after simple misconfiguration  (Victor Yegorov <vyegorov@gmail.com>)
Responses Re: Loss of replication after simple misconfiguration
Re: Loss of replication after simple misconfiguration
List pgsql-bugs
On Thu, Apr 09, 2020 at 07:48:17PM +0300, Victor Yegorov wrote:
> We've hit similar issue last week, but on 11.5 — we
> had track_commit_timestamp enabled on master after switchover,
> replica failed to start.
>
> It might be, that fix was here:
> https://git.postgresql.org/pg/commitdiff/180feb8c7e
> (For 9.5 branch it is: https://git.postgresql.org/pg/commitdiff/69a5686368)
>
> We're not in the position to test it again, though…

Hmm.  We have a gap in tests here as we don't have any tests stressing
switchovers when it comes to track_commit_timestamps.  Anyway, could
you confirm that I got the problem right?  Here is the flow I am getting
from the information of upthread, roughly:
1) Primary/standby cluster, both using max_worker_processes = 8, and
track_commit_timestamp = off.
2) In order to begin the switchover, first stop cleanly the primary.
3) Update configuration of the standby as follows, promote it and
restart it:
track_commit_timestamp = on
max_worker_processes = 50
4) Enable streaming on the old primary to make it a standby, starting
it fails because of the unmatching setting for max_worker_processes.
5) Re-adjust max_worker_processes correctly on the new standby, start
it.  Then this startup should fail at the lookup of pg_commit_ts/.

I have been able to write a TAP test to reproduce this exact scenario,
though it succeeds for me (it could be a good idea to add some
coverage for that actually..).  Perhaps I am missing a step though?
For example, perhaps the original setting was track_commit_timestamp =
on, then it got disabled once?
--
Michael

Attachment

pgsql-bugs by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: [BUG] non archived WAL removed during production crash recovery
Next
From: Michael Paquier
Date:
Subject: Re: Loss of replication after simple misconfiguration