Home > mailing lists

Re: Issue with logical replication slot during switchover - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: Issue with logical replication slot during switchover
Date	November 14, 2025 13:39:38
Msg-id	CAA4eK1KmLwYN_EkyW_W6qFcb2BD0qzGwEKM6cyCrDwSm9dHj_g@mail.gmail.com Whole thread Raw
In response to	Re: Issue with logical replication slot during switchover (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses	Re: Issue with logical replication slot during switchover
List	pgsql-hackers

Tree view

On Fri, Nov 14, 2025 at 11:40 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Nov 13, 2025 at 7:16 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Thu, Nov 13, 2025 at 6:39 PM Alexander Kukushkin <cyberdemn@gmail.com> wrote:
> > >
> > >
> > >
> > >> But the system can die/crash before shutdown.
> > >
> > >
> > > You mean it will not write WAL?
> > > When a logical replication slot is created we build a snapshot and also write to WAL:
> > > postgres=# select pg_current_wal_insert_lsn(); select pg_create_logical_replication_slot('foo', 'pgoutput');
selectpg_current_wal_insert_lsn(); 
> > >  pg_current_wal_insert_lsn
> > > ---------------------------
> > >  0/37F96F8
> > > (1 row)
> > >
> > >  pg_create_logical_replication_slot
> > > ------------------------------------
> > >  (foo,0/37F9730)
> > > (1 row)
> > >
> > >  pg_current_wal_insert_lsn
> > > ---------------------------
> > >  0/37F9730
> > > (1 row)
> > >
> > > Only after that slot is marked as persistent.
> > >
> >
> > There can be a scenario where a replication slot is dropped and
> > recreated, and its WAL is also replicated to the standby. However,
> > before the new slot state can be synchronized via slotsync, the
> > primary crashes and the standby is promoted. Later, the user manually
> > reconfigures the old primary to follow the newly promoted standby (no
> > pg-rewind in play). I was wondering whether in such a case, would it
> > be a good idea to overwrite the newly created slot on old primary with
> > promoted-standby's synced slot (old one) by default? Thoughts?
>
> I think it's an extremely rare or a mostly wrong operation that after
> failover (i.e., the old primary didn't shutdown gracefully) users have
> the old primar rejoin to the replication as the new standby without
> pg_rewind. I guess that pg_rewind should practically be used unless
> the primary server gracefully shutdowns (i.e., in switchover case). In
> failover cases, pg_rewind launches the server in single-user mode to
> run the crash recovery, advancing its LSN and cleaning all existing
> replication slots after rewinding the server. So I think that the
> reported issue doesn't happen in failover cases and we can focus on
> failover cases.
>

The point is quite fundamental, do you think we can sync to a
pre-existing slot with the same name and failover marked as true after
the first time the node joins a new primary? We don't provide any
switchover tools/utilities, so it doesn't appear straight-forward that
we can perform re-sync. If we would have a switchover tool, I think
one may have removed all existing slots before the old primary joins
the new primary because otherwise, there is always a chance that there
remain redundant slots which will prevent resource removal. Consider a
case where after switchover, the old primary decides to join a
different standby (new primary) than where slot-sync was earlier
happening. Now, it is possible that the old primary may have some
slots which should be removed.

--
With Regards,
Amit Kapila.

pgsql-hackers by date:

From: Masahiko Sawada
Date: 14 November 2025, 12:43:53
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From: Masahiko Sawada
Date: 14 November 2025, 13:44:25
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

Re: Issue with logical replication slot during switchover - Mailing list pgsql-hackers

Previous

Next