Re: Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary
Date
Msg-id CAD21AoAABk=EhwMsyQLhbWTOxX1ereM6KpN3ey=1rx05kZsJYA@mail.gmail.com
Whole thread Raw
In response to Re: Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary
List pgsql-hackers
On Sun, Feb 2, 2025 at 8:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jan 24, 2025 at 4:05 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > When a standby replays a XLOG_PARAMETER_CHANGE record that lowers
> > wal_level from logical, we invalidate all logical slots only when the
> > standby is in hot standby mode:
> >
> > if (InRecovery && InHotStandby &&
> >     xlrec.wal_level < WAL_LEVEL_LOGICAL &&
> >     wal_level >= WAL_LEVEL_LOGICAL)
> >     InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_LEVEL,
> >                                        0, InvalidOid,
> >                                        InvalidTransactionId);
> >
> > However, it's possible that this record is replayed when not in hot
> > standby mode and the slot is used after the promotion. In this case,
> > the following Assert in xlog_decode() fails:
> >
> > /*
> >  * This can occur only on a standby, as a primary would
> >  * not allow to restart after changing wal_level < logical
> >  * if there is pre-existing logical slot.
> >  */
>
> Shouldn't we do similar to what this comment indicates on standby? We
> can disallow to start the server as standby, if the hot_standby is off
> and there is a pre-existing logical slot.

It seems like a better idea. I thought we could pass StandbyMode to
StartupReplicationSlots() and check if there is a pre-existing logical
slot, but it would break the ABI compatibility. It might not be a
problem in practice as StartupReplicationSlots() is normally used only
by the startup process. But if we want to avoid that we can introduce
a new function for that.

>
> > Assert(RecoveryInProgress());
> > ereport(ERROR,
> >         (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> >          errmsg("logical decoding on standby requires \"wal_level\" >=
> > \"logical\" on the primary")));
> >
> > Here is brief steps to reproduce this issue:
> >
> > 1. setup the primary and the standby servers (with hot_standby=on).
> > 2. create a logical slot on the standby.
> > 3. on standby, set hot_standby=off and restart.
> > 4. on primary, lower wal_level to replica and restart.
> > 5. promote the standby and execute the logical decoding.
> >
> > I've attached a small patch to fix this issue. With the patch, we
> > invalidate all logical slots even when not in hot standby, that is,
> > the patch just removes InHotStandby from the condition. Thoughts?
> >
>
> This can fix the scenario you described but I am not sure it is a good
> idea. We can consider the fix on these lines if you don't like the
> above suggestion.

Agreed. If your suggestion doesn't work, let's go back to this idea.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Bertrand Drouvot
Date:
Subject: Re: per backend WAL statistics
Next
From: Shlok Kyal
Date:
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation