Re: Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary
Date
Msg-id CAA4eK1Lpm=2Eh73Z6_niN3KN1Js661rrGCisEmqRAZFncV1g3Q@mail.gmail.com
Whole thread Raw
In response to Re: Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary
List pgsql-hackers
On Fri, Feb 7, 2025 at 11:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Feb 6, 2025 at 9:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > True but it sounds like there is more harm than benefit. It seems
> > reasonable to do this on HEAD. Shall we think of doing it differently
> > in HEAD and back-branches or let's restrict as your v2 patch is doing
> > and if by any chance users complain about it we can try to fix it in
> > another way?
>
> While the latter would be good for maintainability it would not be
> advisable to change the behavior back and forth in back branches. I'd
> like to make it clear what point of v2 patch (restring server startup
> for pre-existing logical slots) is preferable over the first patch
> (removing InHotStandby condition)?
>

Sorry, I didn't understand your question.

> If the approach of the first patch
> is reasonably good, we could take it both in HEAD and backbranches.
>
> >
> > >
> > > But, similarly, is
> > > there any concerns of my proposed fix?
> > >
> >
> > - if (InRecovery && InHotStandby &&
> > + if (InRecovery &&
> >   xlrec.wal_level < WAL_LEVEL_LOGICAL &&
> >
> > Won't the InRecovery be true even for crash-recovery as well? I think
> > we need to check standby mode here.
>
> I think if we check standby mode here (i.e., adding StandbyMode), we
> would not be able to invalidate logical slots during archive recovery.
> That is, in the following scenario, we would hit the same assertion
> failure:
>
> 1. setup the primary and the standby servers (with hot_standby=on).
> 2. create a logical slot on the standby.
> 3. on standby, set archive recovery settings (setting restore_command,
> removing standby.signal, and creating recovery.signal etc.).
> 4. on primary, lower wal_level to replica and restart.
> 5. start standby (in archive recovery mode).
> 6. execute the logical decoding after the archive recovery completes.
>
> And, you're right that InRecovery is true even for crash-recovery. But
> is there any case where we invalidate logical slots due to
> XLOG_PARAMETER_CHANGE during a crash recovery?
>

I am not sure but what about the cases where the server crashes
immediately after logging this WAL record and the user reduced the
wal_lvel before the next restart? I think in such a case we may ERROR
out while restoring slots from the disk which will be before we will
try to replay the WAL. However, I am not sure if it is a good idea to
rely on that assumption.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: Introduce XID age and inactive timeout based replication slot invalidation
Next
From: Dilip Kumar
Date:
Subject: Re: Conflict detection for update_deleted in logical replication