Re: Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary
Date
Msg-id CAD21AoCKuUHWPCLEE_y92Gunr1Z1hjFdutgJdQTSHVej4iPiBA@mail.gmail.com
Whole thread Raw
In response to Re: Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Thu, Feb 6, 2025 at 9:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Feb 7, 2025 at 12:11 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Feb 5, 2025 at 10:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Thu, Feb 6, 2025 at 12:30 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > I've updated the patch accordingly.
> > > >
> > >
> > > Today, again thinking about the proposed fix, I was wondering about
> > > the following case. Say, on hot_standby, the user created a logical
> > > slot, then shut down hot_standby, turn off the hot_standby flag, and
> > > restart standby. This is allowed today but not after the patch. It is
> > > possible that the user can promote a non-hot_standby server after its
> > > restart in the previous step and can reuse the logical slot. So, is it
> > > okay to change this behavior? If not, then we may need to go back to
> > > the first fix proposed by you in this thread.
> >
> > While non hot standby, no one can advance logical slots, meaning the
> > standby server ends up accumulating WAL records and also causing the
> > bloat on the primary if hot_standby_feedback is enabled. Given this
> > outcome, I though that the current patch approach would be reasonable.
> >
>
> Agreed as I also can't think of any case where the user would require
> a logical slot on a non-hotstandby server.
>
> > On the other hand, as you pointed out, it would not be good in terms
> > of compatibility as we end up limiting potentially existing use cases.
> > And it would be prefectly fine if users promote the standby server
> > shortly after starting up with hot_standby = off.
>
> True but it sounds like there is more harm than benefit. It seems
> reasonable to do this on HEAD. Shall we think of doing it differently
> in HEAD and back-branches or let's restrict as your v2 patch is doing
> and if by any chance users complain about it we can try to fix it in
> another way?

While the latter would be good for maintainability it would not be
advisable to change the behavior back and forth in back branches. I'd
like to make it clear what point of v2 patch (restring server startup
for pre-existing logical slots) is preferable over the first patch
(removing InHotStandby condition)? If the approach of the first patch
is reasonably good, we could take it both in HEAD and backbranches.

>
> >
> > But, similarly, is
> > there any concerns of my proposed fix?
> >
>
> - if (InRecovery && InHotStandby &&
> + if (InRecovery &&
>   xlrec.wal_level < WAL_LEVEL_LOGICAL &&
>
> Won't the InRecovery be true even for crash-recovery as well? I think
> we need to check standby mode here.

I think if we check standby mode here (i.e., adding StandbyMode), we
would not be able to invalidate logical slots during archive recovery.
That is, in the following scenario, we would hit the same assertion
failure:

1. setup the primary and the standby servers (with hot_standby=on).
2. create a logical slot on the standby.
3. on standby, set archive recovery settings (setting restore_command,
removing standby.signal, and creating recovery.signal etc.).
4. on primary, lower wal_level to replica and restart.
5. start standby (in archive recovery mode).
6. execute the logical decoding after the archive recovery completes.

And, you're right that InRecovery is true even for crash-recovery. But
is there any case where we invalidate logical slots due to
XLOG_PARAMETER_CHANGE during a crash recovery? Also, I guess
InRecovery is always true when we call xlog_redo(). Do we need to
check it in the first place?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Fix for Extra Parenthesis in pgbench progress message
Next
From: Nathan Bossart
Date:
Subject: Re: Fix for Extra Parenthesis in pgbench progress message