Home > mailing lists

Re: Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary - Mailing list pgsql-hackers

From	Masahiko Sawada
Subject	Re: Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary
Date	February 11 22:46:07
Msg-id	CAD21AoA1iVkgjv8OsdNDQ01wsWKgRqvfKmb5rBQqDUjeo7kQvw@mail.gmail.com Whole thread Raw
In response to	Re: Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary
List	pgsql-hackers

Tree view

On Mon, Feb 10, 2025 at 4:11 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Feb 7, 2025 at 11:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Feb 6, 2025 at 9:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > True but it sounds like there is more harm than benefit. It seems
> > > reasonable to do this on HEAD. Shall we think of doing it differently
> > > in HEAD and back-branches or let's restrict as your v2 patch is doing
> > > and if by any chance users complain about it we can try to fix it in
> > > another way?
> >
> > While the latter would be good for maintainability it would not be
> > advisable to change the behavior back and forth in back branches. I'd
> > like to make it clear what point of v2 patch (restring server startup
> > for pre-existing logical slots) is preferable over the first patch
> > (removing InHotStandby condition)?
> >
>
> Sorry, I didn't understand your question.

I wanted to mean that there is another direction to fix this issue by
applying the v1 patch to both HEAD and back-branches. It might be
worth considering this option too if the v1 patch's approach is
reasonable.

>
> > If the approach of the first patch
> > is reasonably good, we could take it both in HEAD and backbranches.
> >
> > >
> > > >
> > > > But, similarly, is
> > > > there any concerns of my proposed fix?
> > > >
> > >
> > > - if (InRecovery && InHotStandby &&
> > > + if (InRecovery &&
> > >   xlrec.wal_level < WAL_LEVEL_LOGICAL &&
> > >
> > > Won't the InRecovery be true even for crash-recovery as well? I think
> > > we need to check standby mode here.
> >
> > I think if we check standby mode here (i.e., adding StandbyMode), we
> > would not be able to invalidate logical slots during archive recovery.
> > That is, in the following scenario, we would hit the same assertion
> > failure:
> >
> > 1. setup the primary and the standby servers (with hot_standby=on).
> > 2. create a logical slot on the standby.
> > 3. on standby, set archive recovery settings (setting restore_command,
> > removing standby.signal, and creating recovery.signal etc.).
> > 4. on primary, lower wal_level to replica and restart.
> > 5. start standby (in archive recovery mode).
> > 6. execute the logical decoding after the archive recovery completes.
> >
> > And, you're right that InRecovery is true even for crash-recovery. But
> > is there any case where we invalidate logical slots due to
> > XLOG_PARAMETER_CHANGE during a crash recovery?
> >
>
> I am not sure but what about the cases where the server crashes
> immediately after logging this WAL record and the user reduced the
> wal_lvel before the next restart? I think in such a case we may ERROR
> out while restoring slots from the disk which will be before we will
> try to replay the WAL.

I think so too.

> However, I am not sure if it is a good idea to
> rely on that assumption.

Agreed, I'm fine with leaving InRecovery in this condition. I think
the point is whether we should add StandbyMode to the condition or
not. I think if we do that, we would end up with the same error in the
above scenario I described. So does the following condition make
sense?

        if (InRecovery &&
            xlrec.wal_level < WAL_LEVEL_LOGICAL &&
            wal_level >= WAL_LEVEL_LOGICAL)
            InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_LEVEL,
                                               0, InvalidOid,
                                               InvalidTransactionId);

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

pgsql-hackers by date:

From: Alena Rybakina
Date: 11 February, 22:43:30
Subject: Re: explain analyze rows=%.0f

From: Jeff Davis
Date: 11 February, 22:48:03
Subject: Re: injection points for hash aggregation

Re: Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary - Mailing list pgsql-hackers

Previous

Next