Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From shveta malik
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id CAJpy0uDNdnZHh0WXQC4xH=3zaV3ePxAu3-1OqW14jgqeuE=Okg@mail.gmail.com
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Responses Re: Synchronizing slots from primary to standby
List pgsql-hackers
On Fri, Apr 5, 2024 at 9:22 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Thu, Apr 04, 2024 at 05:31:45PM +0530, shveta malik wrote:
> > On Thu, Apr 4, 2024 at 2:59 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > >
> > > Prior to commit 2ec005b, this check was okay, as we did not expect
> > > restart_lsn of the synced slot to be ahead of remote since we were
> > > directly copying the lsns. But now when we use 'advance' to do logical
> > > decoding on standby, there is a possibility that restart lsn of the
> > > synced slot is ahead of remote slot, if there are running txns records
> > > found after reaching consistent-point while consuming WALs from
> > > restart_lsn till confirmed_lsn. In such a case, slot-sync's advance
> > > may end up serializing snapshots and setting restart_lsn to the
> > > serialized snapshot point, ahead of remote one.
> > >
> > > Fix:
> > > The sanity check needs to be corrected. Attached a patch to address the issue.
> >
>
> Thanks for reporting, explaining the issue and providing a patch.
>
> Regarding the patch:
>
> 1 ===
>
> +        * Attempt to sync lsns and xmins only if remote slot is ahead of local
>
> s/lsns/LSNs/?
>
> 2 ===
>
> +                       if (slot->data.confirmed_flush != remote_slot->confirmed_lsn)
> +                               elog(LOG,
> +                                        "could not synchronize local slot \"%s\" LSN(%X/%X)"
> +                                        " to remote slot's LSN(%X/%X) ",
> +                                        remote_slot->name,
> +                                        LSN_FORMAT_ARGS(slot->data.confirmed_flush),
> +                                        LSN_FORMAT_ARGS(remote_slot->confirmed_lsn));
>
> I don't think that the message is correct here. Unless I am missing something
> there is nothing in the following code path that would prevent the slot to be
> sync during this cycle.

This is a sanity check,  I will put a comment to indicate the same. We
want to ensure if anything changes in future, we get correct logs to
indicate that.
If required, the LOG msg can be changed. Kindly suggest if you have
anything better in mind.

thanks
Shveta



pgsql-hackers by date:

Previous
From: Erik Wienhold
Date:
Subject: Re: IPC::Run::time[r|out] vs our TAP tests
Next
From: Thomas Munro
Date:
Subject: Re: Streaming read-ready sequential scan code