Home > mailing lists

Re: long-standing data loss bug in initial sync of logical replication - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: long-standing data loss bug in initial sync of logical replication
Date	February 28 07:15:17
Msg-id	CAA4eK1+ubqorurkbWOtpbEd2Vc8QZ_vmUHY3mFgL1fp_1+wK=g@mail.gmail.com Whole thread Raw
In response to	Re: long-standing data loss bug in initial sync of logical replication (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses	Re: long-standing data loss bug in initial sync of logical replication
List	pgsql-hackers

Tree view

On Fri, Feb 28, 2025 at 6:15 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Feb 27, 2025 at 12:14 AM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > On Monday, February 24, 2025 5:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Dec 11, 2024 at 12:37 PM Masahiko Sawada
> > > <sawada.mshk@gmail.com> wrote:
> > > >
> > > > I confirmed that the proposed patch fixes these issues. I have one
> > > > question about the patch:
> > > >
> > > > In the main loop in SnapBuildDistributeSnapshotAndInval(), we have the
> > > > following code:
> > > >
> > > >        /*
> > > >         * If we don't have a base snapshot yet, there are no changes in this
> > > >         * transaction which in turn implies we don't yet need a snapshot at
> > > >         * all. We'll add a snapshot when the first change gets queued.
> > > >         *
> > > >         * NB: This works correctly even for subtransactions because
> > > >         * ReorderBufferAssignChild() takes care to transfer the base
> > > snapshot
> > > >         * to the top-level transaction, and while iterating the changequeue
> > > >         * we'll get the change from the subtxn.
> > > >         */
> > > >        if (!ReorderBufferXidHasBaseSnapshot(builder->reorder, txn->xid))
> > > >            continue;
> > > >
> > > > Is there any case where we need to distribute inval messages to
> > > > transactions that don't have the base snapshot yet but eventually need
> > > > the inval messages?
> > > >
> > >
> > > Good point. It is mentioned that for snapshots: "We'll add a snapshot
> > > when the first change gets queued.". I think we achieve this via
> > > builder->committed.xip array such that when we set a base snapshot for
> > > a transaction, we use that array to form a snapshot. However, I don't
> > > see any such consideration for invalidations. Now, we could either
> > > always add invalidations to xacts that don't have base_snapshot yet or
> > > have a mechanism similar committed.xid array. But it is better to
> > > first reproduce the problem.
> >
> > I think distributing invalidations to a transaction that has not yet built a
> > base snapshot is un-necessary. This is because, during the process of building
> > its base snapshot, such a transaction will have already recorded the XID of the
> > transaction that altered the publication information into its array of
> > committed XIDs. Consequently, it will reflect the latest changes in the catalog
> > from the beginning. In the context of logical decoding, this scenario is
> > analogous to decoding a new transaction initiated after the catalog-change
> > transaction has been committed.
> >
> > The original issue arises because the catalog cache was constructed using an
> > outdated snapshot that could not reflect the latest catalog changes. However,
> > this is not a problem in cases without a base snapshot. Since the existing
> > catalog cache should have been invalidated upon decoding the committed
> > catalog-change transaction, the subsequent transactions will construct a new
> > cache with the latest snapshot.
>
> I've also concluded it's not necessary but the reason and analysis
> might be somewhat different. IIUC in the original issue (looking at
> Andres's reproducer[1]), the fact that when replaying a
> non-catalog-change transaction, the walsender constructed the snapshot
> that doesn't reflect the catalog change is fine because the first
> change of that transaction was made before the catalog change. The
> problem is that the walsender process absorbed the invalidation
> message when replaying the change that happened before the catalog
> change, and ended up keeping replaying the subsequent changes with
> that snapshot. That is why we concluded that we need to distribute the
> invalidation messages to concurrently decoded transactions so that we
> can invalidate the cache again at that point. As the comment
> mentioned, the base snapshot is set before queuing any changes, so if
> the transaction doesn't have the base snapshot yet, there must be no
> queued change that happened before the catalog change. The
> transactions that initiated after the catalog change don't have this
> issue.
>

I think both of you are saying the same thing with slightly different
words. Hou-San's explanation goes into more detail at the code level,
and you have said the same thing with a slightly higher-level view.
Additionally, for streaming transactions where we would have already
sent one or more streams, we don't need anything special since they
behave similarly to a transaction having a base snapshot because we
save the snapshot after sending each stream.

--
With Regards,
Amit Kapila.

pgsql-hackers by date:

From: David Rowley
Date: 28 February, 07:09:13
Subject: Re: Reduce TupleHashEntryData struct size by half

From: Gurjeet Singh
Date: 28 February, 07:29:16
Subject: Re: Disabling vacuum truncate for autovacuum

Re: long-standing data loss bug in initial sync of logical replication - Mailing list pgsql-hackers

Previous

Next