Re: long-standing data loss bug in initial sync of logical replication - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: long-standing data loss bug in initial sync of logical replication
Date
Msg-id CAD21AoBcOkeBgx_c9XXWgjw1YW8KUc9dNtg8XaPY2Q2Vwq7WgQ@mail.gmail.com
Whole thread Raw
In response to RE: long-standing data loss bug in initial sync of logical replication  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Responses Re: long-standing data loss bug in initial sync of logical replication
List pgsql-hackers
On Thu, Feb 27, 2025 at 12:14 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Monday, February 24, 2025 5:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Dec 11, 2024 at 12:37 PM Masahiko Sawada
> > <sawada.mshk@gmail.com> wrote:
> > >
> > > I confirmed that the proposed patch fixes these issues. I have one
> > > question about the patch:
> > >
> > > In the main loop in SnapBuildDistributeSnapshotAndInval(), we have the
> > > following code:
> > >
> > >        /*
> > >         * If we don't have a base snapshot yet, there are no changes in this
> > >         * transaction which in turn implies we don't yet need a snapshot at
> > >         * all. We'll add a snapshot when the first change gets queued.
> > >         *
> > >         * NB: This works correctly even for subtransactions because
> > >         * ReorderBufferAssignChild() takes care to transfer the base
> > snapshot
> > >         * to the top-level transaction, and while iterating the changequeue
> > >         * we'll get the change from the subtxn.
> > >         */
> > >        if (!ReorderBufferXidHasBaseSnapshot(builder->reorder, txn->xid))
> > >            continue;
> > >
> > > Is there any case where we need to distribute inval messages to
> > > transactions that don't have the base snapshot yet but eventually need
> > > the inval messages?
> > >
> >
> > Good point. It is mentioned that for snapshots: "We'll add a snapshot
> > when the first change gets queued.". I think we achieve this via
> > builder->committed.xip array such that when we set a base snapshot for
> > a transaction, we use that array to form a snapshot. However, I don't
> > see any such consideration for invalidations. Now, we could either
> > always add invalidations to xacts that don't have base_snapshot yet or
> > have a mechanism similar committed.xid array. But it is better to
> > first reproduce the problem.
>
> I think distributing invalidations to a transaction that has not yet built a
> base snapshot is un-necessary. This is because, during the process of building
> its base snapshot, such a transaction will have already recorded the XID of the
> transaction that altered the publication information into its array of
> committed XIDs. Consequently, it will reflect the latest changes in the catalog
> from the beginning. In the context of logical decoding, this scenario is
> analogous to decoding a new transaction initiated after the catalog-change
> transaction has been committed.
>
> The original issue arises because the catalog cache was constructed using an
> outdated snapshot that could not reflect the latest catalog changes. However,
> this is not a problem in cases without a base snapshot. Since the existing
> catalog cache should have been invalidated upon decoding the committed
> catalog-change transaction, the subsequent transactions will construct a new
> cache with the latest snapshot.

I've also concluded it's not necessary but the reason and analysis
might be somewhat different. IIUC in the original issue (looking at
Andres's reproducer[1]), the fact that when replaying a
non-catalog-change transaction, the walsender constructed the snapshot
that doesn't reflect the catalog change is fine because the first
change of that transaction was made before the catalog change. The
problem is that the walsender process absorbed the invalidation
message when replaying the change that happened before the catalog
change, and ended up keeping replaying the subsequent changes with
that snapshot. That is why we concluded that we need to distribute the
invalidation messages to concurrently decoded transactions so that we
can invalidate the cache again at that point. As the comment
mentioned, the base snapshot is set before queuing any changes, so if
the transaction doesn't have the base snapshot yet, there must be no
queued change that happened before the catalog change. The
transactions that initiated after the catalog change don't have this
issue.

Regards,

[1] https://www.postgresql.org/message-id/20231118025445.crhaeeuvoe2g5dv6%40awork3.anarazel.de

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Ryo Kanbayashi
Date:
Subject: Re: [PATCH] Add regression tests of ecpg command notice (error / warning)
Next
From: Peter Geoghegan
Date:
Subject: Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)