Re: long-standing data loss bug in initial sync of logical replication - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: long-standing data loss bug in initial sync of logical replication |
Date | |
Msg-id | CAD21AoBcOkeBgx_c9XXWgjw1YW8KUc9dNtg8XaPY2Q2Vwq7WgQ@mail.gmail.com Whole thread Raw |
In response to | RE: long-standing data loss bug in initial sync of logical replication ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) |
Responses |
Re: long-standing data loss bug in initial sync of logical replication
|
List | pgsql-hackers |
On Thu, Feb 27, 2025 at 12:14 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > On Monday, February 24, 2025 5:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Dec 11, 2024 at 12:37 PM Masahiko Sawada > > <sawada.mshk@gmail.com> wrote: > > > > > > I confirmed that the proposed patch fixes these issues. I have one > > > question about the patch: > > > > > > In the main loop in SnapBuildDistributeSnapshotAndInval(), we have the > > > following code: > > > > > > /* > > > * If we don't have a base snapshot yet, there are no changes in this > > > * transaction which in turn implies we don't yet need a snapshot at > > > * all. We'll add a snapshot when the first change gets queued. > > > * > > > * NB: This works correctly even for subtransactions because > > > * ReorderBufferAssignChild() takes care to transfer the base > > snapshot > > > * to the top-level transaction, and while iterating the changequeue > > > * we'll get the change from the subtxn. > > > */ > > > if (!ReorderBufferXidHasBaseSnapshot(builder->reorder, txn->xid)) > > > continue; > > > > > > Is there any case where we need to distribute inval messages to > > > transactions that don't have the base snapshot yet but eventually need > > > the inval messages? > > > > > > > Good point. It is mentioned that for snapshots: "We'll add a snapshot > > when the first change gets queued.". I think we achieve this via > > builder->committed.xip array such that when we set a base snapshot for > > a transaction, we use that array to form a snapshot. However, I don't > > see any such consideration for invalidations. Now, we could either > > always add invalidations to xacts that don't have base_snapshot yet or > > have a mechanism similar committed.xid array. But it is better to > > first reproduce the problem. > > I think distributing invalidations to a transaction that has not yet built a > base snapshot is un-necessary. This is because, during the process of building > its base snapshot, such a transaction will have already recorded the XID of the > transaction that altered the publication information into its array of > committed XIDs. Consequently, it will reflect the latest changes in the catalog > from the beginning. In the context of logical decoding, this scenario is > analogous to decoding a new transaction initiated after the catalog-change > transaction has been committed. > > The original issue arises because the catalog cache was constructed using an > outdated snapshot that could not reflect the latest catalog changes. However, > this is not a problem in cases without a base snapshot. Since the existing > catalog cache should have been invalidated upon decoding the committed > catalog-change transaction, the subsequent transactions will construct a new > cache with the latest snapshot. I've also concluded it's not necessary but the reason and analysis might be somewhat different. IIUC in the original issue (looking at Andres's reproducer[1]), the fact that when replaying a non-catalog-change transaction, the walsender constructed the snapshot that doesn't reflect the catalog change is fine because the first change of that transaction was made before the catalog change. The problem is that the walsender process absorbed the invalidation message when replaying the change that happened before the catalog change, and ended up keeping replaying the subsequent changes with that snapshot. That is why we concluded that we need to distribute the invalidation messages to concurrently decoded transactions so that we can invalidate the cache again at that point. As the comment mentioned, the base snapshot is set before queuing any changes, so if the transaction doesn't have the base snapshot yet, there must be no queued change that happened before the catalog change. The transactions that initiated after the catalog change don't have this issue. Regards, [1] https://www.postgresql.org/message-id/20231118025445.crhaeeuvoe2g5dv6%40awork3.anarazel.de -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: