Re: long-standing data loss bug in initial sync of logical replication - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: long-standing data loss bug in initial sync of logical replication |
Date | |
Msg-id | CAA4eK1+ubqorurkbWOtpbEd2Vc8QZ_vmUHY3mFgL1fp_1+wK=g@mail.gmail.com Whole thread Raw |
In response to | Re: long-standing data loss bug in initial sync of logical replication (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: long-standing data loss bug in initial sync of logical replication
|
List | pgsql-hackers |
On Fri, Feb 28, 2025 at 6:15 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Feb 27, 2025 at 12:14 AM Zhijie Hou (Fujitsu) > <houzj.fnst@fujitsu.com> wrote: > > > > On Monday, February 24, 2025 5:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Wed, Dec 11, 2024 at 12:37 PM Masahiko Sawada > > > <sawada.mshk@gmail.com> wrote: > > > > > > > > I confirmed that the proposed patch fixes these issues. I have one > > > > question about the patch: > > > > > > > > In the main loop in SnapBuildDistributeSnapshotAndInval(), we have the > > > > following code: > > > > > > > > /* > > > > * If we don't have a base snapshot yet, there are no changes in this > > > > * transaction which in turn implies we don't yet need a snapshot at > > > > * all. We'll add a snapshot when the first change gets queued. > > > > * > > > > * NB: This works correctly even for subtransactions because > > > > * ReorderBufferAssignChild() takes care to transfer the base > > > snapshot > > > > * to the top-level transaction, and while iterating the changequeue > > > > * we'll get the change from the subtxn. > > > > */ > > > > if (!ReorderBufferXidHasBaseSnapshot(builder->reorder, txn->xid)) > > > > continue; > > > > > > > > Is there any case where we need to distribute inval messages to > > > > transactions that don't have the base snapshot yet but eventually need > > > > the inval messages? > > > > > > > > > > Good point. It is mentioned that for snapshots: "We'll add a snapshot > > > when the first change gets queued.". I think we achieve this via > > > builder->committed.xip array such that when we set a base snapshot for > > > a transaction, we use that array to form a snapshot. However, I don't > > > see any such consideration for invalidations. Now, we could either > > > always add invalidations to xacts that don't have base_snapshot yet or > > > have a mechanism similar committed.xid array. But it is better to > > > first reproduce the problem. > > > > I think distributing invalidations to a transaction that has not yet built a > > base snapshot is un-necessary. This is because, during the process of building > > its base snapshot, such a transaction will have already recorded the XID of the > > transaction that altered the publication information into its array of > > committed XIDs. Consequently, it will reflect the latest changes in the catalog > > from the beginning. In the context of logical decoding, this scenario is > > analogous to decoding a new transaction initiated after the catalog-change > > transaction has been committed. > > > > The original issue arises because the catalog cache was constructed using an > > outdated snapshot that could not reflect the latest catalog changes. However, > > this is not a problem in cases without a base snapshot. Since the existing > > catalog cache should have been invalidated upon decoding the committed > > catalog-change transaction, the subsequent transactions will construct a new > > cache with the latest snapshot. > > I've also concluded it's not necessary but the reason and analysis > might be somewhat different. IIUC in the original issue (looking at > Andres's reproducer[1]), the fact that when replaying a > non-catalog-change transaction, the walsender constructed the snapshot > that doesn't reflect the catalog change is fine because the first > change of that transaction was made before the catalog change. The > problem is that the walsender process absorbed the invalidation > message when replaying the change that happened before the catalog > change, and ended up keeping replaying the subsequent changes with > that snapshot. That is why we concluded that we need to distribute the > invalidation messages to concurrently decoded transactions so that we > can invalidate the cache again at that point. As the comment > mentioned, the base snapshot is set before queuing any changes, so if > the transaction doesn't have the base snapshot yet, there must be no > queued change that happened before the catalog change. The > transactions that initiated after the catalog change don't have this > issue. > I think both of you are saying the same thing with slightly different words. Hou-San's explanation goes into more detail at the code level, and you have said the same thing with a slightly higher-level view. Additionally, for streaming transactions where we would have already sent one or more streams, we don't need anything special since they behave similarly to a transaction having a base snapshot because we save the snapshot after sending each stream. -- With Regards, Amit Kapila.
pgsql-hackers by date: