Re: Fix premature xmin advancement during fast forward decoding - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Fix premature xmin advancement during fast forward decoding
Date
Msg-id CAD21AoBw_DX3L5mBN3kryc0mZ1TzbT-GFhWPjLubKAHRtVh-bg@mail.gmail.com
Whole thread Raw
In response to Re: Fix premature xmin advancement during fast forward decoding  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Fix premature xmin advancement during fast forward decoding
List pgsql-hackers
On Thu, Apr 24, 2025 at 9:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Apr 25, 2025 at 8:14 AM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > On Fri, Apr 25, 2025 at 5:44 AM Masahiko Sawada wrote:
> > > On Tue, Apr 22, 2025 at 12:06 AM Zhijie Hou (Fujitsu)
> > > <houzj.fnst@fujitsu.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > When analyzing some issues in another thread[1], I found a bug that
> > > > fast forward decoding could lead to premature advancement of
> > > > catalog_xmin, resulting in required catalog data being removed by vacuum.
> > > >
> > > > The issue arises because we do not build a base snapshot when decoding
> > > > changes during fast forward mode, preventing reference to the minimum
> > > > transaction ID that remains visible in the snapshot when determining
> > > > the candidate for catalog_xmin. As a result, catalog_xmin was directly
> > > > advanced to the oldest running transaction ID found in the latest
> > > running_xacts record.
> > > >
> > > > In code-level, SnapBuildProcessChange -> ReorderBufferSetBaseSnapshot
> > > > could not be reached during fast forward decoding, resulting in
> > > > rb->txns_by_base_snapshot_lsn being NULL. When advancing
> > > catalog_xmin,
> > > > rb->the
> > > > system attempts to refer to rb->txns_by_base_snapshot_lsn via
> > > > ReorderBufferGetOldestXmin(). However, since
> > > > rb->txns_by_base_snapshot_lsn is NULL, it defaults to directly using
> > > > running->oldestRunningXid as the candidate for catalog_xmin. For
> > > > reference, see the details in SnapBuildProcessRunningXacts.
> > >
> > > I agree with your analysis.
> > >
> > > > To fix this, I think we can allow the base snapshot to be built during
> > > > fast forward decoding, as implemented in the patch 0001 (We already
> > > > built base snapshot in fast-forward mode for logical message in
> > > logicalmsg_decode()).
> > > >
> > > > I also explored the possibility of further optimizing the fix to
> > > > reduce the overhead associated with building a snapshot in
> > > > fast-forward mode. E.g., to maintain a single base_snapshot_xmin
> > > > rather than generating a full snapshot for each transaction, and use
> > > > this base_snapshot_xmin when determining the candidate catalog_xmin.
> > > > However, I am not feeling confident about maintaining some
> > > > fast-forward dedicated fields in common structures and perhaps employing
> > > different handling for catalog_xmin.
> > > >
> > > > Moreover, I conducted a basic test[2] to test the patch's impact,
> > > > noting that advancing the slot incurs roughly a 4% increase in
> > > > processing time after applying the patch, which appears to be
> > > > acceptable.  Additionally, the cost associated with building the
> > > > snapshot via SnapBuildBuildSnapshot() did not show up in the profile.
> > >
> > > Where did the regression come from? I think that even with your patch
> > > SnapBuildBuildSnapshot() would be called only a few times in the workload.
> >
> > AFAICS, the primary cost arises from the additional function
> > invocations/executions. Functions such as SnapBuildProcessChange,
> > ReorderBufferSetBaseSnapshot, and ReorderBufferXidHasBaseSnapshot are invoked
> > more frequently after the patch. Although these functions aren't inherently
> > expensive, the scenario I tested involved numerous transactions starting with
> > just a single insert each. This resulted in a high frequency of calls to these
> > functions.

Ah, you're right. I looked at the wrong profile.

> >
>
> Yeah, I see below from the patched profile.
>
> |--4.65%--SnapBuildProcessChange
>                                                 |          |
> |          |
>                                                 |          |
> |          |--2.08%--ReorderBufferSetBaseSnapshot
>                                                 |          |
> |          |          |
>                                                 |          |
> |          |           --1.32%--__mcount_internal
>                                                 |          |
> |          |
>                                                 |          |
> |          |--1.29%--__mcount_internal
>                                                 |          |
> |          |
>                                                 |          |
> |           --0.72%--ReorderBufferXidHasBaseSnapshot
>

It looks like you've run the benchmark test with enabling assertions,
is that right? I can see AssertTXNLsnOrder() in the profiles.

> As the number of operations per transaction increases, this cost
> should reduce further as we will set base_snapshot only once per
> transaction. Now, we can try to micro-optimize this by passing
> ReorderBufferTXN, as that is computed before we invoke
> SnapBuildProcessChange, but not sure how much it will change. However,
> the bigger question is, is it really worth optimizing this code path?
>
> If we really want to optimize this code path for the test Hou-San did,
> I see that SnapBuildCommitTxn() has a bigger overhead, so we should
> investigate whether we really need it for the fast-forward mode, where
> we won't process changes. I am of the opinion that we need to pay this
> additional cost of setting base_snapshot for the correctness of the
> fast-forward mode.

I've done the same test in my environment and saw about 7.2% degradation:

HEAD: 773.538 ms
PATCHED: 833.906 ms

It depends on the workload and environment of course but I'm not sure
this is a reasonable cost we have to pay.

What I'm concerned about is the back branches. With this approach all
back branches will have such degradations and it doesn't make sense to
me to optimize SnapBuildCommitTxn() codes in back branches.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Fix premature xmin advancement during fast forward decoding
Next
From: Shlok Kyal
Date:
Subject: Re: long-standing data loss bug in initial sync of logical replication