Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array
Date
Msg-id CAD21AoD7=PU34+FHLQ6Y4Zj=_icCxk36r_-E6BmCz_t70NPsVQ@mail.gmail.com
Whole thread Raw
In response to Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array  (Xuneng Zhou <xunengzhou@gmail.com>)
Responses Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array
List pgsql-hackers
On Mon, Oct 20, 2025 at 11:13 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> Thanks for looking into this.
>
> On Tue, Oct 21, 2025 at 1:05 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
> >
> > On Tue, 21 Oct 2025 at 04:31, Michael Paquier <michael@paquier.xyz> wrote:
> > >
> > > On Sat, Oct 18, 2025 at 01:59:40PM +0500, Kirill Reshke wrote:
> > > > Indeed, these changes look correct.
> > > > I wonder why b89e151054a0 did this place this way, hope we do not miss
> > > > anything here.
> > >
> > > Perhaps a lack of time back in 2014?  It feels like an item where we
> > > would need to research a bit some of the past threads, and see if this
> > > has been discussed, or if there were other potential alternatives
> > > discussed.  This is not saying that what you are doing in this
> > > proposal is actually bad, but it's a bit hard to say what an
> > > "algorithm" should look like in this specific code path with XID
> > > manipulations.  Perhaps since 2014, we may have other places in the
> > > tree that share similar characteristics as what's done here.
> > >
> > > So it feels like this needs a bit more historical investigation first,
> > > rather than saying that your proposal is the best choice on the table.
> > > --
> > > Michael
> >
> > Sure
> >
> > Commit b89e151054a0 comes from [0]
> > Comment of SnapBuildPurgeCommittedTxn tracks to [1] (it was in form
> > "XXX: Neater algorithm?")
> >
> > Between these two messages, it was not disucccesseed...
> >
> > I will also study other related threads like [2], but i don't think
> > they will give more insight here.
> >
> > [0] https://www.postgresql.org/message-id/20140303162652.GB16654%40awork2.anarazel.de
> > [1] https://www.postgresql.org/message-id/20140115002223.GA17204%40awork2.anarazel.de
> > [2]
https://www.postgresql.org/message-id/flat/20130914204913.GA4071%40awork2.anarazel.de#:~:text=20130914204913.GA4071%40awork2.anarazel.de
>
> Introducing logical decoding was a major feature, and I assume there
> were more critical design decisions and trade-offs to consider at that
> time.
>
> I proposed this refactor not because it is the most significant
> optimization, but because it seems to be a low-hanging fruit with
> clear benefits. By using an in-place two-pointer compaction, we can
> eliminate the extra memory allocation and copy-back step without
> introducing risks to this well-tested code path.

I agree the proposed in-pace update is better than the current
copy-and-iterating approach. While the benefit might not be visible as
it's not a known hot-path, I find that the proposed patch makes sense
to improve the current codes. It could help some workloads where there
are many DDLs (e.g., creating temporary tables in many transactions).

>
> A comparable optimization exists in KnownAssignedXidsCompress()  which
> uses the same algorithm to remove stale XIDs without workspace
> allocation. That implementation also adds a lazy compaction heuristic
> that delays compaction until a threshold of removed entries is
> reached, amortizing the O(N) cost across multiple operations.
>
> The comment above the data structure mentions the trade-off of keeping
> the committed.xip array sorted versus unsorted. If the array were
> sorted, we could use a binary search combined with memmove to compact
> it efficiently, achieving O(log n + n) complexity for purging.
> However, that design would increase the complexity of
> SnapBuildAddCommittedTxn from O(1) to O(n) and "more complicated wrt
> wraparound".
>
> /*
> * Array of committed transactions that have modified the catalog.
> *
> * As this array is frequently modified we do *not* keep it in
> * xidComparator order. Instead we sort the array when building &
> * distributing a snapshot.
> *
> * TODO: It's unclear whether that reasoning has much merit. Every
> * time we add something here after becoming consistent will also
> * require distributing a snapshot. Storing them sorted would
> * potentially also make it easier to purge (but more complicated wrt
> * wraparound?). Should be improved if sorting while building the
> * snapshot shows up in profiles.
> */
> TransactionId *xip;
> } committed;
>
> /*
> * Keep track of a new catalog changing transaction that has committed.
> */
> static void
> SnapBuildAddCommittedTxn(SnapBuild *builder, TransactionId xid)
> {
> Assert(TransactionIdIsValid(xid));
>
> if (builder->committed.xcnt == builder->committed.xcnt_space)
> {
> builder->committed.xcnt_space = builder->committed.xcnt_space * 2 + 1;
>
> elog(DEBUG1, "increasing space for committed transactions to %u",
> (uint32) builder->committed.xcnt_space);
>
> builder->committed.xip = repalloc(builder->committed.xip,
> builder->committed.xcnt_space * sizeof(TransactionId));
> }
>
> /*
> * TODO: It might make sense to keep the array sorted here instead of
> * doing it every time we build a new snapshot. On the other hand this
> * gets called repeatedly when a transaction with subtransactions commits.
> */
> builder->committed.xip[builder->committed.xcnt++] = xid;
> }
>
> It might be worth profiling this function to evaluate whether
> maintaining a sorted array could bring potential benefits, although
> accurately measuring its end-to-end impact could be difficult if it
> isn’t a known hotspot. I also did a brief search on the mailing list
> and found no reports of performance concerns or related proposals to
> optimize this part of the code.

It might also be worth researching what kind of workloads would need a
better algorithm in terms of storing/updating xip and subxip arrays
since it would be the primary motivation. Also, otherwise we cannot
measure the real-world impact of a new algorithm. Having said that, I
find that it would be discussed and developed separately from the
proposed patch on this thread.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Quan Zongliang
Date:
Subject: Re: Include extension path on pg_available_extensions
Next
From: Michael Paquier
Date:
Subject: Re: Extended Statistics set/restore/clear functions.