Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array - Mailing list pgsql-hackers

From Xuneng Zhou
Subject Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array
Date
Msg-id CABPTF7X3w9zQbkakb9BmhVU1j0RNs4BoOr1sw9h0d=PU+WLm1Q@mail.gmail.com
Whole thread Raw
In response to Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
Hi Sawada-san, Michael,

Thanks for your comments on this patch.

On Thu, Oct 23, 2025 at 8:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Oct 20, 2025 at 11:13 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi,
> >
> > Thanks for looking into this.
> >
> > On Tue, Oct 21, 2025 at 1:05 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
> > >
> > > On Tue, 21 Oct 2025 at 04:31, Michael Paquier <michael@paquier.xyz> wrote:
> > > >
> > > > On Sat, Oct 18, 2025 at 01:59:40PM +0500, Kirill Reshke wrote:
> > > > > Indeed, these changes look correct.
> > > > > I wonder why b89e151054a0 did this place this way, hope we do not miss
> > > > > anything here.
> > > >
> > > > Perhaps a lack of time back in 2014?  It feels like an item where we
> > > > would need to research a bit some of the past threads, and see if this
> > > > has been discussed, or if there were other potential alternatives
> > > > discussed.  This is not saying that what you are doing in this
> > > > proposal is actually bad, but it's a bit hard to say what an
> > > > "algorithm" should look like in this specific code path with XID
> > > > manipulations.  Perhaps since 2014, we may have other places in the
> > > > tree that share similar characteristics as what's done here.
> > > >
> > > > So it feels like this needs a bit more historical investigation first,
> > > > rather than saying that your proposal is the best choice on the table.

Following these suggestions, I carefully searched the mailing list
archives and found no reports of performance issues directly related
to this code path. I also examined other parts of the codebase for
similar patterns. Components like integerset might share some
characteristics with SnapBuildPurgeOlderTxn, but they have constraints
that make them not directly applicable here. I am not very familiar
with the whole tree, so the investigation might not be exhaustive.

> > >
> > > Commit b89e151054a0 comes from [0]
> > > Comment of SnapBuildPurgeCommittedTxn tracks to [1] (it was in form
> > > "XXX: Neater algorithm?")
> > >
> > > Between these two messages, it was not disucccesseed...
> > >
> > > I will also study other related threads like [2], but i don't think
> > > they will give more insight here.
> > >
> > > [0] https://www.postgresql.org/message-id/20140303162652.GB16654%40awork2.anarazel.de
> > > [1] https://www.postgresql.org/message-id/20140115002223.GA17204%40awork2.anarazel.de
> > > [2]
https://www.postgresql.org/message-id/flat/20130914204913.GA4071%40awork2.anarazel.de#:~:text=20130914204913.GA4071%40awork2.anarazel.de
> >
> > Introducing logical decoding was a major feature, and I assume there
> > were more critical design decisions and trade-offs to consider at that
> > time.
> >
> > I proposed this refactor not because it is the most significant
> > optimization, but because it seems to be a low-hanging fruit with
> > clear benefits. By using an in-place two-pointer compaction, we can
> > eliminate the extra memory allocation and copy-back step without
> > introducing risks to this well-tested code path.
>
> I agree the proposed in-pace update is better than the current
> copy-and-iterating approach. While the benefit might not be visible as
> it's not a known hot-path, I find that the proposed patch makes sense
> to improve the current codes. It could help some workloads where there
> are many DDLs (e.g., creating temporary tables in many transactions).
>

I also agree this approach offers better efficiency within the scope of
current SnapBuildPurgeOlderTxn.

> >
> > A comparable optimization exists in KnownAssignedXidsCompress()  which
> > uses the same algorithm to remove stale XIDs without workspace
> > allocation. That implementation also adds a lazy compaction heuristic
> > that delays compaction until a threshold of removed entries is
> > reached, amortizing the O(N) cost across multiple operations.
> >
> > The comment above the data structure mentions the trade-off of keeping
> > the committed.xip array sorted versus unsorted. If the array were
> > sorted, we could use a binary search combined with memmove to compact
> > it efficiently, achieving O(log n + n) complexity for purging.
> > However, that design would increase the complexity of
> > SnapBuildAddCommittedTxn from O(1) to O(n) and "more complicated wrt
> > wraparound".
> >
> > /*
> > * Array of committed transactions that have modified the catalog.
> > *
> > * As this array is frequently modified we do *not* keep it in
> > * xidComparator order. Instead we sort the array when building &
> > * distributing a snapshot.
> > *
> > * TODO: It's unclear whether that reasoning has much merit. Every
> > * time we add something here after becoming consistent will also
> > * require distributing a snapshot. Storing them sorted would
> > * potentially also make it easier to purge (but more complicated wrt
> > * wraparound?). Should be improved if sorting while building the
> > * snapshot shows up in profiles.
> > */
> > TransactionId *xip;
> > } committed;
> >
> > /*
> > * Keep track of a new catalog changing transaction that has committed.
> > */
> > static void
> > SnapBuildAddCommittedTxn(SnapBuild *builder, TransactionId xid)
> > {
> > Assert(TransactionIdIsValid(xid));
> >
> > if (builder->committed.xcnt == builder->committed.xcnt_space)
> > {
> > builder->committed.xcnt_space = builder->committed.xcnt_space * 2 + 1;
> >
> > elog(DEBUG1, "increasing space for committed transactions to %u",
> > (uint32) builder->committed.xcnt_space);
> >
> > builder->committed.xip = repalloc(builder->committed.xip,
> > builder->committed.xcnt_space * sizeof(TransactionId));
> > }
> >
> > /*
> > * TODO: It might make sense to keep the array sorted here instead of
> > * doing it every time we build a new snapshot. On the other hand this
> > * gets called repeatedly when a transaction with subtransactions commits.
> > */
> > builder->committed.xip[builder->committed.xcnt++] = xid;
> > }
> >
> > It might be worth profiling this function to evaluate whether
> > maintaining a sorted array could bring potential benefits, although
> > accurately measuring its end-to-end impact could be difficult if it
> > isn’t a known hotspot. I also did a brief search on the mailing list
> > and found no reports of performance concerns or related proposals to
> > optimize this part of the code.
>
> It might also be worth researching what kind of workloads would need a
> better algorithm in terms of storing/updating xip and subxip arrays
> since it would be the primary motivation. Also, otherwise we cannot
> measure the real-world impact of a new algorithm. Having said that,
> find that it would be discussed and developed separately from the
> proposed patch on this thread.

+1 for researching the workloads that might need a sorted array and
more efficient algorithm. This exploration isn’t limited to the scope
of SnapBuildPurgeOlderTxn but relates more broadly to the overall
snapbuild process, which might be worth discussing in a separate
thread as suggested.

* TODO: It's unclear whether that reasoning has much merit. Every
* time we add something here after becoming consistent will also
* require distributing a snapshot. Storing them sorted would
* potentially also make it easier to purge (but more complicated wrt
* wraparound?). Should be improved if sorting while building the
* snapshot shows up in profiles.

I also constructed an artificial workload to try to surface the qsort
call in SnapBuildBuildSnapshot, though such a scenario seems very
unlikely to occur in production.

  for ((c=1; c<=DDL_CLIENTS; c++)); do
    (
      local seq=1
      while (( $(date +%s) < tB_end )); do
        local tbl="hp_ddl_${c}_$seq"
        "$psql" -h 127.0.0.1 -p "$PORT" -d postgres -c "
          BEGIN;
          CREATE TABLE ${tbl} (id int, data text);
          CREATE INDEX idx_${tbl} ON ${tbl} (id);
          INSERT INTO ${tbl} VALUES ($seq, 'd');
          DROP TABLE ${tbl};
          COMMIT;" >/dev/null 2>&1 || true
        seq=$((seq+1))
      done
    )

    97.02%     0.00%  postgres  postgres          [.] postmaster_child_launch
            |
            ---postmaster_child_launch
               |
               |--94.93%--BackendMain
               |          PostgresMain
               |          exec_replication_command
               |          StartLogicalReplication
               |          |
               |           --94.92%--WalSndLoop
               |                     |
               |                     |--92.24%--XLogSendLogical
               |                     |          |
               |                     |
--91.63%--LogicalDecodingProcessRecord
               |                     |                     |
               |                     |
|--89.55%--xact_decode
               |                     |                     |
|
               |                     |                     |
--89.23%--DecodeCommit
               |                     |                     |
          |
               |                     |                     |
          |--64.64%--SnapBuildCommitTxn
               |                     |                     |
          |          |
               |                     |                     |
          |          |--62.60%--SnapBuildBuildSnapshot
               |                     |                     |
          |          |          |
               |                     |                     |
          |          |           --62.33%--pg_qsort
               |                     |                     |
          |          |                     |
               |                     |                     |
          |          |                     |--56.84%--pg_qsort

Best,
Xuneng



pgsql-hackers by date:

Previous
From: Chao Li
Date:
Subject: Re: Optimize LISTEN/NOTIFY
Next
From: XYenon
Date:
Subject: Proposal: Allow excluding specific file patterns in pg_checksums