Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array - Mailing list pgsql-hackers

From Kirill Reshke
Subject Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array
Date
Msg-id CALdSSPhikUQp_p+0w=RnCo5BwcCQbJrbRxtS_4n2-V0mD7N7+g@mail.gmail.com
Whole thread Raw
In response to Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array  (Xuneng Zhou <xunengzhou@gmail.com>)
Responses Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array
List pgsql-hackers
On Mon, 20 Oct 2025 at 08:08, Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi, thanks for looking into this.
>
> On Sat, Oct 18, 2025 at 4:59 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
> >
> > On Sat, 18 Oct 2025 at 12:50, Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > >
> > > Hi Hackers,
> >
> > Hi!
> >
> > > The SnapBuildPurgeOlderTxn function previously used a suboptimal
> > > method to remove old XIDs from the committed.xip array. It allocated a
> > > temporary workspace array, copied the surviving elements into it, and
> > > then copied them back, incurring unnecessary memory allocation and
> > > multiple data copies.
> > >
> > > This patch refactors the logic to use a standard two-pointer, in-place
> > > compaction algorithm. The new approach filters the array in a single
> > > pass with no extra memory allocation, improving both CPU and memory
> > > efficiency.
> > >
> > > No behavioral changes are expected. This resolves a TODO comment
> > > expecting a more efficient algorithm.
> > >
> >
> > Indeed, these changes look correct.
> > I wonder why b89e151054a0 did this place this way, hope we do not miss
> > anything here.
>
> I think this small refactor does not introduce behavioral changes or
> breaks given constraints.
>
> > Can we construct a microbenchmark here which will show some benefit?
> >
>
> I prepared a simple microbenchmark to evaluate the impact of the
> algorithm replacement. The attached results summarize the findings.
> An end-to-end benchmark was not included, as this function is unlikely
> to be a performance hotspot in typical decoding workloads—the array
> being cleaned is expected to be relatively small under normal
> operating conditions. However, its impact could become more noticeable
> in scenarios with long-running transactions and a large number of
> catalog-modifying DML or DDL operations.
>
> Hardware:
> AMD EPYC™ Genoa 9454P 48-core 4th generation
> DDR5 ECC reg
> NVMe SSD Datacenter Edition (Gen 4)
>
> Best,
> Xuneng

At first glance these results look satisfactory.

Can you please describe, how did you get your numbers? Maybe more
script or steps to reproduce, if anyone will be willing to...

--
Best regards,
Kirill Reshke



pgsql-hackers by date:

Previous
From: Chao Li
Date:
Subject: Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options
Next
From: jian he
Date:
Subject: Re: misleading error message in ProcessUtilitySlow T_CreateStatsStmt