Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array - Mailing list pgsql-hackers
From | Xuneng Zhou |
---|---|
Subject | Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array |
Date | |
Msg-id | CABPTF7VtK80J2uVayacWqjqzOwnFdFMgmvc5VUp5F3LqMyFWSA@mail.gmail.com Whole thread Raw |
In response to | Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array (Kirill Reshke <reshkekirill@gmail.com>) |
List | pgsql-hackers |
Hi,
Kirill Reshke <reshkekirill@gmail.com> 于 2025年10月20日周一 下午6:07写道:
On Mon, 20 Oct 2025 at 13:47, Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> On Mon, Oct 20, 2025 at 11:36 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
> >
> > On Mon, 20 Oct 2025 at 08:08, Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > >
> > > Hi, thanks for looking into this.
> > >
> > > On Sat, Oct 18, 2025 at 4:59 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
> > > >
> > > > On Sat, 18 Oct 2025 at 12:50, Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > > >
> > > > > Hi Hackers,
> > > >
> > > > Hi!
> > > >
> > > > > The SnapBuildPurgeOlderTxn function previously used a suboptimal
> > > > > method to remove old XIDs from the committed.xip array. It allocated a
> > > > > temporary workspace array, copied the surviving elements into it, and
> > > > > then copied them back, incurring unnecessary memory allocation and
> > > > > multiple data copies.
> > > > >
> > > > > This patch refactors the logic to use a standard two-pointer, in-place
> > > > > compaction algorithm. The new approach filters the array in a single
> > > > > pass with no extra memory allocation, improving both CPU and memory
> > > > > efficiency.
> > > > >
> > > > > No behavioral changes are expected. This resolves a TODO comment
> > > > > expecting a more efficient algorithm.
> > > > >
> > > >
> > > > Indeed, these changes look correct.
> > > > I wonder why b89e151054a0 did this place this way, hope we do not miss
> > > > anything here.
> > >
> > > I think this small refactor does not introduce behavioral changes or
> > > breaks given constraints.
> > >
> > > > Can we construct a microbenchmark here which will show some benefit?
> > > >
> > >
> > > I prepared a simple microbenchmark to evaluate the impact of the
> > > algorithm replacement. The attached results summarize the findings.
> > > An end-to-end benchmark was not included, as this function is unlikely
> > > to be a performance hotspot in typical decoding workloads—the array
> > > being cleaned is expected to be relatively small under normal
> > > operating conditions. However, its impact could become more noticeable
> > > in scenarios with long-running transactions and a large number of
> > > catalog-modifying DML or DDL operations.
> > >
> > > Hardware:
> > > AMD EPYC™ Genoa 9454P 48-core 4th generation
> > > DDR5 ECC reg
> > > NVMe SSD Datacenter Edition (Gen 4)
> > >
> > > Best,
> > > Xuneng
> >
> > At first glance these results look satisfactory.
> >
> > Can you please describe, how did you get your numbers? Maybe more
> > script or steps to reproduce, if anyone will be willing to...
> >
>
> Sure. Here is a brief description of this experiential benchmark:
>
> 1) what Tier 1 measures
>
> Function under test: committed.xip purge in SnapBuild
> (OLD=workspace+memcpy vs NEW=in-place compaction).
>
> Inputs:
> Array sizes: 100, 500, 1000, 2000, 5000, 10000
> Keep ratios: 0.9, 0.5, 0.1, 0.01
> Distributions: scattered (Fisher–Yates shuffle), contiguous
> Repetitions: 30 per scenario
>
> RNG and determinism: pg_prng_state with seed 42 per dataset ensures
> reproducibility.
>
> Metrics recorded per scenario:
> Time (ns): mean, median, p95
> Survivors (count)
> Memory traffic (bytes): bytes_read, bytes_written, bytes_total
> OLD: reads = (xcnt + survivors) × sizeof(XID); writes = 2 × survivors
> × sizeof(XID)
> NEW: reads = xcnt × sizeof(XID); writes = survivors × sizeof(XID)
>
> 2) The core components
>
> # C Extension (snapbuild_bench.c) - The actual benchmark implementation
>
> The C extension contains the actual benchmark implementation that runs
> inside the PostgreSQL backend process. It's designed to:
> - Mimic real PostgreSQL code paths** as closely as possible
> - Use actual PostgreSQL data structures** (`TransactionId`, `MemoryContext`)
> - Call real PostgreSQL functions** (`NormalTransactionIdPrecedes`)
> - Measure with nanosecond precision** using PostgreSQL's timing infrastructure
>
> # SQL Wrapper (snapbuild_bench--1.0.sql) - function definitions
>
> # Orchestration Scripts - Automated benchmark execution and analysis
> run_snapbuild_purge_bench
>
> 3) Execution Flow
>
> 1. Extension Installation
> # Build and install
> export PG_CONFIG=$HOME/pg/vanilla/bin/pg_config
> make -C contrib_extension/snapbuild_bench clean install
> # Create extension in database
> CREATE EXTENSION snapbuild_bench;
>
> 3. Run full benchmark suite
> ./run_snapbuild_purge_bench.sh --clean --with-baseline <patch>
>
> 4. Data Analysis
> # Generate plots
> python3 plot_tier1_results.py --csv results/unit/base_unit.csv --out plots/
> # Compare baseline vs patched
> python3 compare_snapbuild_results.py vanilla/ patched/
Thank you.
> TBH, the performance improvement from this refactor is fairly
> straightforward, and it’s unlikely to introduce regressions.
Sure.
> The
> experimental benchmark is therefore more complex than necessary.
> Still, I treated it as a learning exercise — an opportunity to
> practice benchmarking methodology and hopefully to reuse some of these
> techniques when evaluating more performance-critical paths in the
> future. If anyone has suggestions or spots issues, I’d greatly
> appreciate your feedback as well.
>
> Best,
> Xuneng
I think this patch is in committable state, should we change its CF
entry accordingly?
Feel free to do so if you agree. Thanks again for your review!
Best,
Xuneng
pgsql-hackers by date: