Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array - Mailing list pgsql-hackers
From | Kirill Reshke |
---|---|
Subject | Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array |
Date | |
Msg-id | CALdSSPgR7cWKpzMb-+kXiK0X_rYisUUCLj+2cpWAa9BSddq-CQ@mail.gmail.com Whole thread Raw |
In response to | Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array (Xuneng Zhou <xunengzhou@gmail.com>) |
Responses |
Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array
|
List | pgsql-hackers |
On Mon, 20 Oct 2025 at 13:47, Xuneng Zhou <xunengzhou@gmail.com> wrote: > > Hi, > > On Mon, Oct 20, 2025 at 11:36 AM Kirill Reshke <reshkekirill@gmail.com> wrote: > > > > On Mon, 20 Oct 2025 at 08:08, Xuneng Zhou <xunengzhou@gmail.com> wrote: > > > > > > Hi, thanks for looking into this. > > > > > > On Sat, Oct 18, 2025 at 4:59 PM Kirill Reshke <reshkekirill@gmail.com> wrote: > > > > > > > > On Sat, 18 Oct 2025 at 12:50, Xuneng Zhou <xunengzhou@gmail.com> wrote: > > > > > > > > > > Hi Hackers, > > > > > > > > Hi! > > > > > > > > > The SnapBuildPurgeOlderTxn function previously used a suboptimal > > > > > method to remove old XIDs from the committed.xip array. It allocated a > > > > > temporary workspace array, copied the surviving elements into it, and > > > > > then copied them back, incurring unnecessary memory allocation and > > > > > multiple data copies. > > > > > > > > > > This patch refactors the logic to use a standard two-pointer, in-place > > > > > compaction algorithm. The new approach filters the array in a single > > > > > pass with no extra memory allocation, improving both CPU and memory > > > > > efficiency. > > > > > > > > > > No behavioral changes are expected. This resolves a TODO comment > > > > > expecting a more efficient algorithm. > > > > > > > > > > > > > Indeed, these changes look correct. > > > > I wonder why b89e151054a0 did this place this way, hope we do not miss > > > > anything here. > > > > > > I think this small refactor does not introduce behavioral changes or > > > breaks given constraints. > > > > > > > Can we construct a microbenchmark here which will show some benefit? > > > > > > > > > > I prepared a simple microbenchmark to evaluate the impact of the > > > algorithm replacement. The attached results summarize the findings. > > > An end-to-end benchmark was not included, as this function is unlikely > > > to be a performance hotspot in typical decoding workloads—the array > > > being cleaned is expected to be relatively small under normal > > > operating conditions. However, its impact could become more noticeable > > > in scenarios with long-running transactions and a large number of > > > catalog-modifying DML or DDL operations. > > > > > > Hardware: > > > AMD EPYC™ Genoa 9454P 48-core 4th generation > > > DDR5 ECC reg > > > NVMe SSD Datacenter Edition (Gen 4) > > > > > > Best, > > > Xuneng > > > > At first glance these results look satisfactory. > > > > Can you please describe, how did you get your numbers? Maybe more > > script or steps to reproduce, if anyone will be willing to... > > > > Sure. Here is a brief description of this experiential benchmark: > > 1) what Tier 1 measures > > Function under test: committed.xip purge in SnapBuild > (OLD=workspace+memcpy vs NEW=in-place compaction). > > Inputs: > Array sizes: 100, 500, 1000, 2000, 5000, 10000 > Keep ratios: 0.9, 0.5, 0.1, 0.01 > Distributions: scattered (Fisher–Yates shuffle), contiguous > Repetitions: 30 per scenario > > RNG and determinism: pg_prng_state with seed 42 per dataset ensures > reproducibility. > > Metrics recorded per scenario: > Time (ns): mean, median, p95 > Survivors (count) > Memory traffic (bytes): bytes_read, bytes_written, bytes_total > OLD: reads = (xcnt + survivors) × sizeof(XID); writes = 2 × survivors > × sizeof(XID) > NEW: reads = xcnt × sizeof(XID); writes = survivors × sizeof(XID) > > 2) The core components > > # C Extension (snapbuild_bench.c) - The actual benchmark implementation > > The C extension contains the actual benchmark implementation that runs > inside the PostgreSQL backend process. It's designed to: > - Mimic real PostgreSQL code paths** as closely as possible > - Use actual PostgreSQL data structures** (`TransactionId`, `MemoryContext`) > - Call real PostgreSQL functions** (`NormalTransactionIdPrecedes`) > - Measure with nanosecond precision** using PostgreSQL's timing infrastructure > > # SQL Wrapper (snapbuild_bench--1.0.sql) - function definitions > > # Orchestration Scripts - Automated benchmark execution and analysis > run_snapbuild_purge_bench > > 3) Execution Flow > > 1. Extension Installation > # Build and install > export PG_CONFIG=$HOME/pg/vanilla/bin/pg_config > make -C contrib_extension/snapbuild_bench clean install > # Create extension in database > CREATE EXTENSION snapbuild_bench; > > 3. Run full benchmark suite > ./run_snapbuild_purge_bench.sh --clean --with-baseline <patch> > > 4. Data Analysis > # Generate plots > python3 plot_tier1_results.py --csv results/unit/base_unit.csv --out plots/ > # Compare baseline vs patched > python3 compare_snapbuild_results.py vanilla/ patched/ Thank you. > TBH, the performance improvement from this refactor is fairly > straightforward, and it’s unlikely to introduce regressions. Sure. > The > experimental benchmark is therefore more complex than necessary. > Still, I treated it as a learning exercise — an opportunity to > practice benchmarking methodology and hopefully to reuse some of these > techniques when evaluating more performance-critical paths in the > future. If anyone has suggestions or spots issues, I’d greatly > appreciate your feedback as well. > > Best, > Xuneng I think this patch is in committable state, should we change its CF entry accordingly? -- Best regards, Kirill Reshke
pgsql-hackers by date: