Re: Parallel Apply - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Parallel Apply |
Date | |
Msg-id | CAA4eK1JkZ1JNQ71eO9+0QwSncLNFQv_KauYERNzxRhNUGcYDTA@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Apply (Konstantin Knizhnik <knizhnik@garret.ru>) |
List | pgsql-hackers |
On Mon, Aug 18, 2025 at 8:20 PM Konstantin Knizhnik <knizhnik@garret.ru> wrote: > > On 18/08/2025 9:56 AM, Nisha Moond wrote: > > On Wed, Aug 13, 2025 at 4:17 PM Zhijie Hou (Fujitsu) > > <houzj.fnst@fujitsu.com> wrote: > >> Here is the initial POC patch for this idea. > >> > > Thank you Hou-san for the patch. > > > > I did some performance benchmarking for the patch and overall, the > > results show substantial performance improvements. > > Please find the details as follows: > > > > Source code: > > ---------------- > > pgHead (572c0f1b0e) and v1-0001 patch > > > > Setup: > > --------- > > Pub --> Sub > > - Two nodes created in pub-sub logical replication setup. > > - Both nodes have the same set of pgbench tables created with scale=300. > > - The sub node is subscribed to all the changes from the pub node's > > pgbench tables. > > > > Workload Run: > > -------------------- > > - Disable the subscription on Sub node > > - Run default pgbench(read-write) only on Pub node with #clients=40 > > and run duration=10 minutes > > - Enable the subscription on Sub once pgbench completes and then > > measure time taken in replication. > > ~~~ > > > > Test-01: Measure Replication lag > > ---------------------------------------- > > Observations: > > --------------- > > - Replication time improved as the number of parallel workers > > increased with the patch. > > - On pgHead, replicating a 10-minute publisher workload took ~46 minutes. > > - With just 2 parallel workers (default), replication time was cut in > > half, and with 8 workers it completed in ~13 minutes(3.5x faster). > > - With 16 parallel workers, achieved ~3.7x speedup over pgHead. > > - With 32 workers, performance gains plateaued slightly, likely due > > to more workers running on the machine and work done parallelly is not > > that high to see further improvements. > > > > Detailed Result: > > ----------------- > > Case Time_taken_in_replication(sec) rep_time_in_minutes > > faster_than_head > > 1. pgHead 2760.791 46.01318333 - > > 2. patched_#worker=2 1463.853 24.3975 1.88 times > > 3. patched_#worker=4 1031.376 17.1896 2.68 times > > 4. patched_#worker=8 781.007 13.0168 3.54 times > > 5. patched_#worker=16 741.108 12.3518 3.73 times > > 6. patched_#worker=32 787.203 13.1201 3.51 times > > ~~~~ > > > > Test-02: Measure number of transactions parallelized > > ----------------------------------------------------- > > - Used a top up patch to LOG the number of transactions applied by > > parallel worker, applied by leader, and are depended. > > - The LOG output e.g. - > > ``` > > LOG: parallelized_nxact: 11497254 dependent_nxact: 0 leader_applied_nxact: 600 > > ``` > > - parallelized_nxact: gives the number of parallelized transactions > > - dependent_nxact: gives the dependent transactions > > - leader_applied_nxact: gives the transactions applied by leader worker > > (the required top-up v1-002 patch is attached.) > > > > Observations: > > ---------------- > > - With 4 to 8 parallel workers, ~80%-98% transactions are parallelized > > - As the number of workers increased, the parallelized percentage > > increased and reached 99.99% with 32 workers. > > > > Detailed Result: > > ----------------- > > case1: #parallel_workers = 2(default) > > #total_pgbench_txns = 24745648 > > parallelized_nxact = 14439480 (58.35%) > > dependent_nxact = 16 (0.00006%) > > leader_applied_nxact = 10306153 (41.64%) > > > > case2: #parallel_workers = 4 > > #total_pgbench_txns = 24776108 > > parallelized_nxact = 19666593 (79.37%) > > dependent_nxact = 212 (0.0008%) > > leader_applied_nxact = 5109304 (20.62%) > > > > case3: #parallel_workers = 8 > > #total_pgbench_txns = 24821333 > > parallelized_nxact = 24397431 (98.29%) > > dependent_nxact = 282 (0.001%) > > leader_applied_nxact = 423621 (1.71%) > > > > case4: #parallel_workers = 16 > > #total_pgbench_txns = 24938255 > > parallelized_nxact = 24937754 (99.99%) > > dependent_nxact = 142 (0.0005%) > > leader_applied_nxact = 360 (0.0014%) > > > > case5: #parallel_workers = 32 > > #total_pgbench_txns = 24769474 > > parallelized_nxact = 24769135 (99.99%) > > dependent_nxact = 312 (0.0013%) > > leader_applied_nxact = 28 (0.0001%) > > > > ~~~~~ > > The scripts used for above tests are attached. > > > > Next, I plan to extend the testing to larger workloads by running > > pgbench for 20–30 minutes. > > We will also benchmark performance across different workload types to > > evaluate the improvements once the patch has matured further. > > > > -- > > Thanks, > > Nisha > > > I also did some benchmarking of the proposed parallel apply patch and > compare it with my prewarming approach. > And parallel apply is significantly more efficient than prefetch (it is > expected). > Thanks to you and Nisha for doing some preliminary performance testing, the results are really encouraging (more than 3 to 4 times improvement in multiple workloads). I hope we keep making progress on this patch and make it ready for the next release. -- With Regards, Amit Kapila.
pgsql-hackers by date: