Re: Parallel Apply - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel Apply
Date
Msg-id CAA4eK1JkZ1JNQ71eO9+0QwSncLNFQv_KauYERNzxRhNUGcYDTA@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Apply  (Konstantin Knizhnik <knizhnik@garret.ru>)
List pgsql-hackers
On Mon, Aug 18, 2025 at 8:20 PM Konstantin Knizhnik <knizhnik@garret.ru> wrote:
>
> On 18/08/2025 9:56 AM, Nisha Moond wrote:
> > On Wed, Aug 13, 2025 at 4:17 PM Zhijie Hou (Fujitsu)
> > <houzj.fnst@fujitsu.com> wrote:
> >> Here is the initial POC patch for this idea.
> >>
> > Thank you Hou-san for the patch.
> >
> > I did some performance benchmarking for the patch and overall, the
> > results show substantial performance improvements.
> > Please find the details as follows:
> >
> > Source code:
> > ----------------
> > pgHead (572c0f1b0e) and v1-0001 patch
> >
> > Setup:
> > ---------
> > Pub --> Sub
> >   - Two nodes created in pub-sub logical replication setup.
> >   - Both nodes have the same set of pgbench tables created with scale=300.
> >   - The sub node is subscribed to all the changes from the pub node's
> > pgbench tables.
> >
> > Workload Run:
> > --------------------
> >   - Disable the subscription on Sub node
> >   - Run default pgbench(read-write) only on Pub node with #clients=40
> > and run duration=10 minutes
> >   - Enable the subscription on Sub once pgbench completes and then
> > measure time taken in replication.
> > ~~~
> >
> > Test-01: Measure Replication lag
> > ----------------------------------------
> > Observations:
> > ---------------
> >   - Replication time improved as the number of parallel workers
> > increased with the patch.
> >   - On pgHead, replicating a 10-minute publisher workload took ~46 minutes.
> >   - With just 2 parallel workers (default), replication time was cut in
> > half, and with 8 workers it completed in ~13 minutes(3.5x faster).
> >   - With 16 parallel workers, achieved ~3.7x speedup over pgHead.
> >   - With 32 workers, performance gains plateaued slightly, likely due
> > to more workers running on the machine and work done parallelly is not
> > that high to see further improvements.
> >
> > Detailed Result:
> > -----------------
> > Case    Time_taken_in_replication(sec)    rep_time_in_minutes
> > faster_than_head
> > 1. pgHead              2760.791     46.01318333    -
> > 2. patched_#worker=2    1463.853    24.3975    1.88 times
> > 3. patched_#worker=4    1031.376    17.1896    2.68 times
> > 4. patched_#worker=8      781.007    13.0168    3.54 times
> > 5. patched_#worker=16    741.108    12.3518    3.73 times
> > 6. patched_#worker=32    787.203    13.1201    3.51 times
> > ~~~~
> >
> > Test-02: Measure number of transactions parallelized
> > -----------------------------------------------------
> >   - Used a top up patch to LOG the number of transactions applied by
> > parallel worker, applied by leader, and are depended.
> >   - The LOG output e.g. -
> >    ```
> > LOG:  parallelized_nxact: 11497254 dependent_nxact: 0 leader_applied_nxact: 600
> > ```
> >   - parallelized_nxact: gives the number of parallelized transactions
> >   - dependent_nxact: gives the dependent transactions
> >   - leader_applied_nxact: gives the transactions applied by leader worker
> >   (the required top-up v1-002 patch is attached.)
> >
> >   Observations:
> > ----------------
> >   - With 4 to 8 parallel workers, ~80%-98% transactions are parallelized
> >   - As the number of workers increased, the parallelized percentage
> > increased and reached 99.99% with 32 workers.
> >
> > Detailed Result:
> > -----------------
> > case1: #parallel_workers = 2(default)
> >    #total_pgbench_txns = 24745648
> >      parallelized_nxact = 14439480 (58.35%)
> >      dependent_nxact    = 16 (0.00006%)
> >      leader_applied_nxact = 10306153 (41.64%)
> >
> > case2: #parallel_workers = 4
> >    #total_pgbench_txns = 24776108
> >      parallelized_nxact = 19666593 (79.37%)
> >      dependent_nxact    = 212 (0.0008%)
> >      leader_applied_nxact = 5109304 (20.62%)
> >
> > case3: #parallel_workers = 8
> >    #total_pgbench_txns = 24821333
> >      parallelized_nxact = 24397431 (98.29%)
> >      dependent_nxact    = 282 (0.001%)
> >      leader_applied_nxact = 423621 (1.71%)
> >
> > case4: #parallel_workers = 16
> >    #total_pgbench_txns = 24938255
> >      parallelized_nxact = 24937754 (99.99%)
> >      dependent_nxact    = 142 (0.0005%)
> >      leader_applied_nxact = 360 (0.0014%)
> >
> > case5: #parallel_workers = 32
> >    #total_pgbench_txns = 24769474
> >      parallelized_nxact = 24769135 (99.99%)
> >      dependent_nxact    = 312 (0.0013%)
> >      leader_applied_nxact = 28 (0.0001%)
> >
> > ~~~~~
> > The scripts used for above tests are attached.
> >
> > Next, I plan to extend the testing to larger workloads by running
> > pgbench for 20–30 minutes.
> > We will also benchmark performance across different workload types to
> > evaluate the improvements once the patch has matured further.
> >
> > --
> > Thanks,
> > Nisha
>
>
> I also did some benchmarking of the proposed parallel apply patch and
> compare it with my prewarming approach.
> And parallel apply is significantly more efficient than prefetch (it is
> expected).
>

Thanks to you and Nisha for doing some preliminary performance
testing, the results are really encouraging (more than 3 to 4 times
improvement in multiple workloads). I hope we keep making progress on
this patch and make it ready for the next release.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Richard Guo
Date:
Subject: Re: Pathify RHS unique-ification for semijoin planning
Next
From: jian he
Date:
Subject: Re: CREATE SCHEMA ... CREATE DOMAIN support