Re: Parallel Apply - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel Apply
Date
Msg-id CAA4eK1LoHr52JeXxt=vQoVoXwwPYAfWg1t7Lo0_7r9iajyjbkw@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Apply  (Andrei Lepikhov <lepihov@gmail.com>)
List pgsql-hackers
On Tue, Aug 12, 2025 at 12:04 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
>
> On 11/8/2025 06:45, Amit Kapila wrote:
> > The core idea is that the leader apply worker ensures the following:
> > a. Identifies dependencies between transactions. b. Coordinates
> > parallel workers to apply independent transactions concurrently. c.
> > Ensures correct ordering for dependent transactions.
> Dependency detection may be quite an expensive operation. What about a
> 'positive' approach - deadlock detection on replica and, restart apply
> of a record that should be applied later? Have you thought about this
> way? What are the pros and cons here? Do you envision common cases where
> such a deadlock will be frequent?
>

It is not only deadlocks but we could also incorrectly apply some
transactions which should otherwise fail. For example, consider
following case:
Pub: t1(c1 int unique key, c2 int)
Sub: t1(c1 int unique key, c2 int)
On Pub:
TXN-1
insert(1,11)
TXN-2
update (1,11) --> update (2,12)

On Sub:
table contains (1,11) before replication.
Now, if we allow dependent transactions to go in parallel, instead of
giving an ERROR while doing Insert, the update will be successful and
next insert will also be successful. This will create inconsistency on
the subscriber-side.

Similarly consider another set of transactions:
On Pub:
TXN-1
insert(1,11)
TXN-2
Delete (1,11)

On subscriber, if we allow TXN-2 before TXN-1, then the subscriber
will apply both transactions successfully but will become inconsistent
w.r.t publisher.

My colleague had already built a POC based on this idea and we did
check some initial numbers for non-dependent transactions and the
apply speed has improved drastically. We will share the POC patch and
numbers in the next few days.

For the dependent transactions workload, if we choose to go with the
deadlock detection approach, there will be lot of retries which may
not lead to good apply improvements. Also, we may choose to enable
this form of parallel-apply optionally due to reasons mentioned in my
first email, so if there is overhead due to dependency tracking then
one can disable parally apply for those particular subscriptions.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: shveta malik
Date:
Subject: Re: Excessive LOG messages from replication slot sync worker
Next
From: Shinya Kato
Date:
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD