Home > mailing lists

Re: Parallel Apply - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: Parallel Apply
Date	August 13 08:55:26
Msg-id	CAA4eK1+n=dMzxTXsEpw9oC25h-1Y6n6izF6R4DF8=HA4ovP-jw@mail.gmail.com Whole thread Raw
In response to	Re: Parallel Apply (Константин Книжник <knizhnik@garret.ru>)
List	pgsql-hackers

Tree view

On Tue, Aug 12, 2025 at 9:22 PM Константин Книжник <knizhnik@garret.ru> wrote:
>
> Hi,
> This is something similar to what I have in mind when starting my experiments with LR apply speed improvements. I
thinkthat maintaining a full  (RelationId, ReplicaIdentity) hash may be too expensive - there can be hundreds of active
transactionsupdating millions of rows. 
> I thought about something like a bloom filter. But frankly speaking I didn't go far in thinking about all
implementationdetails. Your proposal is much more concrete. 
>

We can surely investigate a different hash_key if that works for all cases.

> But I decided to implement first approach with prefetch, which is much more simple, similar with prefetching
currentlyused for physical replication and still provide quite significant improvement: 
>
https://www.postgresql.org/message-id/flat/84ed36b8-7d06-4945-9a6b-3826b3f999a6%40garret.ru#70b45c44814c248d3d519a762f528753
>
> There is one thing which I do not completely understand with your proposal: do you assume that LR walsender at
publisherwill use reorder buffer to "serialize" transactions 
> or you assume that streaming mode will be used (now it is possible to enforce parallel apply of short transactions
using`debug_logical_replication_streaming`)? 
>

The current proposal is based on reorderbuffer serializing
transactions as we are doing now.

> It seems to be senseless to spend time and memory trying to serialize transactions at the publisher if we in any case
wantto apply them in parallel at subscriber. 
> But then there is another problem: at publisher there can be hundreds of concurrent active transactions  (limited
onlyby `max_connections`) which records are intermixed in WAL. 
> If we try to apply them concurrently at subscriber, we need a corresponding number of parallel apply workers. But
usuallythe number of such workers is less than 10 (and default is 2). 
> So looks like we need to serialize transactions at subscriber side.
>
> Assume that there are 100 concurrent transactions T1..T100, i.e. before first COMMIT record there are mixed records
of100 transactions. 
> And there are just two parallel apply workers W1 and W2. Main LR apply worker with send T1 record to W1, T2  record
toW2 and ... there are not more vacant workers. 
> It has either to spawn additional ones, but it is not always possible because total number of background workers is
limited.
> Either serialize all other transactions in memory or on disk, until it reaches COMMIT of T1 or T2.
> I afraid that such serialization will eliminate any advantages of parallel apply.
>

Right, I also think so and we will probably end up doing something
what we are doing now in publisher.

> Certainly if we do reordering of transactions at publisher side, then there is no such problem. Subscriber receives
allrecords for T1, then all records for T2, ... If there are no more vacant workers, it can just wait until any of this
transactionsis completed. But I am afraid that in this case the reorder buffer at the publisher will be a bottleneck. 
>

This is a point to investigate if we observe so. But till now in our
internal testing parallel apply gives good improvement in pgbench kind
of workload.

--
With Regards,
Amit Kapila.

pgsql-hackers by date:

From: vignesh C
Date: 13 August, 08:48:17
Subject: Re: Add support for specifying tables in pg_createsubscriber.

From: shveta malik
Date: 13 August, 09:04:16
Subject: Re: Issue with logical replication slot during switchover

Re: Parallel Apply - Mailing list pgsql-hackers

Previous

Next