Re: Parallel Apply - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Parallel Apply |
Date | |
Msg-id | CAA4eK1+n=dMzxTXsEpw9oC25h-1Y6n6izF6R4DF8=HA4ovP-jw@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Apply (Константин Книжник <knizhnik@garret.ru>) |
List | pgsql-hackers |
On Tue, Aug 12, 2025 at 9:22 PM Константин Книжник <knizhnik@garret.ru> wrote: > > Hi, > This is something similar to what I have in mind when starting my experiments with LR apply speed improvements. I thinkthat maintaining a full (RelationId, ReplicaIdentity) hash may be too expensive - there can be hundreds of active transactionsupdating millions of rows. > I thought about something like a bloom filter. But frankly speaking I didn't go far in thinking about all implementationdetails. Your proposal is much more concrete. > We can surely investigate a different hash_key if that works for all cases. > But I decided to implement first approach with prefetch, which is much more simple, similar with prefetching currentlyused for physical replication and still provide quite significant improvement: > https://www.postgresql.org/message-id/flat/84ed36b8-7d06-4945-9a6b-3826b3f999a6%40garret.ru#70b45c44814c248d3d519a762f528753 > > There is one thing which I do not completely understand with your proposal: do you assume that LR walsender at publisherwill use reorder buffer to "serialize" transactions > or you assume that streaming mode will be used (now it is possible to enforce parallel apply of short transactions using`debug_logical_replication_streaming`)? > The current proposal is based on reorderbuffer serializing transactions as we are doing now. > It seems to be senseless to spend time and memory trying to serialize transactions at the publisher if we in any case wantto apply them in parallel at subscriber. > But then there is another problem: at publisher there can be hundreds of concurrent active transactions (limited onlyby `max_connections`) which records are intermixed in WAL. > If we try to apply them concurrently at subscriber, we need a corresponding number of parallel apply workers. But usuallythe number of such workers is less than 10 (and default is 2). > So looks like we need to serialize transactions at subscriber side. > > Assume that there are 100 concurrent transactions T1..T100, i.e. before first COMMIT record there are mixed records of100 transactions. > And there are just two parallel apply workers W1 and W2. Main LR apply worker with send T1 record to W1, T2 record toW2 and ... there are not more vacant workers. > It has either to spawn additional ones, but it is not always possible because total number of background workers is limited. > Either serialize all other transactions in memory or on disk, until it reaches COMMIT of T1 or T2. > I afraid that such serialization will eliminate any advantages of parallel apply. > Right, I also think so and we will probably end up doing something what we are doing now in publisher. > Certainly if we do reordering of transactions at publisher side, then there is no such problem. Subscriber receives allrecords for T1, then all records for T2, ... If there are no more vacant workers, it can just wait until any of this transactionsis completed. But I am afraid that in this case the reorder buffer at the publisher will be a bottleneck. > This is a point to investigate if we observe so. But till now in our internal testing parallel apply gives good improvement in pgbench kind of workload. -- With Regards, Amit Kapila.
pgsql-hackers by date: