Re: Parallel Apply - Mailing list pgsql-hackers
| From | Dilip Kumar |
|---|---|
| Subject | Re: Parallel Apply |
| Date | |
| Msg-id | CAFiTN-ut-W1-SvD=txQk0EUXv5RM5c1YdkfJEgZp78yPTZX8BQ@mail.gmail.com Whole thread Raw |
| In response to | Re: Parallel Apply (Amit Kapila <amit.kapila16@gmail.com>) |
| Responses |
Re: Parallel Apply
|
| List | pgsql-hackers |
On Tue, Sep 16, 2025 at 3:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Sep 6, 2025 at 10:33 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > I suspect this might not be the most performant default strategy and > > could frequently cause a performance dip. In general, we utilize > > parallel apply workers, considering that the time taken to apply > > changes is much costlier than reading and sending messages to workers. > > > > The current strategy involves the leader picking one transaction for > > itself after distributing transactions to all apply workers, assuming > > the apply task will take some time to complete. When the leader takes > > on an apply task, it becomes a bottleneck for complete parallelism. > > This is because it needs to finish applying previous messages before > > accepting any new ones. Consequently, even as workers slowly become > > free, they won't receive new tasks because the leader is busy applying > > its own transaction. > > > > This type of strategy might be suitable in scenarios where users > > cannot supply more workers due to resource limitations. However, on > > high-end machines, it is more efficient to let the leader act solely > > as a message transmitter and allow the apply workers to handle all > > apply tasks. This could be a configurable parameter, determining > > whether the leader also participates in applying changes. I believe > > this should not be the default strategy; in fact, the default should > > be for the leader to act purely as a transmitter. > > > > I see your point but consider a scenario where we have two pa workers. > pa-1 is waiting for some backend on unique_key insertion and pa-2 is > waiting for pa-1 to complete its transaction as pa-2 has to perform > some change which is dependent on pa-1's transaction. So, leader can > either simply wait for a third transaction to be distributed or just > apply it and process another change. If we follow the earlier then it > is quite possible that the sender fills the network queue to send data > and simply timed out. Sorry I took a while to come back to this. I understand your point and agree that it's a valid concern. However, I question whether limiting this to a single choice is the optimal solution. The core issue involves two distinct roles: work distribution and applying changes. Work distribution is exclusively handled by the leader, while any worker can apply the changes. This is essentially a single-producer, multiple-consumer problem. While it might seem efficient for the producer (leader) to assist consumers (workers) when there's a limited number of consumers, I believe this isn't the best design. In such scenarios, it's generally better to allow the producer to focus solely on its primary task, unless there's a severe shortage of processing power. If computing resources are constrained, allowing producers to join consumers in applying changes is acceptable. However, if sufficient processing power is available, the producer should ideally be left to its own duties. The question then becomes: how do we make this decision? My suggestion is to make this a configurable parameter. Users could then decide whether the leader participates in applying changes. This would provide flexibility: If there are enough workers, user can set the leader can focus on its distribution task only OTOH If processing power is limited and only a few apply workers (e.g., two, as in your example) can be set up, users would have the option to configure the leader to also act as an apply worker when needed. -- Regards, Dilip Kumar Google
pgsql-hackers by date: