RE: Logical replication prefetch - Mailing list pgsql-hackers
From | Zhijie Hou (Fujitsu) |
---|---|
Subject | RE: Logical replication prefetch |
Date | |
Msg-id | OS3PR01MB5718853472499F36BF9CDFC69454A@OS3PR01MB5718.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Logical replication prefetch (Konstantin Knizhnik <knizhnik@garret.ru>) |
List | pgsql-hackers |
On Monday, July 14, 2025 2:36 PM Konstantin Knizhnik <knizhnik@garret.ru> wrote: > On 14/07/2025 4:20 am, Zhijie Hou (Fujitsu) wrote: > > Additionally, I was also exploring ways to improve performance and have tried an > > alternative version of prefetch for experimentation. The alternative design is > > that we assigns each non-streaming transaction to a parallel apply worker, while > > strictly maintaining the order of commits. During parallel apply, if the > > transactions that need to be committed before the current transaction are not > > yet finished, the worker performs pre-fetch operations. Specifically, for > > updates and deletes, the worker finds and caches the target local tuple to be > > updated/deleted. Once all preceding transactions are committed, the parallel > > apply worker uses these cached tuples to execute the actual updates or deletes. > > What do you think about this alternative ? I think the alternative might offer > > more stability in scenarios where shared buffer elimination occurs frequently > > and avoids leaving dead tuples in the buffer. However, it also presents some > > drawbacks, such as the need to add wait events to maintain commit order, > > compared to the approach discussed in this thread. > > So as far as I understand your PoC is doing the same as approach 1 in my > proposal - prefetch of replica identity, but it is done not by parallel prefetch > workers, but normal parallel apply workers when they have to wait until previous > transaction is committed. I consider it to be more complex but may be more > efficient than my approach. > > The obvious drawback of both your's and my approaches is that it prefetch only > pages of primary index (replica identity). If there are some other indexes > which keys are changed by update, then pages of such indexes will be read from > the disk when you apply update. The same is also true for insert (in this case > you always has to include new tuple in all indexes) - this is why I have also > implemented another approach: apply operation in prefetch worker and then > rollback transaction. Thank you for your reply! I agree that indexes other than RI do not benefit from the pre-fetch. Regarding the apply operation and rollback approach, I have some concerns about the possible side effects, particularly the accumulation of dead tuples in the shared buffer. This is because all changes are performed by the pre-fetch worker at once before being aborted. I haven't delved deeply into this yet, but do you think this could potentially introduce additional overhead ? > > > Also I do not quite understand how you handle invalidations? During the pre-fetch phase of my patch, the execution of table_tuple_lock() is postponed until all preceding transactions have been finalized. If the cached tuple was modified by other transactions, table_tuple_lock() will return TM_Updated, signifying that the cached tuple is no longer valid. In these cases, the parallel apply worker will re-fetch the tuple. > Assume that we have two transactions - T1 and T2: > > T1: ... W1 Commit > T2: ... W1 > > So T1 writes tuple 1 and then commits transaction. Then T2 updates tuple 1. > If I correctly understand your approach, parallel apply worker for T2 will try > to prefetch tuple 1 before T1 is committed. > > But in this case it will get old version of the tuple. It is not a problem if > parallel apply worker will repeat lookup and not just use cached tuple. Yes, it is done like that. > > One more moment. As far as you assigns each non-streaming transaction to a > parallel apply worker, number of such transactions is limited by assigns each > non-streaming transaction to a parallel apply worker,umber of background > workers. Usually it is not so large (~10). So if there were 100 parallel > transactions and publishers, then at subscriber you still be able to executed > concurrently not more than few of them. In this sense my approach with > separate prefetch workers is more flexible: each prefetch worker can prefetch > as many operations as it can. Yes, that's true. I have been analyzing some performance issues in logical replication, specifically under scenarios where both the publisher and subscriber are subjected to high workloads. In these situations, the shared buffer is frequent updated, prompting me to consider the alternative approach I mentioned. I plan to perform additional tests and analysis on these approaches, thanks ! Best Regards, Hou zj
pgsql-hackers by date: