Re: Logical replication prefetch - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: Logical replication prefetch
Date
Msg-id facc2fa1-31f4-48d4-9588-1165ebafa620@garret.ru
Whole thread Raw
In response to RE: Logical replication prefetch  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Responses RE: Logical replication prefetch
List pgsql-hackers


On 14/07/2025 4:20 am, Zhijie Hou (Fujitsu) wrote:
Thank you for the proposal ! I find it to be a very interesting feature。

I tested the patch you shared in your original email and encountered potential
deadlocks when testing pgbench TPC-B like workload. Could you please provide an
updated patch version so that I can conduct further performance experiments ?

Sorry, it was fixed in my repo: https://github.com/knizhnik/postgres/pull/3
Updated patch is attached.


Additionally, I was also exploring ways to improve performance and have tried an
alternative version of prefetch for experimentation. The alternative design is
that we assigns each non-streaming transaction to a parallel apply worker, while
strictly maintaining the order of commits. During parallel apply, if the
transactions that need to be committed before the current transaction are not
yet finished, the worker performs pre-fetch operations. Specifically, for
updates and deletes, the worker finds and caches the target local tuple to be
updated/deleted. Once all preceding transactions are committed, the parallel
apply worker uses these cached tuples to execute the actual updates or deletes.
What do you think about this alternative ? I think the alternative might offer
more stability in scenarios where shared buffer elimination occurs frequently
and avoids leaving dead tuples in the buffer. However, it also presents some
drawbacks, such as the need to add wait events to maintain commit order,
compared to the approach discussed in this thread.

So as far as I understand your PoC is doing the same as approach 1 in my proposal - prefetch of replica identity, but it is done not by parallel prefetch workers, but normal parallel apply workers when they have to wait until previous transaction is committed.  I consider it to be more complex but may be more efficient than my approach.

The obvious drawback of both your's and my approaches is that it prefetch only pages of primary index (replica identity). If there are some other indexes which keys are changed by update, then pages of such indexes will be read from the disk when you apply update. The same is also true for insert (in this case you always has to include new tuple in all indexes) - this is why I have also implemented another approach: apply operation in prefetch worker and then rollback transaction.

Also I do not quite understand how you handle invalidations? Assume that we have two transactions - T1 and T2:

T1: ... W1 Commit

T2: ...                     W1


So T1 writes tuple 1 and then commits transaction. Then T2  updates tuple 1.

If I correctly understand your approach, parallel apply worker for T2 will try to prefetch tuple 1 before T1 is committed.
But in this case it will get old version of the tuple. It is not a problem if parallel apply worker will repeat lookup and not just use cached tuple.

One more moment. As far as you assigns each non-streaming transaction to a parallel apply worker, number of such transactions is limited by assigns each non-streaming transaction to a parallel apply worker,umber of background workers. Usually it is not so large (~10). So if there were 100 parallel transactions and publishers, then at subscriber you still be able to executed concurrently not more than few of them. In this sense my approach with separate prefetch workers is more flexible: each prefetch worker can prefetch as many operations as it can.

Attachment

pgsql-hackers by date:

Previous
From: jian he
Date:
Subject: comment in index_create "text_eq" should be "texteq"
Next
From: Andrey Borodin
Date:
Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)