Re: Logical replication prefetch - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Re: Logical replication prefetch |
Date | |
Msg-id | facc2fa1-31f4-48d4-9588-1165ebafa620@garret.ru Whole thread Raw |
In response to | RE: Logical replication prefetch ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) |
Responses |
RE: Logical replication prefetch
|
List | pgsql-hackers |
Thank you for the proposal ! I find it to be a very interesting feature。 I tested the patch you shared in your original email and encountered potential deadlocks when testing pgbench TPC-B like workload. Could you please provide an updated patch version so that I can conduct further performance experiments ?
Sorry, it was fixed in my repo: https://github.com/knizhnik/postgres/pull/3
Updated patch is attached.
Additionally, I was also exploring ways to improve performance and have tried an alternative version of prefetch for experimentation. The alternative design is that we assigns each non-streaming transaction to a parallel apply worker, while strictly maintaining the order of commits. During parallel apply, if the transactions that need to be committed before the current transaction are not yet finished, the worker performs pre-fetch operations. Specifically, for updates and deletes, the worker finds and caches the target local tuple to be updated/deleted. Once all preceding transactions are committed, the parallel apply worker uses these cached tuples to execute the actual updates or deletes. What do you think about this alternative ? I think the alternative might offer more stability in scenarios where shared buffer elimination occurs frequently and avoids leaving dead tuples in the buffer. However, it also presents some drawbacks, such as the need to add wait events to maintain commit order, compared to the approach discussed in this thread.
So as far as I understand your PoC is doing the same as approach 1 in my proposal - prefetch of replica identity, but it is done not by parallel prefetch workers, but normal parallel apply workers when they have to wait until previous transaction is committed. I consider it to be more complex but may be more efficient than my approach.
The obvious drawback of both your's and my approaches is that it prefetch only pages of primary index (replica identity). If there are some other indexes which keys are changed by update, then pages of such indexes will be read from the disk when you apply update. The same is also true for insert (in this case you always has to include new tuple in all indexes) - this is why I have also implemented another approach: apply operation in prefetch worker and then rollback transaction.
Also I do not quite understand how you handle invalidations? Assume that we have two transactions - T1 and T2:
T1: ... W1 Commit
T2: ... W1
So T1 writes tuple 1 and then commits transaction. Then T2 updates tuple 1.
If I correctly understand your approach, parallel apply worker for T2 will try to prefetch tuple 1 before T1 is committed.
But in this case it will get old version of the tuple. It is not a problem if parallel apply worker will repeat lookup and not just use cached tuple.
One more moment. As far as you assigns each non-streaming transaction to a parallel apply worker, number of such transactions is limited by assigns each non-streaming transaction to a parallel apply worker,umber of background workers. Usually it is not so large (~10). So if there were 100 parallel transactions and publishers, then at subscriber you still be able to executed concurrently not more than few of them. In this sense my approach with separate prefetch workers is more flexible: each prefetch worker can prefetch as many operations as it can.
Attachment
pgsql-hackers by date: