Home > mailing lists

Re: Logical replication prefetch - Mailing list pgsql-hackers

From	Konstantin Knizhnik
Subject	Re: Logical replication prefetch
Date	July 11 13:27:26
Msg-id	825f05e2-2344-4f7c-9dba-5a0027708dc2@garret.ru Whole thread Raw
In response to	Re: Logical replication prefetch (Amit Kapila <amit.kapila16@gmail.com>)
List	pgsql-hackers

Tree view

On 11/07/2025 11:52 am, Amit Kapila wrote:
> On Wed, Jul 9, 2025 at 12:08 AM Konstantin Knizhnik <knizhnik@garret.ru> wrote:
>> On 08/07/2025 2:51 pm, Amit Kapila wrote:
>>
>> On Tue, Jul 8, 2025 at 12:06 PM Konstantin Knizhnik <knizhnik@garret.ru> wrote:
>>
>> It is possible to enforce parallel apply of short
>> transactions using `debug_logical_replication_streaming` but then
>> performance is ~2x times slower than in case of sequential apply by
>> single worker.
>>
>> What is the reason of such a large slow down? Is it because the amount
>> of network transfer has increased without giving any significant
>> advantage because of the serialization of commits?
>>
>> No, I do not think that network traffic is somehow increased.
>> If I removed locks (just by commenting body of `pa_lock_stream` and `pa_unlock_stream` functions and callof
`pa_wait_for_xact_finish`),I get 3x speed improvement (with 4 parallel apply workers) comparing with normal mode
 
>> (when transactions are applied by main logical replication worker). So the main reason is lock overhead/contention
andde-facto serialization of transactions (in `top` I see that only one worker is active most the time.
 
>>
>> Even with simulated 0.1msec read delay, results of update tests are the following:
>>
>> normal mode: 7:40
>> forced parallel mode: 8:30
>> forced parallel mode (no locks): 1:45
>>
>> By removing serialization by commits, it is possible to
>> speedup apply 3x times and make subscriber apply changes faster then
>> producer can produce them even with multiple clients. But it is possible
>> only if transactions are independent and it can be enforced only by
>> tracking dependencies which seems to be very non-trivial and invasive.
>>
>> I still do not completely give up with tracking dependencies approach,
>> but decided first to try more simple solution - prefetching.
>>
>> Sounds reasonable, but in the long run, we should track transaction
>> dependencies and allow parallel apply of all the transactions.
>>
>> I agree.
>> I see two different approaches:
>>
>> 1. Build dependency graph: track dependency between xids when transaction is executed at publisher and then include
thisgraph in commit record.
 
>> 2. Calculate hash of replica identity key  and check that data sets of transactions do no intersect (certainly will
notworkif there are some triggers).
 
>>
> I think it is better to compute transaction dependencies on the
> subscriber side because there could be many transactions that could be
> filtered because the containing tables or rows are not published.

It is certainly true. Also tracking dependencies at subscriber doesn't 
require to change protocol and makes it possible to do parallel apply 
for old publishers.
But from the other hand it seems to be too late. It will be nice that 
just after receiving first transaction statement, apply process can make 
a decision to which parallel apply worker it should be sent.

At subscriber side it seems to be easier to calculate hash of replica 
identity key and based on this hash (or more precisely the set of hashes 
representing transaction working set) make a decision whther transaction 
interleave with some prior transaction or not and schedule it accordingly.


>>
>> But what about worst cases where these additional pre-fetches could
>> lead to removing some pages from shared_buffers, which are required by
>> the workload on the subscriber? I think you try such workloads as
>> well.
>>
>> It is the common problem of all prefetch algorithms: if size of cache where prefetch results are stored (shared
buffers,OS file cache,...) is not larger enough to keep prefetch result until it will be used,
 
>> then prefetch will not provide any improvement of performance and may be even cause some degradation.
>> So it is really challenged task to choose optimal time for prefetch operation: too early - and its results will be
thrownaway before requested, too late - executor has to wait prefetch completion or load page itself. Certainly there
issome kind of autotuning: worker performing prefetch has to wait for IO completion and executor whichm pickup page
fromcache process requests faster and so should catch up prefetch workers. Then it has to perform IO itself and start
fallbehind prefetch workers.
 
>>
>>
>>
>> I understand that it is just a POC, so you haven't figured out all the
>> details, but it would be good to know the reason of these deadlocks.
>>
>> Will investigate it.

I found the reason: conflict happen between main apply worker and 
prefetch worker which was able to catch-up main worker and so they both 
are trying to apply the same statement.
I fixed the problem by adding extra parameter to 
ExecSimpleRelationUpdate/Insert and handle prefetch as some kind of 
speculative operation.
With this change results for insert test are the following:

no prefetch: 10 min
prefetch (identity): 8 min
prefetch (full): 3 min


> I think our case is a bit different, and prefetch could even be used
> when we are able to track dependencies and achieve true parallelism.
> We can consider using prefetch to speed up dependent transactions that
> can't be parallelized.
Make sense.

pgsql-hackers by date:

From: Erik Rijkers
Date: 11 July, 13:20:36
Subject: Re: patch: Use pg_assume in jsonb_util.c to fix GCC 15 warnings

From: Evgeniy Gorbanev
Date: 11 July, 13:27:59
Subject: Missing NULL check after calling ecpg_strdup

Re: Logical replication prefetch - Mailing list pgsql-hackers

Previous

Next