Re: Logical replication prefetch - Mailing list pgsql-hackers
| From | Konstantin Knizhnik |
|---|---|
| Subject | Re: Logical replication prefetch |
| Date | |
| Msg-id | 26dcc7a3-c3c1-44a4-87e0-bfc68fe7901d@garret.ru Whole thread Raw |
| In response to | Logical replication prefetch (Konstantin Knizhnik <knizhnik@garret.ru>) |
| Responses |
Re: Logical replication prefetch
|
| List | pgsql-hackers |
On 08/07/2025 2:51 pm, Amit Kapila wrote:
> On Tue, Jul 8, 2025 at 12:06 PM Konstantin Knizhnik <knizhnik@garret.ru> wrote:
>> There is well known Postgres problem that logical replication subscriber
>> can not caught-up with publisher just because LR changes are applied by
>> single worker and at publisher changes are made by
>> multiple concurrent backends. The problem is not logical replication
>> specific: physical replication stream is also handled by single
>> walreceiver. But for physical replication Postgres now implements
>> prefetch: looking at WAL record blocks it is quite easy to predict which
>> pages will be required for redo and prefetch them. With logical
>> replication situation is much more complicated.
>>
>> My first idea was to implement parallel apply of transactions. But to do
>> it we need to track dependencies between transactions. Right now
>> Postgres can apply transactions in parallel, but only if they are
>> streamed (which is done only for large transactions) and serialize them
>> by commits. It is possible to enforce parallel apply of short
>> transactions using `debug_logical_replication_streaming` but then
>> performance is ~2x times slower than in case of sequential apply by
>> single worker.
>>
> What is the reason of such a large slow down? Is it because the amount
> of network transfer has increased without giving any significant
> advantage because of the serialization of commits?
It is not directly related with subj, but I do not understand this code:
```
/*
* Stop the worker if there are enough workers in the pool.
*
* XXX Additionally, we also stop the worker if the leader apply worker
* serialize part of the transaction data due to a send timeout.
This is
* because the message could be partially written to the queue and
there
* is no way to clean the queue other than resending the message
until it
* succeeds. Instead of trying to send the data which anyway would have
* been serialized and then letting the parallel apply worker deal with
* the spurious message, we stop the worker.
*/
if (winfo->serialize_changes ||
list_length(ParallelApplyWorkerPool) >
(max_parallel_apply_workers_per_subscription / 2))
{
logicalrep_pa_worker_stop(winfo);
pa_free_worker_info(winfo);
return;
}
```
It stops worker if number fo workers in pool is more than half of
`max_parallel_apply_workers_per_subscription`.
What I see is that `pa_launch_parallel_worker` spawns new workers and
after completion of transaction it is immediately terminated.
Actually this leads to awful slowdown of apply process.
If I just disable and all
`max_parallel_apply_workers_per_subscription`are actually used for
applying transactions, then time of parallel apply with 4 workers is 6
minutes comparing with 10 minutes fr applying all transactions by main
workers. It is still not so larger improvement, but at least it is
improvement and not degradation.
pgsql-hackers by date: