Re: Logical replication prefetch - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: Logical replication prefetch
Date
Msg-id c41bd9ea-2812-40cf-be6b-4691795f09b9@garret.ru
Whole thread Raw
In response to Re: Logical replication prefetch  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On 15/07/2025 2:31 PM, Amit Kapila wrote:
> If you are interested, I would like to know your opinion on a somewhat
> related topic, which has triggered my interest in your patch. We are
> working on an update_delete conflict detection patch. The exact
> problem was explained in the initial email [1]. The basic idea to
> resolve the problem is that on the subscriber, we maintain a slot that
> will help in retaining dead tuples for a certain period of time till
> the concurrent transactions have been applied to the subscriber. You
> can read the commit message of the first patch in email [2]. Now, the
> problem we are facing is that because of replication LAG in a scenario
> similar to what we are discussing here, such that when there are many
> clients on the publisher and a single apply worker on the subscriber,
> the slot takes more time to get advanced. This will lead to retention
> of dead tuples, which further slows down apply worker especially for
> update workloads. Apart from apply, the other transactions running on
> the system (say pgbench kind of workload on the subscriber) also
> became slower because of the retention of dead tuples.
>
> Now, for the workloads where the LAG is not there, like when one
> splits the workload with options mentioned above (split workload among
> pub-sub in some way) or the workload doesn't consist of a large number
> of clients operating on the publisher and subscriber at the same time,
> etc. we don't observe any major slowdown on the subscriber.
>
> We would like to solicit your opinion as you seem to have some
> experience with LR users, whether one can use this feature in cases
> where required by enabling it at the subscription level. They will
> have the facility to disable it if they face any performance
> regression or additional bloat. Now, after having that feature, we can
> work on additional features such as prefetch or parallel apply that
> will reduce the chances of LAG, making the feature more broadly used.
> Does that sound reasonable to you? Feel free to ignore giving your
> opinion if you are not interested in that work.
>
> [1] -
https://www.postgresql.org/message-id/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2%40OS0PR01MB5716.jpnprd01.prod.outlook.com
> [2] -
https://www.postgresql.org/message-id/OS0PR01MB5716ECC539008C85E7AB65C5944FA%40OS0PR01MB5716.jpnprd01.prod.outlook.com
>

I am very sorry for delay with answer - it was very busy week.
I hope that I understand the problem and proposed approach to solve it 
(it actually seems to be quite straightforward and similar with 
`hot_standby_feedback`). And definitely suffering from the same problem: 
blown database because of lagged slots.
But it really hard to propose some other solution (rather than backward 
scan of WAL, but it seems to be completely unacceptable).

Concerning user's experience... First of all disclaimer: I am first of 
all programmer and not DBA. Yes, I have investigated many support cases, 
but still it is hard to expect that I have the full picture. There is 
even no consensus concerning `hot_standby_feedback`! It is still 
disabled by default in Postgres and in Neon. It makes sense for vanilla 
Postgres, where replicas are first of all used for HA and only secondary 
- for load balancing of read only queries.
But in Neon HA is provided in different way and the only sense of 
creating RO replicas is load balancing (mostly for OLAP queries).
Execution of heavy OLAP queries without `hot_stanbdy_feedback` is some 
kind of "russian roulette", because probability of conflict with 
recovery is very high. But still we are using Postgres default.

But situation with feedback slots may be different: as far as I 
understand it is mostly needed for bidirectional replication and 
automatic conflict resolution.
So it is assumed as part of some distributed system (like BDR), rather 
than feature used directly by Postgres users.

I still a little bit depressed by complexity of LR and all related 
aspects. But unlikely it is possible to invent something more elegant 
and simpler:)





pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: MergeJoin beats HashJoin in the case of multiple hash clauses
Next
From: Tom Lane
Date:
Subject: Re: pg_logical_slot_get_changes waits continously for a partial WAL record spanning across 2 pages