Re: Logical replication prefetch - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Re: Logical replication prefetch |
Date | |
Msg-id | c41bd9ea-2812-40cf-be6b-4691795f09b9@garret.ru Whole thread Raw |
In response to | Re: Logical replication prefetch (Amit Kapila <amit.kapila16@gmail.com>) |
List | pgsql-hackers |
On 15/07/2025 2:31 PM, Amit Kapila wrote: > If you are interested, I would like to know your opinion on a somewhat > related topic, which has triggered my interest in your patch. We are > working on an update_delete conflict detection patch. The exact > problem was explained in the initial email [1]. The basic idea to > resolve the problem is that on the subscriber, we maintain a slot that > will help in retaining dead tuples for a certain period of time till > the concurrent transactions have been applied to the subscriber. You > can read the commit message of the first patch in email [2]. Now, the > problem we are facing is that because of replication LAG in a scenario > similar to what we are discussing here, such that when there are many > clients on the publisher and a single apply worker on the subscriber, > the slot takes more time to get advanced. This will lead to retention > of dead tuples, which further slows down apply worker especially for > update workloads. Apart from apply, the other transactions running on > the system (say pgbench kind of workload on the subscriber) also > became slower because of the retention of dead tuples. > > Now, for the workloads where the LAG is not there, like when one > splits the workload with options mentioned above (split workload among > pub-sub in some way) or the workload doesn't consist of a large number > of clients operating on the publisher and subscriber at the same time, > etc. we don't observe any major slowdown on the subscriber. > > We would like to solicit your opinion as you seem to have some > experience with LR users, whether one can use this feature in cases > where required by enabling it at the subscription level. They will > have the facility to disable it if they face any performance > regression or additional bloat. Now, after having that feature, we can > work on additional features such as prefetch or parallel apply that > will reduce the chances of LAG, making the feature more broadly used. > Does that sound reasonable to you? Feel free to ignore giving your > opinion if you are not interested in that work. > > [1] - https://www.postgresql.org/message-id/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2%40OS0PR01MB5716.jpnprd01.prod.outlook.com > [2] - https://www.postgresql.org/message-id/OS0PR01MB5716ECC539008C85E7AB65C5944FA%40OS0PR01MB5716.jpnprd01.prod.outlook.com > I am very sorry for delay with answer - it was very busy week. I hope that I understand the problem and proposed approach to solve it (it actually seems to be quite straightforward and similar with `hot_standby_feedback`). And definitely suffering from the same problem: blown database because of lagged slots. But it really hard to propose some other solution (rather than backward scan of WAL, but it seems to be completely unacceptable). Concerning user's experience... First of all disclaimer: I am first of all programmer and not DBA. Yes, I have investigated many support cases, but still it is hard to expect that I have the full picture. There is even no consensus concerning `hot_standby_feedback`! It is still disabled by default in Postgres and in Neon. It makes sense for vanilla Postgres, where replicas are first of all used for HA and only secondary - for load balancing of read only queries. But in Neon HA is provided in different way and the only sense of creating RO replicas is load balancing (mostly for OLAP queries). Execution of heavy OLAP queries without `hot_stanbdy_feedback` is some kind of "russian roulette", because probability of conflict with recovery is very high. But still we are using Postgres default. But situation with feedback slots may be different: as far as I understand it is mostly needed for bidirectional replication and automatic conflict resolution. So it is assumed as part of some distributed system (like BDR), rather than feature used directly by Postgres users. I still a little bit depressed by complexity of LR and all related aspects. But unlikely it is possible to invent something more elegant and simpler:)
pgsql-hackers by date: