Re: Logical replication prefetch - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Logical replication prefetch
Date
Msg-id CAA4eK1JuKQX397YNVWDgig6B_QVeb8eOn4UMruKewx8=2XUv4w@mail.gmail.com
Whole thread Raw
In response to Re: Logical replication prefetch  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Mon, Jul 14, 2025 at 3:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sun, Jul 13, 2025 at 6:06 PM Konstantin Knizhnik <knizhnik@garret.ru> wrote:
> >
> > On 13/07/2025 1:28 pm, Amit Kapila wrote:
> > > On Tue, Jul 8, 2025 at 12:06 PM Konstantin Knizhnik <knizhnik@garret.ru> wrote:
> > >> There is well known Postgres problem that logical replication subscriber
> > >> can not caught-up with publisher just because LR changes are applied by
> > >> single worker and at publisher changes are made by
> > >> multiple concurrent backends.
> > >>
> > > BTW, do you know how users deal with this lag? For example, one can
> > > imagine creating multiple pub-sub pairs for different sets of tables
> > > so that the workload on the subscriber could also be shared by
> > > multiple apply workers. I can also think of splitting the workload
> > > among multiple pub-sub pairs by using row filters
> >
> >
> > Yes, I saw that users starts several subscriptions/publications to
> > receive and apply changes in parallel.
> > But it can not be considered as universal solution:
> > 1. Not always there are multiple tables (or partitions of one one table)
> > so that it it possible to split them between multiple publications.
> > 2. It violates transactional behavior (consistency): if transactions
> > update several tables included in different publications then applying
> > this changes independently, we can observe at replica behaviour when one
> > table is update - and another - not. The same is true for row filters.
> > 3. Each walsender will have to scan WAL, so having N subscriptions we
> > have to read and decode WAL N times.
> >
>
> I agree that it is not a solution which can be applied in all cases
> and neither I want to say that we shouldn't pursue the idea of
> prefetch or parallel apply to improve the speed of apply. It was just
> to know/discuss how users try to workaround lag for cases where the
> lag is large.
>

If you are interested, I would like to know your opinion on a somewhat
related topic, which has triggered my interest in your patch. We are
working on an update_delete conflict detection patch. The exact
problem was explained in the initial email [1]. The basic idea to
resolve the problem is that on the subscriber, we maintain a slot that
will help in retaining dead tuples for a certain period of time till
the concurrent transactions have been applied to the subscriber. You
can read the commit message of the first patch in email [2]. Now, the
problem we are facing is that because of replication LAG in a scenario
similar to what we are discussing here, such that when there are many
clients on the publisher and a single apply worker on the subscriber,
the slot takes more time to get advanced. This will lead to retention
of dead tuples, which further slows down apply worker especially for
update workloads. Apart from apply, the other transactions running on
the system (say pgbench kind of workload on the subscriber) also
became slower because of the retention of dead tuples.

Now, for the workloads where the LAG is not there, like when one
splits the workload with options mentioned above (split workload among
pub-sub in some way) or the workload doesn't consist of a large number
of clients operating on the publisher and subscriber at the same time,
etc. we don't observe any major slowdown on the subscriber.

We would like to solicit your opinion as you seem to have some
experience with LR users, whether one can use this feature in cases
where required by enabling it at the subscription level. They will
have the facility to disable it if they face any performance
regression or additional bloat. Now, after having that feature, we can
work on additional features such as prefetch or parallel apply that
will reduce the chances of LAG, making the feature more broadly used.
Does that sound reasonable to you? Feel free to ignore giving your
opinion if you are not interested in that work.

[1] -
https://www.postgresql.org/message-id/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2%40OS0PR01MB5716.jpnprd01.prod.outlook.com
[2] -
https://www.postgresql.org/message-id/OS0PR01MB5716ECC539008C85E7AB65C5944FA%40OS0PR01MB5716.jpnprd01.prod.outlook.com



--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: track needed attributes in plan nodes for executor use
Next
From: Shayon Mukherjee
Date:
Subject: Re: [PATCH] Proposal to Enable/Disable Index using ALTER INDEX