Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Date
Msg-id CA+fd4k4ZO2mR34fZOrG_DFp4kr1sMXtSEP5_5MrG3SxVOW8XBA@mail.gmail.com
Whole thread Raw
In response to Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
List pgsql-hackers
On Tue, 24 Dec 2019 at 17:21, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Dec 24, 2019 at 11:17 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Fri, 20 Dec 2019 at 22:30, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > The main aim of this feature is to reduce apply lag.  Because if we
> > > send all the changes together it can delay there apply because of
> > > network delay, whereas if most of the changes are already sent, then
> > > we will save the effort on sending the entire data at commit time.
> > > This in itself gives us decent benefits.  Sure, we can further improve
> > > it by having separate workers (dedicated to apply the changes) as you
> > > are suggesting and in fact, there is a patch for that as well(see the
> > > performance results and bgworker patch at [1]), but if try to shove in
> > > all the things in one go, then it will be difficult to get this patch
> > > committed (there are already enough things and the patch is quite big
> > > that to get it right takes a lot of energy).  So, the plan is
> > > something like that first we get the basic feature and then try to
> > > improve by having dedicated workers or things like that.  Does this
> > > make sense to you?
> > >
> >
> > Thank you for explanation. The plan makes sense. But I think in the
> > current design it's a problem that logical replication worker doesn't
> > receive changes (and doesn't check interrupts) during applying
> > committed changes even if we don't have a worker dedicated for
> > applying. I think the worker should continue to receive changes and
> > save them to temporary files even during applying changes.
> >
>
> Won't it beat the purpose of this feature which is to reduce the apply
> lag?  Basically, it can so happen that while applying commit, it
> constantly gets changes of other transactions which will delay the
> apply of the current transaction.

You're right. But it seems to me that it optimizes the apply lags of
only a transaction that made many changes. On the other hand if a
transaction made many changes applying of subsequent changes are
delayed.

>  Also, won't it create some further
> work to identify the order of commits?  Say while applying commit-1,
> it receives 5 other commits that are written to separate temporary
> files.  How will we later identify which transaction's WAL we need to
> apply first?  We might deduce by LSN's, but I think that could be
> tricky.  Another thing is that I think it could lead to some design
> complications as well because while applying commit, you need some
> sort of callback or something like that to receive and flush totally
> unrelated changes.  It could lead to another kind of failure mode
> wherein while applying commit if it tries to receive another
> transaction data and some failure happens while writing the data of
> that transaction.  I am not sure if it is a good idea to try something
> like that.

It's just an idea but we might want to have new workers dedicated to
apply changes first and then we will have streaming option later. That
way we can reduce the flush lags depending on use cases. The commit
order can be  determined by the receiver and shared with the applyer
in shared memory. Once we separated workers the streaming option can
be introduced without such a downside.

>
> > Otherwise
> > the buffer would be easily full and replication gets stuck.
> >
>
> Are you telling about network buffer?

Yes.

>   I think the best way as
> discussed is to launch new workers for streamed transactions, but we
> can do that as an additional feature. Anyway, as proposed, users can
> choose the streaming mode for subscriptions, so there is an option to
> turn this selectively.

Yes. But user who wants to use this feature would want to replicate
many changes but I guess the side effect is quite big. I think that at
least we need to make the logical replication tolerate such situation.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: table partitioning and access privileges
Next
From: Michael Paquier
Date:
Subject: Re: error context for vacuum to include block number