Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Date
Msg-id CAA4eK1L-KYycdTYanqo3nDzw=XWvADOuerHtbBSnBiRejmE3Qg@mail.gmail.com
Whole thread Raw
In response to Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
Responses Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
List pgsql-hackers
On Tue, Dec 24, 2019 at 11:17 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Fri, 20 Dec 2019 at 22:30, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > The main aim of this feature is to reduce apply lag.  Because if we
> > send all the changes together it can delay there apply because of
> > network delay, whereas if most of the changes are already sent, then
> > we will save the effort on sending the entire data at commit time.
> > This in itself gives us decent benefits.  Sure, we can further improve
> > it by having separate workers (dedicated to apply the changes) as you
> > are suggesting and in fact, there is a patch for that as well(see the
> > performance results and bgworker patch at [1]), but if try to shove in
> > all the things in one go, then it will be difficult to get this patch
> > committed (there are already enough things and the patch is quite big
> > that to get it right takes a lot of energy).  So, the plan is
> > something like that first we get the basic feature and then try to
> > improve by having dedicated workers or things like that.  Does this
> > make sense to you?
> >
>
> Thank you for explanation. The plan makes sense. But I think in the
> current design it's a problem that logical replication worker doesn't
> receive changes (and doesn't check interrupts) during applying
> committed changes even if we don't have a worker dedicated for
> applying. I think the worker should continue to receive changes and
> save them to temporary files even during applying changes.
>

Won't it beat the purpose of this feature which is to reduce the apply
lag?  Basically, it can so happen that while applying commit, it
constantly gets changes of other transactions which will delay the
apply of the current transaction.  Also, won't it create some further
work to identify the order of commits?  Say while applying commit-1,
it receives 5 other commits that are written to separate temporary
files.  How will we later identify which transaction's WAL we need to
apply first?  We might deduce by LSN's, but I think that could be
tricky.  Another thing is that I think it could lead to some design
complications as well because while applying commit, you need some
sort of callback or something like that to receive and flush totally
unrelated changes.  It could lead to another kind of failure mode
wherein while applying commit if it tries to receive another
transaction data and some failure happens while writing the data of
that transaction.  I am not sure if it is a good idea to try something
like that.

> Otherwise
> the buffer would be easily full and replication gets stuck.
>

Are you telling about network buffer?  I think the best way as
discussed is to launch new workers for streamed transactions, but we
can do that as an additional feature. Anyway, as proposed, users can
choose the streaming mode for subscriptions, so there is an option to
turn this selectively.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Re: Columns correlation and adaptive query optimization
Next
From: Julien Rouhaud
Date:
Subject: Re: Should we rename amapi.h and amapi.c?