Re: logical replication empty transactions - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: logical replication empty transactions
Date
Msg-id CAA4eK1KnPYE9tsa-E+KVj7HFaZOTSexSqqkODqg7ARmXyrc-9A@mail.gmail.com
Whole thread Raw
In response to Re: logical replication empty transactions  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
On Tue, Mar 3, 2020 at 2:17 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Mar 3, 2020 at 1:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Mar 3, 2020 at 9:35 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Mon, Mar 2, 2020 at 4:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > >
> > > > One thing that is not clear to me is how will we advance restart_lsn
> > > > if we don't send any empty xact in a system where there are many such
> > > > xacts?  IIRC, the restart_lsn is advanced based on confirmed_flush lsn
> > > > sent by subscriber.  After this change, the subscriber won't be able
> > > > to send the confirmed_flush and for a long time, we won't be able to
> > > > advance restart_lsn.  Is that correct, if so, why do we think that is
> > > > acceptable?  One might argue that restart_lsn will be advanced as soon
> > > > as we send the first non-empty xact, but not sure if that is good
> > > > enough.  What do you think?
> > >
> > > It seems like a valid point.  One idea could be that we can track the
> > > last commit LSN which we streamed and if the confirmed flush location
> > > is already greater than that then even if we skip the sending the
> > > commit message we can increase the confirm flush location locally.
> > > Logically, it should not cause any problem because once we have got
> > > the confirmation for whatever we have streamed so far.  So for other
> > > commits(which we are skipping), we can we advance it locally because
> > > we are sure that we don't have any streamed commit which is not yet
> > > confirmed by the subscriber.
> > >
> >
> > Will this work after restart?  Do you want to persist the information
> > of last streamed commit LSN?
>
> We will not persist the last streamed commit LSN, this variable is in
> memory just to track whether we have got confirmation up to that
> location or not,  once we have confirmation up to that location and if
> we are not streaming any transaction (because those are empty
> transactions) then we can just advance the confirmed flush location
> and based on that we can update the restart point as well and those
> will be persisted.  Basically, "last streamed commit LSN" is just a
> marker that their still something pending to be confirmed from the
> subscriber so until that we can not simply advance the confirm flush
> location or restart point based on the empty transactions.  But, if
> there is nothing pending to be confirmed we can advance.  So if we are
> streaming then we will get confirmation from subscriber otherwise we
> can advance it locally.  So, in either case, the confirmed flush
> location and restart point will keep moving.
>

Okay, so this might work out, but it might look a bit ad-hoc.

> >
> > >   This is just my thought, but if we
> > > think from the code and design perspective then it might complicate
> > > the things and sounds hackish.
> > >
> >
> > Another idea could be that we stream the transaction after some
> > threshold number (say 100 or anything we think is reasonable) of empty
> > xacts.  This will reduce the traffic without tinkering with the core
> > design too much.
>
> Yeah, this could be also an option.
>

Okay.

Peter E, Petr J, others, do you have any opinion on what is the best
way forward for this thread?  I think it would be really good if we
can reduce the network traffic due to these empty transactions.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: reindex concurrently and two toast indexes
Next
From: Kalvin Eng
Date:
Subject: [GSoC 2020] Questions About Performance Farm Benchmarks and Website