Re: logical replication empty transactions - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: logical replication empty transactions
Date
Msg-id CAFiTN-uxKFEV+AEobAUeY2AG=SQWkryt1C46FWutONNe0SiYcQ@mail.gmail.com
Whole thread Raw
In response to Re: logical replication empty transactions  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: logical replication empty transactions  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Wed, Mar 4, 2020 at 10:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Mar 4, 2020 at 9:52 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Mar 4, 2020 at 9:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Mar 4, 2020 at 7:17 AM Euler Taveira
> > > <euler.taveira@2ndquadrant.com> wrote:
> > > >
> > > > On Tue, 3 Mar 2020 at 05:24, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >>
> > > >>
> > > >> Another idea could be that we stream the transaction after some
> > > >> threshold number (say 100 or anything we think is reasonable) of empty
> > > >> xacts.  This will reduce the traffic without tinkering with the core
> > > >> design too much.
> > > >>
> > > >>
> > > > Amit, I suggest an interval to control this setting. Time is something we have control; transactions aren't
(dependingon workload). pg_stat_replication query interval usually is not milliseconds, however, you can execute
thousandsof transactions in a second. If we agree on that idea I can add it to the patch. 
> > > >
> > >
> > > Do you mean to say that if for some threshold interval we didn't
> > > stream any transaction, then we can send the next empty transaction to
> > > the subscriber?  If so, then isn't it possible that the empty xacts
> > > happen irregularly after the specified interval and then we still end
> > > up sending them all.  I might be missing something here, so can you
> > > please explain your idea in detail?  Basically, how will it work and
> > > how will it solve the problem.
> >
> > IMHO, the threshold should be based on the commit LSN.  Our main
> > reason we want to send empty transactions after a certain
> > transaction/duration is that we want the restart_lsn to be moving
> > forward so that if we need to restart the replication slot we don't
> > need to process a lot of extra WAL.  So assume we set the threshold
> > based on transaction count then there is still a possibility that we
> > might process a few very big transactions then we will have to process
> > them again after the restart.
> >
>
> Won't the subscriber eventually send the flush location for the large
> transactions which will move the restart_lsn?

I meant large empty transactions (basically we can not send anything
to the subscriber).  So my point was if there are only large
transactions in the system which we can not stream because those
tables are not published.  Then keeping threshold based on transaction
count will not help much because even if we don't reach the
transaction count threshold, we still might need to process a lot of
data if we don't stream the commit for the empty transactions.  So
instead of tracking transaction count can we track LSN,   and LSN
different since we last stream some change cross the threshold then we
will stream the next empty transaction.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: Unicode escapes with any backend encoding
Next
From: Tom Lane
Date:
Subject: Re: Cast to uint16 in pg_checksum_page()