Re: replication_origin and replication_origin_lsn usage on subscriber - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: replication_origin and replication_origin_lsn usage on subscriber
Date
Msg-id CAFiTN-sS+sR5fKfHhmc_3QPHK+9Wp5D79OeZpuWeFCLb+ApS_Q@mail.gmail.com
Whole thread Raw
In response to Re: replication_origin and replication_origin_lsn usage on subscriber  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: replication_origin and replication_origin_lsn usage on subscriber
List pgsql-hackers
On Thu, Jul 9, 2020 at 6:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jul 9, 2020 at 6:14 PM Petr Jelinek <petr@2ndquadrant.com> wrote:
> >
> > Hi,
> >
> > On 09/07/2020 14:34, Amit Kapila wrote:
> > > On Thu, Jul 9, 2020 at 5:16 PM Petr Jelinek <petr@2ndquadrant.com> wrote:
> > >>
> > >> On 09/07/2020 13:10, Amit Kapila wrote:
> > >>> On Thu, Feb 6, 2020 at 2:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >>>>
> > >>>> During logical decoding, we send replication_origin and
> > >>>> replication_origin_lsn when we decode commit.  In pgoutput_begin_txn,
> > >>>> we send values for these two but never used on the subscriber side.
> > >>>> Though we have provided a function (logicalrep_read_origin) to read
> > >>>> these two values but that is not used in code anywhere.
> > >>>>
> > >>
> > >> We don't use the origin message anywhere really because we don't support
> > >> origin forwarding in the built-in replication yet. That part I left out
> > >> intentionally in the original PG10 patchset as it's mostly useful for
> > >> circular replication detection when you want to replicate both ways.
> > >> However that's relatively useless without also having some kind of
> > >> conflict detection which would be another huge pile of code and I
> > >> expected we would end up not getting logical replication in PG10 at all
> > >> if I tried to push conflict detection as well :)
> > >>
> > >
> > > Fair enough.  However, without tests and more documentation about this
> > > concept, it is likely that future development might break it.  It is
> > > good that you and others who know this part well are there to respond
> > > but still, the more documentation and tests would be preferred.
> > >
> >
> > Honestly that part didn't even need to be committed given it's unused.
> > Protocol supports versioning so it could have been added at later time.
> >
> > >>>
> > >>> For the purpose of decoding in-progress transactions, I think we can
> > >>> send replication_origin in the first 'start' message as it is present
> > >>> with each WAL record, however replication_origin_lsn is only logged at
> > >>> commit time, so can't send it before commit.  The
> > >>> replication_origin_lsn is set by pg_replication_origin_xact_setup()
> > >>> but it is not clear how and when that function can be used.  Do we
> > >>> really need replication_origin_lsn before we decode the commit record?
> > >>>
> > >>
> > >> That's the SQL interface, C interface does not require that and I don't
> > >> think we need to do that.
> > >>
> > >
> > > I think when you are saying SQL interface, you referred to
> > > pg_replication_origin_xact_setup() but I am not sure which C interface
> > > you are referring to in the above sentence?
> > >
> >
> > All the stuff pg_replication_origin_xact_setup does internally.
> >
> > >> The existing apply code sets the
> > >> replorigin_session_origin_lsn only when processing commit message IIRC.
> > >>
> > >
> > > That's correct.  However, we do send it via 'begin' callback which
> > > won't be possible with the streaming of in-progress transactions.  Do
> > > we need to send this origin related information (origin, origin_lsn)
> > > while streaming of in-progress transactions?  If so, when?  As far as
> > > I can see, the origin_id can be sent with the first 'start' message.
> > > The origin_lsn and origin_commit can be sent with the last 'start' of
> > > streaming commit if we want but not sure if that is of use.  If we
> > > need to send origin_lsn earlier than that then we need to record it
> > > with other WAL records (other than Commit WAL record).
> > >
> >
> > If we were to support the origin forwarding, then strictly speaking we
> > need everything only at commit time from correctness perspective,
> >
>
> Okay.  Anyway streaming mode is optional, so in such cases, we can keep it 'off'
>
> > but
> > ideally origin_id would be best sent with first message as it can be
> > used to filter out changes at decoding stage rather than while we
> > process the commit so having it set early improves performance of decoding.
> >
>
> Yeah, makes sense.  So, we will just send origin_id (with first
> streaming start message) and leave others.

So IIUC, currently we are sending the latest origin_id which is set
during the commit time.  So in our case, while we start streaming we
will send the origin_id of the latest change in the current stream
right?  I think we will always have to remember the latest origin id
in top-level ReorderBufferTXN as well.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: DROP relation IF EXISTS Docs and Tests - Bug Fix
Next
From: Michael Paquier
Date:
Subject: Re: TAP tests and symlinks on Windows