Re: replication_origin and replication_origin_lsn usage on subscriber - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: replication_origin and replication_origin_lsn usage on subscriber |
Date | |
Msg-id | CAFiTN-sS+sR5fKfHhmc_3QPHK+9Wp5D79OeZpuWeFCLb+ApS_Q@mail.gmail.com Whole thread Raw |
In response to | Re: replication_origin and replication_origin_lsn usage on subscriber (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: replication_origin and replication_origin_lsn usage on subscriber
|
List | pgsql-hackers |
On Thu, Jul 9, 2020 at 6:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Jul 9, 2020 at 6:14 PM Petr Jelinek <petr@2ndquadrant.com> wrote: > > > > Hi, > > > > On 09/07/2020 14:34, Amit Kapila wrote: > > > On Thu, Jul 9, 2020 at 5:16 PM Petr Jelinek <petr@2ndquadrant.com> wrote: > > >> > > >> On 09/07/2020 13:10, Amit Kapila wrote: > > >>> On Thu, Feb 6, 2020 at 2:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > >>>> > > >>>> During logical decoding, we send replication_origin and > > >>>> replication_origin_lsn when we decode commit. In pgoutput_begin_txn, > > >>>> we send values for these two but never used on the subscriber side. > > >>>> Though we have provided a function (logicalrep_read_origin) to read > > >>>> these two values but that is not used in code anywhere. > > >>>> > > >> > > >> We don't use the origin message anywhere really because we don't support > > >> origin forwarding in the built-in replication yet. That part I left out > > >> intentionally in the original PG10 patchset as it's mostly useful for > > >> circular replication detection when you want to replicate both ways. > > >> However that's relatively useless without also having some kind of > > >> conflict detection which would be another huge pile of code and I > > >> expected we would end up not getting logical replication in PG10 at all > > >> if I tried to push conflict detection as well :) > > >> > > > > > > Fair enough. However, without tests and more documentation about this > > > concept, it is likely that future development might break it. It is > > > good that you and others who know this part well are there to respond > > > but still, the more documentation and tests would be preferred. > > > > > > > Honestly that part didn't even need to be committed given it's unused. > > Protocol supports versioning so it could have been added at later time. > > > > >>> > > >>> For the purpose of decoding in-progress transactions, I think we can > > >>> send replication_origin in the first 'start' message as it is present > > >>> with each WAL record, however replication_origin_lsn is only logged at > > >>> commit time, so can't send it before commit. The > > >>> replication_origin_lsn is set by pg_replication_origin_xact_setup() > > >>> but it is not clear how and when that function can be used. Do we > > >>> really need replication_origin_lsn before we decode the commit record? > > >>> > > >> > > >> That's the SQL interface, C interface does not require that and I don't > > >> think we need to do that. > > >> > > > > > > I think when you are saying SQL interface, you referred to > > > pg_replication_origin_xact_setup() but I am not sure which C interface > > > you are referring to in the above sentence? > > > > > > > All the stuff pg_replication_origin_xact_setup does internally. > > > > >> The existing apply code sets the > > >> replorigin_session_origin_lsn only when processing commit message IIRC. > > >> > > > > > > That's correct. However, we do send it via 'begin' callback which > > > won't be possible with the streaming of in-progress transactions. Do > > > we need to send this origin related information (origin, origin_lsn) > > > while streaming of in-progress transactions? If so, when? As far as > > > I can see, the origin_id can be sent with the first 'start' message. > > > The origin_lsn and origin_commit can be sent with the last 'start' of > > > streaming commit if we want but not sure if that is of use. If we > > > need to send origin_lsn earlier than that then we need to record it > > > with other WAL records (other than Commit WAL record). > > > > > > > If we were to support the origin forwarding, then strictly speaking we > > need everything only at commit time from correctness perspective, > > > > Okay. Anyway streaming mode is optional, so in such cases, we can keep it 'off' > > > but > > ideally origin_id would be best sent with first message as it can be > > used to filter out changes at decoding stage rather than while we > > process the commit so having it set early improves performance of decoding. > > > > Yeah, makes sense. So, we will just send origin_id (with first > streaming start message) and leave others. So IIUC, currently we are sending the latest origin_id which is set during the commit time. So in our case, while we start streaming we will send the origin_id of the latest change in the current stream right? I think we will always have to remember the latest origin id in top-level ReorderBufferTXN as well. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: