Re: logical decoding and replication of sequences, take 2 - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: logical decoding and replication of sequences, take 2
Date
Msg-id CAFiTN-vVGOF-jtQ-JQzBGzVoYpJh=GjCNHGUPNYSxN_SwEK=9Q@mail.gmail.com
Whole thread Raw
In response to Re: logical decoding and replication of sequences, take 2  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
On Wed, Dec 6, 2023 at 7:17 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> On 12/6/23 12:05, Dilip Kumar wrote:
> > On Wed, Dec 6, 2023 at 3:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >>
> >>> Why can't we use the same concept of
> >>> SnapBuildDistributeNewCatalogSnapshot(), I mean we keep queuing the
> >>> non-transactional changes (have some base snapshot before the first
> >>> change), and whenever there is any catalog change, queue new snapshot
> >>> change also in the queue of the non-transactional sequence change so
> >>> that while sending it to downstream whenever it is necessary we will
> >>> change the historic snapshot?
> >>>
> >>
> >> Oh, do you mean maintain different historic snapshots and then switch
> >> based on the change we are processing? I guess the other thing we need
> >> to consider is the order of processing the changes if we maintain
> >> separate queues that need to be processed.
> >
> > I mean we will not specifically maintain the historic changes, but if
> > there is any catalog change where we are pushing the snapshot to all
> > the transaction's change queue, at the same time we will push this
> > snapshot in the non-transactional sequence queue as well.  I am not
> > sure what is the problem with the ordering? because we will be
> > queueing all non-transactional sequence changes in a separate queue in
> > the order they arrive and as soon as we process the next commit we
> > will process all the non-transactional changes at that time.  Do you
> > see issue with that?
> >
>
> Isn't this (in principle) the idea of queuing the non-transactional
> changes and then applying them on the next commit?

Yes, it is.

 Yes, I didn't get
> very far with that, but I got stuck exactly on tracking which snapshot
> to use, so if there's a way to do that, that'd fix my issue.

Thinking more about the snapshot issue do we need to even bother about
changing the snapshot at all while streaming the non-transactional
sequence changes or we can send all the non-transactional changes with
a single snapshot? So mainly snapshot logically gets changed due to
these 2 events case1: When any transaction gets committed which has
done catalog operation (this changes the global snapshot) and case2:
When within a transaction, there is some catalog change (this just
updates the 'curcid' in the base snapshot of the transaction).

Now, if we are thinking that we are streaming all the
non-transactional sequence changes right before the next commit then
we are not bothered about the (case1) at all because all changes we
have queues so far are before this commit.   And if we come to a
(case2), if we are performing any catalog change on the sequence then
the following changes on the same sequence will be considered
transactional and if the changes are just on some other catalog (not
relevant to our sequence operation) then also we should not be worried
about command_id change because visibility of catalog lookup for our
sequence will be unaffected by this.

In short, I am trying to say that we can safely queue the
non-transactional sequence changes and stream them based on the
snapshot we got when we decode the first change, and as long as we are
planning to stream just before the next commit (or next in-progress
stream), we don't ever need to update the snapshot.

> Also, would this mean we don't need to track the relfilenodes, if we're
> able to query the catalog? Would we be able to check if the relfilenode
> was created by the current xact?

I think by querying the catalog and checking the xmin we should be
able to figure that out, but isn't that costlier than looking up the
relfilenode in hash?  Because just for identifying whether the changes
are transactional or non-transactional you would have to query the
catalog, that means for each change before we decide whether we add to
the transaction's change queue or non-transactional change queue we
will have to query the catalog i.e. you will have to start/stop the
transaction?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: pg_upgrade and logical replication
Next
From: Amit Kapila
Date:
Subject: Re: logical decoding and replication of sequences, take 2