Re: logical decoding and replication of sequences, take 2 - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: logical decoding and replication of sequences, take 2 |
Date | |
Msg-id | CAFiTN-vVGOF-jtQ-JQzBGzVoYpJh=GjCNHGUPNYSxN_SwEK=9Q@mail.gmail.com Whole thread Raw |
In response to | Re: logical decoding and replication of sequences, take 2 (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
List | pgsql-hackers |
On Wed, Dec 6, 2023 at 7:17 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > On 12/6/23 12:05, Dilip Kumar wrote: > > On Wed, Dec 6, 2023 at 3:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > >> > >>> Why can't we use the same concept of > >>> SnapBuildDistributeNewCatalogSnapshot(), I mean we keep queuing the > >>> non-transactional changes (have some base snapshot before the first > >>> change), and whenever there is any catalog change, queue new snapshot > >>> change also in the queue of the non-transactional sequence change so > >>> that while sending it to downstream whenever it is necessary we will > >>> change the historic snapshot? > >>> > >> > >> Oh, do you mean maintain different historic snapshots and then switch > >> based on the change we are processing? I guess the other thing we need > >> to consider is the order of processing the changes if we maintain > >> separate queues that need to be processed. > > > > I mean we will not specifically maintain the historic changes, but if > > there is any catalog change where we are pushing the snapshot to all > > the transaction's change queue, at the same time we will push this > > snapshot in the non-transactional sequence queue as well. I am not > > sure what is the problem with the ordering? because we will be > > queueing all non-transactional sequence changes in a separate queue in > > the order they arrive and as soon as we process the next commit we > > will process all the non-transactional changes at that time. Do you > > see issue with that? > > > > Isn't this (in principle) the idea of queuing the non-transactional > changes and then applying them on the next commit? Yes, it is. Yes, I didn't get > very far with that, but I got stuck exactly on tracking which snapshot > to use, so if there's a way to do that, that'd fix my issue. Thinking more about the snapshot issue do we need to even bother about changing the snapshot at all while streaming the non-transactional sequence changes or we can send all the non-transactional changes with a single snapshot? So mainly snapshot logically gets changed due to these 2 events case1: When any transaction gets committed which has done catalog operation (this changes the global snapshot) and case2: When within a transaction, there is some catalog change (this just updates the 'curcid' in the base snapshot of the transaction). Now, if we are thinking that we are streaming all the non-transactional sequence changes right before the next commit then we are not bothered about the (case1) at all because all changes we have queues so far are before this commit. And if we come to a (case2), if we are performing any catalog change on the sequence then the following changes on the same sequence will be considered transactional and if the changes are just on some other catalog (not relevant to our sequence operation) then also we should not be worried about command_id change because visibility of catalog lookup for our sequence will be unaffected by this. In short, I am trying to say that we can safely queue the non-transactional sequence changes and stream them based on the snapshot we got when we decode the first change, and as long as we are planning to stream just before the next commit (or next in-progress stream), we don't ever need to update the snapshot. > Also, would this mean we don't need to track the relfilenodes, if we're > able to query the catalog? Would we be able to check if the relfilenode > was created by the current xact? I think by querying the catalog and checking the xmin we should be able to figure that out, but isn't that costlier than looking up the relfilenode in hash? Because just for identifying whether the changes are transactional or non-transactional you would have to query the catalog, that means for each change before we decide whether we add to the transaction's change queue or non-transactional change queue we will have to query the catalog i.e. you will have to start/stop the transaction? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: