Thread: Re: Tablesync early exit
Patch v2 is the same; it only needed re-basing to the latest HEAD. ---- Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Mon, Aug 30, 2021 at 8:50 AM Peter Smith <smithpb2250@gmail.com> wrote: > > Patch v2 is the same; it only needed re-basing to the latest HEAD. > Why do you think it is correct to exit before trying to receive any message? How will we ensure whether the apply worker has processed any message? At the beginning of function LogicalRepApplyLoop(), last_received is the LSN where the copy has finished in the case of tablesync worker. I think we need to receive the message before trying to ensure whether we have synced with the apply worker or not. -- With Regards, Amit Kapila.
On Wed, Mar 16, 2022 at 4:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Aug 30, 2021 at 8:50 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > Patch v2 is the same; it only needed re-basing to the latest HEAD. > > > > Why do you think it is correct to exit before trying to receive any > message? I think the STATE_CATCHUP state guarantees the apply worker must have received (or tried to receive) a message. See the next answer. > How will we ensure whether the apply worker has processed any > message? All this patch code does is call process_syncing_tables, which delegates to process_syncing_tables_for_sync (because the call is from a tablesync worker). This function code can’t do anything unless the tablesync worker is in STATE_CATCHUP state, and that cannot happen unless it was explicitly set to that state by the apply worker. On the other side of the coin, the apply worker can only set that syncworker->relstate = SUBREL_STATE_CATCHUP from within function process_syncing_tables_for_apply, and AFAIK that function is only called when the apply worker has either handled a message, (or the walrcv_receive in the LogicalRepApplyLoop received nothing). So I think the STATE_CATCHUP mechanism itself ensures the apply worker *must* have already processed a message (or there was no message to process). > At the beginning of function LogicalRepApplyLoop(), > last_received is the LSN where the copy has finished in the case of > tablesync worker. I think we need to receive the message before trying > to ensure whether we have synced with the apply worker or not. > I think the STATE_CATCHUP guarantees the apply worker must have received (or tried to receive) a message. See the previous answer. ~~~ AFAIK this patch is OK, but since it is not particularly urgent I've bumped this to the next CommitFest [1] instead of trying to jam it into PG15 at the last minute. BTW - There were some useful logfiles I captured a very long time ago [2]. They show the behaviour without/with this patch. ------ [1] https://commitfest.postgresql.org/37/3062/ [2] https://www.postgresql.org/message-id/CAHut+Ptjk-Qgd3R1a1_tr62CmiswcYphuv0pLmVA-+2s8r0Bkw@mail.gmail.com Kind Regards, Peter Smith Fujitsu Australia
On Fri, Apr 1, 2022 at 1:52 PM Peter Smith <smithpb2250@gmail.com> wrote: > > On Wed, Mar 16, 2022 at 4:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > I think the STATE_CATCHUP guarantees the apply worker must have > received (or tried to receive) a message. See the previous answer. > Sorry, I intend to say till the sync worker has received any message. The point is that LSN till where the copy has finished might actually be later than some of the in-progress transactions on the server. It may not be a good idea to blindly skip those changes if the apply worker has already received those changes (say via a 'streaming' mode). Today, all such changes would be written to the file and applied at commit time but tomorrow, we can have an implementation where we can apply such changes (via some background worker) by skipping changes related to the table for which the table-sync worker is in-progress. Now, in such a scenario, unless, we allow the table sync worker to process more messages, we will end up losing some changes for that particular table. As per my understanding, this is safe as per the current code but it can't be guaranteed for future implementations and the amount of extra work is additional work to receive the messages for one transaction. I still don't think that it is a good idea to pursue this patch. -- With Regards, Amit Kapila.
On Sat, Apr 2, 2022 at 5:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Apr 1, 2022 at 1:52 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > On Wed, Mar 16, 2022 at 4:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > I think the STATE_CATCHUP guarantees the apply worker must have > > received (or tried to receive) a message. See the previous answer. > > > > Sorry, I intend to say till the sync worker has received any message. > The point is that LSN till where the copy has finished might actually > be later than some of the in-progress transactions on the server. It > may not be a good idea to blindly skip those changes if the apply > worker has already received those changes (say via a 'streaming' > mode). Today, all such changes would be written to the file and > applied at commit time but tomorrow, we can have an implementation > where we can apply such changes (via some background worker) by > skipping changes related to the table for which the table-sync worker > is in-progress. Now, in such a scenario, unless, we allow the table > sync worker to process more messages, we will end up losing some > changes for that particular table. > > As per my understanding, this is safe as per the current code but it > can't be guaranteed for future implementations and the amount of extra > work is additional work to receive the messages for one transaction. I > still don't think that it is a good idea to pursue this patch. IIUC you are saying that my patch is good today, but it may cause problems in a hypothetical future if the rest of the replication logic is implemented differently. Anyway, it seems there is no chance of this getting committed, so it is time for me to stop flogging this dead horse. I will remove this from the CF. ------ Kind Regards, Peter Smith Fujitsu Australia
On Tue, Apr 5, 2022 at 9:37 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Sat, Apr 2, 2022 at 5:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Apr 1, 2022 at 1:52 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > On Wed, Mar 16, 2022 at 4:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > I think the STATE_CATCHUP guarantees the apply worker must have > > > received (or tried to receive) a message. See the previous answer. > > > > > > > Sorry, I intend to say till the sync worker has received any message. > > The point is that LSN till where the copy has finished might actually > > be later than some of the in-progress transactions on the server. It > > may not be a good idea to blindly skip those changes if the apply > > worker has already received those changes (say via a 'streaming' > > mode). Today, all such changes would be written to the file and > > applied at commit time but tomorrow, we can have an implementation > > where we can apply such changes (via some background worker) by > > skipping changes related to the table for which the table-sync worker > > is in-progress. Now, in such a scenario, unless, we allow the table > > sync worker to process more messages, we will end up losing some > > changes for that particular table. > > > > As per my understanding, this is safe as per the current code but it > > can't be guaranteed for future implementations and the amount of extra > > work is additional work to receive the messages for one transaction. I > > still don't think that it is a good idea to pursue this patch. > > IIUC you are saying that my patch is good today, but it may cause > problems in a hypothetical future if the rest of the replication logic > is implemented differently. > The approach I have alluded to above is already proposed earlier on -hackers [1] to make streaming transactions perform better. So, it is not completely hypothetical. [1] - https://www.postgresql.org/message-id/8eda5118-2dd0-79a1-4fe9-eec7e334de17%40postgrespro.ru -- With Regards, Amit Kapila.