Re: Time delayed LR (WAS Re: logical replication restrictions) - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Time delayed LR (WAS Re: logical replication restrictions) |
Date | |
Msg-id | CAA4eK1+Zfk=dvp+sxsQkHfxp_GtpxqFi46_fiuqGMfiaRzeYFQ@mail.gmail.com Whole thread Raw |
In response to | Re: Time delayed LR (WAS Re: logical replication restrictions) (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Time delayed LR (WAS Re: logical replication restrictions)
|
List | pgsql-hackers |
On Fri, May 12, 2023 at 7:38 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, May 11, 2023 at 2:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Apr 28, 2023 at 2:35 PM Hayato Kuroda (Fujitsu) > > <kuroda.hayato@fujitsu.com> wrote: > > > > > > Dear hackers, > > > > > > I rebased and refined my PoC. Followings are the changes: > > > > > > > 1. Is my understanding correct that this patch creates the delay files > > for each transaction? If so, did you consider other approaches such as > > using one file to avoid creating many files? > > 2. For streaming transactions, first the changes are written in the > > temp file and then moved to the delay file. It seems like there is a > > double work. Is it possible to unify it such that when min_apply_delay > > is specified, we just use the delay file without sacrificing the > > advantages like stream sub-abort can truncate the changes? > > 3. Ideally, there shouldn't be a performance impact of this feature on > > regular transactions because the delay file is created only when > > min_apply_delay is active but better to do some testing of the same. > > > > In addition to the points Amit raised, if the 'required_schema' option > is specified in START_REPLICATION, the publisher sends schema > information for every change. I think it leads to significant > overhead. Did you consider alternative approaches such as sending the > schema information for every transaction or the subscriber requests > the publisher to send it? > Why do we need this new flag? I can't see any comments in the related code which explain its need. > > Overall, I think such an approach can address comments by Sawada-San > > [1] but not sure if Sawada-San or others have any better ideas to > > achieve this feature. It would be good to see what others think of > > this approach. > > > > I agree with this approach. > > When it comes to the idea of writing logical changes to permanent > files, I think it would also be a good idea (and perhaps could be a > building block of this feature) that we write streamed changes to a > permanent file so that the apply worker can retry to apply them > without retrieving the same changes again from the publisher. > I think we anyway won't be able to send confirmation till we write or process the commit. If it gets interrupted anytime in between we need to get all the changes again. I think using Fileset with temp files has quite a few advantages for streaming as are noted in the header comments of worker.c. We can investigate to replace that with permanent files but I don't see that the advantages outweigh the change. Also, after parallel apply, I am expecting, most users would prefer that mode for large transactions, so making changes in the serialized path doesn't seem like a good idea to me. Having said that, I also thought that it would be a good idea if both streaming and time-delayed can use the same code path in some way w.r.t writing to files but couldn't come up with any good idea without more downsides. I see that Kuroda-San has tried to keep the code path isolated for this feature but still see that one can question the implementation approach. -- With Regards, Amit Kapila.
pgsql-hackers by date: