On Tuesday, April 2, 2024 3:21 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> On Tuesday, April 2, 2024 8:35 AM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > On Monday, April 1, 2024 7:30 PM Amit Kapila <amit.kapila16@gmail.com>
> > wrote:
> > >
> > > On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu)
> > > <houzj.fnst@fujitsu.com>
> > > wrote:
> > > >
> > > > On Friday, March 29, 2024 2:50 PM Amit Kapila
> > > > <amit.kapila16@gmail.com>
> > > wrote:
> > > > >
> > > >
> > > > >
> > > > >
> > > > > 2.
> > > > > +extern XLogRecPtr
> > > > > +pg_logical_replication_slot_advance(XLogRecPtr
> > > moveto,
> > > > > + bool *found_consistent_point);
> > > > > +
> > > > >
> > > > > This API looks a bit awkward as the functionality doesn't match
> > > > > the name. How about having a function with name
> > > > > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto,
> > > > > ready_for_decoding) with the same functionality as your patch
> > > > > has for
> > > > > pg_logical_replication_slot_advance() and then invoke it both
> > > > > from pg_logical_replication_slot_advance and slotsync.c. The
> > > > > function name is too big, we can think of a shorter name. Any ideas?
> > > >
> > > > How about LogicalSlotAdvanceAndCheckDecodingState() Or just
> > > > LogicalSlotAdvanceAndCheckDecoding()?
> > > >
> > >
> > > It is about snapbuild state, so how about naming the function as
> > > LogicalSlotAdvanceAndCheckSnapState()?
> >
> > It looks better to me, so changed.
> >
> > >
> > > I have made quite a few cosmetic changes in comments and code. See
> > > attached. This is atop your latest patch. Can you please review and
> > > include these changes in the next version?
> >
> > Thanks, I have reviewed and merged them.
> > Attach the V5 patch set which addressed above comments and ran pgindent.
>
> I added one test in 040_standby_failover_slots_sync.pl in 0002 patch, which can
> reproduce the data loss issue consistently on my machine. It may not
> reproduce in some rare cases if concurrent xl_running_xacts are written by
> bgwriter, but I think it's still valuable if it can verify the fix in most cases. The test
> will fail if directly applied on HEAD, and will pass after applying atop of 0001.
CFbot[1] complained about one query result's order in the tap-test, so I am
attaching a V7 patch set which fixed this. There are no changes in 0001.
[1] https://cirrus-ci.com/task/6375962162495488
Best Regards,
Hou zj