Re: A failure of standby to follow timeline switch - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: A failure of standby to follow timeline switch
Date
Msg-id CAHGQGwGtAqCd0SaAiFSS78imhxuLzSzu0oQ7oFbHjqOD_1zxEA@mail.gmail.com
Whole thread Raw
In response to Re: A failure of standby to follow timeline switch  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: A failure of standby to follow timeline switch  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
List pgsql-hackers
On Wed, Jan 13, 2021 at 10:48 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Tue, 12 Jan 2021 10:47:21 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in
> > On Sat, Jan 9, 2021 at 5:08 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > >
> > > Masao-san: Are you intending to act as committer for these?  Since the
> > > bug is mine I can look into it, but since you already did all the
> > > reviewing work, I'm good with you giving it the final push.
> >
> > Thanks! I'm thinking to push the patch.
> >
> >
> > > 0001 looks good to me; let's get that one committed quickly so that we
> > > can focus on the interesting stuff.  While the implementation of
> > > find_in_log is quite dumb (not this patch's fault), it seems sufficient
> > > to deal with small log files.  We can improve the implementation later,
> > > if needed, but we have to get the API right on the first try.
> > >
> > > 0003: The fix looks good to me.  I verified that the test fails without
> > > the fix, and it passes with the fix.
> >
> > Yes.
> >
> >
> > > The test added in 0002 is a bit optimistic regarding timing, as well as
> > > potentially slow; it loops 1000 times and sleeps 100 milliseconds each
> > > time.  In a very slow server (valgrind or clobber_cache animals) this
> > > could not be sufficient time, while on fast servers it may end up
> > > waiting longer than needed.  Maybe we can do something like this:
> >
> > On second thought, I think that the regression test should be in
> > 004_timeline_switch.pl instead of 001_stream_rep.pl because it's
>
> Agreed. It's definitely the right place.
>
> > the test about timeline switch. Also I'm thinking that it's better to
> > test the timeline switch by checking whether some data is successfully
> > replicatead like the existing regression test for timeline switch in
> > 004_timeline_switch.pl does, instead of finding the specific message
> > in the log file. I attached the POC patch. Thought?
>
> It's practically a check on this issue, and looks better. The 180s
> timeout in the failure case seems a bit annoying but it's the way all
> of this kind of test follow.

Yes.

>
> The last check on table content is actually useless but it might make
> sense to confirm that replication is actually working. However, I
> don't think the test don't need to insert as many as 1000 tuples. Just
> a single tuple would suffice.

Thanks for the review!
I'm ok with this change (i.e., insert only single row).
Attached is the updated version of the patch.

Regards,

-- 
Fujii Masao

Attachment

pgsql-hackers by date:

Previous
From: "Hou, Zhijie"
Date:
Subject: remove unneeded pstrdup in fetch_table_list
Next
From: Craig Ringer
Date:
Subject: Re: Logical decoding without slots: decoding in lockstep with recovery