Re: A failure of standby to follow timeline switch - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: A failure of standby to follow timeline switch
Date
Msg-id 20210113.104852.2302931744657595046.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: A failure of standby to follow timeline switch  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: A failure of standby to follow timeline switch  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
At Tue, 12 Jan 2021 10:47:21 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in 
> On Sat, Jan 9, 2021 at 5:08 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > Masao-san: Are you intending to act as committer for these?  Since the
> > bug is mine I can look into it, but since you already did all the
> > reviewing work, I'm good with you giving it the final push.
> 
> Thanks! I'm thinking to push the patch.
> 
> 
> > 0001 looks good to me; let's get that one committed quickly so that we
> > can focus on the interesting stuff.  While the implementation of
> > find_in_log is quite dumb (not this patch's fault), it seems sufficient
> > to deal with small log files.  We can improve the implementation later,
> > if needed, but we have to get the API right on the first try.
> >
> > 0003: The fix looks good to me.  I verified that the test fails without
> > the fix, and it passes with the fix.
> 
> Yes.
> 
> 
> > The test added in 0002 is a bit optimistic regarding timing, as well as
> > potentially slow; it loops 1000 times and sleeps 100 milliseconds each
> > time.  In a very slow server (valgrind or clobber_cache animals) this
> > could not be sufficient time, while on fast servers it may end up
> > waiting longer than needed.  Maybe we can do something like this:
> 
> On second thought, I think that the regression test should be in
> 004_timeline_switch.pl instead of 001_stream_rep.pl because it's

Agreed. It's definitely the right place.

> the test about timeline switch. Also I'm thinking that it's better to
> test the timeline switch by checking whether some data is successfully
> replicatead like the existing regression test for timeline switch in
> 004_timeline_switch.pl does, instead of finding the specific message
> in the log file. I attached the POC patch. Thought?

It's practically a check on this issue, and looks better. The 180s
timeout in the failure case seems a bit annoying but it's the way all
of this kind of test follow.

The last check on table content is actually useless but it might make
sense to confirm that replication is actually working. However, I
don't think the test don't need to insert as many as 1000 tuples. Just
a single tuple would suffice.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: outdated references to replication timeout
Next
From: "kuroda.hayato@fujitsu.com"
Date:
Subject: RE: ResourceOwner refactoring