Re: pg_rewind vs superuser - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: pg_rewind vs superuser
Date
Msg-id 20190408.161442.167396698.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: pg_rewind vs superuser  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
At Mon, 8 Apr 2019 15:17:25 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20190408061725.GF2712@paquier.xyz>
> On Sun, Apr 07, 2019 at 03:06:56PM +0200, Magnus Hagander wrote:
> > So can we *detect* that this is the case? Because if so, we could perhaps
> > just wait for it to be done? Because there will always be one?
> 
> Yes, this one is technically possible.  We could add a timeout option
> which checks each N seconds the control file of the online source and
> sees if its timeline differs or not with the target, waiting for the
> change to happen.  If we do that, we may want to revisit the behavior
> of not issuing an error if the source and the target are detected as
> being on the same timeline, and consider it as a failure.
> 
> > The main point is -- we know from experience that it's pretty fragile to
> > assume the user read the documentation :) So if we can find *any* way to
> > handle this in code rather than docs, that'd be great. We would still
> > absolutely want the docs change for back branches of course.
> 
> Any veeeeery recent experience on the matter perhaps? :)

I (am not Magnus) saw a similar but a bit different case. Just
after master's promote, standby was killed in immediate mode
after catching up to master's latest TLI but before restartpoint
finished. They are in different TLIs in control data so *the
tool* decides to try pg_rewind. Restart->shutdown (*1) sequence
for cleanup made standby catch up to the master's TLI but their
histories have diverged from each other in the latest TLI. Of
course, pg_rewind says "no need to rewind since they're on the
same TLI". The subsequent replication starts from the segment
beginning and overwrote the WAL records already applied on the
standby. The result was a broken database. I suspect that it is
the result of a kind of misoperation and sane operation won't
cause the situation, but such situation could be "cleaned up" if
pg_rewind did the work for a replication set on the same TLI.

I haven't find exactly what happend yet in the case.

*1: It is somewhat strange, that recovery reaches to the next TLI
 despite that I heard that the restart is in non-standby,
 non-recovery mode.. Something should be wrong.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




pgsql-hackers by date:

Previous
From: "Zhang, Jie"
Date:
Subject: Translation updates for zh_CN.po (Chinese Simplified)
Next
From: "Yuzuko Hosoya"
Date:
Subject: RE: Problem with default partition pruning