On Tue, May 18, 2021 at 12:22 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> And finally I think I could reach the situation the commit wanted to fix.
>
> I took a basebackup from a standby just before replaying the first
> checkpoint of the new timeline (by using debugger), without copying
> pg_wal. In this backup, the control file contains checkPointCopy of
> the previous timeline.
>
> I modified StartXLOG so that expectedTLEs is set just after first
> determining recoveryTargetTLI, then started the grandchild node. I
> have the following error and the server fails to continue replication.
> [postmaster] LOG: starting PostgreSQL 14beta1 on x86_64-pc-linux-gnu...
> [startup] LOG: database system was interrupted while in recovery at log...
> [startup] LOG: set expectedtles tli=6, length=1
> [startup] LOG: Probing history file for TLI=7
> [startup] LOG: entering standby mode
> [startup] LOG: scanning segment 3 TLI 6, source 0
> [startup] LOG: Trying fetching history file for TLI=6
> [walreceiver] LOG: fetching timeline history file for timeline 5 from pri...
> [walreceiver] LOG: fetching timeline history file for timeline 6 from pri...
> [walreceiver] LOG: started streaming ... primary at 0/3000000 on timeline 5
> [walreceiver] DETAIL: End of WAL reached on timeline 5 at 0/30006E0.
> [startup] LOG: unexpected timeline ID 1 in log segment 000000050000000000000003, offset 0
> [startup] LOG: Probing history file for TLI=7
> [startup] LOG: scanning segment 3 TLI 6, source 0
> (repeats forever)
So IIUC, this logs shows that
"ControlFile->checkPointCopy.ThisTimeLineID" is 6 but
"ControlFile->checkPoint" record is on TL 5? I think if you had the
old version of the code (before the commit) or below code [1], right
after initializing expectedTLEs then you would have hit the FATAL the
patch had fix.
While debugging did you check what was the "ControlFile->checkPoint"
LSN vs the first LSN of the first segment with TL6?
expectedTLEs = readTimeLineHistory(recoveryTargetTLI);
[1]
if (tliOfPointInHistory(ControlFile->checkPoint, expectedTLEs) !=
ControlFile->checkPointCopy.ThisTimeLineID)
{
report(FATAL..
}
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com