Re: Standby recovers records from wrong timeline - Mailing list pgsql-hackers

From Ants Aasma
Subject Re: Standby recovers records from wrong timeline
Date
Msg-id CANwKhkPozUvyfuy1sz0fKN4=CC3TPQOF0Tr+uEVO_XX6yqDHpA@mail.gmail.com
Whole thread Raw
In response to Re: Standby recovers records from wrong timeline  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: Standby recovers records from wrong timeline  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
On Thu, 20 Oct 2022 at 11:30, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
>
> primary_restored did a time-travel to past a bit because of the
> recovery_target=immediate. In other words, the primary_restored and
> the replica diverge. I don't think it is legit to connect a diverged
> standby to a primary.

primary_restored did timetravel to the past, as we're doing PITR on the
primary that's the expected behavior. However replica is not diverged,
it's a copy of the exact same basebackup. The usecase is restoring a
cluster from backup using PITR and using the same backup to create a
standby. Currently this breaks when primary has not yet archived any
segments.

> So, about the behavior in doubt, it is the correct behavior to
> seemingly ignore the history file in the archive. Recovery assumes
> that the first half of the first segment of the new timeline is the
> same with the same segment of the old timeline (.partial) so it is
> legit to read the <tli=1,seg=2> file til the end and that causes the
> replica goes beyond the divergence point.

What is happening is that primary_restored has a timeline switch at
tli 2, lsn 0/2000100, and the next insert record starts in the same
segment. Replica is starting on the same backup on timeline 1, tries to
find tli 2 seg 2, which is not archived yet, so falls back to tli 1 seg 2
and replays tli 1 seg 2 continuing to tli seg 3, then connects to primary
and starts applying wal starting from tli 2 seg 4. To me that seems
completely broken.

> As you know, when new primary starts a diverged history, the
> recommended way is to blow (or stash) away the archive, then take a
> new backup from the running primary.

My understanding is that backup archives are supposed to remain valid
even after PITR or equivalently a lagging standby promoting.

--
Ants Aasma
Senior Database Engineer
www.cybertec-postgresql.com



pgsql-hackers by date:

Previous
From: Marcos Pegoraro
Date:
Subject: ​session_user and current_user on LOG
Next
From: Erik Rijkers
Date:
Subject: date_part/extract parse curiosity