Thread: Fix primary crash continually with invalid checkpoint after promote
Newly promoted primary may leave an invalid checkpoint.
In function CreateRestartPoint, control file is updated and old wals are removed. But in some situations, control file is not updated, old wals are still removed. Thus produces an invalid checkpoint with nonexistent wal. Crucial log: "invalid primary checkpoint record", "could not locate a valid checkpoint record".
The following timeline reproduces above situation:
tl1: standby begins to create restart point (time or wal triggered).
tl2: standby promotes and control file state is updated to DB_IN_PRODUCTION. Control file will not update (xlog.c:9690). But old wals are still removed (xlog.c:9719).
tl3: standby becomes primary. primary may crash before the next complete checkpoint (OOM in my situation). primary will crash continually with invalid checkpoint.
The attached patch reproduces this problem using standard postgresql perl test, you can run with
./configure --enable-tap-tests; make -j; make -C src/test/recovery/ check PROVE_TESTS=t/027_invalid_checkpoint_after_promote.pl
The attached patch also fixes this problem by ensuring that remove old wals only after control file is updated.
Attachment
On Tue, Apr 26, 2022 at 03:16:13PM +0800, Zhao Rui wrote: > In function CreateRestartPoint, control file is updated and old wals are removed. But in some situations, control fileis not updated, old wals are still removed. Thus produces an invalid checkpoint with nonexistent wal. Crucial log: "invalidprimary checkpoint record", "could not locate a valid checkpoint record". I think this is the same issue tracked here: [0]. [0] https://postgr.es/m/20220316.102444.2193181487576617583.horikyota.ntt%40gmail.com -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Tue, Apr 26, 2022 at 03:16:13PM +0800, Zhao Rui wrote: > In function CreateRestartPoint, control file is updated and old wals are removed. But in some situations, control fileis not updated, old wals are still removed. Thus produces an invalid checkpoint with nonexistent wal. Crucial log: "invalidprimary checkpoint record", "could not locate a valid checkpoint record". I think this is the same issue tracked here: [0]. [0] https://postgr.es/m/20220316.102444.2193181487576617583.horikyota.ntt%40gmail.com -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
"=?ISO-8859-1?B?WmhhbyBSdWk=?=" <875941708@qq.com> writes: > Newly promoted primary may leave an invalid checkpoint. > In function CreateRestartPoint, control file is updated and old wals are removed. But in some situations, control fileis not updated, old wals are still removed. Thus produces an invalid checkpoint with nonexistent wal. Crucial log: "invalidprimary checkpoint record", "could not locate a valid checkpoint record". I believe this is the same issue being discussed here: https://www.postgresql.org/message-id/flat/20220316.102444.2193181487576617583.horikyota.ntt%40gmail.com but Horiguchi-san's proposed fix looks quite different from yours. regards, tom lane
Re: Fix primary crash continually with invalid checkpoint after promote
At Tue, 26 Apr 2022 15:47:13 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in > "=?ISO-8859-1?B?WmhhbyBSdWk=?=" <875941708@qq.com> writes: > > Newly promoted primary may leave an invalid checkpoint. > > In function CreateRestartPoint, control file is updated and old wals are removed. But in some situations, control fileis not updated, old wals are still removed. Thus produces an invalid checkpoint with nonexistent wal. Crucial log: "invalidprimary checkpoint record", "could not locate a valid checkpoint record". > > I believe this is the same issue being discussed here: > > https://www.postgresql.org/message-id/flat/20220316.102444.2193181487576617583.horikyota.ntt%40gmail.com > > but Horiguchi-san's proposed fix looks quite different from yours. The root cause is that CreateRestartPoint omits to update last checkpoint in control file if archiver recovery exits at an unfortunate timing. So my proposal is going to fix the root cause. Zhao Rui's proposal is retension of WAL files according to (the wrong content of) control file. Aside from the fact that it may let slots be invalidated ealier, It's not great that an acutally performed restartpoint is forgotten, which may cause the next crash recovery starts from an already performed checkpoint. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Wed, Apr 27, 2022 at 11:24:11AM +0900, Kyotaro Horiguchi wrote: > Zhao Rui's proposal is retension of WAL files according to (the wrong > content of) control file. > > Aside from the fact that it may let slots be invalidated ealier, It's > not great that an acutally performed restartpoint is forgotten, which > may cause the next crash recovery starts from an already performed > checkpoint. Yeah, I was analyzing this problem and took a look at what's proposed here, and I agree that what is proposed on this thread would just do some unnecessary work if we find ourselves in a situation where we we need to replay from a point earlier than necessary, aka the checkpoint that should have been already finished. -- Michael