Re: Possible corruption by CreateRestartPoint at promotion - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: Possible corruption by CreateRestartPoint at promotion
Date
Msg-id 20220426183349.GA3002960@nathanxps13
Whole thread Raw
In response to Possible corruption by CreateRestartPoint at promotion  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: Possible corruption by CreateRestartPoint at promotion  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
On Wed, Mar 16, 2022 at 10:24:44AM +0900, Kyotaro Horiguchi wrote:
> While discussing on additional LSNs in checkpoint log message,
> Fujii-san pointed out [2] that there is a case where
> CreateRestartPoint leaves unrecoverable database when concurrent
> promotion happens. That corruption is "fixed" by the next checkpoint
> so it is not a severe corruption.

I suspect we'll start seeing this problem more often once end-of-recovery
checkpoints are removed [0].  Would you mind creating a commitfest entry
for this thread?  I didn't see one.

> AFAICS since 9.5, no check(/restart)pionts won't run concurrently with
> restartpoint [3].  So I propose to remove the code path as attached.

Yeah, this "quick hack" has been around for some time (2de48a8), and I
believe much has changed since then, so something like what you're
proposing is probably the right thing to do.

>      /* Also update the info_lck-protected copy */
>      SpinLockAcquire(&XLogCtl->info_lck);
> -    XLogCtl->RedoRecPtr = lastCheckPoint.redo;
> +    XLogCtl->RedoRecPtr = RedoRecPtr;
>      SpinLockRelease(&XLogCtl->info_lck);
>  
>      /*
> @@ -6984,7 +6987,10 @@ CreateRestartPoint(int flags)
>      /* Update the process title */
>      update_checkpoint_display(flags, true, false);
>  
> -    CheckPointGuts(lastCheckPoint.redo, flags);
> +    CheckPointGuts(RedoRecPtr, flags);

I don't understand the purpose of these changes.  Are these related to the
fix, or is this just tidying up?

[0] https://postgr.es/m/CA%2BTgmoY%2BSJLTjma4Hfn1sA7S6CZAgbihYd%3DKzO6srd7Ut%3DXVBQ%40mail.gmail.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: Fix primary crash continually with invalid checkpoint after promote
Next
From: Tom Lane
Date:
Subject: Re: Fix primary crash continually with invalid checkpoint after promote