Re: Possible corruption by CreateRestartPoint at promotion - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Possible corruption by CreateRestartPoint at promotion
Date
Msg-id YmjRkYo2Ce+mO5WO@paquier.xyz
Whole thread Raw
In response to Re: Possible corruption by CreateRestartPoint at promotion  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: Possible corruption by CreateRestartPoint at promotion  (Nathan Bossart <nathandbossart@gmail.com>)
List pgsql-hackers
On Tue, Apr 26, 2022 at 08:26:09PM -0700, Nathan Bossart wrote:
> On Wed, Apr 27, 2022 at 10:43:53AM +0900, Kyotaro Horiguchi wrote:
>> At Tue, 26 Apr 2022 11:33:49 -0700, Nathan Bossart <nathandbossart@gmail.com> wrote in
>>> I suspect we'll start seeing this problem more often once end-of-recovery
>>> checkpoints are removed [0].  Would you mind creating a commitfest entry
>>> for this thread?  I didn't see one.
>>
>> I'm not sure the patch makes any change here, because restart points
>> don't run while crash recovery, since no checkpoint records seen
>> during a crash recovery.  Anyway the patch doesn't apply anymore so
>> rebased, but only the one for master for the lack of time for now.
>
> Thanks for the new patch!  Yeah, it wouldn't affect crash recovery, but
> IIUC Robert's patch also applies to archive recovery.
>
>> +    /* Update pg_control */
>> +    LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
>> +
>>      /*
>>       * Remember the prior checkpoint's redo ptr for
>>       * UpdateCheckPointDistanceEstimate()
>>       */
>>      PriorRedoPtr = ControlFile->checkPointCopy.redo;
>
> nitpick: Why move the LWLockAcquire() all the way up here?

Yeah, that should not be necessary.  InitWalRecovery() is the only
place outside the checkpointer that would touch this field, but that
happens far too early in the startup sequence to matter with the
checkpointer.

>> +    Assert (PriorRedoPtr < RedoRecPtr);
>
> I think this could use a short explanation.

That's just to make sure that the current redo LSN is always older
than the one prior that.  It does not seem really necessary to me to
add that.

>> +        /*
>> +         * Aarchive recovery has ended. Crash recovery ever after should
>> +         * always recover to the end of WAL
>> +         */

s/Aarchive/Archive/.

>> +        ControlFile->minRecoveryPoint = InvalidXLogRecPtr;
>> +        ControlFile->minRecoveryPointTLI = 0;
>> +
>> +        /* also update local copy */
>> +        LocalMinRecoveryPoint = InvalidXLogRecPtr;
>> +        LocalMinRecoveryPointTLI = 0;
>
> Should this be handled by the code that changes the control file state to
> DB_IN_PRODUCTION instead?  It looks like this is ordinarily done in the
> next checkpoint.  It's not clear to me why it is done this way.

Anyway, that would be the work of the end-of-recovery checkpoint
requested at the end of StartupXLOG() once a promotion happens or of
the checkpoint requested by PerformRecoveryXLogAction() in the second
case, no?  So, I don't quite see why we need to update
minRecoveryPoint and minRecoveryPointTLI in the control file here, as
much as this does not have to be part of the end-of-recovery code
that switches the control file to DB_IN_PRODUCTION.

-   if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY &&
-       ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
-   {
7ff23c6 has removed the last call to CreateCheckpoint() outside the
checkpointer, meaning that there is one less concurrent race to worry
about, but I have to admit that this change, to update the control
file's checkPoint and checkPointCopy even if we don't check after
ControlFile->checkPointCopy.redo < lastCheckPoint.redo would make the
code less robust in ~14.  So I am questioning whether a backpatch
is actually worth the risk here.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Jian He
Date:
Subject: Fwd: range of composite types!
Next
From: Tom Lane
Date:
Subject: Re: Fwd: range of composite types!