Re: Add checkpoint and redo LSN to LogCheckpointEnd log message - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: Add checkpoint and redo LSN to LogCheckpointEnd log message |
Date | |
Msg-id | 20220225.153112.120893179127555595.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: Add checkpoint and redo LSN to LogCheckpointEnd log message (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
Responses |
Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
|
List | pgsql-hackers |
At Tue, 22 Feb 2022 17:44:01 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > At Tue, 22 Feb 2022 01:59:45 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > > > > > Of the following, I think we should do (a) and (b) to make future > > > backpatchings easier. > > > a) Use RedoRecPtr and PriorRedoPtr after they are assigned. > > > b) Move assignment to PriorRedoPtr into the ControlFileLock section. > > > > I failed to understand how (a) and (b) can make the backpatching > > easier. How easy to backpatch seems the same whether we apply (a) and > > (b) or not... > > That premises that the patch applied to master contains (a) and (b). > So if it doesn't, those are not need by older branches. I was once going to remove them. But according the discussion below, the patch for back-patching is now quite close to that for the master branch. So I left them alone. > > > d) Skip call to UpdateCheckPointDistanceEstimate() when RedoRecPtr <= > > > PriorRedoPtr. > > > > But "RedoRecPtr <= PriorRedoPtr" will never happen, will it? Because a > > I didn't believe that it happens. (So, it came from my > convervativeness, or laziness, or both:p) The code dates from 2009 and > StartupXLOG makes a concurrent checkpoint with bgwriter. But as of at > least 9.5, StartupXLOG doesn't directly call CreateCheckPoint. So I > think that won't happen. > > So, in short, I agree to remove it or turn it into Assert(). It was a bit out of point. If we assume RedoRecPtr is always larger than PriorRedoPtr and then we don't need to check that there, we should also remove the "if (PriorRedoPtr < RedoRecPtr)" branch just above, which means the patch for back-branches gets very close to that for the master. Do we make such a large change on back branches? Anyways this version once takes that way. > > if (flags & CHECKPOINT_IS_SHUTDOWN) > > ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY; > > > > Same as above. IMO it's safer and better to always update the state > > (whether the state is DB_IN_ARCHIVE_RECOVERY or not) if > > CHECKPOINT_IS_SHUTDOWN flag is passed. > > That means we may exit recovery mode after ShutdownXLOG called > CreateRestartPoint. I don't think that may happen. So I'd rather add > Assert ((flags&CHECKPOINT_IS_SHUTDOWN) == 0) there instaed. So this version for v14 gets updated in the following points. Completely removed the code path for the case some other process runs simultaneous checkpoint. Removed the condition (RedoRecPtr > PriorRedoPtr) for UpdateCheckPointDistanceEstimate() call. Added an assertion to the recoery-end path. # Honestly I feel this is a bit too much for back-patching, though. While making patches for v12, I see a test failure of pg_rewind for uncertain reason. I'm investigating that but I post this for discussion. regards. -- Kyotaro Horiguchi NTT Open Source Software Center From e983f3d4c2dbeea742aed0ef1e209e7821f6687f Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Date: Mon, 14 Feb 2022 13:04:33 +0900 Subject: [PATCH v2] Correctly update contfol file at the end of archive recovery CreateRestartPoint runs WAL file cleanup basing on the checkpoint just have finished in the function. If the database has exited DB_IN_ARCHIVE_RECOVERY state when the function is going to update control file, the function refrains from updating the file at all then proceeds to WAL cleanup having the latest REDO LSN, which is now inconsistent with the control file. As the result, the succeeding cleanup procedure overly removes WAL files against the control file and leaves unrecoverable database until the next checkpoint finishes. Along with that fix, we remove a dead code path for the case some other process ran a simultaneous checkpoint. It seems like just a preventive measure but it's no longer useful because we are sure that checkpoint is performed only by checkpointer except single process mode. --- src/backend/access/transam/xlog.c | 73 ++++++++++++++++++++----------- 1 file changed, 47 insertions(+), 26 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 6208e123e5..ff4a90eacc 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9587,6 +9587,9 @@ CreateRestartPoint(int flags) XLogSegNo _logSegNo; TimestampTz xtime; + /* we don't assume concurrent checkpoint/restartpoint to run */ + Assert (!IsUnderPostmaster || MyBackendType == B_CHECKPOINTER); + /* Get a local copy of the last safe checkpoint record. */ SpinLockAcquire(&XLogCtl->info_lck); lastCheckPointRecPtr = XLogCtl->lastCheckPointRecPtr; @@ -9653,7 +9656,7 @@ CreateRestartPoint(int flags) /* Also update the info_lck-protected copy */ SpinLockAcquire(&XLogCtl->info_lck); - XLogCtl->RedoRecPtr = lastCheckPoint.redo; + XLogCtl->RedoRecPtr = RedoRecPtr; SpinLockRelease(&XLogCtl->info_lck); /* @@ -9672,7 +9675,10 @@ CreateRestartPoint(int flags) /* Update the process title */ update_checkpoint_display(flags, true, false); - CheckPointGuts(lastCheckPoint.redo, flags); + CheckPointGuts(RedoRecPtr, flags); + + /* Update pg_control */ + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); /* * Remember the prior checkpoint's redo ptr for @@ -9680,31 +9686,29 @@ CreateRestartPoint(int flags) */ PriorRedoPtr = ControlFile->checkPointCopy.redo; + Assert (PriorRedoPtr < RedoRecPtr); + + ControlFile->checkPoint = lastCheckPointRecPtr; + ControlFile->checkPointCopy = lastCheckPoint; + + /* Update control file using current time */ + ControlFile->time = (pg_time_t) time(NULL); + /* - * Update pg_control, using current time. Check that it still shows - * DB_IN_ARCHIVE_RECOVERY state and an older checkpoint, else do nothing; - * this is a quick hack to make sure nothing really bad happens if somehow - * we get here after the end-of-recovery checkpoint. + * Ensure minRecoveryPoint is past the checkpoint record while archive + * recovery is still ongoing. Normally, this will have happened already + * while writing out dirty buffers, but not necessarily - e.g. because no + * buffers were dirtied. We do this because a non-exclusive base backup + * uses minRecoveryPoint to determine which WAL files must be included in + * the backup, and the file (or files) containing the checkpoint record + * must be included, at a minimum. Note that for an ordinary restart of + * recovery there's no value in having the minimum recovery point any + * earlier than this anyway, because redo will begin just after the + * checkpoint record. This is a quick hack to make sure nothing really bad + * happens if somehow we get here after the end-of-recovery checkpoint. */ - LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); - if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY && - ControlFile->checkPointCopy.redo < lastCheckPoint.redo) + if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY) { - ControlFile->checkPoint = lastCheckPointRecPtr; - ControlFile->checkPointCopy = lastCheckPoint; - ControlFile->time = (pg_time_t) time(NULL); - - /* - * Ensure minRecoveryPoint is past the checkpoint record. Normally, - * this will have happened already while writing out dirty buffers, - * but not necessarily - e.g. because no buffers were dirtied. We do - * this because a non-exclusive base backup uses minRecoveryPoint to - * determine which WAL files must be included in the backup, and the - * file (or files) containing the checkpoint record must be included, - * at a minimum. Note that for an ordinary restart of recovery there's - * no value in having the minimum recovery point any earlier than this - * anyway, because redo will begin just after the checkpoint record. - */ if (ControlFile->minRecoveryPoint < lastCheckPointEndPtr) { ControlFile->minRecoveryPoint = lastCheckPointEndPtr; @@ -9716,8 +9720,25 @@ CreateRestartPoint(int flags) } if (flags & CHECKPOINT_IS_SHUTDOWN) ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY; - UpdateControlFile(); } + else + { + /* recovery mode is not supposed to end during shutdown restartpoint */ + Assert((flags & CHECKPOINT_IS_SHUTDOWN) == 0); + + /* + * Aarchive recovery has ended. Crash recovery ever after should + * always recover to the end of WAL + */ + ControlFile->minRecoveryPoint = InvalidXLogRecPtr; + ControlFile->minRecoveryPointTLI = 0; + + /* also update local copy */ + minRecoveryPoint = InvalidXLogRecPtr; + minRecoveryPointTLI = 0; + } + + UpdateControlFile(); LWLockRelease(ControlFileLock); /* @@ -9804,7 +9825,7 @@ CreateRestartPoint(int flags) xtime = GetLatestXTime(); ereport((log_checkpoints ? LOG : DEBUG2), (errmsg("recovery restart point at %X/%X", - LSN_FORMAT_ARGS(lastCheckPoint.redo)), + LSN_FORMAT_ARGS(RedoRecPtr)), xtime ? errdetail("Last completed transaction was at log time %s.", timestamptz_to_str(xtime)) : 0)); -- 2.27.0 From 13329169b996509a3a853afb9c283c3b27e0eab7 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Date: Fri, 25 Feb 2022 14:46:41 +0900 Subject: [PATCH v2] Correctly update contfol file at the end of archive recovery CreateRestartPoint runs WAL file cleanup basing on the checkpoint just have finished in the function. If the database has exited DB_IN_ARCHIVE_RECOVERY state when the function is going to update control file, the function refrains from updating the file at all then proceeds to WAL cleanup having the latest REDO LSN, which is now inconsistent with the control file. As the result, the succeeding cleanup procedure overly removes WAL files against the control file and leaves unrecoverable database until the next checkpoint finishes. Along with that fix, we remove a dead code path for the case some other process ran a simultaneous checkpoint. It seems like just a preventive measure but it's no longer useful because we are sure that checkpoint is performed only by checkpointer except single process mode. --- src/backend/access/transam/xlog.c | 73 ++++++++++++++++++++----------- 1 file changed, 47 insertions(+), 26 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 3d76fad128..3670ff81e7 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9376,6 +9376,9 @@ CreateRestartPoint(int flags) */ LWLockAcquire(CheckpointLock, LW_EXCLUSIVE); + /* we don't assume concurrent checkpoint/restartpoint to run */ + Assert (!IsUnderPostmaster || MyBackendType == B_CHECKPOINTER); + /* Get a local copy of the last safe checkpoint record. */ SpinLockAcquire(&XLogCtl->info_lck); lastCheckPointRecPtr = XLogCtl->lastCheckPointRecPtr; @@ -9445,7 +9448,7 @@ CreateRestartPoint(int flags) /* Also update the info_lck-protected copy */ SpinLockAcquire(&XLogCtl->info_lck); - XLogCtl->RedoRecPtr = lastCheckPoint.redo; + XLogCtl->RedoRecPtr = RedoRecPtr; SpinLockRelease(&XLogCtl->info_lck); /* @@ -9461,7 +9464,10 @@ CreateRestartPoint(int flags) if (log_checkpoints) LogCheckpointStart(flags, true); - CheckPointGuts(lastCheckPoint.redo, flags); + CheckPointGuts(RedoRecPtr, flags); + + /* Update pg_control */ + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); /* * Remember the prior checkpoint's redo ptr for @@ -9469,31 +9475,29 @@ CreateRestartPoint(int flags) */ PriorRedoPtr = ControlFile->checkPointCopy.redo; + Assert (PriorRedoPtr < RedoRecPtr); + + ControlFile->checkPoint = lastCheckPointRecPtr; + ControlFile->checkPointCopy = lastCheckPoint; + + /* Update control file using current time */ + ControlFile->time = (pg_time_t) time(NULL); + /* - * Update pg_control, using current time. Check that it still shows - * DB_IN_ARCHIVE_RECOVERY state and an older checkpoint, else do nothing; - * this is a quick hack to make sure nothing really bad happens if somehow - * we get here after the end-of-recovery checkpoint. + * Ensure minRecoveryPoint is past the checkpoint record while archive + * recovery is still ongoing. Normally, this will have happened already + * while writing out dirty buffers, but not necessarily - e.g. because no + * buffers were dirtied. We do this because a non-exclusive base backup + * uses minRecoveryPoint to determine which WAL files must be included in + * the backup, and the file (or files) containing the checkpoint record + * must be included, at a minimum. Note that for an ordinary restart of + * recovery there's no value in having the minimum recovery point any + * earlier than this anyway, because redo will begin just after the + * checkpoint record. This is a quick hack to make sure nothing really bad + * happens if somehow we get here after the end-of-recovery checkpoint. */ - LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); - if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY && - ControlFile->checkPointCopy.redo < lastCheckPoint.redo) + if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY) { - ControlFile->checkPoint = lastCheckPointRecPtr; - ControlFile->checkPointCopy = lastCheckPoint; - ControlFile->time = (pg_time_t) time(NULL); - - /* - * Ensure minRecoveryPoint is past the checkpoint record. Normally, - * this will have happened already while writing out dirty buffers, - * but not necessarily - e.g. because no buffers were dirtied. We do - * this because a non-exclusive base backup uses minRecoveryPoint to - * determine which WAL files must be included in the backup, and the - * file (or files) containing the checkpoint record must be included, - * at a minimum. Note that for an ordinary restart of recovery there's - * no value in having the minimum recovery point any earlier than this - * anyway, because redo will begin just after the checkpoint record. - */ if (ControlFile->minRecoveryPoint < lastCheckPointEndPtr) { ControlFile->minRecoveryPoint = lastCheckPointEndPtr; @@ -9505,8 +9509,25 @@ CreateRestartPoint(int flags) } if (flags & CHECKPOINT_IS_SHUTDOWN) ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY; - UpdateControlFile(); } + else + { + /* recovery mode is not supposed to end during shutdown restartpoint */ + Assert((flags & CHECKPOINT_IS_SHUTDOWN) == 0); + + /* + * Aarchive recovery has ended. Crash recovery ever after should + * always recover to the end of WAL + */ + ControlFile->minRecoveryPoint = InvalidXLogRecPtr; + ControlFile->minRecoveryPointTLI = 0; + + /* also update local copy */ + minRecoveryPoint = InvalidXLogRecPtr; + minRecoveryPointTLI = 0; + } + + UpdateControlFile(); LWLockRelease(ControlFileLock); /* @@ -9590,7 +9611,7 @@ CreateRestartPoint(int flags) xtime = GetLatestXTime(); ereport((log_checkpoints ? LOG : DEBUG2), (errmsg("recovery restart point at %X/%X", - (uint32) (lastCheckPoint.redo >> 32), (uint32) lastCheckPoint.redo), + (uint32) (RedoRecPtr >> 32), (uint32) RedoRecPtr), xtime ? errdetail("Last completed transaction was at log time %s.", timestamptz_to_str(xtime)) : 0)); -- 2.27.0
pgsql-hackers by date: