Re: Add checkpoint and redo LSN to LogCheckpointEnd log message - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: Add checkpoint and redo LSN to LogCheckpointEnd log message |
Date | |
Msg-id | CALj2ACVVMAGDvourutDvtDUOhTqS_yTtfrdCjtP75-RrQU0dFA@mail.gmail.com Whole thread Raw |
In response to | Re: Add checkpoint and redo LSN to LogCheckpointEnd log message (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
(Fujii Masao <masao.fujii@oss.nttdata.com>)
|
List | pgsql-hackers |
On Tue, Feb 1, 2022 at 9:39 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > I found that CreateRestartPoint() already reported the redo lsn as follows after emitting the restartpoint log message.To avoid duplicated logging of the same information, we should update this code? > > ereport((log_checkpoints ? LOG : DEBUG2), > (errmsg("recovery restart point at %X/%X", > LSN_FORMAT_ARGS(lastCheckPoint.redo)), > xtime ? errdetail("Last completed transaction was at log time %s.", > timestamptz_to_str(xtime)) : 0)); > > This code reports lastCheckPoint.redo as redo lsn. OTOH, with the patch, LogCheckpointEnd() reports ControlFile->checkPointCopy.redo.They may be different, for example, when the current DB state is not DB_IN_ARCHIVE_RECOVERY.In this case, which lsn should we report as redo lsn? Do we ever reach CreateRestartPoint when ControlFile->stat != DB_IN_ARCHIVE_RECOVERY? Assert(ControlFile->state == DB_IN_ARCHIVE_RECOVERY); in CreateRestartPoint doesn't fail any regression tests. Here's what can happen: lastCheckPoint.redo is 100 and ControlFile->checkPointCopy.redo is 105, so, "skipping restartpoint, already performed at %X/%X" LogCheckpointEnd isn't reached lastCheckPoint.redo is 105 and ControlFile->checkPointCopy.redo is 100 and ControlFile->state == DB_IN_ARCHIVE_RECOVERY, then the control file gets updated and LogCheckpointEnd prints the right redo lsn. lastCheckPoint.redo is 105 and ControlFile->checkPointCopy.redo is 100 and ControlFile->state != DB_IN_ARCHIVE_RECOVERY, the the control file doesn't get updated and LogCheckpointEnd just prints the control redo lsn. Looks like this case is rare given Assert(ControlFile->state == DB_IN_ARCHIVE_RECOVERY); doesn't fail any tests. I think we should just let LogCheckpointEnd print the redo lsn from the control file. We can just remove the above errmsg("recovery restart point at %X/%X" message altogether or just print it only in the rare scenario, something like below: if (ControlFile->state != DB_IN_ARCHIVE_RECOVERY) { ereport((log_checkpoints ? LOG : DEBUG2), (errmsg("performed recovery restart point at %X/%X while the database state is %s", LSN_FORMAT_ARGS(lastCheckPoint.redo)), getDBState(ControlFile->state))); } And the last commit/abort records's timestamp will always get logged even before we reach here in the main redo loop (errmsg("last completed transaction was at log time %s"). Or another way is to just pass the redo lsn to LogCheckpointEnd and pass the lastCheckPoint.redo in if (ControlFile->state != DB_IN_ARCHIVE_RECOVERY) cases or when control file isn't updated but restart point happened. Thoughts? Regards, Bharath Rupireddy.
pgsql-hackers by date: