Re: Add checkpoint and redo LSN to LogCheckpointEnd log message - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date
Msg-id CALj2ACVVMAGDvourutDvtDUOhTqS_yTtfrdCjtP75-RrQU0dFA@mail.gmail.com
Whole thread Raw
In response to Re: Add checkpoint and redo LSN to LogCheckpointEnd log message  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: Add checkpoint and redo LSN to LogCheckpointEnd log message  (Fujii Masao <masao.fujii@oss.nttdata.com>)
List pgsql-hackers
On Tue, Feb 1, 2022 at 9:39 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> I found that CreateRestartPoint() already reported the redo lsn as follows after emitting the restartpoint log
message.To avoid duplicated logging of the same information, we should update this code? 
>
>         ereport((log_checkpoints ? LOG : DEBUG2),
>                         (errmsg("recovery restart point at %X/%X",
>                                         LSN_FORMAT_ARGS(lastCheckPoint.redo)),
>                          xtime ? errdetail("Last completed transaction was at log time %s.",
>                                                            timestamptz_to_str(xtime)) : 0));
>
> This code reports lastCheckPoint.redo as redo lsn. OTOH, with the patch, LogCheckpointEnd() reports
ControlFile->checkPointCopy.redo.They may be different, for example, when the current DB state is not
DB_IN_ARCHIVE_RECOVERY.In this case, which lsn should we report as redo lsn? 

Do we ever reach CreateRestartPoint when ControlFile->stat !=
DB_IN_ARCHIVE_RECOVERY? Assert(ControlFile->state ==
DB_IN_ARCHIVE_RECOVERY); in CreateRestartPoint doesn't fail any
regression tests.

Here's what can happen:
lastCheckPoint.redo is 100 and ControlFile->checkPointCopy.redo is
105, so, "skipping restartpoint, already performed at %X/%X"
LogCheckpointEnd isn't reached
lastCheckPoint.redo is 105 and ControlFile->checkPointCopy.redo is 100
and ControlFile->state == DB_IN_ARCHIVE_RECOVERY, then the control
file gets updated and LogCheckpointEnd prints the right redo lsn.
lastCheckPoint.redo is 105 and ControlFile->checkPointCopy.redo is 100
and ControlFile->state != DB_IN_ARCHIVE_RECOVERY, the the control file
doesn't get updated and LogCheckpointEnd just prints the control redo
lsn. Looks like this case is rare given Assert(ControlFile->state ==
DB_IN_ARCHIVE_RECOVERY); doesn't fail any tests.

I think we should just let LogCheckpointEnd print the redo lsn from
the control file. We can just remove the above errmsg("recovery
restart point at %X/%X" message altogether or just print it only in
the rare scenario, something like below:

if (ControlFile->state != DB_IN_ARCHIVE_RECOVERY)
{
    ereport((log_checkpoints ? LOG : DEBUG2),
            (errmsg("performed recovery restart point at %X/%X while
the database state is %s",
                    LSN_FORMAT_ARGS(lastCheckPoint.redo)),
getDBState(ControlFile->state)));
}

And the last commit/abort records's timestamp will always get logged
even before we reach here in the main redo loop (errmsg("last
completed transaction was at log time %s").

Or another way is to just pass the redo lsn to LogCheckpointEnd and
pass the lastCheckPoint.redo in if (ControlFile->state !=
DB_IN_ARCHIVE_RECOVERY) cases or when control file isn't updated but
restart point happened.

Thoughts?

Regards,
Bharath Rupireddy.



pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Ensure that STDERR is empty during connect_ok
Next
From: Alvaro Herrera
Date:
Subject: Re: Ensure that STDERR is empty during connect_ok