Re: Add checkpoint and redo LSN to LogCheckpointEnd log message - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date
Msg-id 7bfad665-db9c-0c2a-2604-9f54763c5f9e@oss.nttdata.com
Whole thread Raw
In response to Re: Add checkpoint and redo LSN to LogCheckpointEnd log message  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Add checkpoint and redo LSN to LogCheckpointEnd log message  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers

On 2022/02/02 23:46, Bharath Rupireddy wrote:
> On Tue, Feb 1, 2022 at 9:39 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>> I found that CreateRestartPoint() already reported the redo lsn as follows after emitting the restartpoint log
message.To avoid duplicated logging of the same information, we should update this code?
 
>>
>>          ereport((log_checkpoints ? LOG : DEBUG2),
>>                          (errmsg("recovery restart point at %X/%X",
>>                                          LSN_FORMAT_ARGS(lastCheckPoint.redo)),
>>                           xtime ? errdetail("Last completed transaction was at log time %s.",
>>                                                             timestamptz_to_str(xtime)) : 0));
>>
>> This code reports lastCheckPoint.redo as redo lsn. OTOH, with the patch, LogCheckpointEnd() reports
ControlFile->checkPointCopy.redo.They may be different, for example, when the current DB state is not
DB_IN_ARCHIVE_RECOVERY.In this case, which lsn should we report as redo lsn?
 
> 
> Do we ever reach CreateRestartPoint when ControlFile->stat !=
> DB_IN_ARCHIVE_RECOVERY? Assert(ControlFile->state ==
> DB_IN_ARCHIVE_RECOVERY); in CreateRestartPoint doesn't fail any
> regression tests.

ISTM that CreateRestartPoint() can reach the condition ControlFile->state != DB_IN_ARCHIVE_RECOVERY. Please imagine the
casewhere CreateRestartPoint() has already started and calls CheckPointGuts(). If the standby server is promoted while
CreateRestartPoint()is flushing the data to disk at CheckPointGuts(), the state would be updated to DB_IN_PRODUCTION
andCreateRestartPoint() can see the state != DB_IN_ARCHIVE_RECOVERY later.
 

As far as I read the code, this case seems to be able to make the server unrecoverable. If this case happens, since
pg_controlis not updated, pg_control still indicates the REDO LSN of last valid restartpoint. But CreateRestartPoint()
seemsto delete old WAL files based on its "current" REDO LSN not pg_control's REDO LSN. That is, WAL files required for
thecrash recovery starting from pg_control's REDO LSN would be removed.
 

If this understanding is right, to address this issue, probably we need to make CreateRestartPoint() do nothing (return
immediately)when the state != DB_IN_ARCHIVE_RECOVERY?
 

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: 2022-01 Commitfest
Next
From: Julien Rouhaud
Date:
Subject: Re: Extensible Rmgr for Table AMs