Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint? - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
Date
Msg-id 20211207.095800.2146783737320451942.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?  ("Bossart, Nathan" <bossartn@amazon.com>)
List pgsql-hackers
At Mon, 6 Dec 2021 19:28:03 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in 
> On 12/6/21, 4:34 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> > While the database is performing end-of-recovery checkpoint, the
> > control file gets updated with db state as "shutting down" in
> > CreateCheckPoint (see the code snippet at [1]) and at the end it sets
> > it back to "shut down" for a brief moment and then finally to "in
> > production". If the end-of-recovery checkpoint takes a lot of time or
> > the db goes down during the end-of-recovery checkpoint for whatever
> > reasons, the control file ends up having the wrong db state.
> >
> > Should we add a new db state something like
> > DB_IN_END_OF_RECOVERY_CHECKPOINT/"in end-of-recovery checkpoint" or
> > something else to represent the correct state?
> 
> This seems like a reasonable change to me.  From a quick glance, it
> looks like it should be a simple fix that wouldn't add too much
> divergence between the shutdown and end-of-recovery checkpoint code
> paths.

Technically end-of-crash-recovery checkpointis actually a kind of
shutdown checkpoint. In other words, a server that needs to run a
crash recovery actually is once shut down then enters normal operation
mode internally.  So if the server crashed after the end-of-recovery
checkpoint finished and before it enters DB_IN_PRODUCTION state, the
server would start with a clean startup next time.  We could treat
DB_IN_END_OF_RECOVERY_CHECKPOINT as safe state to skip recovery but I
don't think we need to preserve that behavior.

In other places, server log and ps display specifically, we already
make distinction between end-of-recovery checkopint and shutdown
checkpoint.

Finally, I agree to Nathan that it should be simple enough.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Make pg_waldump report replication origin ID, LSN, and timestamp.
Next
From: Peter Geoghegan
Date:
Subject: Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?