Re: please update ps display for recovery checkpoint - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: please update ps display for recovery checkpoint
Date
Msg-id 20200919160031.GC30557@telsasoft.com
Whole thread Raw
In response to Re: please update ps display for recovery checkpoint  (Michael Paquier <michael@paquier.xyz>)
Responses Re: please update ps display for recovery checkpoint  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On Thu, Sep 10, 2020 at 01:37:10PM +0900, Michael Paquier wrote:
> On Wed, Sep 09, 2020 at 09:00:50PM -0500, Justin Pryzby wrote:
> > What would you want the checkpointer's ps to say ?
> > 
> > Normally it just says:
> > postgres  3468  3151  0 Aug27 ?        00:20:57 postgres: checkpointer                                
> 
> Note that CreateCheckPoint() can also be called from the startup
> process if the bgwriter has not been launched once recovery finishes.
> 
> > Or do you mean do the same thing as now, but one layer lower, like:
> >
> > @@ -8728,6 +8725,9 @@ CreateCheckPoint(int flags)
> > +       if (flags & CHECKPOINT_END_OF_RECOVERY)
> > +               set_ps_display("recovery checkpoint");
> 
> For the use-case discussed here, that would be fine.  Now the
> difficult point is how much information we can actually display
> without bloating ps while still have something meaningful.  Showing
> all the information from LogCheckpointStart() would bloat the output a
> lot for example.  So, thinking about that, my take would be to have ps
> display the following at the beginning of CreateCheckpoint() and
> CreateRestartPoint():
> - restartpoint or checkpoint
> - shutdown
> - end-of-recovery
> 
> The output also needs to be cleared once the routines finish or if
> there is a skip, of course.

In my inital request, I *only* care about the startup process' recovery
checkpoint.  AFAIK, this exits when it's done, so there may be no need to
"revert" to the previous "ps".  However, one could argue that it's currently a
bug that the "recovering NNN" portion isn't updated after finishing the WAL
files.

StartupXLOG -> xlogreader -> XLogPageRead -> WaitForWALToBecomeAvailable -> XLogFileReadAnyTLI -> XLogFileRead
                                          -> CreateCheckPoint

Maybe it's a bad idea if the checkpointer is continuously changing its display.
I don't see the utility in it, since log_checkpoints does more than ps could
ever do.  I'm concerned that would break things for someone using something
like pgrep.
|$ ps -wwf `pgrep -f 'checkpointer *$'`
|UID        PID  PPID  C STIME TTY      STAT   TIME CMD
|postgres  9434  9418  0 Aug20 ?        Ss   214:25 postgres: checkpointer   

|pryzbyj  23010 23007  0 10:33 ?        00:00:00 postgres: checkpointer checkpoint

I think this one is by far the most common, but somewhat confusing, since it's
only one word.  This led me to put parenthesis around it:

|pryzbyj  26810 26809 82 10:53 ?        00:00:12 postgres: startup (end-of-recovery checkpoint)

Related: I have always thought that this message meant "recovery will complete
Real Soon", but I now understand it to mean "beginning the recovery checkpoint,
which is flagged CHECKPOINT_IMMEDIATE" (and may take a long time).

|2020-09-19 10:53:26.345 CDT [26810] LOG:  checkpoint starting: end-of-recovery immediate

-- 
Justin

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: recovering from "found xmin ... from before relfrozenxid ..."
Next
From: Andrew Dunstan
Date:
Subject: Re: XversionUpgrade tests broken by postfix operator removal