Thread: please update ps display for recovery checkpoint

please update ps display for recovery checkpoint

From
Justin Pryzby
Date:
During crash recovery, the server writes this to log:

< 2020-08-16 08:46:08.601 -03  >LOG:  redo done at 2299C/1EC6BA00
< 2020-08-16 08:46:08.877 -03  >LOG:  checkpoint starting: end-of-recovery immediate

But runs a checkpoint, which can take a long time, while the "ps" display still
says "recovering NNNNNNNN".

Please change to say "recovery checkpoint" or similar, as I mentioned here.
https://www.postgresql.org/message-id/20200118201111.GP26045@telsasoft.com

-- 
Justin

Attachment

RE: please update ps display for recovery checkpoint

From
"k.jamison@fujitsu.com"
Date:
On Wednesday, August 19, 2020 7:53 AM (GMT+9), Justin Pryzby wrote:

Hi,

All the patches apply, although when applying them the following appears:
   (Stripping trailing CRs from patch; use --binary to disable.)

> During crash recovery, the server writes this to log:
>
> < 2020-08-16 08:46:08.601 -03  >LOG:  redo done at 2299C/1EC6BA00 <
> 2020-08-16 08:46:08.877 -03  >LOG:  checkpoint starting: end-of-recovery
> immediate
>
> But runs a checkpoint, which can take a long time, while the "ps" display still says
> "recovering NNNNNNNN".
>
> Please change to say "recovery checkpoint" or similar, as I mentioned here.
> https://www.postgresql.org/message-id/20200118201111.GP26045@telsasoft.c
> om

Yes, I agree that it is helpful to tell users about that.

About 0003 patch, there are similar phrases in bgwriter_flush_after and
backend_flush_after. Should those be updated too?

--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3170,7 +3170,7 @@ include_dir 'conf.d'
         limit the amount of dirty data in the kernel's page cache, reducing
         the likelihood of stalls when an <function>fsync</function> is issued at the end of the
         checkpoint, or when the OS writes data back in larger batches in the
-        background.  Often that will result in greatly reduced transaction
+        background.  This feature will often result in greatly reduced transaction
         latency, but there also are some cases, especially with workloads
         that are bigger than <xref linkend="guc-shared-buffers"/>, but smaller
         than the OS's page cache, where performance might degrade.  This


Regards,
Kirk Jamison



Re: please update ps display for recovery checkpoint

From
Michael Paquier
Date:
On Wed, Aug 19, 2020 at 12:20:50AM +0000, k.jamison@fujitsu.com wrote:
> On Wednesday, August 19, 2020 7:53 AM (GMT+9), Justin Pryzby wrote:
>> During crash recovery, the server writes this to log:
>> Please change to say "recovery checkpoint" or similar, as I mentioned here.
>> https://www.postgresql.org/message-id/20200118201111.GP26045@telsasoft.c
>> om
>
> Yes, I agree that it is helpful to tell users about that.

That could be helpful.  Wouldn't it be better to use "end-of-recovery
checkpoint" instead?  That's the common wording in the code comments.

I don't see the point of patch 0002.  In the same paragraph, we
already know that this applies to any checkpoints.

> About 0003 patch, there are similar phrases in bgwriter_flush_after and
> backend_flush_after. Should those be updated too?

Yep, makes sense.
--
Michael

Attachment

Re: please update ps display for recovery checkpoint

From
Michael Paquier
Date:
On Thu, Aug 20, 2020 at 05:09:05PM +0900, Michael Paquier wrote:
> That could be helpful.  Wouldn't it be better to use "end-of-recovery
> checkpoint" instead?  That's the common wording in the code comments.
>
> I don't see the point of patch 0002.  In the same paragraph, we
> already know that this applies to any checkpoints.

Thinking more about this..  Could it be better to just add some calls
to set_ps_display() directly in CreateCheckPoint()?  This way, both
the checkpointer as well as the startup process at the end of recovery
would benefit from the change.
--
Michael

Attachment

Re: please update ps display for recovery checkpoint

From
Justin Pryzby
Date:
On Mon, Aug 31, 2020 at 03:52:44PM +0900, Michael Paquier wrote:
> On Thu, Aug 20, 2020 at 05:09:05PM +0900, Michael Paquier wrote:
> > That could be helpful.  Wouldn't it be better to use "end-of-recovery
> > checkpoint" instead?  That's the common wording in the code comments.
> > 
> > I don't see the point of patch 0002.  In the same paragraph, we
> > already know that this applies to any checkpoints.
> 
> Thinking more about this..  Could it be better to just add some calls
> to set_ps_display() directly in CreateCheckPoint()?  This way, both
> the checkpointer as well as the startup process at the end of recovery
> would benefit from the change.

What would you want the checkpointer's ps to say ?

Normally it just says:
postgres  3468  3151  0 Aug27 ?        00:20:57 postgres: checkpointer                                

Or do you mean do the same thing as now, but one layer lower, like:

@@ -8728,6 +8725,9 @@ CreateCheckPoint(int flags)
+       if (flags & CHECKPOINT_END_OF_RECOVERY)
+               set_ps_display("recovery checkpoint");

-- 
Justin



Re: please update ps display for recovery checkpoint

From
Michael Paquier
Date:
On Wed, Sep 09, 2020 at 09:00:50PM -0500, Justin Pryzby wrote:
> What would you want the checkpointer's ps to say ?
>
> Normally it just says:
> postgres  3468  3151  0 Aug27 ?        00:20:57 postgres: checkpointer

Note that CreateCheckPoint() can also be called from the startup
process if the bgwriter has not been launched once recovery finishes.

> Or do you mean do the same thing as now, but one layer lower, like:
>
> @@ -8728,6 +8725,9 @@ CreateCheckPoint(int flags)
> +       if (flags & CHECKPOINT_END_OF_RECOVERY)
> +               set_ps_display("recovery checkpoint");

For the use-case discussed here, that would be fine.  Now the
difficult point is how much information we can actually display
without bloating ps while still have something meaningful.  Showing
all the information from LogCheckpointStart() would bloat the output a
lot for example.  So, thinking about that, my take would be to have ps
display the following at the beginning of CreateCheckpoint() and
CreateRestartPoint():
- restartpoint or checkpoint
- shutdown
- end-of-recovery

The output also needs to be cleared once the routines finish or if
there is a skip, of course.
--
Michael

Attachment

Re: please update ps display for recovery checkpoint

From
Justin Pryzby
Date:
On Thu, Sep 10, 2020 at 01:37:10PM +0900, Michael Paquier wrote:
> On Wed, Sep 09, 2020 at 09:00:50PM -0500, Justin Pryzby wrote:
> > What would you want the checkpointer's ps to say ?
> > 
> > Normally it just says:
> > postgres  3468  3151  0 Aug27 ?        00:20:57 postgres: checkpointer                                
> 
> Note that CreateCheckPoint() can also be called from the startup
> process if the bgwriter has not been launched once recovery finishes.
> 
> > Or do you mean do the same thing as now, but one layer lower, like:
> >
> > @@ -8728,6 +8725,9 @@ CreateCheckPoint(int flags)
> > +       if (flags & CHECKPOINT_END_OF_RECOVERY)
> > +               set_ps_display("recovery checkpoint");
> 
> For the use-case discussed here, that would be fine.  Now the
> difficult point is how much information we can actually display
> without bloating ps while still have something meaningful.  Showing
> all the information from LogCheckpointStart() would bloat the output a
> lot for example.  So, thinking about that, my take would be to have ps
> display the following at the beginning of CreateCheckpoint() and
> CreateRestartPoint():
> - restartpoint or checkpoint
> - shutdown
> - end-of-recovery
> 
> The output also needs to be cleared once the routines finish or if
> there is a skip, of course.

In my inital request, I *only* care about the startup process' recovery
checkpoint.  AFAIK, this exits when it's done, so there may be no need to
"revert" to the previous "ps".  However, one could argue that it's currently a
bug that the "recovering NNN" portion isn't updated after finishing the WAL
files.

StartupXLOG -> xlogreader -> XLogPageRead -> WaitForWALToBecomeAvailable -> XLogFileReadAnyTLI -> XLogFileRead
                                          -> CreateCheckPoint

Maybe it's a bad idea if the checkpointer is continuously changing its display.
I don't see the utility in it, since log_checkpoints does more than ps could
ever do.  I'm concerned that would break things for someone using something
like pgrep.
|$ ps -wwf `pgrep -f 'checkpointer *$'`
|UID        PID  PPID  C STIME TTY      STAT   TIME CMD
|postgres  9434  9418  0 Aug20 ?        Ss   214:25 postgres: checkpointer   

|pryzbyj  23010 23007  0 10:33 ?        00:00:00 postgres: checkpointer checkpoint

I think this one is by far the most common, but somewhat confusing, since it's
only one word.  This led me to put parenthesis around it:

|pryzbyj  26810 26809 82 10:53 ?        00:00:12 postgres: startup (end-of-recovery checkpoint)

Related: I have always thought that this message meant "recovery will complete
Real Soon", but I now understand it to mean "beginning the recovery checkpoint,
which is flagged CHECKPOINT_IMMEDIATE" (and may take a long time).

|2020-09-19 10:53:26.345 CDT [26810] LOG:  checkpoint starting: end-of-recovery immediate

-- 
Justin

Attachment

Re: please update ps display for recovery checkpoint

From
Michael Paquier
Date:
On Sat, Sep 19, 2020 at 11:00:31AM -0500, Justin Pryzby wrote:
> Maybe it's a bad idea if the checkpointer is continuously changing its display.
> I don't see the utility in it, since log_checkpoints does more than ps could
> ever do.  I'm concerned that would break things for someone using something
> like pgrep.

At the end of recovery, there is a code path where the startup process
triggers the checkpoint by itself if the bgwriter is not launched, but
there is also a second code path where, if the bgwriter is started and
if the cluster not promoted, the startup process would request for an
immediate checkpoint and then wait for it.  It is IMO equally
important to update the display of the checkpointer in this case to
show that the checkpointer is running an end-of-recovery checkpoint.

> Related: I have always thought that this message meant "recovery will complete
> Real Soon", but I now understand it to mean "beginning the recovery checkpoint,
> which is flagged CHECKPOINT_IMMEDIATE" (and may take a long time).

Yep.  And at the end of crash recovery seconds feel like minutes.

I agree that "checkpointer checkpoint" is not the best fit.  Using
parenthesis would also be inconsistent with the other usages of this
API in the backend code.  What about adding "running" then?  This
would give "checkpointer running end-of-recovery checkpoint".

While looking at this patch, I got tempted to use a StringInfo to fill
in the string to display as that would make the addition of any extra
information easier, giving the attached.
--
Michael

Attachment

Re: please update ps display for recovery checkpoint

From
Justin Pryzby
Date:
On Fri, Oct 02, 2020 at 04:28:14PM +0900, Michael Paquier wrote:
> > Related: I have always thought that this message meant "recovery will complete
> > Real Soon", but I now understand it to mean "beginning the recovery checkpoint,
> > which is flagged CHECKPOINT_IMMEDIATE" (and may take a long time).
> 
> Yep.  And at the end of crash recovery seconds feel like minutes.
> 
> I agree that "checkpointer checkpoint" is not the best fit.  Using
> parenthesis would also be inconsistent with the other usages of this
> API in the backend code.

I think maybe I got the idea for parenthesis from these:
src/backend/tcop/postgres.c:                            set_ps_display("idle in transaction (aborted)");
src/backend/postmaster/postmaster.c-    if (port->remote_port[0] != '\0')
src/backend/postmaster/postmaster.c-            appendStringInfo(&ps_data, "(%s)", port->remote_port);


>  What about adding "running" then?  This
> would give "checkpointer running end-of-recovery checkpoint".

What about one of these?
"checkpointer: running end-of-recovery checkpoint"
"checkpointer running: end-of-recovery checkpoint"

Thanks for looking.

-- 
Justin



Re: please update ps display for recovery checkpoint

From
"Bossart, Nathan"
Date:
Hi,

I like the idea behind this patch.  I wrote a new version with a
couple of changes:

    1. Instead of using StringInfoData, just use a small buffer along
       with snprintf().  This is similar to what the WAL receiver
       process does.
    2. If the process is the checkpointer, reset the display to "idle"
       once the checkpoint or restartpoint is finished.  It's easy
       enough to discover the backend type, and I think adding a bit
       more clarity to the checkpointer display is a nice touch.
    3. Instead of "running," use "creating."  IMO that's a bit more
       accurate.

I considered also checking that update_process_title was enabled, but
I figured that these ps display updates should happen sparsely enough
that it wouldn't make much of an impact.

What do you think?

Nathan


Attachment

Re: please update ps display for recovery checkpoint

From
Justin Pryzby
Date:
On Thu, Dec 03, 2020 at 09:18:07PM +0000, Bossart, Nathan wrote:
> I considered also checking that update_process_title was enabled, but
> I figured that these ps display updates should happen sparsely enough
> that it wouldn't make much of an impact.

Since bf68b79e5, update_ps_display is responsible for checking
update_process_title.  Its other, remaining uses are apparently just acting as
minor optimizations to guard against useless snprintf's.

See also https://www.postgresql.org/message-id/flat/1288021.1600178478%40sss.pgh.pa.us
in which (I just saw) Tom wrote:

> Seems like a good argument, but you'd have to be careful about the
> final state when you stop overriding update_process_title --- it can't
> be left looking like it's still-in-progress on some random WAL file.

I think that's a live problem, not just a concern for that patch.
It was exactly my complaint leading to this thread:

> But runs a checkpoint, which can take a long time, while the "ps" display still
> says "recovering NNNNNNNN".

-- 
Justin



Re: please update ps display for recovery checkpoint

From
"Bossart, Nathan"
Date:
On 12/3/20, 1:58 PM, "Justin Pryzby" <pryzby@telsasoft.com> wrote:
> On Thu, Dec 03, 2020 at 09:18:07PM +0000, Bossart, Nathan wrote:
>> I considered also checking that update_process_title was enabled, but
>> I figured that these ps display updates should happen sparsely enough
>> that it wouldn't make much of an impact.
>
> Since bf68b79e5, update_ps_display is responsible for checking
> update_process_title.  Its other, remaining uses are apparently just acting as
> minor optimizations to guard against useless snprintf's.
>
> See also https://www.postgresql.org/message-id/flat/1288021.1600178478%40sss.pgh.pa.us
> in which (I just saw) Tom wrote:
>
>> Seems like a good argument, but you'd have to be careful about the
>> final state when you stop overriding update_process_title --- it can't
>> be left looking like it's still-in-progress on some random WAL file.
>
> I think that's a live problem, not just a concern for that patch.
> It was exactly my complaint leading to this thread:
>
>> But runs a checkpoint, which can take a long time, while the "ps" display still
>> says "recovering NNNNNNNN".

Ah, I see.  Thanks for pointing this out.

Nathan


Re: please update ps display for recovery checkpoint

From
"Bossart, Nathan"
Date:
On 12/3/20, 1:19 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> I like the idea behind this patch.  I wrote a new version with a
> couple of changes:

I missed an #include in this patch.  Here's a new one with that fixed.

Nathan


Attachment

Re: please update ps display for recovery checkpoint

From
Fujii Masao
Date:

On 2020/12/05 2:17, Bossart, Nathan wrote:
> On 12/3/20, 1:19 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
>> I like the idea behind this patch.  I wrote a new version with a
>> couple of changes:
> 
> I missed an #include in this patch.  Here's a new one with that fixed.

I agree it might be helpful to display something like "checkpointing" or "waiting for checkpoint" in PS title for the
startupprocess.
 

But, at least for me, it seems strange to display "idle" only for checkpointer. *If* we want to monitor the current
statusof checkpointer, it should be shown as wait event in pg_stat_activity, instead?
 

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: please update ps display for recovery checkpoint

From
Michael Paquier
Date:
On Wed, Dec 09, 2020 at 02:00:44AM +0900, Fujii Masao wrote:
> I agree it might be helpful to display something like
> "checkpointing" or "waiting for checkpoint" in PS title for the
> startup process.

Thanks.

> But, at least for me, it seems strange to display "idle" only for
> checkpointer.

Yeah, I'd rather leave out the part of the patch where we have the
checkpointer say "idle".  So I still think that what v3 did upthread,
by just resetting the ps display in all cases, feels more natural.  So
we should remove the parts of v5 from checkpointer.c.

+        * Reset the ps status display.  We set the status to "idle" for the
+        * checkpointer process, and we clear it for anything else that runs this
+        * code.
+        */
+       if (MyBackendType == B_CHECKPOINTER)
+               set_ps_display("idle");
+       else
+               set_ps_display("");
Regarding this part, this just needs a reset with an empty string.
The second sentence is confusing (it partially comes fronm v3, from
me..).  Do we lose anything by removing the second sentence of this
comment?

+   snprintf(activitymsg, sizeof(activitymsg), "creating %s%scheckpoint",
[...]
+   snprintf(activitymsg, sizeof(activitymsg), "creating %srestartpoint",
FWIW, I still fing "running" to sound better than "creating" here.  An
extra term I can think of that could be adapted is "performing".

> *If* we want to monitor the current status of
> checkpointer, it should be shown as wait event in pg_stat_activity,
> instead?

That would be WAIT_EVENT_CHECKPOINTER_MAIN, now there has been also on
this thread an argument that you would not have access to
pg_stat_activity until crash recovery completes and consistency is
restored.
--
Michael

Attachment

Re: please update ps display for recovery checkpoint

From
"Bossart, Nathan"
Date:
On 12/8/20, 10:16 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> On Wed, Dec 09, 2020 at 02:00:44AM +0900, Fujii Masao wrote:
>> I agree it might be helpful to display something like
>> "checkpointing" or "waiting for checkpoint" in PS title for the
>> startup process.
>
> Thanks.
>
>> But, at least for me, it seems strange to display "idle" only for
>> checkpointer.
>
> Yeah, I'd rather leave out the part of the patch where we have the
> checkpointer say "idle".  So I still think that what v3 did upthread,
> by just resetting the ps display in all cases, feels more natural.  So
> we should remove the parts of v5 from checkpointer.c.

That seems fine to me.  I think it is most important that the end-of-
recovery and shutdown checkpoints are shown.  I'm not sure there's
much value in updating the process title for automatic checkpoints and
checkpoints triggered via the CHECKPOINT command, so IMO we could skip
those, too.  I made these changes in the new version of the patch.

> +        * Reset the ps status display.  We set the status to "idle" for the
> +        * checkpointer process, and we clear it for anything else that runs this
> +        * code.
> +        */
> +       if (MyBackendType == B_CHECKPOINTER)
> +               set_ps_display("idle");
> +       else
> +               set_ps_display("");
> Regarding this part, this just needs a reset with an empty string.
> The second sentence is confusing (it partially comes fronm v3, from
> me..).  Do we lose anything by removing the second sentence of this
> comment?

I've fixed this in the new version of the patch.

> +   snprintf(activitymsg, sizeof(activitymsg), "creating %s%scheckpoint",
> [...]
> +   snprintf(activitymsg, sizeof(activitymsg), "creating %srestartpoint",
> FWIW, I still fing "running" to sound better than "creating" here.  An
> extra term I can think of that could be adapted is "performing".

I think I prefer "performing" over "running" because that's what the
docs use [0].  I've changed it to "performing" in the new version of
the patch.

Nathan

[0] https://www.postgresql.org/docs/devel/wal-configuration.html


Attachment

Re: please update ps display for recovery checkpoint

From
Michael Paquier
Date:
On Wed, Dec 09, 2020 at 06:37:22PM +0000, Bossart, Nathan wrote:
> On 12/8/20, 10:16 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
>> Yeah, I'd rather leave out the part of the patch where we have the
>> checkpointer say "idle".  So I still think that what v3 did upthread,
>> by just resetting the ps display in all cases, feels more natural.  So
>> we should remove the parts of v5 from checkpointer.c.
>
> That seems fine to me.  I think it is most important that the end-of-
> recovery and shutdown checkpoints are shown.  I'm not sure there's
> much value in updating the process title for automatic checkpoints and
> checkpoints triggered via the CHECKPOINT command, so IMO we could skip
> those, too.  I made these changes in the new version of the patch.

It would be possible to use pg_stat_activity in most cases here, so I
agree to settle down to the minimum we can agree on for now, and maybe
discuss separately if this should be extended in some or another in
the future if there is a use-case for that.  So I'd say that what you
have here is logically fine.

> > +   snprintf(activitymsg, sizeof(activitymsg), "creating %s%scheckpoint",
> > [...]
> > +   snprintf(activitymsg, sizeof(activitymsg), "creating %srestartpoint",
> > FWIW, I still fing "running" to sound better than "creating" here.  An
> > extra term I can think of that could be adapted is "performing".
>
> I think I prefer "performing" over "running" because that's what the
> docs use [0].  I've changed it to "performing" in the new version of
> the patch.

That's also used in the code comments, FWIW.

@@ -9282,6 +9296,7 @@ CreateRestartPoint(int flags)
    XLogRecPtr  endptr;
    XLogSegNo   _logSegNo;
    TimestampTz xtime;
+   bool        shutdown = (flags & CHECKPOINT_IS_SHUTDOWN);
You are right that CHECKPOINT_END_OF_RECOVERY should not be called for
a restart point, so that's correct.

However, I think that it would be better to have all those four code
paths controlled by a single routine, where we pass down the
checkpoint flags and fill in the ps string directly depending on what
the caller wants to do.  This way, we will avoid any inconsistent
updates if this stuff gets extended in the future, and there will be
all the information at hand to extend the logic.  So I have simplified
your patch as per the attached.  Thoughts?
--
Michael

Attachment

Re: please update ps display for recovery checkpoint

From
Justin Pryzby
Date:
Isn't the sense of "reset" inverted ?

I think maybe you mean to do set_ps_display(""); in the "if reset".

On Fri, Dec 11, 2020 at 12:54:22PM +0900, Michael Paquier wrote:
> +update_checkpoint_display(int flags, bool restartpoint, bool reset)
> +{
> +    if (reset)
> +    {
> +        char activitymsg[128];
> +
> +        snprintf(activitymsg, sizeof(activitymsg), "performing %s%s%s",
> +                 (flags & CHECKPOINT_END_OF_RECOVERY) ? "end-of-recovery " : "",
> +                 (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " : "",
> +                 restartpoint ? "restartpoint" : "checkpoint");
> +        set_ps_display(activitymsg);
> +    }
> +    else
> +        set_ps_display("");
> +}
> +
> +
>  /*
>   * Perform a checkpoint --- either during shutdown, or on-the-fly
>   *
> @@ -8905,6 +8937,9 @@ CreateCheckPoint(int flags)
>      if (log_checkpoints)
>          LogCheckpointStart(flags, false);
>  
> +    /* Update the process title */
> +    update_checkpoint_display(flags, false, false);
> +
>      TRACE_POSTGRESQL_CHECKPOINT_START(flags);
>  
>      /*
> @@ -9120,6 +9155,9 @@ CreateCheckPoint(int flags)
>      /* Real work is done, but log and update stats before releasing lock. */
>      LogCheckpointEnd(false);
>  
> +    /* Reset the process title */
> +    update_checkpoint_display(flags, false, true);
> +
>      TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written,
>                                       NBuffers,
>                                       CheckpointStats.ckpt_segs_added,
> @@ -9374,6 +9412,9 @@ CreateRestartPoint(int flags)
>      if (log_checkpoints)
>          LogCheckpointStart(flags, true);
>  
> +    /* Update the process title */
> +    update_checkpoint_display(flags, true, false);
> +
>      CheckPointGuts(lastCheckPoint.redo, flags);
>  
>      /*
> @@ -9492,6 +9533,9 @@ CreateRestartPoint(int flags)
>      /* Real work is done, but log and update before releasing lock. */
>      LogCheckpointEnd(true);
>  
> +    /* Reset the process title */
> +    update_checkpoint_display(flags, true, true);
> +
>      xtime = GetLatestXTime();
>      ereport((log_checkpoints ? LOG : DEBUG2),
>              (errmsg("recovery restart point at %X/%X",



Re: please update ps display for recovery checkpoint

From
Michael Paquier
Date:
On Thu, Dec 10, 2020 at 10:02:10PM -0600, Justin Pryzby wrote:
> Isn't the sense of "reset" inverted ?

It is ;p
--
Michael

Attachment

Re: please update ps display for recovery checkpoint

From
"Bossart, Nathan"
Date:
On 12/10/20, 7:54 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> However, I think that it would be better to have all those four code
> paths controlled by a single routine, where we pass down the
> checkpoint flags and fill in the ps string directly depending on what
> the caller wants to do.  This way, we will avoid any inconsistent
> updates if this stuff gets extended in the future, and there will be
> all the information at hand to extend the logic.  So I have simplified
> your patch as per the attached.  Thoughts?

This approach seems reasonable to me.  I've attached my take on it.

Nathan


Attachment

Re: please update ps display for recovery checkpoint

From
Michael Paquier
Date:
On Fri, Dec 11, 2020 at 06:54:42PM +0000, Bossart, Nathan wrote:
> This approach seems reasonable to me.  I've attached my take on it.

+       /* Reset the process title */
+       set_ps_display("");
I would still recommend to avoid calling set_ps_display() if there is
no need to so as we avoid useless system calls, so I think that this
stuff had better use a common path for the set and reset logic.

My counter-proposal is like the attached, with the set/reset part not
reversed this time, and the code indented :p
--
Michael

Attachment

Re: please update ps display for recovery checkpoint

From
"Bossart, Nathan"
Date:
On 12/11/20, 4:00 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> On Fri, Dec 11, 2020 at 06:54:42PM +0000, Bossart, Nathan wrote:
>> This approach seems reasonable to me.  I've attached my take on it.
>
> +       /* Reset the process title */
> +       set_ps_display("");
> I would still recommend to avoid calling set_ps_display() if there is
> no need to so as we avoid useless system calls, so I think that this
> stuff had better use a common path for the set and reset logic.
>
> My counter-proposal is like the attached, with the set/reset part not
> reversed this time, and the code indented :p

Haha.  LGTM.

Nathan


Re: please update ps display for recovery checkpoint

From
Michael Paquier
Date:
On Sat, Dec 12, 2020 at 12:41:25AM +0000, Bossart, Nathan wrote:
> On 12/11/20, 4:00 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
>> My counter-proposal is like the attached, with the set/reset part not
>> reversed this time, and the code indented :p
>
> Haha.  LGTM.

Thanks.  I have applied this one, then.
--
Michael

Attachment

Re: please update ps display for recovery checkpoint

From
Justin Pryzby
Date:
On Mon, Dec 14, 2020 at 12:01:33PM +0900, Michael Paquier wrote:
> On Sat, Dec 12, 2020 at 12:41:25AM +0000, Bossart, Nathan wrote:
> > On 12/11/20, 4:00 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> >> My counter-proposal is like the attached, with the set/reset part not
> >> reversed this time, and the code indented :p
> > 
> > Haha.  LGTM.
> 
> Thanks.  I have applied this one, then.

Thank you.

I'm not sure, but we could consider backpatching something to clear the
"recovering NNN" that's currently displayed during checkpoint, even though
recovery of NNN has already completed.  Possibly just calling
set_ps_display(""); after "Real work is done".

-- 
Justin



Re: please update ps display for recovery checkpoint

From
Michael Paquier
Date:
On Sun, Dec 13, 2020 at 09:22:24PM -0600, Justin Pryzby wrote:
> I'm not sure, but we could consider backpatching something to clear the
> "recovering NNN" that's currently displayed during checkpoint, even though
> recovery of NNN has already completed.  Possibly just calling
> set_ps_display(""); after "Real work is done".

This behavior exists for ages and there were not a lot of complaints
on this matter, so I see no reason to touch back-branches more than
necessary.
--
Michael

Attachment

Re: please update ps display for recovery checkpoint

From
Justin Pryzby
Date:
On Mon, Dec 14, 2020 at 12:01:33PM +0900, Michael Paquier wrote:
> On Sat, Dec 12, 2020 at 12:41:25AM +0000, Bossart, Nathan wrote:
> > On 12/11/20, 4:00 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> >> My counter-proposal is like the attached, with the set/reset part not
> >> reversed this time, and the code indented :p
> > 
> > Haha.  LGTM.
> 
> Thanks.  I have applied this one, then.

To refresh: commit df9274adf adds "checkpoint" info to "ps", which previously
continued to say "recovering NNNNN" even after finishing WAL replay, and
throughout the checkpoint.

Now, I wonder whether the startup process should also include some detail about
"syncing data dir".  It's possible to strace the process to see what it's
doing, but most DBA would probably not know that, and it's helpful to know the
status of recovery and what part of recovery is slow: sync, replay, or
checkpoint.  commit df9274adf improved the situation between replay and
ckpoint, but it's still not clear what "postgres: startup" means before replay
starts.

There's some interaction between Thomas' commit 61752afb2 and
recovery_init_sync_method - if we include a startup message, it should
distinguish between "syncing data dirs (fsync)" and (syncfs).

Putting this into fd.c seems to assume that we can clobber "ps", which is fine
when called by StartupXLOG(), but it's a public interface, so I'm not sure if
it's okay to assume that's the only caller.  Maybe it should check if
MyAuxProcType == B_STARTUP.

-- 
Justin

Attachment

Re: please update ps display for recovery checkpoint

From
"Bossart, Nathan"
Date:
On 6/6/21, 7:14 PM, "Justin Pryzby" <pryzby@telsasoft.com> wrote:
> Now, I wonder whether the startup process should also include some detail about
> "syncing data dir".  It's possible to strace the process to see what it's
> doing, but most DBA would probably not know that, and it's helpful to know the
> status of recovery and what part of recovery is slow: sync, replay, or
> checkpoint.  commit df9274adf improved the situation between replay and
> ckpoint, but it's still not clear what "postgres: startup" means before replay
> starts.

I've seen a few functions cause lengthy startups, including
SyncDataDirectory() (for which I was grateful to see 61752afb),
StartupReorderBuffer(), and RemovePgTempFiles().  I like the idea of
adding additional information in the ps title, but I also think it is
worth exploring additional ways to improve on these O(n) startup
tasks.

Nathan


Re: please update ps display for recovery checkpoint

From
Robert Haas
Date:
On Mon, Jun 7, 2021 at 12:02 PM Bossart, Nathan <bossartn@amazon.com> wrote:
> On 6/6/21, 7:14 PM, "Justin Pryzby" <pryzby@telsasoft.com> wrote:
> > Now, I wonder whether the startup process should also include some detail about
> > "syncing data dir".  It's possible to strace the process to see what it's
> > doing, but most DBA would probably not know that, and it's helpful to know the
> > status of recovery and what part of recovery is slow: sync, replay, or
> > checkpoint.  commit df9274adf improved the situation between replay and
> > ckpoint, but it's still not clear what "postgres: startup" means before replay
> > starts.
>
> I've seen a few functions cause lengthy startups, including
> SyncDataDirectory() (for which I was grateful to see 61752afb),
> StartupReorderBuffer(), and RemovePgTempFiles().  I like the idea of
> adding additional information in the ps title, but I also think it is
> worth exploring additional ways to improve on these O(n) startup
> tasks.

See also the nearby thread entitled "when the startup process doesn't"
which touches on this same issue.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: please update ps display for recovery checkpoint

From
Michael Paquier
Date:
On Mon, Jun 07, 2021 at 01:28:06PM -0400, Robert Haas wrote:
> On Mon, Jun 7, 2021 at 12:02 PM Bossart, Nathan <bossartn@amazon.com> wrote:
>> I've seen a few functions cause lengthy startups, including
>> SyncDataDirectory() (for which I was grateful to see 61752afb),
>> StartupReorderBuffer(), and RemovePgTempFiles().  I like the idea of
>> adding additional information in the ps title, but I also think it is
>> worth exploring additional ways to improve on these O(n) startup
>> tasks.

+1.  I also agree with doing something for the ps output of the
startup process when these are happening in crash recovery.

> See also the nearby thread entitled "when the startup process doesn't"
> which touches on this same issue.

Here is a link to the thread:
https://www.postgresql.org/message-id/CA+TgmoaHQrgDFOBwgY16XCoMtXxsrVGFB2jNCvb7-ubuEe1MGg@mail.gmail.com
--
Michael

Attachment

Re: please update ps display for recovery checkpoint

From
Michael Paquier
Date:
On Sun, Jun 06, 2021 at 09:13:48PM -0500, Justin Pryzby wrote:
> Putting this into fd.c seems to assume that we can clobber "ps", which is fine
> when called by StartupXLOG(), but it's a public interface, so I'm not sure if
> it's okay to assume that's the only caller.  Maybe it should check if
> MyAuxProcType == B_STARTUP.

I would be tempted to just add that into StartupXLOG() rather than
implying that callers of SyncDataDirectory() are fine to get their ps
output enforced all the time.
--
Michael

Attachment