Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) - Mailing list pgsql-hackers

From Nitin Jadhav
Subject Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Date
Msg-id CAMm1aWYNHbeM2+dxOZmgDut0HqpK2CjxQF1t5RNYf3S6xjgSUA@mail.gmail.com
Whole thread Raw
In response to Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)  (Ashutosh Sharma <ashu.coek88@gmail.com>)
List pgsql-hackers
> On what basis have you classified the above into the various types of
> checkpoints? AFAIK, the first two types are based on what triggered
> the checkpoint (whether it was the checkpoint_timeout or maz_wal_size
> settings) while the third type indicates the force checkpoint that can
> happen when the checkpoint is triggered for various reasons e.g. .
> during createb or dropdb etc. This is quite possible that both the
> PROGRESS_CHECKPOINT_KIND_TIME and PROGRESS_CHECKPOINT_KIND_FORCE flags
> are set for the checkpoint because multiple checkpoint requests are
> processed at one go, so what type of checkpoint would that be?

My initial understanding was wrong. In the v2 patch I have supported
all values for checkpoint kinds and displaying a string in the
pg_stat_progress_checkpoint view which describes all the bits set in
the checkpoint flags.

On Tue, Feb 22, 2022 at 8:10 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> +/* Kinds of checkpoint (as advertised via PROGRESS_CHECKPOINT_KIND) */
> +#define PROGRESS_CHECKPOINT_KIND_WAL                0
> +#define PROGRESS_CHECKPOINT_KIND_TIME               1
> +#define PROGRESS_CHECKPOINT_KIND_FORCE              2
> +#define PROGRESS_CHECKPOINT_KIND_UNKNOWN            3
>
> On what basis have you classified the above into the various types of
> checkpoints? AFAIK, the first two types are based on what triggered
> the checkpoint (whether it was the checkpoint_timeout or maz_wal_size
> settings) while the third type indicates the force checkpoint that can
> happen when the checkpoint is triggered for various reasons e.g. .
> during createb or dropdb etc. This is quite possible that both the
> PROGRESS_CHECKPOINT_KIND_TIME and PROGRESS_CHECKPOINT_KIND_FORCE flags
> are set for the checkpoint because multiple checkpoint requests are
> processed at one go, so what type of checkpoint would that be?
>
> +        */
> +       if ((flags & (CHECKPOINT_IS_SHUTDOWN |
> CHECKPOINT_END_OF_RECOVERY)) == 0)
> +       {
> +
> pgstat_progress_start_command(PROGRESS_COMMAND_CHECKPOINT,
> InvalidOid);
> +               checkpoint_progress_update_param(flags,
> PROGRESS_CHECKPOINT_PHASE,
> +
>           PROGRESS_CHECKPOINT_PHASE_INIT);
> +               if (flags & CHECKPOINT_CAUSE_XLOG)
> +                       checkpoint_progress_update_param(flags,
> PROGRESS_CHECKPOINT_KIND,
> +
>                   PROGRESS_CHECKPOINT_KIND_WAL);
> +               else if (flags & CHECKPOINT_CAUSE_TIME)
> +                       checkpoint_progress_update_param(flags,
> PROGRESS_CHECKPOINT_KIND,
> +
>                   PROGRESS_CHECKPOINT_KIND_TIME);
> +               else if (flags & CHECKPOINT_FORCE)
> +                       checkpoint_progress_update_param(flags,
> PROGRESS_CHECKPOINT_KIND,
> +
>                   PROGRESS_CHECKPOINT_KIND_FORCE);
> +               else
> +                       checkpoint_progress_update_param(flags,
> PROGRESS_CHECKPOINT_KIND,
> +
>                   PROGRESS_CHECKPOINT_KIND_UNKNOWN);
> +       }
> +}
>
> --
> With Regards,
> Ashutosh Sharma.
>
> On Thu, Feb 10, 2022 at 12:23 PM Nitin Jadhav
> <nitinjadhavpostgres@gmail.com> wrote:
> >
> > > > We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of
justemitting the stats at the end.
 
> > > >
> > > > Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid,
substepit is running, whether it is on target for completion, checkpoint_Reason
 
> > > > (manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need
somerefactoring here.
 
> > >
> > > I agree to provide above mentioned information as part of showing the
> > > progress of current checkpoint operation. I am currently looking into
> > > the code to know if any other information can be added.
> >
> > Here is the initial patch to show the progress of checkpoint through
> > pg_stat_progress_checkpoint view. Please find the attachment.
> >
> > The information added to this view are pid - process ID of a
> > CHECKPOINTER process, kind - kind of checkpoint indicates the reason
> > for checkpoint (values can be wal, time or force), phase - indicates
> > the current phase of checkpoint operation, total_buffer_writes - total
> > number of buffers to be written, buffers_processed - number of buffers
> > processed, buffers_written - number of buffers written,
> > total_file_syncs - total number of files to be synced, files_synced -
> > number of files synced.
> >
> > There are many operations happen as part of checkpoint. For each of
> > the operation I am updating the phase field of
> > pg_stat_progress_checkpoint view. The values supported for this field
> > are initializing, checkpointing replication slots, checkpointing
> > snapshots, checkpointing logical rewrite mappings, checkpointing CLOG
> > pages, checkpointing CommitTs pages, checkpointing SUBTRANS pages,
> > checkpointing MULTIXACT pages, checkpointing SLRU pages, checkpointing
> > buffers, performing sync requests, performing two phase checkpoint,
> > recycling old XLOG files and Finalizing. In case of checkpointing
> > buffers phase, the fields total_buffer_writes, buffers_processed and
> > buffers_written shows the detailed progress of writing buffers. In
> > case of performing sync requests phase, the fields total_file_syncs
> > and files_synced shows the detailed progress of syncing files. In
> > other phases, only the phase field is getting updated and it is
> > difficult to show the progress because we do not get the total number
> > of files count without traversing the directory. It is not worth to
> > calculate that as it affects the performance of the checkpoint. I also
> > gave a thought to just mention the number of files processed, but this
> > wont give a meaningful progress information (It can be treated as
> > statistics). Hence just updating the phase field in those scenarios.
> >
> > Apart from above fields, I am planning to add few more fields to the
> > view in the next patch. That is, process ID of the backend process
> > which triggered a CHECKPOINT command, checkpoint start location, filed
> > to indicate whether it is a checkpoint or restartpoint and elapsed
> > time of the checkpoint operation. Please share your thoughts. I would
> > be happy to add any other information that contributes to showing the
> > progress of checkpoint.
> >
> > As per the discussion in this thread, there should be some mechanism
> > to show the progress of checkpoint during shutdown and end-of-recovery
> > cases as we cannot access pg_stat_progress_checkpoint in those cases.
> > I am working on this to use log_startup_progress_interval mechanism to
> > log the progress in the server logs.
> >
> > Kindly review the patch and share your thoughts.
> >
> >
> > On Fri, Jan 28, 2022 at 12:24 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > On Fri, Jan 21, 2022 at 11:07 AM Nitin Jadhav
> > > <nitinjadhavpostgres@gmail.com> wrote:
> > > >
> > > > > I think the right choice to solve the *general* problem is the
> > > > > mentioned pg_stat_progress_checkpoints.
> > > > >
> > > > > We may want to *additionally* have the ability to log the progress
> > > > > specifically for the special cases when we're not able to use that
> > > > > view. And in those case, we can perhaps just use the existing
> > > > > log_startup_progress_interval parameter for this as well -- at least
> > > > > for the startup checkpoint.
> > > >
> > > > +1
> > > >
> > > > > We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead
ofjust emitting the stats at the end.
 
> > > > >
> > > > > Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid,
substepit is running, whether it is on target for completion, checkpoint_Reason
 
> > > > > (manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need
somerefactoring here. 
> > > >
> > > > I agree to provide above mentioned information as part of showing the
> > > > progress of current checkpoint operation. I am currently looking into
> > > > the code to know if any other information can be added.
> > >
> > > As suggested in the other thread by Julien, I'm changing the subject
> > > of this thread to reflect the discussion.
> > >
> > > Regards,
> > > Bharath Rupireddy.



pgsql-hackers by date:

Previous
From: Dean Rasheed
Date:
Subject: Some optimisations for numeric division
Next
From: Andrew Dunstan
Date:
Subject: Re: bailing out in tap tests nearly always a bad idea