Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) - Mailing list pgsql-hackers
From | Nitin Jadhav |
---|---|
Subject | Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) |
Date | |
Msg-id | CAMm1aWYNHbeM2+dxOZmgDut0HqpK2CjxQF1t5RNYf3S6xjgSUA@mail.gmail.com Whole thread Raw |
In response to | Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) (Ashutosh Sharma <ashu.coek88@gmail.com>) |
List | pgsql-hackers |
> On what basis have you classified the above into the various types of > checkpoints? AFAIK, the first two types are based on what triggered > the checkpoint (whether it was the checkpoint_timeout or maz_wal_size > settings) while the third type indicates the force checkpoint that can > happen when the checkpoint is triggered for various reasons e.g. . > during createb or dropdb etc. This is quite possible that both the > PROGRESS_CHECKPOINT_KIND_TIME and PROGRESS_CHECKPOINT_KIND_FORCE flags > are set for the checkpoint because multiple checkpoint requests are > processed at one go, so what type of checkpoint would that be? My initial understanding was wrong. In the v2 patch I have supported all values for checkpoint kinds and displaying a string in the pg_stat_progress_checkpoint view which describes all the bits set in the checkpoint flags. On Tue, Feb 22, 2022 at 8:10 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > +/* Kinds of checkpoint (as advertised via PROGRESS_CHECKPOINT_KIND) */ > +#define PROGRESS_CHECKPOINT_KIND_WAL 0 > +#define PROGRESS_CHECKPOINT_KIND_TIME 1 > +#define PROGRESS_CHECKPOINT_KIND_FORCE 2 > +#define PROGRESS_CHECKPOINT_KIND_UNKNOWN 3 > > On what basis have you classified the above into the various types of > checkpoints? AFAIK, the first two types are based on what triggered > the checkpoint (whether it was the checkpoint_timeout or maz_wal_size > settings) while the third type indicates the force checkpoint that can > happen when the checkpoint is triggered for various reasons e.g. . > during createb or dropdb etc. This is quite possible that both the > PROGRESS_CHECKPOINT_KIND_TIME and PROGRESS_CHECKPOINT_KIND_FORCE flags > are set for the checkpoint because multiple checkpoint requests are > processed at one go, so what type of checkpoint would that be? > > + */ > + if ((flags & (CHECKPOINT_IS_SHUTDOWN | > CHECKPOINT_END_OF_RECOVERY)) == 0) > + { > + > pgstat_progress_start_command(PROGRESS_COMMAND_CHECKPOINT, > InvalidOid); > + checkpoint_progress_update_param(flags, > PROGRESS_CHECKPOINT_PHASE, > + > PROGRESS_CHECKPOINT_PHASE_INIT); > + if (flags & CHECKPOINT_CAUSE_XLOG) > + checkpoint_progress_update_param(flags, > PROGRESS_CHECKPOINT_KIND, > + > PROGRESS_CHECKPOINT_KIND_WAL); > + else if (flags & CHECKPOINT_CAUSE_TIME) > + checkpoint_progress_update_param(flags, > PROGRESS_CHECKPOINT_KIND, > + > PROGRESS_CHECKPOINT_KIND_TIME); > + else if (flags & CHECKPOINT_FORCE) > + checkpoint_progress_update_param(flags, > PROGRESS_CHECKPOINT_KIND, > + > PROGRESS_CHECKPOINT_KIND_FORCE); > + else > + checkpoint_progress_update_param(flags, > PROGRESS_CHECKPOINT_KIND, > + > PROGRESS_CHECKPOINT_KIND_UNKNOWN); > + } > +} > > -- > With Regards, > Ashutosh Sharma. > > On Thu, Feb 10, 2022 at 12:23 PM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > > > > > We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of justemitting the stats at the end. > > > > > > > > Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substepit is running, whether it is on target for completion, checkpoint_Reason > > > > (manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need somerefactoring here. > > > > > > I agree to provide above mentioned information as part of showing the > > > progress of current checkpoint operation. I am currently looking into > > > the code to know if any other information can be added. > > > > Here is the initial patch to show the progress of checkpoint through > > pg_stat_progress_checkpoint view. Please find the attachment. > > > > The information added to this view are pid - process ID of a > > CHECKPOINTER process, kind - kind of checkpoint indicates the reason > > for checkpoint (values can be wal, time or force), phase - indicates > > the current phase of checkpoint operation, total_buffer_writes - total > > number of buffers to be written, buffers_processed - number of buffers > > processed, buffers_written - number of buffers written, > > total_file_syncs - total number of files to be synced, files_synced - > > number of files synced. > > > > There are many operations happen as part of checkpoint. For each of > > the operation I am updating the phase field of > > pg_stat_progress_checkpoint view. The values supported for this field > > are initializing, checkpointing replication slots, checkpointing > > snapshots, checkpointing logical rewrite mappings, checkpointing CLOG > > pages, checkpointing CommitTs pages, checkpointing SUBTRANS pages, > > checkpointing MULTIXACT pages, checkpointing SLRU pages, checkpointing > > buffers, performing sync requests, performing two phase checkpoint, > > recycling old XLOG files and Finalizing. In case of checkpointing > > buffers phase, the fields total_buffer_writes, buffers_processed and > > buffers_written shows the detailed progress of writing buffers. In > > case of performing sync requests phase, the fields total_file_syncs > > and files_synced shows the detailed progress of syncing files. In > > other phases, only the phase field is getting updated and it is > > difficult to show the progress because we do not get the total number > > of files count without traversing the directory. It is not worth to > > calculate that as it affects the performance of the checkpoint. I also > > gave a thought to just mention the number of files processed, but this > > wont give a meaningful progress information (It can be treated as > > statistics). Hence just updating the phase field in those scenarios. > > > > Apart from above fields, I am planning to add few more fields to the > > view in the next patch. That is, process ID of the backend process > > which triggered a CHECKPOINT command, checkpoint start location, filed > > to indicate whether it is a checkpoint or restartpoint and elapsed > > time of the checkpoint operation. Please share your thoughts. I would > > be happy to add any other information that contributes to showing the > > progress of checkpoint. > > > > As per the discussion in this thread, there should be some mechanism > > to show the progress of checkpoint during shutdown and end-of-recovery > > cases as we cannot access pg_stat_progress_checkpoint in those cases. > > I am working on this to use log_startup_progress_interval mechanism to > > log the progress in the server logs. > > > > Kindly review the patch and share your thoughts. > > > > > > On Fri, Jan 28, 2022 at 12:24 PM Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > > On Fri, Jan 21, 2022 at 11:07 AM Nitin Jadhav > > > <nitinjadhavpostgres@gmail.com> wrote: > > > > > > > > > I think the right choice to solve the *general* problem is the > > > > > mentioned pg_stat_progress_checkpoints. > > > > > > > > > > We may want to *additionally* have the ability to log the progress > > > > > specifically for the special cases when we're not able to use that > > > > > view. And in those case, we can perhaps just use the existing > > > > > log_startup_progress_interval parameter for this as well -- at least > > > > > for the startup checkpoint. > > > > > > > > +1 > > > > > > > > > We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead ofjust emitting the stats at the end. > > > > > > > > > > Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substepit is running, whether it is on target for completion, checkpoint_Reason > > > > > (manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need somerefactoring here. > > > > > > > > I agree to provide above mentioned information as part of showing the > > > > progress of current checkpoint operation. I am currently looking into > > > > the code to know if any other information can be added. > > > > > > As suggested in the other thread by Julien, I'm changing the subject > > > of this thread to reflect the discussion. > > > > > > Regards, > > > Bharath Rupireddy.
pgsql-hackers by date: