Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) - Mailing list pgsql-hackers
From | Nitin Jadhav |
---|---|
Subject | Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) |
Date | |
Msg-id | CAMm1aWbU81dqpT6Ydz9txSw6=VUvb0XLYLw-qxty2mD5gB1gYg@mail.gmail.com Whole thread Raw |
In response to | Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
List | pgsql-hackers |
> Interesting idea, and overall a nice addition to the > pg_stat_progress_* reporting infrastructure. > > Could you add your patch to the current commitfest at > https://commitfest.postgresql.org/37/? > > See below for some comments on the patch: Thanks you for reviewing. I have added it to the commitfest - https://commitfest.postgresql.org/37/3545/ > > xlog.c @ checkpoint_progress_start, checkpoint_progress_update_param, checkpoint_progress_end > > + /* In bootstrap mode, we don't actually record anything. */ > > + if (IsBootstrapProcessingMode()) > > + return; > > Why do you check against the state of the system? > pgstat_progress_update_* already provides protections against updating > the progress tables if the progress infrastructure is not loaded; and > otherwise (in the happy path) the cost of updating the progress fields > will be quite a bit higher than normal. Updating stat_progress isn't > very expensive (quite cheap, really), so I don't quite get why you > guard against reporting stats when you expect no other client to be > listening. Nice point. I agree that the extra guards(IsBootstrapProcessingMode() and (flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY)) == 0) are not needed as the progress reporting mechanism handles that internally (It only updates when there is an access to the pg_stat_progress_activity view). I am planning to add the progress of checkpoint during shutdown and end-of-recovery cases in server logs as we don't have access to the view. In this case these guards are necessary. checkpoint_progress_update_param() is a generic function to report progress to the view or server logs. Thoughts? > I think you can simplify this a lot by directly using > pgstat_progress_update_param() instead. > > > xlog.c @ checkpoint_progress_start > > + pgstat_progress_start_command(PROGRESS_COMMAND_CHECKPOINT, InvalidOid); > > + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_PHASE, > > + PROGRESS_CHECKPOINT_PHASE_INIT); > > + if (flags & CHECKPOINT_CAUSE_XLOG) > > + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND, > > + PROGRESS_CHECKPOINT_KIND_WAL); > > + else if (flags & CHECKPOINT_CAUSE_TIME) > > + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND, > > + PROGRESS_CHECKPOINT_KIND_TIME); > > + [...] > > Could you assign the kind of checkpoint to a local variable, and then > update the "phase" and "kind" parameters at the same time through > pgstat_progress_update_multi_param(2, ...)? See > BuildRelationExtStatistics in extended_stats.c for an example usage. I will make use of pgstat_progress_update_multi_param() in the next patch to replace multiple calls to checkpoint_progress_update_param(). > Note that regardless of whether checkpoint_progress_update* will > remain, the checks done in that function already have been checked in > this function as well, so you can use the pgstat_* functions directly. As I mentioned before I am planning to add progress reporting in the server logs, checkpoint_progress_update_param() is required and it makes the job easier. > > monitoring.sgml > > + <structname>pg_stat_progress_checkpoint</structname> view will contain a > > + single row indicating the progress of checkpoint operation. > >... add "if a checkpoint is currently active". I feel adding extra words here to indicate "if a checkpoint is currently active" is not necessary as the view description provides that information and also it aligns with the documentation of existing progress views. > > + <structfield>total_buffer_writes</structfield> <type>bigint</type> > > + <structfield>total_file_syncs</structfield> <type>bigint</type> > > The other progress tables use [type]_total as column names for counter > targets (e.g. backup_total for backup_streamed, heap_blks_total for > heap_blks_scanned, etc.). I think that `buffers_total` and > `files_total` would be better column names. I agree and I will update this in the next patch. > > + The checkpoint operation is requested due to XLOG filling. > > + The checkpoint was started because >max_wal_size< of WAL was written. How about this "The checkpoint is started because max_wal_size is reached". > > + The checkpoint operation is requested due to timeout. > > + The checkpoint was started due to the expiration of a > >checkpoint_timeout< interval "The checkpoint is started because checkpoint_timeout expired". > > + The checkpoint operation is forced even if no XLOG activity has occurred > > + since the last one. > > + Some operation forced a checkpoint. "The checkpoint is started because some operation forced a checkpoint". > > + <entry><literal>checkpointing CommitTs pages</literal></entry> > > CommitTs -> Commit time stamp I will handle this in the next patch. Thanks & Regards, Nitin Jadhav > On Thu, 10 Feb 2022 at 07:53, Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > > > > > We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of justemitting the stats at the end. > > > > > > > > Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substepit is running, whether it is on target for completion, checkpoint_Reason > > > > (manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need somerefactoring here. > > > > > > I agree to provide above mentioned information as part of showing the > > > progress of current checkpoint operation. I am currently looking into > > > the code to know if any other information can be added. > > > > Here is the initial patch to show the progress of checkpoint through > > pg_stat_progress_checkpoint view. Please find the attachment. > > > > The information added to this view are pid - process ID of a > > CHECKPOINTER process, kind - kind of checkpoint indicates the reason > > for checkpoint (values can be wal, time or force), phase - indicates > > the current phase of checkpoint operation, total_buffer_writes - total > > number of buffers to be written, buffers_processed - number of buffers > > processed, buffers_written - number of buffers written, > > total_file_syncs - total number of files to be synced, files_synced - > > number of files synced. > > > > There are many operations happen as part of checkpoint. For each of > > the operation I am updating the phase field of > > pg_stat_progress_checkpoint view. The values supported for this field > > are initializing, checkpointing replication slots, checkpointing > > snapshots, checkpointing logical rewrite mappings, checkpointing CLOG > > pages, checkpointing CommitTs pages, checkpointing SUBTRANS pages, > > checkpointing MULTIXACT pages, checkpointing SLRU pages, checkpointing > > buffers, performing sync requests, performing two phase checkpoint, > > recycling old XLOG files and Finalizing. In case of checkpointing > > buffers phase, the fields total_buffer_writes, buffers_processed and > > buffers_written shows the detailed progress of writing buffers. In > > case of performing sync requests phase, the fields total_file_syncs > > and files_synced shows the detailed progress of syncing files. In > > other phases, only the phase field is getting updated and it is > > difficult to show the progress because we do not get the total number > > of files count without traversing the directory. It is not worth to > > calculate that as it affects the performance of the checkpoint. I also > > gave a thought to just mention the number of files processed, but this > > wont give a meaningful progress information (It can be treated as > > statistics). Hence just updating the phase field in those scenarios. > > > > Apart from above fields, I am planning to add few more fields to the > > view in the next patch. That is, process ID of the backend process > > which triggered a CHECKPOINT command, checkpoint start location, filed > > to indicate whether it is a checkpoint or restartpoint and elapsed > > time of the checkpoint operation. Please share your thoughts. I would > > be happy to add any other information that contributes to showing the > > progress of checkpoint. > > > > As per the discussion in this thread, there should be some mechanism > > to show the progress of checkpoint during shutdown and end-of-recovery > > cases as we cannot access pg_stat_progress_checkpoint in those cases. > > I am working on this to use log_startup_progress_interval mechanism to > > log the progress in the server logs. > > > > Kindly review the patch and share your thoughts. > > Interesting idea, and overall a nice addition to the > pg_stat_progress_* reporting infrastructure. > > Could you add your patch to the current commitfest at > https://commitfest.postgresql.org/37/? > > See below for some comments on the patch: > > > xlog.c @ checkpoint_progress_start, checkpoint_progress_update_param, checkpoint_progress_end > > + /* In bootstrap mode, we don't actually record anything. */ > > + if (IsBootstrapProcessingMode()) > > + return; > > Why do you check against the state of the system? > pgstat_progress_update_* already provides protections against updating > the progress tables if the progress infrastructure is not loaded; and > otherwise (in the happy path) the cost of updating the progress fields > will be quite a bit higher than normal. Updating stat_progress isn't > very expensive (quite cheap, really), so I don't quite get why you > guard against reporting stats when you expect no other client to be > listening. > > I think you can simplify this a lot by directly using > pgstat_progress_update_param() instead. > > > xlog.c @ checkpoint_progress_start > > + pgstat_progress_start_command(PROGRESS_COMMAND_CHECKPOINT, InvalidOid); > > + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_PHASE, > > + PROGRESS_CHECKPOINT_PHASE_INIT); > > + if (flags & CHECKPOINT_CAUSE_XLOG) > > + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND, > > + PROGRESS_CHECKPOINT_KIND_WAL); > > + else if (flags & CHECKPOINT_CAUSE_TIME) > > + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND, > > + PROGRESS_CHECKPOINT_KIND_TIME); > > + [...] > > Could you assign the kind of checkpoint to a local variable, and then > update the "phase" and "kind" parameters at the same time through > pgstat_progress_update_multi_param(2, ...)? See > BuildRelationExtStatistics in extended_stats.c for an example usage. > Note that regardless of whether checkpoint_progress_update* will > remain, the checks done in that function already have been checked in > this function as well, so you can use the pgstat_* functions directly. > > > monitoring.sgml > > + <structname>pg_stat_progress_checkpoint</structname> view will contain a > > + single row indicating the progress of checkpoint operation. > > ... add "if a checkpoint is currently active". > > > + <structfield>total_buffer_writes</structfield> <type>bigint</type> > > + <structfield>total_file_syncs</structfield> <type>bigint</type> > > The other progress tables use [type]_total as column names for counter > targets (e.g. backup_total for backup_streamed, heap_blks_total for > heap_blks_scanned, etc.). I think that `buffers_total` and > `files_total` would be better column names. > > > + The checkpoint operation is requested due to XLOG filling. > > + The checkpoint was started because >max_wal_size< of WAL was written. > > > + The checkpoint operation is requested due to timeout. > > + The checkpoint was started due to the expiration of a > >checkpoint_timeout< interval > > > + The checkpoint operation is forced even if no XLOG activity has occurred > > + since the last one. > > + Some operation forced a checkpoint. > > > + <entry><literal>checkpointing CommitTs pages</literal></entry> > > CommitTs -> Commit time stamp > > Thanks for working on this. > > Kind regards, > > Matthias van de Meent
pgsql-hackers by date: