Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) - Mailing list pgsql-hackers

From Nitin Jadhav
Subject Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Date
Msg-id CAMm1aWY_wThAxmhC_7mkFsYHcNP2_JQLAKJEoY5EcHBi0Q42QQ@mail.gmail.com
Whole thread Raw
In response to Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)  (Nitin Jadhav <nitinjadhavpostgres@gmail.com>)
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)  (Ashutosh Sharma <ashu.coek88@gmail.com>)
List pgsql-hackers
> > We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of
justemitting the stats at the end.
 
> >
> > Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid,
substepit is running, whether it is on target for completion, checkpoint_Reason
 
> > (manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need some
refactoringhere.
 
>
> I agree to provide above mentioned information as part of showing the
> progress of current checkpoint operation. I am currently looking into
> the code to know if any other information can be added.

Here is the initial patch to show the progress of checkpoint through
pg_stat_progress_checkpoint view. Please find the attachment.

The information added to this view are pid - process ID of a
CHECKPOINTER process, kind - kind of checkpoint indicates the reason
for checkpoint (values can be wal, time or force), phase - indicates
the current phase of checkpoint operation, total_buffer_writes - total
number of buffers to be written, buffers_processed - number of buffers
processed, buffers_written - number of buffers written,
total_file_syncs - total number of files to be synced, files_synced -
number of files synced.

There are many operations happen as part of checkpoint. For each of
the operation I am updating the phase field of
pg_stat_progress_checkpoint view. The values supported for this field
are initializing, checkpointing replication slots, checkpointing
snapshots, checkpointing logical rewrite mappings, checkpointing CLOG
pages, checkpointing CommitTs pages, checkpointing SUBTRANS pages,
checkpointing MULTIXACT pages, checkpointing SLRU pages, checkpointing
buffers, performing sync requests, performing two phase checkpoint,
recycling old XLOG files and Finalizing. In case of checkpointing
buffers phase, the fields total_buffer_writes, buffers_processed and
buffers_written shows the detailed progress of writing buffers. In
case of performing sync requests phase, the fields total_file_syncs
and files_synced shows the detailed progress of syncing files. In
other phases, only the phase field is getting updated and it is
difficult to show the progress because we do not get the total number
of files count without traversing the directory. It is not worth to
calculate that as it affects the performance of the checkpoint. I also
gave a thought to just mention the number of files processed, but this
wont give a meaningful progress information (It can be treated as
statistics). Hence just updating the phase field in those scenarios.

Apart from above fields, I am planning to add few more fields to the
view in the next patch. That is, process ID of the backend process
which triggered a CHECKPOINT command, checkpoint start location, filed
to indicate whether it is a checkpoint or restartpoint and elapsed
time of the checkpoint operation. Please share your thoughts. I would
be happy to add any other information that contributes to showing the
progress of checkpoint.

As per the discussion in this thread, there should be some mechanism
to show the progress of checkpoint during shutdown and end-of-recovery
cases as we cannot access pg_stat_progress_checkpoint in those cases.
I am working on this to use log_startup_progress_interval mechanism to
log the progress in the server logs.

Kindly review the patch and share your thoughts.


On Fri, Jan 28, 2022 at 12:24 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Jan 21, 2022 at 11:07 AM Nitin Jadhav
> <nitinjadhavpostgres@gmail.com> wrote:
> >
> > > I think the right choice to solve the *general* problem is the
> > > mentioned pg_stat_progress_checkpoints.
> > >
> > > We may want to *additionally* have the ability to log the progress
> > > specifically for the special cases when we're not able to use that
> > > view. And in those case, we can perhaps just use the existing
> > > log_startup_progress_interval parameter for this as well -- at least
> > > for the startup checkpoint.
> >
> > +1
> >
> > > We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of
justemitting the stats at the end.
 
> > >
> > > Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid,
substepit is running, whether it is on target for completion, checkpoint_Reason
 
> > > (manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need some
refactoringhere.
 
> >
> > I agree to provide above mentioned information as part of showing the
> > progress of current checkpoint operation. I am currently looking into
> > the code to know if any other information can be added.
>
> As suggested in the other thread by Julien, I'm changing the subject
> of this thread to reflect the discussion.
>
> Regards,
> Bharath Rupireddy.

Attachment

pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Replacing TAP test planning with done_testing()
Next
From: Julien Rouhaud
Date:
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints