Thread: Report checkpoint progress in server logs
Hi, At times, some of the checkpoint operations such as removing old WAL files, dealing with replication snapshot or mapping files etc. may take a while during which the server doesn't emit any logs or information, the only logs emitted are LogCheckpointStart and LogCheckpointEnd. Many times this isn't a problem if the checkpoint is quicker, but there can be extreme situations which require the users to know what's going on with the current checkpoint. Given that the commit 9ce346ea [1] introduced a nice mechanism to report the long running operations of the startup process in the server logs, I'm thinking we can have a similar progress mechanism for the checkpoint as well. There's another idea suggested in a couple of other threads to have a pg_stat_progress_checkpoint similar to pg_stat_progress_analyze/vacuum/etc. But the problem with this idea is during the end-of-recovery or shutdown checkpoints, the pg_stat_progress_checkpoint view isn't accessible as it requires a connection to the server which isn't allowed. Therefore, reporting the checkpoint progress in the server logs, much like [1], seems to be the best way IMO. We can 1) either make ereport_startup_progress and log_startup_progress_interval more generic (something like ereport_log_progress and log_progress_interval), move the code to elog.c, use it for checkpoint progress and if required for other time-consuming operations 2) or have an entirely different GUC and API for checkpoint progress. IMO, option (1) i.e. ereport_log_progress and log_progress_interval (better names are welcome) seems a better idea. Thoughts? [1] commit 9ce346eabf350a130bba46be3f8c50ba28506969 Author: Robert Haas <rhaas@postgresql.org> Date: Mon Oct 25 11:51:57 2021 -0400 Report progress of startup operations that take a long time. Regards, Bharath Rupireddy.
On Wed, Dec 29, 2021 at 3:31 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > Hi, > > At times, some of the checkpoint operations such as removing old WAL > files, dealing with replication snapshot or mapping files etc. may > take a while during which the server doesn't emit any logs or > information, the only logs emitted are LogCheckpointStart and > LogCheckpointEnd. Many times this isn't a problem if the checkpoint is > quicker, but there can be extreme situations which require the users > to know what's going on with the current checkpoint. > > Given that the commit 9ce346ea [1] introduced a nice mechanism to > report the long running operations of the startup process in the > server logs, I'm thinking we can have a similar progress mechanism for > the checkpoint as well. There's another idea suggested in a couple of > other threads to have a pg_stat_progress_checkpoint similar to > pg_stat_progress_analyze/vacuum/etc. But the problem with this idea is > during the end-of-recovery or shutdown checkpoints, the > pg_stat_progress_checkpoint view isn't accessible as it requires a > connection to the server which isn't allowed. > > Therefore, reporting the checkpoint progress in the server logs, much > like [1], seems to be the best way IMO. We can 1) either make > ereport_startup_progress and log_startup_progress_interval more > generic (something like ereport_log_progress and > log_progress_interval), move the code to elog.c, use it for > checkpoint progress and if required for other time-consuming > operations 2) or have an entirely different GUC and API for checkpoint > progress. > > IMO, option (1) i.e. ereport_log_progress and log_progress_interval > (better names are welcome) seems a better idea. > > Thoughts? I find progress reporting in the logfile to generally be a terrible way of doing things, and the fact that we do it for the startup process is/should be only because we have no other choice, not because it's the right choice. I think the right choice to solve the *general* problem is the mentioned pg_stat_progress_checkpoints. We may want to *additionally* have the ability to log the progress specifically for the special cases when we're not able to use that view. And in those case, we can perhaps just use the existing log_startup_progress_interval parameter for this as well -- at least for the startup checkpoint. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes: >> Therefore, reporting the checkpoint progress in the server logs, much >> like [1], seems to be the best way IMO. > I find progress reporting in the logfile to generally be a terrible > way of doing things, and the fact that we do it for the startup > process is/should be only because we have no other choice, not because > it's the right choice. I'm already pretty seriously unhappy about the log-spamming effects of 64da07c41 (default to log_checkpoints=on), and am willing to lay a side bet that that gets reverted after we have some field experience with it. This proposal seems far worse from that standpoint. Keep in mind that our out-of-the-box logging configuration still doesn't have any log rotation ability, which means that the noisier the server is in normal operation, the sooner you fill your disk. > I think the right choice to solve the *general* problem is the > mentioned pg_stat_progress_checkpoints. +1 regards, tom lane
Coincidentally, I was thinking about the same yesterday after tired of waiting for the checkpoint completion on a server.
On Wed, Dec 29, 2021 at 7:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
>> Therefore, reporting the checkpoint progress in the server logs, much
>> like [1], seems to be the best way IMO.
> I find progress reporting in the logfile to generally be a terrible
> way of doing things, and the fact that we do it for the startup
> process is/should be only because we have no other choice, not because
> it's the right choice.
I'm already pretty seriously unhappy about the log-spamming effects of
64da07c41 (default to log_checkpoints=on), and am willing to lay a side
bet that that gets reverted after we have some field experience with it.
This proposal seems far worse from that standpoint. Keep in mind that
our out-of-the-box logging configuration still doesn't have any log
rotation ability, which means that the noisier the server is in normal
operation, the sooner you fill your disk.
Server is not open up for the queries while running the end of recovery checkpoint and a catalog view may not help here but the process title change or logging would be helpful in such cases. When the server is running the recovery, anxious customers ask several times the ETA for recovery completion, and not having visibility into these operations makes life difficult for the DBA/operations.
> I think the right choice to solve the *general* problem is the
> mentioned pg_stat_progress_checkpoints.
+1
+1 to this. We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of just emitting the stats at the end.
Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substep it is running, whether it is on target for completion, checkpoint_Reason (manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need some refactoring here.
regards, tom lane
On Wed, Dec 29, 2021 at 10:40:59AM -0500, Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: >> I think the right choice to solve the *general* problem is the >> mentioned pg_stat_progress_checkpoints. > > +1 Agreed. I don't see why this would not work as there are PgBackendStatus entries for each auxiliary process. -- Michael
Attachment
On Wed, Dec 29, 2021 at 10:40:59AM -0500, Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: > >> Therefore, reporting the checkpoint progress in the server logs, much > >> like [1], seems to be the best way IMO. > > > I find progress reporting in the logfile to generally be a terrible > > way of doing things, and the fact that we do it for the startup > > process is/should be only because we have no other choice, not because > > it's the right choice. > > I'm already pretty seriously unhappy about the log-spamming effects of > 64da07c41 (default to log_checkpoints=on), and am willing to lay a side > bet that that gets reverted after we have some field experience with it. > This proposal seems far worse from that standpoint. Keep in mind that > our out-of-the-box logging configuration still doesn't have any log > rotation ability, which means that the noisier the server is in normal > operation, the sooner you fill your disk. I think we are looking at three potential observable behaviors people might care about: * the current activity/progress of checkpoints * the historical reporting of checkpoint completion, mixed in with other log messages for later analysis * the aggregate behavior of checkpoint operation I think it is clear that checkpoint progress activity isn't useful for the server logs because that information has little historical value, but does fit for a progress view. As Tom already expressed, we will have to wait to see if non-progress checkpoint information in the logs has sufficient historical value. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
> I think the right choice to solve the *general* problem is the > mentioned pg_stat_progress_checkpoints. > > We may want to *additionally* have the ability to log the progress > specifically for the special cases when we're not able to use that > view. And in those case, we can perhaps just use the existing > log_startup_progress_interval parameter for this as well -- at least > for the startup checkpoint. +1 > We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of just emittingthe stats at the end. > > Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substep itis running, whether it is on target for completion, checkpoint_Reason > (manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need some refactoringhere. I agree to provide above mentioned information as part of showing the progress of current checkpoint operation. I am currently looking into the code to know if any other information can be added. Thanks & Regards, Nitin Jadhav On Thu, Jan 6, 2022 at 5:12 AM Bruce Momjian <bruce@momjian.us> wrote: > > On Wed, Dec 29, 2021 at 10:40:59AM -0500, Tom Lane wrote: > > Magnus Hagander <magnus@hagander.net> writes: > > >> Therefore, reporting the checkpoint progress in the server logs, much > > >> like [1], seems to be the best way IMO. > > > > > I find progress reporting in the logfile to generally be a terrible > > > way of doing things, and the fact that we do it for the startup > > > process is/should be only because we have no other choice, not because > > > it's the right choice. > > > > I'm already pretty seriously unhappy about the log-spamming effects of > > 64da07c41 (default to log_checkpoints=on), and am willing to lay a side > > bet that that gets reverted after we have some field experience with it. > > This proposal seems far worse from that standpoint. Keep in mind that > > our out-of-the-box logging configuration still doesn't have any log > > rotation ability, which means that the noisier the server is in normal > > operation, the sooner you fill your disk. > > I think we are looking at three potential observable behaviors people > might care about: > > * the current activity/progress of checkpoints > * the historical reporting of checkpoint completion, mixed in with other > log messages for later analysis > * the aggregate behavior of checkpoint operation > > I think it is clear that checkpoint progress activity isn't useful for > the server logs because that information has little historical value, > but does fit for a progress view. As Tom already expressed, we will > have to wait to see if non-progress checkpoint information in the logs > has sufficient historical value. > > -- > Bruce Momjian <bruce@momjian.us> https://momjian.us > EDB https://enterprisedb.com > > If only the physical world exists, free will is an illusion. > > >
Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Bharath Rupireddy
Date:
On Fri, Jan 21, 2022 at 11:07 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > > I think the right choice to solve the *general* problem is the > > mentioned pg_stat_progress_checkpoints. > > > > We may want to *additionally* have the ability to log the progress > > specifically for the special cases when we're not able to use that > > view. And in those case, we can perhaps just use the existing > > log_startup_progress_interval parameter for this as well -- at least > > for the startup checkpoint. > > +1 > > > We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of justemitting the stats at the end. > > > > Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substepit is running, whether it is on target for completion, checkpoint_Reason > > (manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need some refactoringhere. > > I agree to provide above mentioned information as part of showing the > progress of current checkpoint operation. I am currently looking into > the code to know if any other information can be added. As suggested in the other thread by Julien, I'm changing the subject of this thread to reflect the discussion. Regards, Bharath Rupireddy.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> > We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of justemitting the stats at the end. > > > > Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substepit is running, whether it is on target for completion, checkpoint_Reason > > (manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need some refactoringhere. > > I agree to provide above mentioned information as part of showing the > progress of current checkpoint operation. I am currently looking into > the code to know if any other information can be added. Here is the initial patch to show the progress of checkpoint through pg_stat_progress_checkpoint view. Please find the attachment. The information added to this view are pid - process ID of a CHECKPOINTER process, kind - kind of checkpoint indicates the reason for checkpoint (values can be wal, time or force), phase - indicates the current phase of checkpoint operation, total_buffer_writes - total number of buffers to be written, buffers_processed - number of buffers processed, buffers_written - number of buffers written, total_file_syncs - total number of files to be synced, files_synced - number of files synced. There are many operations happen as part of checkpoint. For each of the operation I am updating the phase field of pg_stat_progress_checkpoint view. The values supported for this field are initializing, checkpointing replication slots, checkpointing snapshots, checkpointing logical rewrite mappings, checkpointing CLOG pages, checkpointing CommitTs pages, checkpointing SUBTRANS pages, checkpointing MULTIXACT pages, checkpointing SLRU pages, checkpointing buffers, performing sync requests, performing two phase checkpoint, recycling old XLOG files and Finalizing. In case of checkpointing buffers phase, the fields total_buffer_writes, buffers_processed and buffers_written shows the detailed progress of writing buffers. In case of performing sync requests phase, the fields total_file_syncs and files_synced shows the detailed progress of syncing files. In other phases, only the phase field is getting updated and it is difficult to show the progress because we do not get the total number of files count without traversing the directory. It is not worth to calculate that as it affects the performance of the checkpoint. I also gave a thought to just mention the number of files processed, but this wont give a meaningful progress information (It can be treated as statistics). Hence just updating the phase field in those scenarios. Apart from above fields, I am planning to add few more fields to the view in the next patch. That is, process ID of the backend process which triggered a CHECKPOINT command, checkpoint start location, filed to indicate whether it is a checkpoint or restartpoint and elapsed time of the checkpoint operation. Please share your thoughts. I would be happy to add any other information that contributes to showing the progress of checkpoint. As per the discussion in this thread, there should be some mechanism to show the progress of checkpoint during shutdown and end-of-recovery cases as we cannot access pg_stat_progress_checkpoint in those cases. I am working on this to use log_startup_progress_interval mechanism to log the progress in the server logs. Kindly review the patch and share your thoughts. On Fri, Jan 28, 2022 at 12:24 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Fri, Jan 21, 2022 at 11:07 AM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > > > > I think the right choice to solve the *general* problem is the > > > mentioned pg_stat_progress_checkpoints. > > > > > > We may want to *additionally* have the ability to log the progress > > > specifically for the special cases when we're not able to use that > > > view. And in those case, we can perhaps just use the existing > > > log_startup_progress_interval parameter for this as well -- at least > > > for the startup checkpoint. > > > > +1 > > > > > We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of justemitting the stats at the end. > > > > > > Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substepit is running, whether it is on target for completion, checkpoint_Reason > > > (manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need some refactoringhere. > > > > I agree to provide above mentioned information as part of showing the > > progress of current checkpoint operation. I am currently looking into > > the code to know if any other information can be added. > > As suggested in the other thread by Julien, I'm changing the subject > of this thread to reflect the discussion. > > Regards, > Bharath Rupireddy.
Attachment
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> Apart from above fields, I am planning to add few more fields to the > view in the next patch. That is, process ID of the backend process > which triggered a CHECKPOINT command, checkpoint start location, filed > to indicate whether it is a checkpoint or restartpoint and elapsed > time of the checkpoint operation. Please share your thoughts. I would > be happy to add any other information that contributes to showing the > progress of checkpoint. The progress reporting mechanism of postgres uses the 'st_progress_param' array of 'PgBackendStatus' structure to hold the information related to the progress. There is a function 'pgstat_progress_update_param()' which takes 'index' and 'val' as arguments and updates the 'val' to corresponding 'index' in the 'st_progress_param' array. This mechanism works fine when all the progress information is of type integer as the data type of 'st_progress_param' is of type integer. If the progress data is of different type than integer, then there is no easy way to do so. In my understanding, define a new structure with additional fields. Add this as part of the 'PgBackendStatus' structure and support the necessary function to update and fetch the data from this structure. This becomes very ugly as it will not match the existing mechanism of progress reporting. Kindly let me know if there is any better way to handle this. If there are any changes to the existing mechanism to make it generic to support basic data types, I would like to discuss this in the new thread. On Thu, Feb 10, 2022 at 12:22 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > > > We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of justemitting the stats at the end. > > > > > > Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substepit is running, whether it is on target for completion, checkpoint_Reason > > > (manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need some refactoringhere. > > > > I agree to provide above mentioned information as part of showing the > > progress of current checkpoint operation. I am currently looking into > > the code to know if any other information can be added. > > Here is the initial patch to show the progress of checkpoint through > pg_stat_progress_checkpoint view. Please find the attachment. > > The information added to this view are pid - process ID of a > CHECKPOINTER process, kind - kind of checkpoint indicates the reason > for checkpoint (values can be wal, time or force), phase - indicates > the current phase of checkpoint operation, total_buffer_writes - total > number of buffers to be written, buffers_processed - number of buffers > processed, buffers_written - number of buffers written, > total_file_syncs - total number of files to be synced, files_synced - > number of files synced. > > There are many operations happen as part of checkpoint. For each of > the operation I am updating the phase field of > pg_stat_progress_checkpoint view. The values supported for this field > are initializing, checkpointing replication slots, checkpointing > snapshots, checkpointing logical rewrite mappings, checkpointing CLOG > pages, checkpointing CommitTs pages, checkpointing SUBTRANS pages, > checkpointing MULTIXACT pages, checkpointing SLRU pages, checkpointing > buffers, performing sync requests, performing two phase checkpoint, > recycling old XLOG files and Finalizing. In case of checkpointing > buffers phase, the fields total_buffer_writes, buffers_processed and > buffers_written shows the detailed progress of writing buffers. In > case of performing sync requests phase, the fields total_file_syncs > and files_synced shows the detailed progress of syncing files. In > other phases, only the phase field is getting updated and it is > difficult to show the progress because we do not get the total number > of files count without traversing the directory. It is not worth to > calculate that as it affects the performance of the checkpoint. I also > gave a thought to just mention the number of files processed, but this > wont give a meaningful progress information (It can be treated as > statistics). Hence just updating the phase field in those scenarios. > > Apart from above fields, I am planning to add few more fields to the > view in the next patch. That is, process ID of the backend process > which triggered a CHECKPOINT command, checkpoint start location, filed > to indicate whether it is a checkpoint or restartpoint and elapsed > time of the checkpoint operation. Please share your thoughts. I would > be happy to add any other information that contributes to showing the > progress of checkpoint. > > As per the discussion in this thread, there should be some mechanism > to show the progress of checkpoint during shutdown and end-of-recovery > cases as we cannot access pg_stat_progress_checkpoint in those cases. > I am working on this to use log_startup_progress_interval mechanism to > log the progress in the server logs. > > Kindly review the patch and share your thoughts. > > > On Fri, Jan 28, 2022 at 12:24 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > On Fri, Jan 21, 2022 at 11:07 AM Nitin Jadhav > > <nitinjadhavpostgres@gmail.com> wrote: > > > > > > > I think the right choice to solve the *general* problem is the > > > > mentioned pg_stat_progress_checkpoints. > > > > > > > > We may want to *additionally* have the ability to log the progress > > > > specifically for the special cases when we're not able to use that > > > > view. And in those case, we can perhaps just use the existing > > > > log_startup_progress_interval parameter for this as well -- at least > > > > for the startup checkpoint. > > > > > > +1 > > > > > > > We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of justemitting the stats at the end. > > > > > > > > Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substepit is running, whether it is on target for completion, checkpoint_Reason > > > > (manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need somerefactoring here. > > > > > > I agree to provide above mentioned information as part of showing the > > > progress of current checkpoint operation. I am currently looking into > > > the code to know if any other information can be added. > > > > As suggested in the other thread by Julien, I'm changing the subject > > of this thread to reflect the discussion. > > > > Regards, > > Bharath Rupireddy.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Matthias van de Meent
Date:
On Tue, 15 Feb 2022 at 13:16, Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > > Apart from above fields, I am planning to add few more fields to the > > view in the next patch. That is, process ID of the backend process > > which triggered a CHECKPOINT command, checkpoint start location, filed > > to indicate whether it is a checkpoint or restartpoint and elapsed > > time of the checkpoint operation. Please share your thoughts. I would > > be happy to add any other information that contributes to showing the > > progress of checkpoint. > > The progress reporting mechanism of postgres uses the > 'st_progress_param' array of 'PgBackendStatus' structure to hold the > information related to the progress. There is a function > 'pgstat_progress_update_param()' which takes 'index' and 'val' as > arguments and updates the 'val' to corresponding 'index' in the > 'st_progress_param' array. This mechanism works fine when all the > progress information is of type integer as the data type of > 'st_progress_param' is of type integer. If the progress data is of > different type than integer, then there is no easy way to do so. Progress parameters are int64, so all of the new 'checkpoint start location' (lsn = uint64), 'triggering backend PID' (int), 'elapsed time' (store as start time in stat_progress, timestamp fits in 64 bits) and 'checkpoint or restartpoint?' (boolean) would each fit in a current stat_progress parameter. Some processing would be required at the view, but that's not impossible to overcome. Kind regards, Matthias van de Meent
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Matthias van de Meent
Date:
On Thu, 10 Feb 2022 at 07:53, Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > > > We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of justemitting the stats at the end. > > > > > > Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substepit is running, whether it is on target for completion, checkpoint_Reason > > > (manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need some refactoringhere. > > > > I agree to provide above mentioned information as part of showing the > > progress of current checkpoint operation. I am currently looking into > > the code to know if any other information can be added. > > Here is the initial patch to show the progress of checkpoint through > pg_stat_progress_checkpoint view. Please find the attachment. > > The information added to this view are pid - process ID of a > CHECKPOINTER process, kind - kind of checkpoint indicates the reason > for checkpoint (values can be wal, time or force), phase - indicates > the current phase of checkpoint operation, total_buffer_writes - total > number of buffers to be written, buffers_processed - number of buffers > processed, buffers_written - number of buffers written, > total_file_syncs - total number of files to be synced, files_synced - > number of files synced. > > There are many operations happen as part of checkpoint. For each of > the operation I am updating the phase field of > pg_stat_progress_checkpoint view. The values supported for this field > are initializing, checkpointing replication slots, checkpointing > snapshots, checkpointing logical rewrite mappings, checkpointing CLOG > pages, checkpointing CommitTs pages, checkpointing SUBTRANS pages, > checkpointing MULTIXACT pages, checkpointing SLRU pages, checkpointing > buffers, performing sync requests, performing two phase checkpoint, > recycling old XLOG files and Finalizing. In case of checkpointing > buffers phase, the fields total_buffer_writes, buffers_processed and > buffers_written shows the detailed progress of writing buffers. In > case of performing sync requests phase, the fields total_file_syncs > and files_synced shows the detailed progress of syncing files. In > other phases, only the phase field is getting updated and it is > difficult to show the progress because we do not get the total number > of files count without traversing the directory. It is not worth to > calculate that as it affects the performance of the checkpoint. I also > gave a thought to just mention the number of files processed, but this > wont give a meaningful progress information (It can be treated as > statistics). Hence just updating the phase field in those scenarios. > > Apart from above fields, I am planning to add few more fields to the > view in the next patch. That is, process ID of the backend process > which triggered a CHECKPOINT command, checkpoint start location, filed > to indicate whether it is a checkpoint or restartpoint and elapsed > time of the checkpoint operation. Please share your thoughts. I would > be happy to add any other information that contributes to showing the > progress of checkpoint. > > As per the discussion in this thread, there should be some mechanism > to show the progress of checkpoint during shutdown and end-of-recovery > cases as we cannot access pg_stat_progress_checkpoint in those cases. > I am working on this to use log_startup_progress_interval mechanism to > log the progress in the server logs. > > Kindly review the patch and share your thoughts. Interesting idea, and overall a nice addition to the pg_stat_progress_* reporting infrastructure. Could you add your patch to the current commitfest at https://commitfest.postgresql.org/37/? See below for some comments on the patch: > xlog.c @ checkpoint_progress_start, checkpoint_progress_update_param, checkpoint_progress_end > + /* In bootstrap mode, we don't actually record anything. */ > + if (IsBootstrapProcessingMode()) > + return; Why do you check against the state of the system? pgstat_progress_update_* already provides protections against updating the progress tables if the progress infrastructure is not loaded; and otherwise (in the happy path) the cost of updating the progress fields will be quite a bit higher than normal. Updating stat_progress isn't very expensive (quite cheap, really), so I don't quite get why you guard against reporting stats when you expect no other client to be listening. I think you can simplify this a lot by directly using pgstat_progress_update_param() instead. > xlog.c @ checkpoint_progress_start > + pgstat_progress_start_command(PROGRESS_COMMAND_CHECKPOINT, InvalidOid); > + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_PHASE, > + PROGRESS_CHECKPOINT_PHASE_INIT); > + if (flags & CHECKPOINT_CAUSE_XLOG) > + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND, > + PROGRESS_CHECKPOINT_KIND_WAL); > + else if (flags & CHECKPOINT_CAUSE_TIME) > + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND, > + PROGRESS_CHECKPOINT_KIND_TIME); > + [...] Could you assign the kind of checkpoint to a local variable, and then update the "phase" and "kind" parameters at the same time through pgstat_progress_update_multi_param(2, ...)? See BuildRelationExtStatistics in extended_stats.c for an example usage. Note that regardless of whether checkpoint_progress_update* will remain, the checks done in that function already have been checked in this function as well, so you can use the pgstat_* functions directly. > monitoring.sgml > + <structname>pg_stat_progress_checkpoint</structname> view will contain a > + single row indicating the progress of checkpoint operation. ... add "if a checkpoint is currently active". > + <structfield>total_buffer_writes</structfield> <type>bigint</type> > + <structfield>total_file_syncs</structfield> <type>bigint</type> The other progress tables use [type]_total as column names for counter targets (e.g. backup_total for backup_streamed, heap_blks_total for heap_blks_scanned, etc.). I think that `buffers_total` and `files_total` would be better column names. > + The checkpoint operation is requested due to XLOG filling. + The checkpoint was started because >max_wal_size< of WAL was written. > + The checkpoint operation is requested due to timeout. + The checkpoint was started due to the expiration of a >checkpoint_timeout< interval > + The checkpoint operation is forced even if no XLOG activity has occurred > + since the last one. + Some operation forced a checkpoint. > + <entry><literal>checkpointing CommitTs pages</literal></entry> CommitTs -> Commit time stamp Thanks for working on this. Kind regards, Matthias van de Meent
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> > The progress reporting mechanism of postgres uses the > > 'st_progress_param' array of 'PgBackendStatus' structure to hold the > > information related to the progress. There is a function > > 'pgstat_progress_update_param()' which takes 'index' and 'val' as > > arguments and updates the 'val' to corresponding 'index' in the > > 'st_progress_param' array. This mechanism works fine when all the > > progress information is of type integer as the data type of > > 'st_progress_param' is of type integer. If the progress data is of > > different type than integer, then there is no easy way to do so. > > Progress parameters are int64, so all of the new 'checkpoint start > location' (lsn = uint64), 'triggering backend PID' (int), 'elapsed > time' (store as start time in stat_progress, timestamp fits in 64 > bits) and 'checkpoint or restartpoint?' (boolean) would each fit in a > current stat_progress parameter. Some processing would be required at > the view, but that's not impossible to overcome. Thank you for sharing the information. 'triggering backend PID' (int) - can be stored without any problem. 'checkpoint or restartpoint?' (boolean) - can be stored as a integer value like PROGRESS_CHECKPOINT_TYPE_CHECKPOINT(0) and PROGRESS_CHECKPOINT_TYPE_RESTARTPOINT(1). 'elapsed time' (store as start time in stat_progress, timestamp fits in 64 bits) - As Timestamptz is of type int64 internally, so we can store the timestamp value in the progres parameter and then expose a function like 'pg_stat_get_progress_checkpoint_elapsed' which takes int64 (not Timestamptz) as argument and then returns string representing the elapsed time. This function can be called in the view. Is it safe/advisable to use int64 type here rather than Timestamptz for this purpose? 'checkpoint start location' (lsn = uint64) - I feel we cannot use progress parameters for this case. As assigning uint64 to int64 type would be an issue for larger values and can lead to hidden bugs. Thoughts? Thanks & Regards, Nitin Jadhav On Thu, Feb 17, 2022 at 1:33 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > On Thu, 10 Feb 2022 at 07:53, Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > > > > > We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of justemitting the stats at the end. > > > > > > > > Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substepit is running, whether it is on target for completion, checkpoint_Reason > > > > (manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need somerefactoring here. > > > > > > I agree to provide above mentioned information as part of showing the > > > progress of current checkpoint operation. I am currently looking into > > > the code to know if any other information can be added. > > > > Here is the initial patch to show the progress of checkpoint through > > pg_stat_progress_checkpoint view. Please find the attachment. > > > > The information added to this view are pid - process ID of a > > CHECKPOINTER process, kind - kind of checkpoint indicates the reason > > for checkpoint (values can be wal, time or force), phase - indicates > > the current phase of checkpoint operation, total_buffer_writes - total > > number of buffers to be written, buffers_processed - number of buffers > > processed, buffers_written - number of buffers written, > > total_file_syncs - total number of files to be synced, files_synced - > > number of files synced. > > > > There are many operations happen as part of checkpoint. For each of > > the operation I am updating the phase field of > > pg_stat_progress_checkpoint view. The values supported for this field > > are initializing, checkpointing replication slots, checkpointing > > snapshots, checkpointing logical rewrite mappings, checkpointing CLOG > > pages, checkpointing CommitTs pages, checkpointing SUBTRANS pages, > > checkpointing MULTIXACT pages, checkpointing SLRU pages, checkpointing > > buffers, performing sync requests, performing two phase checkpoint, > > recycling old XLOG files and Finalizing. In case of checkpointing > > buffers phase, the fields total_buffer_writes, buffers_processed and > > buffers_written shows the detailed progress of writing buffers. In > > case of performing sync requests phase, the fields total_file_syncs > > and files_synced shows the detailed progress of syncing files. In > > other phases, only the phase field is getting updated and it is > > difficult to show the progress because we do not get the total number > > of files count without traversing the directory. It is not worth to > > calculate that as it affects the performance of the checkpoint. I also > > gave a thought to just mention the number of files processed, but this > > wont give a meaningful progress information (It can be treated as > > statistics). Hence just updating the phase field in those scenarios. > > > > Apart from above fields, I am planning to add few more fields to the > > view in the next patch. That is, process ID of the backend process > > which triggered a CHECKPOINT command, checkpoint start location, filed > > to indicate whether it is a checkpoint or restartpoint and elapsed > > time of the checkpoint operation. Please share your thoughts. I would > > be happy to add any other information that contributes to showing the > > progress of checkpoint. > > > > As per the discussion in this thread, there should be some mechanism > > to show the progress of checkpoint during shutdown and end-of-recovery > > cases as we cannot access pg_stat_progress_checkpoint in those cases. > > I am working on this to use log_startup_progress_interval mechanism to > > log the progress in the server logs. > > > > Kindly review the patch and share your thoughts. > > Interesting idea, and overall a nice addition to the > pg_stat_progress_* reporting infrastructure. > > Could you add your patch to the current commitfest at > https://commitfest.postgresql.org/37/? > > See below for some comments on the patch: > > > xlog.c @ checkpoint_progress_start, checkpoint_progress_update_param, checkpoint_progress_end > > + /* In bootstrap mode, we don't actually record anything. */ > > + if (IsBootstrapProcessingMode()) > > + return; > > Why do you check against the state of the system? > pgstat_progress_update_* already provides protections against updating > the progress tables if the progress infrastructure is not loaded; and > otherwise (in the happy path) the cost of updating the progress fields > will be quite a bit higher than normal. Updating stat_progress isn't > very expensive (quite cheap, really), so I don't quite get why you > guard against reporting stats when you expect no other client to be > listening. > > I think you can simplify this a lot by directly using > pgstat_progress_update_param() instead. > > > xlog.c @ checkpoint_progress_start > > + pgstat_progress_start_command(PROGRESS_COMMAND_CHECKPOINT, InvalidOid); > > + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_PHASE, > > + PROGRESS_CHECKPOINT_PHASE_INIT); > > + if (flags & CHECKPOINT_CAUSE_XLOG) > > + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND, > > + PROGRESS_CHECKPOINT_KIND_WAL); > > + else if (flags & CHECKPOINT_CAUSE_TIME) > > + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND, > > + PROGRESS_CHECKPOINT_KIND_TIME); > > + [...] > > Could you assign the kind of checkpoint to a local variable, and then > update the "phase" and "kind" parameters at the same time through > pgstat_progress_update_multi_param(2, ...)? See > BuildRelationExtStatistics in extended_stats.c for an example usage. > Note that regardless of whether checkpoint_progress_update* will > remain, the checks done in that function already have been checked in > this function as well, so you can use the pgstat_* functions directly. > > > monitoring.sgml > > + <structname>pg_stat_progress_checkpoint</structname> view will contain a > > + single row indicating the progress of checkpoint operation. > > ... add "if a checkpoint is currently active". > > > + <structfield>total_buffer_writes</structfield> <type>bigint</type> > > + <structfield>total_file_syncs</structfield> <type>bigint</type> > > The other progress tables use [type]_total as column names for counter > targets (e.g. backup_total for backup_streamed, heap_blks_total for > heap_blks_scanned, etc.). I think that `buffers_total` and > `files_total` would be better column names. > > > + The checkpoint operation is requested due to XLOG filling. > > + The checkpoint was started because >max_wal_size< of WAL was written. > > > + The checkpoint operation is requested due to timeout. > > + The checkpoint was started due to the expiration of a > >checkpoint_timeout< interval > > > + The checkpoint operation is forced even if no XLOG activity has occurred > > + since the last one. > > + Some operation forced a checkpoint. > > > + <entry><literal>checkpointing CommitTs pages</literal></entry> > > CommitTs -> Commit time stamp > > Thanks for working on this. > > Kind regards, > > Matthias van de Meent
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Julien Rouhaud
Date:
Hi, On Thu, Feb 17, 2022 at 12:26:07PM +0530, Nitin Jadhav wrote: > > Thank you for sharing the information. 'triggering backend PID' (int) > - can be stored without any problem. There can be multiple processes triggering a checkpoint, or at least wanting it to happen or happen faster. > 'checkpoint or restartpoint?' Do you actually need to store that? Can't it be inferred from pg_is_in_recovery()?
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Matthias van de Meent
Date:
On Thu, 17 Feb 2022 at 07:56, Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > > Progress parameters are int64, so all of the new 'checkpoint start > > location' (lsn = uint64), 'triggering backend PID' (int), 'elapsed > > time' (store as start time in stat_progress, timestamp fits in 64 > > bits) and 'checkpoint or restartpoint?' (boolean) would each fit in a > > current stat_progress parameter. Some processing would be required at > > the view, but that's not impossible to overcome. > > Thank you for sharing the information. 'triggering backend PID' (int) > - can be stored without any problem. 'checkpoint or restartpoint?' > (boolean) - can be stored as a integer value like > PROGRESS_CHECKPOINT_TYPE_CHECKPOINT(0) and > PROGRESS_CHECKPOINT_TYPE_RESTARTPOINT(1). 'elapsed time' (store as > start time in stat_progress, timestamp fits in 64 bits) - As > Timestamptz is of type int64 internally, so we can store the timestamp > value in the progres parameter and then expose a function like > 'pg_stat_get_progress_checkpoint_elapsed' which takes int64 (not > Timestamptz) as argument and then returns string representing the > elapsed time. No need to use a string there; I think exposing the checkpoint start time is good enough. The conversion of int64 to timestamp[tz] can be done in SQL (although I'm not sure that exposing the internal bitwise representation of Interval should be exposed to that extent) [0]. Users can then extract the duration interval using now() - start_time, which also allows the user to use their own preferred formatting. > This function can be called in the view. Is it > safe/advisable to use int64 type here rather than Timestamptz for this > purpose? Yes, this must be exposed through int64, as the sql-callable pg_stat_get_progress_info only exposes bigint columns. Any transformation function may return other types (see pg_indexam_progress_phasename for an example of that). > 'checkpoint start location' (lsn = uint64) - I feel we > cannot use progress parameters for this case. As assigning uint64 to > int64 type would be an issue for larger values and can lead to hidden > bugs. Not necessarily - we can (without much trouble) do a bitwise cast from uint64 to int64, and then (in SQL) cast it back to a pg_lsn [1]. Not very elegant, but it works quite well. Kind regards, Matthias van de Meent [0] Assuming we don't care about the years past 294246 CE (2942467 is when int64 overflows into negatives), the following works without any precision losses: SELECT to_timestamp((stat.my_int64::bigint/1000000)::float8) + make_interval(0, 0, 0, 0, 0, 0, MOD(stat.my_int64, 1000000)::float8 / 1000000::float8) FROM (SELECT 1::bigint) AS stat(my_int64); [1] SELECT '0/0'::pg_lsn + ((CASE WHEN stat.my_int64 < 0 THEN pow(2::numeric, 64::numeric)::numeric ELSE 0::numeric END) + stat.my_int64::numeric) FROM (SELECT -2::bigint /* 0xFFFFFFFF/FFFFFFFE */ AS my_bigint_lsn) AS stat(my_int64);
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> > Thank you for sharing the information. 'triggering backend PID' (int) > > - can be stored without any problem. > > There can be multiple processes triggering a checkpoint, or at least wanting it > to happen or happen faster. Yes. There can be multiple processes but there will be one checkpoint operation at a time. So the backend PID corresponds to the current checkpoint operation. Let me know if I am missing something. > > 'checkpoint or restartpoint?' > > Do you actually need to store that? Can't it be inferred from > pg_is_in_recovery()? AFAIK we cannot use pg_is_in_recovery() to predict whether it is a checkpoint or restartpoint because if the system exits from recovery mode during restartpoint then any query to pg_stat_progress_checkpoint view will return it as a checkpoint which is ideally not correct. Please correct me if I am wrong. Thanks & Regards, Nitin Jadhav On Thu, Feb 17, 2022 at 4:35 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > Hi, > > On Thu, Feb 17, 2022 at 12:26:07PM +0530, Nitin Jadhav wrote: > > > > Thank you for sharing the information. 'triggering backend PID' (int) > > - can be stored without any problem. > > There can be multiple processes triggering a checkpoint, or at least wanting it > to happen or happen faster. > > > 'checkpoint or restartpoint?' > > Do you actually need to store that? Can't it be inferred from > pg_is_in_recovery()?
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Julien Rouhaud
Date:
Hi, On Thu, Feb 17, 2022 at 10:39:02PM +0530, Nitin Jadhav wrote: > > > Thank you for sharing the information. 'triggering backend PID' (int) > > > - can be stored without any problem. > > > > There can be multiple processes triggering a checkpoint, or at least wanting it > > to happen or happen faster. > > Yes. There can be multiple processes but there will be one checkpoint > operation at a time. So the backend PID corresponds to the current > checkpoint operation. Let me know if I am missing something. If there's a checkpoint timed triggered and then someone calls pg_start_backup() which then wait for the end of the current checkpoint (possibly after changing the flags), I think the view should reflect that in some way. Maybe storing an array of (pid, flags) is too much, but at least a counter with the number of processes actively waiting for the end of the checkpoint. > > > 'checkpoint or restartpoint?' > > > > Do you actually need to store that? Can't it be inferred from > > pg_is_in_recovery()? > > AFAIK we cannot use pg_is_in_recovery() to predict whether it is a > checkpoint or restartpoint because if the system exits from recovery > mode during restartpoint then any query to pg_stat_progress_checkpoint > view will return it as a checkpoint which is ideally not correct. Please > correct me if I am wrong. Recovery ends with an end-of-recovery checkpoint that has to finish before the promotion can happen, so I don't think that a restart can still be in progress if pg_is_in_recovery() returns false.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> Interesting idea, and overall a nice addition to the > pg_stat_progress_* reporting infrastructure. > > Could you add your patch to the current commitfest at > https://commitfest.postgresql.org/37/? > > See below for some comments on the patch: Thanks you for reviewing. I have added it to the commitfest - https://commitfest.postgresql.org/37/3545/ > > xlog.c @ checkpoint_progress_start, checkpoint_progress_update_param, checkpoint_progress_end > > + /* In bootstrap mode, we don't actually record anything. */ > > + if (IsBootstrapProcessingMode()) > > + return; > > Why do you check against the state of the system? > pgstat_progress_update_* already provides protections against updating > the progress tables if the progress infrastructure is not loaded; and > otherwise (in the happy path) the cost of updating the progress fields > will be quite a bit higher than normal. Updating stat_progress isn't > very expensive (quite cheap, really), so I don't quite get why you > guard against reporting stats when you expect no other client to be > listening. Nice point. I agree that the extra guards(IsBootstrapProcessingMode() and (flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY)) == 0) are not needed as the progress reporting mechanism handles that internally (It only updates when there is an access to the pg_stat_progress_activity view). I am planning to add the progress of checkpoint during shutdown and end-of-recovery cases in server logs as we don't have access to the view. In this case these guards are necessary. checkpoint_progress_update_param() is a generic function to report progress to the view or server logs. Thoughts? > I think you can simplify this a lot by directly using > pgstat_progress_update_param() instead. > > > xlog.c @ checkpoint_progress_start > > + pgstat_progress_start_command(PROGRESS_COMMAND_CHECKPOINT, InvalidOid); > > + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_PHASE, > > + PROGRESS_CHECKPOINT_PHASE_INIT); > > + if (flags & CHECKPOINT_CAUSE_XLOG) > > + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND, > > + PROGRESS_CHECKPOINT_KIND_WAL); > > + else if (flags & CHECKPOINT_CAUSE_TIME) > > + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND, > > + PROGRESS_CHECKPOINT_KIND_TIME); > > + [...] > > Could you assign the kind of checkpoint to a local variable, and then > update the "phase" and "kind" parameters at the same time through > pgstat_progress_update_multi_param(2, ...)? See > BuildRelationExtStatistics in extended_stats.c for an example usage. I will make use of pgstat_progress_update_multi_param() in the next patch to replace multiple calls to checkpoint_progress_update_param(). > Note that regardless of whether checkpoint_progress_update* will > remain, the checks done in that function already have been checked in > this function as well, so you can use the pgstat_* functions directly. As I mentioned before I am planning to add progress reporting in the server logs, checkpoint_progress_update_param() is required and it makes the job easier. > > monitoring.sgml > > + <structname>pg_stat_progress_checkpoint</structname> view will contain a > > + single row indicating the progress of checkpoint operation. > >... add "if a checkpoint is currently active". I feel adding extra words here to indicate "if a checkpoint is currently active" is not necessary as the view description provides that information and also it aligns with the documentation of existing progress views. > > + <structfield>total_buffer_writes</structfield> <type>bigint</type> > > + <structfield>total_file_syncs</structfield> <type>bigint</type> > > The other progress tables use [type]_total as column names for counter > targets (e.g. backup_total for backup_streamed, heap_blks_total for > heap_blks_scanned, etc.). I think that `buffers_total` and > `files_total` would be better column names. I agree and I will update this in the next patch. > > + The checkpoint operation is requested due to XLOG filling. > > + The checkpoint was started because >max_wal_size< of WAL was written. How about this "The checkpoint is started because max_wal_size is reached". > > + The checkpoint operation is requested due to timeout. > > + The checkpoint was started due to the expiration of a > >checkpoint_timeout< interval "The checkpoint is started because checkpoint_timeout expired". > > + The checkpoint operation is forced even if no XLOG activity has occurred > > + since the last one. > > + Some operation forced a checkpoint. "The checkpoint is started because some operation forced a checkpoint". > > + <entry><literal>checkpointing CommitTs pages</literal></entry> > > CommitTs -> Commit time stamp I will handle this in the next patch. Thanks & Regards, Nitin Jadhav > On Thu, 10 Feb 2022 at 07:53, Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > > > > > We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of justemitting the stats at the end. > > > > > > > > Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substepit is running, whether it is on target for completion, checkpoint_Reason > > > > (manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need somerefactoring here. > > > > > > I agree to provide above mentioned information as part of showing the > > > progress of current checkpoint operation. I am currently looking into > > > the code to know if any other information can be added. > > > > Here is the initial patch to show the progress of checkpoint through > > pg_stat_progress_checkpoint view. Please find the attachment. > > > > The information added to this view are pid - process ID of a > > CHECKPOINTER process, kind - kind of checkpoint indicates the reason > > for checkpoint (values can be wal, time or force), phase - indicates > > the current phase of checkpoint operation, total_buffer_writes - total > > number of buffers to be written, buffers_processed - number of buffers > > processed, buffers_written - number of buffers written, > > total_file_syncs - total number of files to be synced, files_synced - > > number of files synced. > > > > There are many operations happen as part of checkpoint. For each of > > the operation I am updating the phase field of > > pg_stat_progress_checkpoint view. The values supported for this field > > are initializing, checkpointing replication slots, checkpointing > > snapshots, checkpointing logical rewrite mappings, checkpointing CLOG > > pages, checkpointing CommitTs pages, checkpointing SUBTRANS pages, > > checkpointing MULTIXACT pages, checkpointing SLRU pages, checkpointing > > buffers, performing sync requests, performing two phase checkpoint, > > recycling old XLOG files and Finalizing. In case of checkpointing > > buffers phase, the fields total_buffer_writes, buffers_processed and > > buffers_written shows the detailed progress of writing buffers. In > > case of performing sync requests phase, the fields total_file_syncs > > and files_synced shows the detailed progress of syncing files. In > > other phases, only the phase field is getting updated and it is > > difficult to show the progress because we do not get the total number > > of files count without traversing the directory. It is not worth to > > calculate that as it affects the performance of the checkpoint. I also > > gave a thought to just mention the number of files processed, but this > > wont give a meaningful progress information (It can be treated as > > statistics). Hence just updating the phase field in those scenarios. > > > > Apart from above fields, I am planning to add few more fields to the > > view in the next patch. That is, process ID of the backend process > > which triggered a CHECKPOINT command, checkpoint start location, filed > > to indicate whether it is a checkpoint or restartpoint and elapsed > > time of the checkpoint operation. Please share your thoughts. I would > > be happy to add any other information that contributes to showing the > > progress of checkpoint. > > > > As per the discussion in this thread, there should be some mechanism > > to show the progress of checkpoint during shutdown and end-of-recovery > > cases as we cannot access pg_stat_progress_checkpoint in those cases. > > I am working on this to use log_startup_progress_interval mechanism to > > log the progress in the server logs. > > > > Kindly review the patch and share your thoughts. > > Interesting idea, and overall a nice addition to the > pg_stat_progress_* reporting infrastructure. > > Could you add your patch to the current commitfest at > https://commitfest.postgresql.org/37/? > > See below for some comments on the patch: > > > xlog.c @ checkpoint_progress_start, checkpoint_progress_update_param, checkpoint_progress_end > > + /* In bootstrap mode, we don't actually record anything. */ > > + if (IsBootstrapProcessingMode()) > > + return; > > Why do you check against the state of the system? > pgstat_progress_update_* already provides protections against updating > the progress tables if the progress infrastructure is not loaded; and > otherwise (in the happy path) the cost of updating the progress fields > will be quite a bit higher than normal. Updating stat_progress isn't > very expensive (quite cheap, really), so I don't quite get why you > guard against reporting stats when you expect no other client to be > listening. > > I think you can simplify this a lot by directly using > pgstat_progress_update_param() instead. > > > xlog.c @ checkpoint_progress_start > > + pgstat_progress_start_command(PROGRESS_COMMAND_CHECKPOINT, InvalidOid); > > + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_PHASE, > > + PROGRESS_CHECKPOINT_PHASE_INIT); > > + if (flags & CHECKPOINT_CAUSE_XLOG) > > + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND, > > + PROGRESS_CHECKPOINT_KIND_WAL); > > + else if (flags & CHECKPOINT_CAUSE_TIME) > > + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND, > > + PROGRESS_CHECKPOINT_KIND_TIME); > > + [...] > > Could you assign the kind of checkpoint to a local variable, and then > update the "phase" and "kind" parameters at the same time through > pgstat_progress_update_multi_param(2, ...)? See > BuildRelationExtStatistics in extended_stats.c for an example usage. > Note that regardless of whether checkpoint_progress_update* will > remain, the checks done in that function already have been checked in > this function as well, so you can use the pgstat_* functions directly. > > > monitoring.sgml > > + <structname>pg_stat_progress_checkpoint</structname> view will contain a > > + single row indicating the progress of checkpoint operation. > > ... add "if a checkpoint is currently active". > > > + <structfield>total_buffer_writes</structfield> <type>bigint</type> > > + <structfield>total_file_syncs</structfield> <type>bigint</type> > > The other progress tables use [type]_total as column names for counter > targets (e.g. backup_total for backup_streamed, heap_blks_total for > heap_blks_scanned, etc.). I think that `buffers_total` and > `files_total` would be better column names. > > > + The checkpoint operation is requested due to XLOG filling. > > + The checkpoint was started because >max_wal_size< of WAL was written. > > > + The checkpoint operation is requested due to timeout. > > + The checkpoint was started due to the expiration of a > >checkpoint_timeout< interval > > > + The checkpoint operation is forced even if no XLOG activity has occurred > > + since the last one. > > + Some operation forced a checkpoint. > > > + <entry><literal>checkpointing CommitTs pages</literal></entry> > > CommitTs -> Commit time stamp > > Thanks for working on this. > > Kind regards, > > Matthias van de Meent
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> > > > Thank you for sharing the information. 'triggering backend PID' (int) > > > > - can be stored without any problem. > > > > > > There can be multiple processes triggering a checkpoint, or at least wanting it > > > to happen or happen faster. > > > > Yes. There can be multiple processes but there will be one checkpoint > > operation at a time. So the backend PID corresponds to the current > > checkpoint operation. Let me know if I am missing something. > > If there's a checkpoint timed triggered and then someone calls > pg_start_backup() which then wait for the end of the current checkpoint > (possibly after changing the flags), I think the view should reflect that in > some way. Maybe storing an array of (pid, flags) is too much, but at least a > counter with the number of processes actively waiting for the end of the > checkpoint. Okay. I feel this can be added as additional field but it will not replace backend_pid field as this represents the pid of the backend which triggered the current checkpoint. Probably a new field named 'processes_wiating' or 'events_waiting' can be added for this purpose. Thoughts? > > > > 'checkpoint or restartpoint?' > > > > > > Do you actually need to store that? Can't it be inferred from > > > pg_is_in_recovery()? > > > > AFAIK we cannot use pg_is_in_recovery() to predict whether it is a > > checkpoint or restartpoint because if the system exits from recovery > > mode during restartpoint then any query to pg_stat_progress_checkpoint > > view will return it as a checkpoint which is ideally not correct. Please > > correct me if I am wrong. > > Recovery ends with an end-of-recovery checkpoint that has to finish before the > promotion can happen, so I don't think that a restart can still be in progress > if pg_is_in_recovery() returns false. Probably writing of buffers or syncing files may complete before pg_is_in_recovery() returns false. But there are some cleanup operations happen as part of the checkpoint. During this scenario, we may get false value for pg_is_in_recovery(). Please refer following piece of code which is present in CreateRestartpoint(). if (!RecoveryInProgress()) replayTLI = XLogCtl->InsertTimeLineID; Thanks & Regards, Nitin Jadhav On Thu, Feb 17, 2022 at 10:57 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > Hi, > > On Thu, Feb 17, 2022 at 10:39:02PM +0530, Nitin Jadhav wrote: > > > > Thank you for sharing the information. 'triggering backend PID' (int) > > > > - can be stored without any problem. > > > > > > There can be multiple processes triggering a checkpoint, or at least wanting it > > > to happen or happen faster. > > > > Yes. There can be multiple processes but there will be one checkpoint > > operation at a time. So the backend PID corresponds to the current > > checkpoint operation. Let me know if I am missing something. > > If there's a checkpoint timed triggered and then someone calls > pg_start_backup() which then wait for the end of the current checkpoint > (possibly after changing the flags), I think the view should reflect that in > some way. Maybe storing an array of (pid, flags) is too much, but at least a > counter with the number of processes actively waiting for the end of the > checkpoint. > > > > > 'checkpoint or restartpoint?' > > > > > > Do you actually need to store that? Can't it be inferred from > > > pg_is_in_recovery()? > > > > AFAIK we cannot use pg_is_in_recovery() to predict whether it is a > > checkpoint or restartpoint because if the system exits from recovery > > mode during restartpoint then any query to pg_stat_progress_checkpoint > > view will return it as a checkpoint which is ideally not correct. Please > > correct me if I am wrong. > > Recovery ends with an end-of-recovery checkpoint that has to finish before the > promotion can happen, so I don't think that a restart can still be in progress > if pg_is_in_recovery() returns false.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Julien Rouhaud
Date:
Hi, On Fri, Feb 18, 2022 at 12:20:26PM +0530, Nitin Jadhav wrote: > > > > If there's a checkpoint timed triggered and then someone calls > > pg_start_backup() which then wait for the end of the current checkpoint > > (possibly after changing the flags), I think the view should reflect that in > > some way. Maybe storing an array of (pid, flags) is too much, but at least a > > counter with the number of processes actively waiting for the end of the > > checkpoint. > > Okay. I feel this can be added as additional field but it will not > replace backend_pid field as this represents the pid of the backend > which triggered the current checkpoint. I don't think that's true. Requesting a checkpoint means telling the checkpointer that it should wake up and start a checkpoint (or restore point) if it's not already doing so, so the pid will always be the checkpointer pid. The only exception is a standalone backend, but in that case you won't be able to query that view anyway. And also while looking at the patch I see there's the same problem that I mentioned in the previous thread, which is that the effective flags can be updated once the checkpoint started, and as-is the view won't reflect that. It also means that you can't simply display one of wal, time or force but a possible combination of the flags (including the one not handled in v1). > Probably a new field named 'processes_wiating' or 'events_waiting' can be > added for this purpose. Maybe num_process_waiting? > > > > > 'checkpoint or restartpoint?' > > > > > > > > Do you actually need to store that? Can't it be inferred from > > > > pg_is_in_recovery()? > > > > > > AFAIK we cannot use pg_is_in_recovery() to predict whether it is a > > > checkpoint or restartpoint because if the system exits from recovery > > > mode during restartpoint then any query to pg_stat_progress_checkpoint > > > view will return it as a checkpoint which is ideally not correct. Please > > > correct me if I am wrong. > > > > Recovery ends with an end-of-recovery checkpoint that has to finish before the > > promotion can happen, so I don't think that a restart can still be in progress > > if pg_is_in_recovery() returns false. > > Probably writing of buffers or syncing files may complete before > pg_is_in_recovery() returns false. But there are some cleanup > operations happen as part of the checkpoint. During this scenario, we > may get false value for pg_is_in_recovery(). Please refer following > piece of code which is present in CreateRestartpoint(). > > if (!RecoveryInProgress()) > replayTLI = XLogCtl->InsertTimeLineID; Then maybe we could store the timeline rather then then kind of checkpoint? You should still be able to compute the information while giving a bit more information for the same memory usage.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> > Okay. I feel this can be added as additional field but it will not > > replace backend_pid field as this represents the pid of the backend > > which triggered the current checkpoint. > > I don't think that's true. Requesting a checkpoint means telling the > checkpointer that it should wake up and start a checkpoint (or restore point) > if it's not already doing so, so the pid will always be the checkpointer pid. > The only exception is a standalone backend, but in that case you won't be able > to query that view anyway. Yes. I agree that the checkpoint will always be performed by the checkpointer process. So the pid in the pg_stat_progress_checkpoint view will always correspond to the checkpointer pid only. Checkpoints get triggered in many scenarios. One of the cases is the CHECKPOINT command issued explicitly by the backend. In this scenario I would like to know the backend pid which triggered the checkpoint. Hence I would like to add a backend_pid field. So the pg_stat_progress_checkpoint view contains pid fields as well as backend_pid fields. The backend_pid contains a valid value only during the CHECKPOINT command issued by the backend explicitly, otherwise the value will be 0. We may have to add an additional field to 'CheckpointerShmemStruct' to hold the backend pid. The backend requesting the checkpoint will update its pid to this structure. Kindly let me know if you still feel the backend_pid field is not necessary. > And also while looking at the patch I see there's the same problem that I > mentioned in the previous thread, which is that the effective flags can be > updated once the checkpoint started, and as-is the view won't reflect that. It > also means that you can't simply display one of wal, time or force but a > possible combination of the flags (including the one not handled in v1). If I understand the above comment properly, it has 2 points. First is to display the combination of flags rather than just displaying wal, time or force - The idea behind this is to just let the user know the reason for checkpointing. That is, the checkpoint is started because max_wal_size is reached or checkpoint_timeout expired or explicitly issued CHECKPOINT command. The other flags like CHECKPOINT_IMMEDIATE, CHECKPOINT_WAIT or CHECKPOINT_FLUSH_ALL indicate how the checkpoint has to be performed. Hence I have not included those in the view. If it is really required, I would like to modify the code to include other flags and display the combination. Second point is to reflect the updated flags in the view. AFAIK, there is a possibility that the flags get updated during the on-going checkpoint but the reason for checkpoint (wal, time or force) will remain same for the current checkpoint. There might be a change in how checkpoint has to be performed if CHECKPOINT_IMMEDIATE flag is set. If we go with displaying the combination of flags in the view, then probably we may have to reflect this in the view. > > Probably a new field named 'processes_wiating' or 'events_waiting' can be > > added for this purpose. > > Maybe num_process_waiting? I feel 'processes_wiating' aligns more with the naming conventions of the fields of the existing progres views. > > Probably writing of buffers or syncing files may complete before > > pg_is_in_recovery() returns false. But there are some cleanup > > operations happen as part of the checkpoint. During this scenario, we > > may get false value for pg_is_in_recovery(). Please refer following > > piece of code which is present in CreateRestartpoint(). > > > > if (!RecoveryInProgress()) > > replayTLI = XLogCtl->InsertTimeLineID; > > Then maybe we could store the timeline rather then then kind of checkpoint? > You should still be able to compute the information while giving a bit more > information for the same memory usage. Can you please describe more about how checkpoint/restartpoint can be confirmed using the timeline id. Thanks & Regards, Nitin Jadhav On Fri, Feb 18, 2022 at 1:13 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > Hi, > > On Fri, Feb 18, 2022 at 12:20:26PM +0530, Nitin Jadhav wrote: > > > > > > If there's a checkpoint timed triggered and then someone calls > > > pg_start_backup() which then wait for the end of the current checkpoint > > > (possibly after changing the flags), I think the view should reflect that in > > > some way. Maybe storing an array of (pid, flags) is too much, but at least a > > > counter with the number of processes actively waiting for the end of the > > > checkpoint. > > > > Okay. I feel this can be added as additional field but it will not > > replace backend_pid field as this represents the pid of the backend > > which triggered the current checkpoint. > > I don't think that's true. Requesting a checkpoint means telling the > checkpointer that it should wake up and start a checkpoint (or restore point) > if it's not already doing so, so the pid will always be the checkpointer pid. > The only exception is a standalone backend, but in that case you won't be able > to query that view anyway. > > And also while looking at the patch I see there's the same problem that I > mentioned in the previous thread, which is that the effective flags can be > updated once the checkpoint started, and as-is the view won't reflect that. It > also means that you can't simply display one of wal, time or force but a > possible combination of the flags (including the one not handled in v1). > > > Probably a new field named 'processes_wiating' or 'events_waiting' can be > > added for this purpose. > > Maybe num_process_waiting? > > > > > > > 'checkpoint or restartpoint?' > > > > > > > > > > Do you actually need to store that? Can't it be inferred from > > > > > pg_is_in_recovery()? > > > > > > > > AFAIK we cannot use pg_is_in_recovery() to predict whether it is a > > > > checkpoint or restartpoint because if the system exits from recovery > > > > mode during restartpoint then any query to pg_stat_progress_checkpoint > > > > view will return it as a checkpoint which is ideally not correct. Please > > > > correct me if I am wrong. > > > > > > Recovery ends with an end-of-recovery checkpoint that has to finish before the > > > promotion can happen, so I don't think that a restart can still be in progress > > > if pg_is_in_recovery() returns false. > > > > Probably writing of buffers or syncing files may complete before > > pg_is_in_recovery() returns false. But there are some cleanup > > operations happen as part of the checkpoint. During this scenario, we > > may get false value for pg_is_in_recovery(). Please refer following > > piece of code which is present in CreateRestartpoint(). > > > > if (!RecoveryInProgress()) > > replayTLI = XLogCtl->InsertTimeLineID; > > Then maybe we could store the timeline rather then then kind of checkpoint? > You should still be able to compute the information while giving a bit more > information for the same memory usage.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Julien Rouhaud
Date:
Hi, On Fri, Feb 18, 2022 at 08:07:05PM +0530, Nitin Jadhav wrote: > > The backend_pid contains a valid value only during > the CHECKPOINT command issued by the backend explicitly, otherwise the > value will be 0. We may have to add an additional field to > 'CheckpointerShmemStruct' to hold the backend pid. The backend > requesting the checkpoint will update its pid to this structure. > Kindly let me know if you still feel the backend_pid field is not > necessary. There are more scenarios where you can have a baackend requesting a checkpoint and waiting for its completion, and there may be more than one backend concerned, so I don't think that storing only one / the first backend pid is ok. > > And also while looking at the patch I see there's the same problem that I > > mentioned in the previous thread, which is that the effective flags can be > > updated once the checkpoint started, and as-is the view won't reflect that. It > > also means that you can't simply display one of wal, time or force but a > > possible combination of the flags (including the one not handled in v1). > > If I understand the above comment properly, it has 2 points. First is > to display the combination of flags rather than just displaying wal, > time or force - The idea behind this is to just let the user know the > reason for checkpointing. That is, the checkpoint is started because > max_wal_size is reached or checkpoint_timeout expired or explicitly > issued CHECKPOINT command. The other flags like CHECKPOINT_IMMEDIATE, > CHECKPOINT_WAIT or CHECKPOINT_FLUSH_ALL indicate how the checkpoint > has to be performed. Hence I have not included those in the view. If > it is really required, I would like to modify the code to include > other flags and display the combination. I think all the information should be exposed. Only knowing why the current checkpoint has been triggered without any further information seems a bit useless. Think for instance for cases like [1]. > Second point is to reflect > the updated flags in the view. AFAIK, there is a possibility that the > flags get updated during the on-going checkpoint but the reason for > checkpoint (wal, time or force) will remain same for the current > checkpoint. There might be a change in how checkpoint has to be > performed if CHECKPOINT_IMMEDIATE flag is set. If we go with > displaying the combination of flags in the view, then probably we may > have to reflect this in the view. You can only "upgrade" a checkpoint, but not "downgrade" it. So if for instance you find both CHECKPOINT_CAUSE_TIME and CHECKPOINT_FORCE (which is possible) you can easily know which one was the one that triggered the checkpoint and which one was added later. > > > Probably a new field named 'processes_wiating' or 'events_waiting' can be > > > added for this purpose. > > > > Maybe num_process_waiting? > > I feel 'processes_wiating' aligns more with the naming conventions of > the fields of the existing progres views. There's at least pg_stat_progress_vacuum.num_dead_tuples. Anyway I don't have a strong opinion on it, just make sure to correct the typo. > > > Probably writing of buffers or syncing files may complete before > > > pg_is_in_recovery() returns false. But there are some cleanup > > > operations happen as part of the checkpoint. During this scenario, we > > > may get false value for pg_is_in_recovery(). Please refer following > > > piece of code which is present in CreateRestartpoint(). > > > > > > if (!RecoveryInProgress()) > > > replayTLI = XLogCtl->InsertTimeLineID; > > > > Then maybe we could store the timeline rather then then kind of checkpoint? > > You should still be able to compute the information while giving a bit more > > information for the same memory usage. > > Can you please describe more about how checkpoint/restartpoint can be > confirmed using the timeline id. If pg_is_in_recovery() is true, then it's a restartpoint, otherwise it's a restartpoint if the checkpoint's timeline is different from the current timeline? [1] https://www.postgresql.org/message-id/1486805889.24568.96.camel%40credativ.de
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> > Thank you for sharing the information. 'triggering backend PID' (int) > > - can be stored without any problem. 'checkpoint or restartpoint?' > > (boolean) - can be stored as a integer value like > > PROGRESS_CHECKPOINT_TYPE_CHECKPOINT(0) and > > PROGRESS_CHECKPOINT_TYPE_RESTARTPOINT(1). 'elapsed time' (store as > > start time in stat_progress, timestamp fits in 64 bits) - As > > Timestamptz is of type int64 internally, so we can store the timestamp > > value in the progres parameter and then expose a function like > > 'pg_stat_get_progress_checkpoint_elapsed' which takes int64 (not > > Timestamptz) as argument and then returns string representing the > > elapsed time. > > No need to use a string there; I think exposing the checkpoint start > time is good enough. The conversion of int64 to timestamp[tz] can be > done in SQL (although I'm not sure that exposing the internal bitwise > representation of Interval should be exposed to that extent) [0]. > Users can then extract the duration interval using now() - start_time, > which also allows the user to use their own preferred formatting. The reason for showing the elapsed time rather than exposing the timestamp directly is in case of checkpoint during shutdown and end-of-recovery, I am planning to log a message in server logs using 'log_startup_progress_interval' infrastructure which displays elapsed time. So just to match both of the behaviour I am displaying elapsed time here. I feel that elapsed time gives a quicker feel of the progress. Kindly let me know if you still feel just exposing the timestamp is better than showing the elapsed time. > > 'checkpoint start location' (lsn = uint64) - I feel we > > cannot use progress parameters for this case. As assigning uint64 to > > int64 type would be an issue for larger values and can lead to hidden > > bugs. > > Not necessarily - we can (without much trouble) do a bitwise cast from > uint64 to int64, and then (in SQL) cast it back to a pg_lsn [1]. Not > very elegant, but it works quite well. > > [1] SELECT '0/0'::pg_lsn + ((CASE WHEN stat.my_int64 < 0 THEN > pow(2::numeric, 64::numeric)::numeric ELSE 0::numeric END) + > stat.my_int64::numeric) FROM (SELECT -2::bigint /* 0xFFFFFFFF/FFFFFFFE > */ AS my_bigint_lsn) AS stat(my_int64); Thanks for sharing. It works. I will include this in the next patch. On Sat, Feb 19, 2022 at 11:02 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > Hi, > > On Fri, Feb 18, 2022 at 08:07:05PM +0530, Nitin Jadhav wrote: > > > > The backend_pid contains a valid value only during > > the CHECKPOINT command issued by the backend explicitly, otherwise the > > value will be 0. We may have to add an additional field to > > 'CheckpointerShmemStruct' to hold the backend pid. The backend > > requesting the checkpoint will update its pid to this structure. > > Kindly let me know if you still feel the backend_pid field is not > > necessary. > > There are more scenarios where you can have a baackend requesting a checkpoint > and waiting for its completion, and there may be more than one backend > concerned, so I don't think that storing only one / the first backend pid is > ok. > > > > And also while looking at the patch I see there's the same problem that I > > > mentioned in the previous thread, which is that the effective flags can be > > > updated once the checkpoint started, and as-is the view won't reflect that. It > > > also means that you can't simply display one of wal, time or force but a > > > possible combination of the flags (including the one not handled in v1). > > > > If I understand the above comment properly, it has 2 points. First is > > to display the combination of flags rather than just displaying wal, > > time or force - The idea behind this is to just let the user know the > > reason for checkpointing. That is, the checkpoint is started because > > max_wal_size is reached or checkpoint_timeout expired or explicitly > > issued CHECKPOINT command. The other flags like CHECKPOINT_IMMEDIATE, > > CHECKPOINT_WAIT or CHECKPOINT_FLUSH_ALL indicate how the checkpoint > > has to be performed. Hence I have not included those in the view. If > > it is really required, I would like to modify the code to include > > other flags and display the combination. > > I think all the information should be exposed. Only knowing why the current > checkpoint has been triggered without any further information seems a bit > useless. Think for instance for cases like [1]. > > > Second point is to reflect > > the updated flags in the view. AFAIK, there is a possibility that the > > flags get updated during the on-going checkpoint but the reason for > > checkpoint (wal, time or force) will remain same for the current > > checkpoint. There might be a change in how checkpoint has to be > > performed if CHECKPOINT_IMMEDIATE flag is set. If we go with > > displaying the combination of flags in the view, then probably we may > > have to reflect this in the view. > > You can only "upgrade" a checkpoint, but not "downgrade" it. So if for > instance you find both CHECKPOINT_CAUSE_TIME and CHECKPOINT_FORCE (which is > possible) you can easily know which one was the one that triggered the > checkpoint and which one was added later. > > > > > Probably a new field named 'processes_wiating' or 'events_waiting' can be > > > > added for this purpose. > > > > > > Maybe num_process_waiting? > > > > I feel 'processes_wiating' aligns more with the naming conventions of > > the fields of the existing progres views. > > There's at least pg_stat_progress_vacuum.num_dead_tuples. Anyway I don't have > a strong opinion on it, just make sure to correct the typo. > > > > > Probably writing of buffers or syncing files may complete before > > > > pg_is_in_recovery() returns false. But there are some cleanup > > > > operations happen as part of the checkpoint. During this scenario, we > > > > may get false value for pg_is_in_recovery(). Please refer following > > > > piece of code which is present in CreateRestartpoint(). > > > > > > > > if (!RecoveryInProgress()) > > > > replayTLI = XLogCtl->InsertTimeLineID; > > > > > > Then maybe we could store the timeline rather then then kind of checkpoint? > > > You should still be able to compute the information while giving a bit more > > > information for the same memory usage. > > > > Can you please describe more about how checkpoint/restartpoint can be > > confirmed using the timeline id. > > If pg_is_in_recovery() is true, then it's a restartpoint, otherwise it's a > restartpoint if the checkpoint's timeline is different from the current > timeline? > > [1] https://www.postgresql.org/message-id/1486805889.24568.96.camel%40credativ.de
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Ashutosh Sharma
Date:
+/* Kinds of checkpoint (as advertised via PROGRESS_CHECKPOINT_KIND) */ +#define PROGRESS_CHECKPOINT_KIND_WAL 0 +#define PROGRESS_CHECKPOINT_KIND_TIME 1 +#define PROGRESS_CHECKPOINT_KIND_FORCE 2 +#define PROGRESS_CHECKPOINT_KIND_UNKNOWN 3 On what basis have you classified the above into the various types of checkpoints? AFAIK, the first two types are based on what triggered the checkpoint (whether it was the checkpoint_timeout or maz_wal_size settings) while the third type indicates the force checkpoint that can happen when the checkpoint is triggered for various reasons e.g. . during createb or dropdb etc. This is quite possible that both the PROGRESS_CHECKPOINT_KIND_TIME and PROGRESS_CHECKPOINT_KIND_FORCE flags are set for the checkpoint because multiple checkpoint requests are processed at one go, so what type of checkpoint would that be? + */ + if ((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY)) == 0) + { + pgstat_progress_start_command(PROGRESS_COMMAND_CHECKPOINT, InvalidOid); + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_PHASE, + PROGRESS_CHECKPOINT_PHASE_INIT); + if (flags & CHECKPOINT_CAUSE_XLOG) + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND, + PROGRESS_CHECKPOINT_KIND_WAL); + else if (flags & CHECKPOINT_CAUSE_TIME) + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND, + PROGRESS_CHECKPOINT_KIND_TIME); + else if (flags & CHECKPOINT_FORCE) + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND, + PROGRESS_CHECKPOINT_KIND_FORCE); + else + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND, + PROGRESS_CHECKPOINT_KIND_UNKNOWN); + } +} -- With Regards, Ashutosh Sharma. On Thu, Feb 10, 2022 at 12:23 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > > > We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of justemitting the stats at the end. > > > > > > Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substepit is running, whether it is on target for completion, checkpoint_Reason > > > (manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need some refactoringhere. > > > > I agree to provide above mentioned information as part of showing the > > progress of current checkpoint operation. I am currently looking into > > the code to know if any other information can be added. > > Here is the initial patch to show the progress of checkpoint through > pg_stat_progress_checkpoint view. Please find the attachment. > > The information added to this view are pid - process ID of a > CHECKPOINTER process, kind - kind of checkpoint indicates the reason > for checkpoint (values can be wal, time or force), phase - indicates > the current phase of checkpoint operation, total_buffer_writes - total > number of buffers to be written, buffers_processed - number of buffers > processed, buffers_written - number of buffers written, > total_file_syncs - total number of files to be synced, files_synced - > number of files synced. > > There are many operations happen as part of checkpoint. For each of > the operation I am updating the phase field of > pg_stat_progress_checkpoint view. The values supported for this field > are initializing, checkpointing replication slots, checkpointing > snapshots, checkpointing logical rewrite mappings, checkpointing CLOG > pages, checkpointing CommitTs pages, checkpointing SUBTRANS pages, > checkpointing MULTIXACT pages, checkpointing SLRU pages, checkpointing > buffers, performing sync requests, performing two phase checkpoint, > recycling old XLOG files and Finalizing. In case of checkpointing > buffers phase, the fields total_buffer_writes, buffers_processed and > buffers_written shows the detailed progress of writing buffers. In > case of performing sync requests phase, the fields total_file_syncs > and files_synced shows the detailed progress of syncing files. In > other phases, only the phase field is getting updated and it is > difficult to show the progress because we do not get the total number > of files count without traversing the directory. It is not worth to > calculate that as it affects the performance of the checkpoint. I also > gave a thought to just mention the number of files processed, but this > wont give a meaningful progress information (It can be treated as > statistics). Hence just updating the phase field in those scenarios. > > Apart from above fields, I am planning to add few more fields to the > view in the next patch. That is, process ID of the backend process > which triggered a CHECKPOINT command, checkpoint start location, filed > to indicate whether it is a checkpoint or restartpoint and elapsed > time of the checkpoint operation. Please share your thoughts. I would > be happy to add any other information that contributes to showing the > progress of checkpoint. > > As per the discussion in this thread, there should be some mechanism > to show the progress of checkpoint during shutdown and end-of-recovery > cases as we cannot access pg_stat_progress_checkpoint in those cases. > I am working on this to use log_startup_progress_interval mechanism to > log the progress in the server logs. > > Kindly review the patch and share your thoughts. > > > On Fri, Jan 28, 2022 at 12:24 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > On Fri, Jan 21, 2022 at 11:07 AM Nitin Jadhav > > <nitinjadhavpostgres@gmail.com> wrote: > > > > > > > I think the right choice to solve the *general* problem is the > > > > mentioned pg_stat_progress_checkpoints. > > > > > > > > We may want to *additionally* have the ability to log the progress > > > > specifically for the special cases when we're not able to use that > > > > view. And in those case, we can perhaps just use the existing > > > > log_startup_progress_interval parameter for this as well -- at least > > > > for the startup checkpoint. > > > > > > +1 > > > > > > > We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of justemitting the stats at the end. > > > > > > > > Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substepit is running, whether it is on target for completion, checkpoint_Reason > > > > (manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need somerefactoring here. > > > > > > I agree to provide above mentioned information as part of showing the > > > progress of current checkpoint operation. I am currently looking into > > > the code to know if any other information can be added. > > > > As suggested in the other thread by Julien, I'm changing the subject > > of this thread to reflect the discussion. > > > > Regards, > > Bharath Rupireddy.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Matthias van de Meent
Date:
On Tue, 22 Feb 2022 at 07:39, Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > > > Thank you for sharing the information. 'triggering backend PID' (int) > > > - can be stored without any problem. 'checkpoint or restartpoint?' > > > (boolean) - can be stored as a integer value like > > > PROGRESS_CHECKPOINT_TYPE_CHECKPOINT(0) and > > > PROGRESS_CHECKPOINT_TYPE_RESTARTPOINT(1). 'elapsed time' (store as > > > start time in stat_progress, timestamp fits in 64 bits) - As > > > Timestamptz is of type int64 internally, so we can store the timestamp > > > value in the progres parameter and then expose a function like > > > 'pg_stat_get_progress_checkpoint_elapsed' which takes int64 (not > > > Timestamptz) as argument and then returns string representing the > > > elapsed time. > > > > No need to use a string there; I think exposing the checkpoint start > > time is good enough. The conversion of int64 to timestamp[tz] can be > > done in SQL (although I'm not sure that exposing the internal bitwise > > representation of Interval should be exposed to that extent) [0]. > > Users can then extract the duration interval using now() - start_time, > > which also allows the user to use their own preferred formatting. > > The reason for showing the elapsed time rather than exposing the > timestamp directly is in case of checkpoint during shutdown and > end-of-recovery, I am planning to log a message in server logs using > 'log_startup_progress_interval' infrastructure which displays elapsed > time. So just to match both of the behaviour I am displaying elapsed > time here. I feel that elapsed time gives a quicker feel of the > progress. Kindly let me know if you still feel just exposing the > timestamp is better than showing the elapsed time. At least for pg_stat_progress_checkpoint, storing only a timestamp in the pg_stat storage (instead of repeatedly updating the field as a duration) seems to provide much more precise measures of 'time elapsed' for other sessions if one step of the checkpoint is taking a long time. I understand the want to integrate the log-based reporting in the same API, but I don't think that is necessarily the right approach: pg_stat_progress_* has low-overhead infrastructure specifically to ensure that most tasks will not run much slower while reporting, never waiting for locks. Logging, however, needs to take locks (if only to prevent concurrent writes to the output file at a kernel level) and thus has a not insignificant overhead and thus is not very useful for precise and very frequent statistics updates. So, although similar in nature, I don't think it is smart to use the exact same infrastructure between pgstat_progress*-based reporting and log-based progress reporting, especially if your logging-based progress reporting is not intended to be a debugging-only configuration option similar to log_min_messages=DEBUG[1..5]. - Matthias
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> I will make use of pgstat_progress_update_multi_param() in the next > patch to replace multiple calls to checkpoint_progress_update_param(). Fixed. --- > > The other progress tables use [type]_total as column names for counter > > targets (e.g. backup_total for backup_streamed, heap_blks_total for > > heap_blks_scanned, etc.). I think that `buffers_total` and > > `files_total` would be better column names. > > I agree and I will update this in the next patch. Fixed. --- > How about this "The checkpoint is started because max_wal_size is reached". > > "The checkpoint is started because checkpoint_timeout expired". > > "The checkpoint is started because some operation forced a checkpoint". I have used the above description. Kindly let me know if any changes are required. --- > > > + <entry><literal>checkpointing CommitTs pages</literal></entry> > > > > CommitTs -> Commit time stamp > > I will handle this in the next patch. Fixed. --- > There are more scenarios where you can have a baackend requesting a checkpoint > and waiting for its completion, and there may be more than one backend > concerned, so I don't think that storing only one / the first backend pid is > ok. Thanks for this information. I am not considering backend_pid. --- > I think all the information should be exposed. Only knowing why the current > checkpoint has been triggered without any further information seems a bit > useless. Think for instance for cases like [1]. I have supported all possible checkpoint kinds. Added pg_stat_get_progress_checkpoint_kind() to convert the flags (int) to a string representing a combination of flags and also checking for the flag update in ImmediateCheckpointRequested() which checks whether CHECKPOINT_IMMEDIATE flag is set or not. I did not find any other cases where the flags get changed (which changes the current checkpoint behaviour) during the checkpoint. Kindly let me know if I am missing something. --- > > I feel 'processes_wiating' aligns more with the naming conventions of > > the fields of the existing progres views. > > There's at least pg_stat_progress_vacuum.num_dead_tuples. Anyway I don't have > a strong opinion on it, just make sure to correct the typo. More analysis is required to support this. I am planning to take care in the next patch. --- > If pg_is_in_recovery() is true, then it's a restartpoint, otherwise it's a > restartpoint if the checkpoint's timeline is different from the current > timeline? Fixed. Sharing the v2 patch. Kindly have a look and share your comments. Thanks & Regards, Nitin Jadhav On Tue, Feb 22, 2022 at 12:08 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > > > Thank you for sharing the information. 'triggering backend PID' (int) > > > - can be stored without any problem. 'checkpoint or restartpoint?' > > > (boolean) - can be stored as a integer value like > > > PROGRESS_CHECKPOINT_TYPE_CHECKPOINT(0) and > > > PROGRESS_CHECKPOINT_TYPE_RESTARTPOINT(1). 'elapsed time' (store as > > > start time in stat_progress, timestamp fits in 64 bits) - As > > > Timestamptz is of type int64 internally, so we can store the timestamp > > > value in the progres parameter and then expose a function like > > > 'pg_stat_get_progress_checkpoint_elapsed' which takes int64 (not > > > Timestamptz) as argument and then returns string representing the > > > elapsed time. > > > > No need to use a string there; I think exposing the checkpoint start > > time is good enough. The conversion of int64 to timestamp[tz] can be > > done in SQL (although I'm not sure that exposing the internal bitwise > > representation of Interval should be exposed to that extent) [0]. > > Users can then extract the duration interval using now() - start_time, > > which also allows the user to use their own preferred formatting. > > The reason for showing the elapsed time rather than exposing the > timestamp directly is in case of checkpoint during shutdown and > end-of-recovery, I am planning to log a message in server logs using > 'log_startup_progress_interval' infrastructure which displays elapsed > time. So just to match both of the behaviour I am displaying elapsed > time here. I feel that elapsed time gives a quicker feel of the > progress. Kindly let me know if you still feel just exposing the > timestamp is better than showing the elapsed time. > > > > 'checkpoint start location' (lsn = uint64) - I feel we > > > cannot use progress parameters for this case. As assigning uint64 to > > > int64 type would be an issue for larger values and can lead to hidden > > > bugs. > > > > Not necessarily - we can (without much trouble) do a bitwise cast from > > uint64 to int64, and then (in SQL) cast it back to a pg_lsn [1]. Not > > very elegant, but it works quite well. > > > > [1] SELECT '0/0'::pg_lsn + ((CASE WHEN stat.my_int64 < 0 THEN > > pow(2::numeric, 64::numeric)::numeric ELSE 0::numeric END) + > > stat.my_int64::numeric) FROM (SELECT -2::bigint /* 0xFFFFFFFF/FFFFFFFE > > */ AS my_bigint_lsn) AS stat(my_int64); > > Thanks for sharing. It works. I will include this in the next patch. > On Sat, Feb 19, 2022 at 11:02 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > Hi, > > > > On Fri, Feb 18, 2022 at 08:07:05PM +0530, Nitin Jadhav wrote: > > > > > > The backend_pid contains a valid value only during > > > the CHECKPOINT command issued by the backend explicitly, otherwise the > > > value will be 0. We may have to add an additional field to > > > 'CheckpointerShmemStruct' to hold the backend pid. The backend > > > requesting the checkpoint will update its pid to this structure. > > > Kindly let me know if you still feel the backend_pid field is not > > > necessary. > > > > There are more scenarios where you can have a baackend requesting a checkpoint > > and waiting for its completion, and there may be more than one backend > > concerned, so I don't think that storing only one / the first backend pid is > > ok. > > > > > > And also while looking at the patch I see there's the same problem that I > > > > mentioned in the previous thread, which is that the effective flags can be > > > > updated once the checkpoint started, and as-is the view won't reflect that. It > > > > also means that you can't simply display one of wal, time or force but a > > > > possible combination of the flags (including the one not handled in v1). > > > > > > If I understand the above comment properly, it has 2 points. First is > > > to display the combination of flags rather than just displaying wal, > > > time or force - The idea behind this is to just let the user know the > > > reason for checkpointing. That is, the checkpoint is started because > > > max_wal_size is reached or checkpoint_timeout expired or explicitly > > > issued CHECKPOINT command. The other flags like CHECKPOINT_IMMEDIATE, > > > CHECKPOINT_WAIT or CHECKPOINT_FLUSH_ALL indicate how the checkpoint > > > has to be performed. Hence I have not included those in the view. If > > > it is really required, I would like to modify the code to include > > > other flags and display the combination. > > > > I think all the information should be exposed. Only knowing why the current > > checkpoint has been triggered without any further information seems a bit > > useless. Think for instance for cases like [1]. > > > > > Second point is to reflect > > > the updated flags in the view. AFAIK, there is a possibility that the > > > flags get updated during the on-going checkpoint but the reason for > > > checkpoint (wal, time or force) will remain same for the current > > > checkpoint. There might be a change in how checkpoint has to be > > > performed if CHECKPOINT_IMMEDIATE flag is set. If we go with > > > displaying the combination of flags in the view, then probably we may > > > have to reflect this in the view. > > > > You can only "upgrade" a checkpoint, but not "downgrade" it. So if for > > instance you find both CHECKPOINT_CAUSE_TIME and CHECKPOINT_FORCE (which is > > possible) you can easily know which one was the one that triggered the > > checkpoint and which one was added later. > > > > > > > Probably a new field named 'processes_wiating' or 'events_waiting' can be > > > > > added for this purpose. > > > > > > > > Maybe num_process_waiting? > > > > > > I feel 'processes_wiating' aligns more with the naming conventions of > > > the fields of the existing progres views. > > > > There's at least pg_stat_progress_vacuum.num_dead_tuples. Anyway I don't have > > a strong opinion on it, just make sure to correct the typo. > > > > > > > Probably writing of buffers or syncing files may complete before > > > > > pg_is_in_recovery() returns false. But there are some cleanup > > > > > operations happen as part of the checkpoint. During this scenario, we > > > > > may get false value for pg_is_in_recovery(). Please refer following > > > > > piece of code which is present in CreateRestartpoint(). > > > > > > > > > > if (!RecoveryInProgress()) > > > > > replayTLI = XLogCtl->InsertTimeLineID; > > > > > > > > Then maybe we could store the timeline rather then then kind of checkpoint? > > > > You should still be able to compute the information while giving a bit more > > > > information for the same memory usage. > > > > > > Can you please describe more about how checkpoint/restartpoint can be > > > confirmed using the timeline id. > > > > If pg_is_in_recovery() is true, then it's a restartpoint, otherwise it's a > > restartpoint if the checkpoint's timeline is different from the current > > timeline? > > > > [1] https://www.postgresql.org/message-id/1486805889.24568.96.camel%40credativ.de
Attachment
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> On what basis have you classified the above into the various types of > checkpoints? AFAIK, the first two types are based on what triggered > the checkpoint (whether it was the checkpoint_timeout or maz_wal_size > settings) while the third type indicates the force checkpoint that can > happen when the checkpoint is triggered for various reasons e.g. . > during createb or dropdb etc. This is quite possible that both the > PROGRESS_CHECKPOINT_KIND_TIME and PROGRESS_CHECKPOINT_KIND_FORCE flags > are set for the checkpoint because multiple checkpoint requests are > processed at one go, so what type of checkpoint would that be? My initial understanding was wrong. In the v2 patch I have supported all values for checkpoint kinds and displaying a string in the pg_stat_progress_checkpoint view which describes all the bits set in the checkpoint flags. On Tue, Feb 22, 2022 at 8:10 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > +/* Kinds of checkpoint (as advertised via PROGRESS_CHECKPOINT_KIND) */ > +#define PROGRESS_CHECKPOINT_KIND_WAL 0 > +#define PROGRESS_CHECKPOINT_KIND_TIME 1 > +#define PROGRESS_CHECKPOINT_KIND_FORCE 2 > +#define PROGRESS_CHECKPOINT_KIND_UNKNOWN 3 > > On what basis have you classified the above into the various types of > checkpoints? AFAIK, the first two types are based on what triggered > the checkpoint (whether it was the checkpoint_timeout or maz_wal_size > settings) while the third type indicates the force checkpoint that can > happen when the checkpoint is triggered for various reasons e.g. . > during createb or dropdb etc. This is quite possible that both the > PROGRESS_CHECKPOINT_KIND_TIME and PROGRESS_CHECKPOINT_KIND_FORCE flags > are set for the checkpoint because multiple checkpoint requests are > processed at one go, so what type of checkpoint would that be? > > + */ > + if ((flags & (CHECKPOINT_IS_SHUTDOWN | > CHECKPOINT_END_OF_RECOVERY)) == 0) > + { > + > pgstat_progress_start_command(PROGRESS_COMMAND_CHECKPOINT, > InvalidOid); > + checkpoint_progress_update_param(flags, > PROGRESS_CHECKPOINT_PHASE, > + > PROGRESS_CHECKPOINT_PHASE_INIT); > + if (flags & CHECKPOINT_CAUSE_XLOG) > + checkpoint_progress_update_param(flags, > PROGRESS_CHECKPOINT_KIND, > + > PROGRESS_CHECKPOINT_KIND_WAL); > + else if (flags & CHECKPOINT_CAUSE_TIME) > + checkpoint_progress_update_param(flags, > PROGRESS_CHECKPOINT_KIND, > + > PROGRESS_CHECKPOINT_KIND_TIME); > + else if (flags & CHECKPOINT_FORCE) > + checkpoint_progress_update_param(flags, > PROGRESS_CHECKPOINT_KIND, > + > PROGRESS_CHECKPOINT_KIND_FORCE); > + else > + checkpoint_progress_update_param(flags, > PROGRESS_CHECKPOINT_KIND, > + > PROGRESS_CHECKPOINT_KIND_UNKNOWN); > + } > +} > > -- > With Regards, > Ashutosh Sharma. > > On Thu, Feb 10, 2022 at 12:23 PM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > > > > > We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of justemitting the stats at the end. > > > > > > > > Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substepit is running, whether it is on target for completion, checkpoint_Reason > > > > (manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need somerefactoring here. > > > > > > I agree to provide above mentioned information as part of showing the > > > progress of current checkpoint operation. I am currently looking into > > > the code to know if any other information can be added. > > > > Here is the initial patch to show the progress of checkpoint through > > pg_stat_progress_checkpoint view. Please find the attachment. > > > > The information added to this view are pid - process ID of a > > CHECKPOINTER process, kind - kind of checkpoint indicates the reason > > for checkpoint (values can be wal, time or force), phase - indicates > > the current phase of checkpoint operation, total_buffer_writes - total > > number of buffers to be written, buffers_processed - number of buffers > > processed, buffers_written - number of buffers written, > > total_file_syncs - total number of files to be synced, files_synced - > > number of files synced. > > > > There are many operations happen as part of checkpoint. For each of > > the operation I am updating the phase field of > > pg_stat_progress_checkpoint view. The values supported for this field > > are initializing, checkpointing replication slots, checkpointing > > snapshots, checkpointing logical rewrite mappings, checkpointing CLOG > > pages, checkpointing CommitTs pages, checkpointing SUBTRANS pages, > > checkpointing MULTIXACT pages, checkpointing SLRU pages, checkpointing > > buffers, performing sync requests, performing two phase checkpoint, > > recycling old XLOG files and Finalizing. In case of checkpointing > > buffers phase, the fields total_buffer_writes, buffers_processed and > > buffers_written shows the detailed progress of writing buffers. In > > case of performing sync requests phase, the fields total_file_syncs > > and files_synced shows the detailed progress of syncing files. In > > other phases, only the phase field is getting updated and it is > > difficult to show the progress because we do not get the total number > > of files count without traversing the directory. It is not worth to > > calculate that as it affects the performance of the checkpoint. I also > > gave a thought to just mention the number of files processed, but this > > wont give a meaningful progress information (It can be treated as > > statistics). Hence just updating the phase field in those scenarios. > > > > Apart from above fields, I am planning to add few more fields to the > > view in the next patch. That is, process ID of the backend process > > which triggered a CHECKPOINT command, checkpoint start location, filed > > to indicate whether it is a checkpoint or restartpoint and elapsed > > time of the checkpoint operation. Please share your thoughts. I would > > be happy to add any other information that contributes to showing the > > progress of checkpoint. > > > > As per the discussion in this thread, there should be some mechanism > > to show the progress of checkpoint during shutdown and end-of-recovery > > cases as we cannot access pg_stat_progress_checkpoint in those cases. > > I am working on this to use log_startup_progress_interval mechanism to > > log the progress in the server logs. > > > > Kindly review the patch and share your thoughts. > > > > > > On Fri, Jan 28, 2022 at 12:24 PM Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > > On Fri, Jan 21, 2022 at 11:07 AM Nitin Jadhav > > > <nitinjadhavpostgres@gmail.com> wrote: > > > > > > > > > I think the right choice to solve the *general* problem is the > > > > > mentioned pg_stat_progress_checkpoints. > > > > > > > > > > We may want to *additionally* have the ability to log the progress > > > > > specifically for the special cases when we're not able to use that > > > > > view. And in those case, we can perhaps just use the existing > > > > > log_startup_progress_interval parameter for this as well -- at least > > > > > for the startup checkpoint. > > > > > > > > +1 > > > > > > > > > We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead ofjust emitting the stats at the end. > > > > > > > > > > Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substepit is running, whether it is on target for completion, checkpoint_Reason > > > > > (manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need somerefactoring here. > > > > > > > > I agree to provide above mentioned information as part of showing the > > > > progress of current checkpoint operation. I am currently looking into > > > > the code to know if any other information can be added. > > > > > > As suggested in the other thread by Julien, I'm changing the subject > > > of this thread to reflect the discussion. > > > > > > Regards, > > > Bharath Rupireddy.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> At least for pg_stat_progress_checkpoint, storing only a timestamp in > the pg_stat storage (instead of repeatedly updating the field as a > duration) seems to provide much more precise measures of 'time > elapsed' for other sessions if one step of the checkpoint is taking a > long time. I am storing the checkpoint start timestamp in the st_progress_param[] and this gets set only once during the checkpoint (at the start of the checkpoint). I have added function pg_stat_get_progress_checkpoint_elapsed() which calculates the elapsed time and returns a string. This function gets called whenever pg_stat_progress_checkpoint view is queried. Kindly refer v2 patch and share your thoughts. > I understand the want to integrate the log-based reporting in the same > API, but I don't think that is necessarily the right approach: > pg_stat_progress_* has low-overhead infrastructure specifically to > ensure that most tasks will not run much slower while reporting, never > waiting for locks. Logging, however, needs to take locks (if only to > prevent concurrent writes to the output file at a kernel level) and > thus has a not insignificant overhead and thus is not very useful for > precise and very frequent statistics updates. I understand that the log based reporting is very costly and very frequent updates are not advisable. I am planning to use the existing infrastructure of 'log_startup_progress_interval' which provides an option for the user to configure the interval between each progress update. Hence it avoids frequent updates to server logs. This approach is used only during shutdown and end-of-recovery cases because we cannot access pg_stat_progress_checkpoint view during those scenarios. > So, although similar in nature, I don't think it is smart to use the > exact same infrastructure between pgstat_progress*-based reporting and > log-based progress reporting, especially if your logging-based > progress reporting is not intended to be a debugging-only > configuration option similar to log_min_messages=DEBUG[1..5]. Yes. I agree that we cannot use the same infrastructure for both. Progress views and servers logs have different APIs to report the progress information. But since both of this are required for the same purpose, I am planning to use a common function which increases the code readability than calling it separately in all the scenarios. I am planning to include log based reporting in the next patch. Even after that if using the same function is not recommended, I am happy to change. Thanks & Regards, Nitin Jadhav On Wed, Feb 23, 2022 at 12:13 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > On Tue, 22 Feb 2022 at 07:39, Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > > > > > Thank you for sharing the information. 'triggering backend PID' (int) > > > > - can be stored without any problem. 'checkpoint or restartpoint?' > > > > (boolean) - can be stored as a integer value like > > > > PROGRESS_CHECKPOINT_TYPE_CHECKPOINT(0) and > > > > PROGRESS_CHECKPOINT_TYPE_RESTARTPOINT(1). 'elapsed time' (store as > > > > start time in stat_progress, timestamp fits in 64 bits) - As > > > > Timestamptz is of type int64 internally, so we can store the timestamp > > > > value in the progres parameter and then expose a function like > > > > 'pg_stat_get_progress_checkpoint_elapsed' which takes int64 (not > > > > Timestamptz) as argument and then returns string representing the > > > > elapsed time. > > > > > > No need to use a string there; I think exposing the checkpoint start > > > time is good enough. The conversion of int64 to timestamp[tz] can be > > > done in SQL (although I'm not sure that exposing the internal bitwise > > > representation of Interval should be exposed to that extent) [0]. > > > Users can then extract the duration interval using now() - start_time, > > > which also allows the user to use their own preferred formatting. > > > > The reason for showing the elapsed time rather than exposing the > > timestamp directly is in case of checkpoint during shutdown and > > end-of-recovery, I am planning to log a message in server logs using > > 'log_startup_progress_interval' infrastructure which displays elapsed > > time. So just to match both of the behaviour I am displaying elapsed > > time here. I feel that elapsed time gives a quicker feel of the > > progress. Kindly let me know if you still feel just exposing the > > timestamp is better than showing the elapsed time. > > At least for pg_stat_progress_checkpoint, storing only a timestamp in > the pg_stat storage (instead of repeatedly updating the field as a > duration) seems to provide much more precise measures of 'time > elapsed' for other sessions if one step of the checkpoint is taking a > long time. > > I understand the want to integrate the log-based reporting in the same > API, but I don't think that is necessarily the right approach: > pg_stat_progress_* has low-overhead infrastructure specifically to > ensure that most tasks will not run much slower while reporting, never > waiting for locks. Logging, however, needs to take locks (if only to > prevent concurrent writes to the output file at a kernel level) and > thus has a not insignificant overhead and thus is not very useful for > precise and very frequent statistics updates. > > So, although similar in nature, I don't think it is smart to use the > exact same infrastructure between pgstat_progress*-based reporting and > log-based progress reporting, especially if your logging-based > progress reporting is not intended to be a debugging-only > configuration option similar to log_min_messages=DEBUG[1..5]. > > - Matthias
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Ashutosh Sharma
Date:
+ if ((ckpt_flags & + (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY)) == 0) + { This code (present at multiple places) looks a little ugly to me, what we can do instead is add a macro probably named IsShutdownCheckpoint() which does the above check and use it in all the functions that have this check. See below: #define IsShutdownCheckpoint(flags) \ (flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY) != 0) And then you may use this macro like: if (IsBootstrapProcessingMode() || IsShutdownCheckpoint(flags)) return; This change can be done in all these functions: +void +checkpoint_progress_start(int flags) -- + */ +void +checkpoint_progress_update_param(int index, int64 val) -- + * Stop reporting progress of the checkpoint. + */ +void +checkpoint_progress_end(void) == + pgstat_progress_start_command(PROGRESS_COMMAND_CHECKPOINT, InvalidOid); + + val[0] = XLogCtl->InsertTimeLineID; + val[1] = flags; + val[2] = PROGRESS_CHECKPOINT_PHASE_INIT; + val[3] = CheckpointStats.ckpt_start_t; + + pgstat_progress_update_multi_param(4, index, val); + } Any specific reason for recording the timelineID in checkpoint stats table? Will this ever change in our case? -- With Regards, Ashutosh Sharma. On Wed, Feb 23, 2022 at 6:59 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > > I will make use of pgstat_progress_update_multi_param() in the next > > patch to replace multiple calls to checkpoint_progress_update_param(). > > Fixed. > --- > > > > The other progress tables use [type]_total as column names for counter > > > targets (e.g. backup_total for backup_streamed, heap_blks_total for > > > heap_blks_scanned, etc.). I think that `buffers_total` and > > > `files_total` would be better column names. > > > > I agree and I will update this in the next patch. > > Fixed. > --- > > > How about this "The checkpoint is started because max_wal_size is reached". > > > > "The checkpoint is started because checkpoint_timeout expired". > > > > "The checkpoint is started because some operation forced a checkpoint". > > I have used the above description. Kindly let me know if any changes > are required. > --- > > > > > + <entry><literal>checkpointing CommitTs pages</literal></entry> > > > > > > CommitTs -> Commit time stamp > > > > I will handle this in the next patch. > > Fixed. > --- > > > There are more scenarios where you can have a baackend requesting a checkpoint > > and waiting for its completion, and there may be more than one backend > > concerned, so I don't think that storing only one / the first backend pid is > > ok. > > Thanks for this information. I am not considering backend_pid. > --- > > > I think all the information should be exposed. Only knowing why the current > > checkpoint has been triggered without any further information seems a bit > > useless. Think for instance for cases like [1]. > > I have supported all possible checkpoint kinds. Added > pg_stat_get_progress_checkpoint_kind() to convert the flags (int) to a > string representing a combination of flags and also checking for the > flag update in ImmediateCheckpointRequested() which checks whether > CHECKPOINT_IMMEDIATE flag is set or not. I did not find any other > cases where the flags get changed (which changes the current > checkpoint behaviour) during the checkpoint. Kindly let me know if I > am missing something. > --- > > > > I feel 'processes_wiating' aligns more with the naming conventions of > > > the fields of the existing progres views. > > > > There's at least pg_stat_progress_vacuum.num_dead_tuples. Anyway I don't have > > a strong opinion on it, just make sure to correct the typo. > > More analysis is required to support this. I am planning to take care > in the next patch. > --- > > > If pg_is_in_recovery() is true, then it's a restartpoint, otherwise it's a > > restartpoint if the checkpoint's timeline is different from the current > > timeline? > > Fixed. > > Sharing the v2 patch. Kindly have a look and share your comments. > > Thanks & Regards, > Nitin Jadhav > > > > > On Tue, Feb 22, 2022 at 12:08 PM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > > > > > Thank you for sharing the information. 'triggering backend PID' (int) > > > > - can be stored without any problem. 'checkpoint or restartpoint?' > > > > (boolean) - can be stored as a integer value like > > > > PROGRESS_CHECKPOINT_TYPE_CHECKPOINT(0) and > > > > PROGRESS_CHECKPOINT_TYPE_RESTARTPOINT(1). 'elapsed time' (store as > > > > start time in stat_progress, timestamp fits in 64 bits) - As > > > > Timestamptz is of type int64 internally, so we can store the timestamp > > > > value in the progres parameter and then expose a function like > > > > 'pg_stat_get_progress_checkpoint_elapsed' which takes int64 (not > > > > Timestamptz) as argument and then returns string representing the > > > > elapsed time. > > > > > > No need to use a string there; I think exposing the checkpoint start > > > time is good enough. The conversion of int64 to timestamp[tz] can be > > > done in SQL (although I'm not sure that exposing the internal bitwise > > > representation of Interval should be exposed to that extent) [0]. > > > Users can then extract the duration interval using now() - start_time, > > > which also allows the user to use their own preferred formatting. > > > > The reason for showing the elapsed time rather than exposing the > > timestamp directly is in case of checkpoint during shutdown and > > end-of-recovery, I am planning to log a message in server logs using > > 'log_startup_progress_interval' infrastructure which displays elapsed > > time. So just to match both of the behaviour I am displaying elapsed > > time here. I feel that elapsed time gives a quicker feel of the > > progress. Kindly let me know if you still feel just exposing the > > timestamp is better than showing the elapsed time. > > > > > > 'checkpoint start location' (lsn = uint64) - I feel we > > > > cannot use progress parameters for this case. As assigning uint64 to > > > > int64 type would be an issue for larger values and can lead to hidden > > > > bugs. > > > > > > Not necessarily - we can (without much trouble) do a bitwise cast from > > > uint64 to int64, and then (in SQL) cast it back to a pg_lsn [1]. Not > > > very elegant, but it works quite well. > > > > > > [1] SELECT '0/0'::pg_lsn + ((CASE WHEN stat.my_int64 < 0 THEN > > > pow(2::numeric, 64::numeric)::numeric ELSE 0::numeric END) + > > > stat.my_int64::numeric) FROM (SELECT -2::bigint /* 0xFFFFFFFF/FFFFFFFE > > > */ AS my_bigint_lsn) AS stat(my_int64); > > > > Thanks for sharing. It works. I will include this in the next patch. > > On Sat, Feb 19, 2022 at 11:02 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > Hi, > > > > > > On Fri, Feb 18, 2022 at 08:07:05PM +0530, Nitin Jadhav wrote: > > > > > > > > The backend_pid contains a valid value only during > > > > the CHECKPOINT command issued by the backend explicitly, otherwise the > > > > value will be 0. We may have to add an additional field to > > > > 'CheckpointerShmemStruct' to hold the backend pid. The backend > > > > requesting the checkpoint will update its pid to this structure. > > > > Kindly let me know if you still feel the backend_pid field is not > > > > necessary. > > > > > > There are more scenarios where you can have a baackend requesting a checkpoint > > > and waiting for its completion, and there may be more than one backend > > > concerned, so I don't think that storing only one / the first backend pid is > > > ok. > > > > > > > > And also while looking at the patch I see there's the same problem that I > > > > > mentioned in the previous thread, which is that the effective flags can be > > > > > updated once the checkpoint started, and as-is the view won't reflect that. It > > > > > also means that you can't simply display one of wal, time or force but a > > > > > possible combination of the flags (including the one not handled in v1). > > > > > > > > If I understand the above comment properly, it has 2 points. First is > > > > to display the combination of flags rather than just displaying wal, > > > > time or force - The idea behind this is to just let the user know the > > > > reason for checkpointing. That is, the checkpoint is started because > > > > max_wal_size is reached or checkpoint_timeout expired or explicitly > > > > issued CHECKPOINT command. The other flags like CHECKPOINT_IMMEDIATE, > > > > CHECKPOINT_WAIT or CHECKPOINT_FLUSH_ALL indicate how the checkpoint > > > > has to be performed. Hence I have not included those in the view. If > > > > it is really required, I would like to modify the code to include > > > > other flags and display the combination. > > > > > > I think all the information should be exposed. Only knowing why the current > > > checkpoint has been triggered without any further information seems a bit > > > useless. Think for instance for cases like [1]. > > > > > > > Second point is to reflect > > > > the updated flags in the view. AFAIK, there is a possibility that the > > > > flags get updated during the on-going checkpoint but the reason for > > > > checkpoint (wal, time or force) will remain same for the current > > > > checkpoint. There might be a change in how checkpoint has to be > > > > performed if CHECKPOINT_IMMEDIATE flag is set. If we go with > > > > displaying the combination of flags in the view, then probably we may > > > > have to reflect this in the view. > > > > > > You can only "upgrade" a checkpoint, but not "downgrade" it. So if for > > > instance you find both CHECKPOINT_CAUSE_TIME and CHECKPOINT_FORCE (which is > > > possible) you can easily know which one was the one that triggered the > > > checkpoint and which one was added later. > > > > > > > > > Probably a new field named 'processes_wiating' or 'events_waiting' can be > > > > > > added for this purpose. > > > > > > > > > > Maybe num_process_waiting? > > > > > > > > I feel 'processes_wiating' aligns more with the naming conventions of > > > > the fields of the existing progres views. > > > > > > There's at least pg_stat_progress_vacuum.num_dead_tuples. Anyway I don't have > > > a strong opinion on it, just make sure to correct the typo. > > > > > > > > > Probably writing of buffers or syncing files may complete before > > > > > > pg_is_in_recovery() returns false. But there are some cleanup > > > > > > operations happen as part of the checkpoint. During this scenario, we > > > > > > may get false value for pg_is_in_recovery(). Please refer following > > > > > > piece of code which is present in CreateRestartpoint(). > > > > > > > > > > > > if (!RecoveryInProgress()) > > > > > > replayTLI = XLogCtl->InsertTimeLineID; > > > > > > > > > > Then maybe we could store the timeline rather then then kind of checkpoint? > > > > > You should still be able to compute the information while giving a bit more > > > > > information for the same memory usage. > > > > > > > > Can you please describe more about how checkpoint/restartpoint can be > > > > confirmed using the timeline id. > > > > > > If pg_is_in_recovery() is true, then it's a restartpoint, otherwise it's a > > > restartpoint if the checkpoint's timeline is different from the current > > > timeline? > > > > > > [1] https://www.postgresql.org/message-id/1486805889.24568.96.camel%40credativ.de
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Alvaro Herrera
Date:
I think the change to ImmediateCheckpointRequested() makes no sense. Before this patch, that function merely inquires whether there's an immediate checkpoint queued. After this patch, it ... changes a progress-reporting flag? I think it would make more sense to make the progress-report flag change in whatever is the place that *requests* an immediate checkpoint rather than here. I think the use of capitals in CHECKPOINT and CHECKPOINTER in the documentation is excessive. (Same for terms such as MULTIXACT and others in those docs; we typically use those in lowercase when user-facing; and do we really use term CLOG anymore? Don't we call it "commit log" nowadays?) -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ "Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Justin Pryzby
Date:
+ Whenever the checkpoint operation is running, the + <structname>pg_stat_progress_checkpoint</structname> view will contain a + single row indicating the progress of the checkpoint. The tables below Maybe it should show a single row , unless the checkpointer isn't running at all (like in single user mode). + Process ID of a CHECKPOINTER process. It's *the* checkpointer process. pgstatfuncs.c has a whitespace issue (tab-space). I suppose the functions should set provolatile. -- Justin
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Matthias van de Meent
Date:
On Wed, 23 Feb 2022 at 15:24, Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > > At least for pg_stat_progress_checkpoint, storing only a timestamp in > > the pg_stat storage (instead of repeatedly updating the field as a > > duration) seems to provide much more precise measures of 'time > > elapsed' for other sessions if one step of the checkpoint is taking a > > long time. > > I am storing the checkpoint start timestamp in the st_progress_param[] > and this gets set only once during the checkpoint (at the start of the > checkpoint). I have added function > pg_stat_get_progress_checkpoint_elapsed() which calculates the elapsed > time and returns a string. This function gets called whenever > pg_stat_progress_checkpoint view is queried. Kindly refer v2 patch and > share your thoughts. I dislike the lack of access to the actual value of the checkpoint start / checkpoint elapsed field. As a user, if I query the pg_stat_progress_* views, my terminal or application can easily interpret an `interval` value and cast it to string, but the opposite is not true: the current implementation for pg_stat_get_progress_checkpoint_elapsed loses precision. This is why we use typed numeric fields in effectively all other places instead of stringified versions of the values: oid fields, counters, etc are all rendered as bigint in the view, so that no information is lost and interpretation is trivial. > > I understand the want to integrate the log-based reporting in the same > > API, but I don't think that is necessarily the right approach: > > pg_stat_progress_* has low-overhead infrastructure specifically to > > ensure that most tasks will not run much slower while reporting, never > > waiting for locks. Logging, however, needs to take locks (if only to > > prevent concurrent writes to the output file at a kernel level) and > > thus has a not insignificant overhead and thus is not very useful for > > precise and very frequent statistics updates. > > I understand that the log based reporting is very costly and very > frequent updates are not advisable. I am planning to use the existing > infrastructure of 'log_startup_progress_interval' which provides an > option for the user to configure the interval between each progress > update. Hence it avoids frequent updates to server logs. This approach > is used only during shutdown and end-of-recovery cases because we > cannot access pg_stat_progress_checkpoint view during those scenarios. I see; but log_startup_progress_interval seems to be exclusively consumed through the ereport_startup_progress macro. Why put startup/shutdown logging on the same path as the happy flow of normal checkpoints? > > So, although similar in nature, I don't think it is smart to use the > > exact same infrastructure between pgstat_progress*-based reporting and > > log-based progress reporting, especially if your logging-based > > progress reporting is not intended to be a debugging-only > > configuration option similar to log_min_messages=DEBUG[1..5]. > > Yes. I agree that we cannot use the same infrastructure for both. > Progress views and servers logs have different APIs to report the > progress information. But since both of this are required for the same > purpose, I am planning to use a common function which increases the > code readability than calling it separately in all the scenarios. I am > planning to include log based reporting in the next patch. Even after > that if using the same function is not recommended, I am happy to > change. I don't think that checkpoint_progress_update_param(int, uint64) fits well with the construction of progress log messages, requiring special-casing / matching the offset numbers to actual fields inside that single function, which adds unnecessary overhead when compared against normal and direct calls to the related infrastructure. I think that, instead of looking to what might at some point be added, it is better to use the currently available functions instead, and move to new functions if and when the log-based reporting requires it. - Matthias
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Matthias van de Meent
Date:
On Wed, 23 Feb 2022 at 14:28, Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > Sharing the v2 patch. Kindly have a look and share your comments. Thanks for updating. > diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml With the new pg_stat_progress_checkpoint, you should also add a backreference to this progress reporting in the CHECKPOINT sql command documentation located in checkpoint.sgml, and maybe in wal.sgml and/or backup.sgml too. See e.g. cluster.sgml around line 195 for an example. > diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c > +ImmediateCheckpointRequested(int flags) > if (cps->ckpt_flags & CHECKPOINT_IMMEDIATE) > + { > + updated_flags |= CHECKPOINT_IMMEDIATE; I don't think that these changes are expected behaviour. Under in this condition; the currently running checkpoint is still not 'immediate', but it is going to hurry up for a new, actually immediate checkpoint. Those are different kinds of checkpoint handling; and I don't think you should modify the reported flags to show that we're going to do stuff faster than usual. Maybe maintiain a seperate 'upcoming checkpoint flags' field instead? > diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql > + ( SELECT '0/0'::pg_lsn + > + ((CASE > + WHEN stat.lsn_int64 < 0 THEN pow(2::numeric, 64::numeric)::numeric > + ELSE 0::numeric > + END) + > + stat.lsn_int64::numeric) > + FROM (SELECT s.param3::bigint) AS stat(lsn_int64) > + ) AS start_lsn, My LSN select statement was an example that could be run directly in psql; the so you didn't have to embed the SELECT into the view query. The following should be sufficient (and save the planner a few cycles otherwise spent in inlining): + ('0/0'::pg_lsn + + ((CASE + WHEN s.param3 < 0 THEN pow(2::numeric, 64::numeric)::numeric + ELSE 0::numeric + END) + + s.param3::numeric) + ) AS start_lsn, > diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c > +checkpoint_progress_start(int flags) > [...] > +checkpoint_progress_update_param(int index, int64 val) > [...] > +checkpoint_progress_end(void) > +{ > + /* In bootstrap mode, we don't actually record anything. */ > + if (IsBootstrapProcessingMode()) > + return; Disabling pgstat progress reporting when in bootstrap processing mode / startup/end-of-recovery makes very little sense (see upthread) and should be removed, regardless of whether seperate functions stay. > diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h > +#define PROGRESS_CHECKPOINT_PHASE_INIT 0 Generally, enum-like values in a stat_progress field are 1-indexed, to differentiate between empty/uninitialized (0) and states that have been set by the progress reporting infrastructure. Kind regards, Matthias van de Meent
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> I think the change to ImmediateCheckpointRequested() makes no sense. > Before this patch, that function merely inquires whether there's an > immediate checkpoint queued. After this patch, it ... changes a > progress-reporting flag? I think it would make more sense to make the > progress-report flag change in whatever is the place that *requests* an > immediate checkpoint rather than here. > > > diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c > > +ImmediateCheckpointRequested(int flags) > > if (cps->ckpt_flags & CHECKPOINT_IMMEDIATE) > > + { > > + updated_flags |= CHECKPOINT_IMMEDIATE; > > I don't think that these changes are expected behaviour. Under in this > condition; the currently running checkpoint is still not 'immediate', > but it is going to hurry up for a new, actually immediate checkpoint. > Those are different kinds of checkpoint handling; and I don't think > you should modify the reported flags to show that we're going to do > stuff faster than usual. Maybe maintiain a seperate 'upcoming > checkpoint flags' field instead? Thank you Alvaro and Matthias for your views. I understand your point of not updating the progress-report flag here as it just checks whether the CHECKPOINT_IMMEDIATE is set or not and takes an action based on that but it doesn't change the checkpoint flags. I will modify the code but I am a bit confused here. As per Alvaro, we need to make the progress-report flag change in whatever is the place that *requests* an immediate checkpoint. I feel this gives information about the upcoming checkpoint not the current one. So updating here provides wrong details in the view. The flags available during CreateCheckPoint() will remain same for the entire checkpoint operation and we should show the same information in the view till it completes. So just removing the above piece of code (modified in ImmediateCheckpointRequested()) in the patch will make it correct. My opinion about maintaining a separate field to show upcoming checkpoint flags is it makes the view complex. Please share your thoughts. Thanks & Regards, On Thu, Feb 24, 2022 at 10:45 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > On Wed, 23 Feb 2022 at 14:28, Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > > > Sharing the v2 patch. Kindly have a look and share your comments. > > Thanks for updating. > > > diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml > > With the new pg_stat_progress_checkpoint, you should also add a > backreference to this progress reporting in the CHECKPOINT sql command > documentation located in checkpoint.sgml, and maybe in wal.sgml and/or > backup.sgml too. See e.g. cluster.sgml around line 195 for an example. > > > diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c > > +ImmediateCheckpointRequested(int flags) > > if (cps->ckpt_flags & CHECKPOINT_IMMEDIATE) > > + { > > + updated_flags |= CHECKPOINT_IMMEDIATE; > > I don't think that these changes are expected behaviour. Under in this > condition; the currently running checkpoint is still not 'immediate', > but it is going to hurry up for a new, actually immediate checkpoint. > Those are different kinds of checkpoint handling; and I don't think > you should modify the reported flags to show that we're going to do > stuff faster than usual. Maybe maintiain a seperate 'upcoming > checkpoint flags' field instead? > > > diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql > > + ( SELECT '0/0'::pg_lsn + > > + ((CASE > > + WHEN stat.lsn_int64 < 0 THEN pow(2::numeric, 64::numeric)::numeric > > + ELSE 0::numeric > > + END) + > > + stat.lsn_int64::numeric) > > + FROM (SELECT s.param3::bigint) AS stat(lsn_int64) > > + ) AS start_lsn, > > My LSN select statement was an example that could be run directly in > psql; the so you didn't have to embed the SELECT into the view query. > The following should be sufficient (and save the planner a few cycles > otherwise spent in inlining): > > + ('0/0'::pg_lsn + > + ((CASE > + WHEN s.param3 < 0 THEN pow(2::numeric, > 64::numeric)::numeric > + ELSE 0::numeric > + END) + > + s.param3::numeric) > + ) AS start_lsn, > > > > diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c > > +checkpoint_progress_start(int flags) > > [...] > > +checkpoint_progress_update_param(int index, int64 val) > > [...] > > +checkpoint_progress_end(void) > > +{ > > + /* In bootstrap mode, we don't actually record anything. */ > > + if (IsBootstrapProcessingMode()) > > + return; > > Disabling pgstat progress reporting when in bootstrap processing mode > / startup/end-of-recovery makes very little sense (see upthread) and > should be removed, regardless of whether seperate functions stay. > > > diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h > > +#define PROGRESS_CHECKPOINT_PHASE_INIT 0 > > Generally, enum-like values in a stat_progress field are 1-indexed, to > differentiate between empty/uninitialized (0) and states that have > been set by the progress reporting infrastructure. > > > > Kind regards, > > Matthias van de Meent
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Julien Rouhaud
Date:
Hi, On Fri, Feb 25, 2022 at 12:23:27AM +0530, Nitin Jadhav wrote: > > I think the change to ImmediateCheckpointRequested() makes no sense. > > Before this patch, that function merely inquires whether there's an > > immediate checkpoint queued. After this patch, it ... changes a > > progress-reporting flag? I think it would make more sense to make the > > progress-report flag change in whatever is the place that *requests* an > > immediate checkpoint rather than here. > > > > > diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c > > > +ImmediateCheckpointRequested(int flags) > > > if (cps->ckpt_flags & CHECKPOINT_IMMEDIATE) > > > + { > > > + updated_flags |= CHECKPOINT_IMMEDIATE; > > > > I don't think that these changes are expected behaviour. Under in this > > condition; the currently running checkpoint is still not 'immediate', > > but it is going to hurry up for a new, actually immediate checkpoint. > > Those are different kinds of checkpoint handling; and I don't think > > you should modify the reported flags to show that we're going to do > > stuff faster than usual. Maybe maintiain a seperate 'upcoming > > checkpoint flags' field instead? > > Thank you Alvaro and Matthias for your views. I understand your point > of not updating the progress-report flag here as it just checks > whether the CHECKPOINT_IMMEDIATE is set or not and takes an action > based on that but it doesn't change the checkpoint flags. I will > modify the code but I am a bit confused here. As per Alvaro, we need > to make the progress-report flag change in whatever is the place that > *requests* an immediate checkpoint. I feel this gives information > about the upcoming checkpoint not the current one. So updating here > provides wrong details in the view. The flags available during > CreateCheckPoint() will remain same for the entire checkpoint > operation and we should show the same information in the view till it > completes. I'm not sure what Matthias meant, but as far as I know there's no fundamental difference between checkpoint with and without the CHECKPOINT_IMMEDIATE flag, and there's also no scheduling for multiple checkpoints. Yes, the flags will remain the same but checkpoint.c will test both the passed flags and the shmem flags to see whether a delay should be added or not, which is the only difference in checkpoint processing for this flag. See the call to ImmediateCheckpointRequested() which will look at the value in shmem: /* * Perform the usual duties and take a nap, unless we're behind schedule, * in which case we just try to catch up as quickly as possible. */ if (!(flags & CHECKPOINT_IMMEDIATE) && !ShutdownRequestPending && !ImmediateCheckpointRequested() && IsCheckpointOnSchedule(progress)) [...]
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> Thank you Alvaro and Matthias for your views. I understand your point > of not updating the progress-report flag here as it just checks > whether the CHECKPOINT_IMMEDIATE is set or not and takes an action > based on that but it doesn't change the checkpoint flags. I will > modify the code but I am a bit confused here. As per Alvaro, we need > to make the progress-report flag change in whatever is the place that > *requests* an immediate checkpoint. I feel this gives information > about the upcoming checkpoint not the current one. So updating here > provides wrong details in the view. The flags available during > CreateCheckPoint() will remain same for the entire checkpoint > operation and we should show the same information in the view till it > completes. So just removing the above piece of code (modified in > ImmediateCheckpointRequested()) in the patch will make it correct. My > opinion about maintaining a separate field to show upcoming checkpoint > flags is it makes the view complex. Please share your thoughts. I have modified the code accordingly. --- > I think the use of capitals in CHECKPOINT and CHECKPOINTER in the > documentation is excessive. Fixed. Here the word CHECKPOINT represents command/checkpoint operation. If we treat it as a checkpoint operation, I agree to use lowercase but if we treat it as command, then I think uppercase is recommended (Refer https://www.postgresql.org/docs/14/sql-checkpoint.html). Is it ok to always use lowercase here? --- > (Same for terms such as MULTIXACT and > others in those docs; we typically use those in lowercase when > user-facing; and do we really use term CLOG anymore? Don't we call it > "commit log" nowadays?) I have observed the CLOG term in the existing documentation. Anyways I have changed MULTIXACT to multixact, SUBTRANS to subtransaction and CLOG to commit log. --- > + Whenever the checkpoint operation is running, the > + <structname>pg_stat_progress_checkpoint</structname> view will contain a > + single row indicating the progress of the checkpoint. The tables below > > Maybe it should show a single row , unless the checkpointer isn't running at > all (like in single user mode). Nice thought. Can we add an additional checkpoint phase like 'Idle'. Idle is ON whenever the checkpointer process is running and there are no on-going checkpoint Thoughts? --- > + Process ID of a CHECKPOINTER process. > > It's *the* checkpointer process. Fixed. --- > pgstatfuncs.c has a whitespace issue (tab-space). I have verified with 'git diff --check' and also manually. I did not find any issue. Kindly mention the specific code which has an issue. --- > I suppose the functions should set provolatile. Fixed. --- > > I am storing the checkpoint start timestamp in the st_progress_param[] > > and this gets set only once during the checkpoint (at the start of the > > checkpoint). I have added function > > pg_stat_get_progress_checkpoint_elapsed() which calculates the elapsed > > time and returns a string. This function gets called whenever > > pg_stat_progress_checkpoint view is queried. Kindly refer v2 patch and > > share your thoughts. > > I dislike the lack of access to the actual value of the checkpoint > start / checkpoint elapsed field. > > As a user, if I query the pg_stat_progress_* views, my terminal or > application can easily interpret an `interval` value and cast it to > string, but the opposite is not true: the current implementation for > pg_stat_get_progress_checkpoint_elapsed loses precision. This is why > we use typed numeric fields in effectively all other places instead of > stringified versions of the values: oid fields, counters, etc are all > rendered as bigint in the view, so that no information is lost and > interpretation is trivial. Displaying start time of the checkpoint. --- > > I understand that the log based reporting is very costly and very > > frequent updates are not advisable. I am planning to use the existing > > infrastructure of 'log_startup_progress_interval' which provides an > > option for the user to configure the interval between each progress > > update. Hence it avoids frequent updates to server logs. This approach > > is used only during shutdown and end-of-recovery cases because we > > cannot access pg_stat_progress_checkpoint view during those scenarios. > > I see; but log_startup_progress_interval seems to be exclusively > consumed through the ereport_startup_progress macro. Why put > startup/shutdown logging on the same path as the happy flow of normal > checkpoints? You mean to say while updating the progress of the checkpoint, call pgstat_progress_update_param() and then call ereport_startup_progress() ? > I think that, instead of looking to what might at some point be added, > it is better to use the currently available functions instead, and > move to new functions if and when the log-based reporting requires it. Make sense. Removing checkpoint_progress_update_param() and checkpoint_progress_end(). I would like to concentrate on pg_stat_progress_checkpoint view as of now and I will consider log based reporting later. > > diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml > > With the new pg_stat_progress_checkpoint, you should also add a > backreference to this progress reporting in the CHECKPOINT sql command > documentation located in checkpoint.sgml, and maybe in wal.sgml and/or > backup.sgml too. See e.g. cluster.sgml around line 195 for an example. I have updated in checkpoint.sqml and wal.sqml. > > diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql > > + ( SELECT '0/0'::pg_lsn + > > + ((CASE > > + WHEN stat.lsn_int64 < 0 THEN pow(2::numeric, 64::numeric)::numeric > > + ELSE 0::numeric > > + END) + > > + stat.lsn_int64::numeric) > > + FROM (SELECT s.param3::bigint) AS stat(lsn_int64) > > + ) AS start_lsn, > > My LSN select statement was an example that could be run directly in > psql; the so you didn't have to embed the SELECT into the view query. > The following should be sufficient (and save the planner a few cycles > otherwise spent in inlining): > > + ('0/0'::pg_lsn + > + ((CASE > + WHEN s.param3 < 0 THEN pow(2::numeric, > 64::numeric)::numeric > + ELSE 0::numeric > + END) + > + s.param3::numeric) > + ) AS start_lsn, Thanks for the suggestion. Fixed. > > diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c > > +checkpoint_progress_start(int flags) > > [...] > > +checkpoint_progress_update_param(int index, int64 val) > > [...] > > +checkpoint_progress_end(void) > > +{ > > + /* In bootstrap mode, we don't actually record anything. */ > > + if (IsBootstrapProcessingMode()) > > + return; > > Disabling pgstat progress reporting when in bootstrap processing mode > / startup/end-of-recovery makes very little sense (see upthread) and > should be removed, regardless of whether seperate functions stay. Removed since log based reporting is not part of the current patch. > > diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h > > +#define PROGRESS_CHECKPOINT_PHASE_INIT 0 > > Generally, enum-like values in a stat_progress field are 1-indexed, to > differentiate between empty/uninitialized (0) and states that have > been set by the progress reporting infrastructure. Fixed. Please find the v3 patch attached and share your thoughts. Thanks & Regards, Nitin Jadhav On Fri, Feb 25, 2022 at 12:23 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > > I think the change to ImmediateCheckpointRequested() makes no sense. > > Before this patch, that function merely inquires whether there's an > > immediate checkpoint queued. After this patch, it ... changes a > > progress-reporting flag? I think it would make more sense to make the > > progress-report flag change in whatever is the place that *requests* an > > immediate checkpoint rather than here. > > > > > diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c > > > +ImmediateCheckpointRequested(int flags) > > > if (cps->ckpt_flags & CHECKPOINT_IMMEDIATE) > > > + { > > > + updated_flags |= CHECKPOINT_IMMEDIATE; > > > > I don't think that these changes are expected behaviour. Under in this > > condition; the currently running checkpoint is still not 'immediate', > > but it is going to hurry up for a new, actually immediate checkpoint. > > Those are different kinds of checkpoint handling; and I don't think > > you should modify the reported flags to show that we're going to do > > stuff faster than usual. Maybe maintiain a seperate 'upcoming > > checkpoint flags' field instead? > > Thank you Alvaro and Matthias for your views. I understand your point > of not updating the progress-report flag here as it just checks > whether the CHECKPOINT_IMMEDIATE is set or not and takes an action > based on that but it doesn't change the checkpoint flags. I will > modify the code but I am a bit confused here. As per Alvaro, we need > to make the progress-report flag change in whatever is the place that > *requests* an immediate checkpoint. I feel this gives information > about the upcoming checkpoint not the current one. So updating here > provides wrong details in the view. The flags available during > CreateCheckPoint() will remain same for the entire checkpoint > operation and we should show the same information in the view till it > completes. So just removing the above piece of code (modified in > ImmediateCheckpointRequested()) in the patch will make it correct. My > opinion about maintaining a separate field to show upcoming checkpoint > flags is it makes the view complex. Please share your thoughts. > > Thanks & Regards, > > On Thu, Feb 24, 2022 at 10:45 PM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > > > On Wed, 23 Feb 2022 at 14:28, Nitin Jadhav > > <nitinjadhavpostgres@gmail.com> wrote: > > > > > > Sharing the v2 patch. Kindly have a look and share your comments. > > > > Thanks for updating. > > > > > diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml > > > > With the new pg_stat_progress_checkpoint, you should also add a > > backreference to this progress reporting in the CHECKPOINT sql command > > documentation located in checkpoint.sgml, and maybe in wal.sgml and/or > > backup.sgml too. See e.g. cluster.sgml around line 195 for an example. > > > > > diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c > > > +ImmediateCheckpointRequested(int flags) > > > if (cps->ckpt_flags & CHECKPOINT_IMMEDIATE) > > > + { > > > + updated_flags |= CHECKPOINT_IMMEDIATE; > > > > I don't think that these changes are expected behaviour. Under in this > > condition; the currently running checkpoint is still not 'immediate', > > but it is going to hurry up for a new, actually immediate checkpoint. > > Those are different kinds of checkpoint handling; and I don't think > > you should modify the reported flags to show that we're going to do > > stuff faster than usual. Maybe maintiain a seperate 'upcoming > > checkpoint flags' field instead? > > > > > diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql > > > + ( SELECT '0/0'::pg_lsn + > > > + ((CASE > > > + WHEN stat.lsn_int64 < 0 THEN pow(2::numeric, 64::numeric)::numeric > > > + ELSE 0::numeric > > > + END) + > > > + stat.lsn_int64::numeric) > > > + FROM (SELECT s.param3::bigint) AS stat(lsn_int64) > > > + ) AS start_lsn, > > > > My LSN select statement was an example that could be run directly in > > psql; the so you didn't have to embed the SELECT into the view query. > > The following should be sufficient (and save the planner a few cycles > > otherwise spent in inlining): > > > > + ('0/0'::pg_lsn + > > + ((CASE > > + WHEN s.param3 < 0 THEN pow(2::numeric, > > 64::numeric)::numeric > > + ELSE 0::numeric > > + END) + > > + s.param3::numeric) > > + ) AS start_lsn, > > > > > > > diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c > > > +checkpoint_progress_start(int flags) > > > [...] > > > +checkpoint_progress_update_param(int index, int64 val) > > > [...] > > > +checkpoint_progress_end(void) > > > +{ > > > + /* In bootstrap mode, we don't actually record anything. */ > > > + if (IsBootstrapProcessingMode()) > > > + return; > > > > Disabling pgstat progress reporting when in bootstrap processing mode > > / startup/end-of-recovery makes very little sense (see upthread) and > > should be removed, regardless of whether seperate functions stay. > > > > > diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h > > > +#define PROGRESS_CHECKPOINT_PHASE_INIT 0 > > > > Generally, enum-like values in a stat_progress field are 1-indexed, to > > differentiate between empty/uninitialized (0) and states that have > > been set by the progress reporting infrastructure. > > > > > > > > Kind regards, > > > > Matthias van de Meent
Attachment
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> + if ((ckpt_flags & > + (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY)) == 0) > + { > > This code (present at multiple places) looks a little ugly to me, what > we can do instead is add a macro probably named IsShutdownCheckpoint() > which does the above check and use it in all the functions that have > this check. See below: > > #define IsShutdownCheckpoint(flags) \ > (flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY) != 0) > > And then you may use this macro like: > > if (IsBootstrapProcessingMode() || IsShutdownCheckpoint(flags)) > return; Good suggestion. In the v3 patch, I have removed the corresponding code as these checks are not required. Hence this suggestion is not applicable now. --- > pgstat_progress_start_command(PROGRESS_COMMAND_CHECKPOINT, > InvalidOid); > + > + val[0] = XLogCtl->InsertTimeLineID; > + val[1] = flags; > + val[2] = PROGRESS_CHECKPOINT_PHASE_INIT; > + val[3] = CheckpointStats.ckpt_start_t; > + > + pgstat_progress_update_multi_param(4, index, val); > + } > > Any specific reason for recording the timelineID in checkpoint stats > table? Will this ever change in our case? The timelineID is used to decide whether the current operation is checkpoint or restartpoint. There is a field in the view to display this information. Thanks & Regards, Nitin Jadhav On Wed, Feb 23, 2022 at 9:46 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > + if ((ckpt_flags & > + (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY)) == 0) > + { > > This code (present at multiple places) looks a little ugly to me, what > we can do instead is add a macro probably named IsShutdownCheckpoint() > which does the above check and use it in all the functions that have > this check. See below: > > #define IsShutdownCheckpoint(flags) \ > (flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY) != 0) > > And then you may use this macro like: > > if (IsBootstrapProcessingMode() || IsShutdownCheckpoint(flags)) > return; > > This change can be done in all these functions: > > +void > +checkpoint_progress_start(int flags) > > -- > > + */ > +void > +checkpoint_progress_update_param(int index, int64 val) > > -- > > + * Stop reporting progress of the checkpoint. > + */ > +void > +checkpoint_progress_end(void) > > == > > + > pgstat_progress_start_command(PROGRESS_COMMAND_CHECKPOINT, > InvalidOid); > + > + val[0] = XLogCtl->InsertTimeLineID; > + val[1] = flags; > + val[2] = PROGRESS_CHECKPOINT_PHASE_INIT; > + val[3] = CheckpointStats.ckpt_start_t; > + > + pgstat_progress_update_multi_param(4, index, val); > + } > > Any specific reason for recording the timelineID in checkpoint stats > table? Will this ever change in our case? > > -- > With Regards, > Ashutosh Sharma. > > On Wed, Feb 23, 2022 at 6:59 PM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > > > > I will make use of pgstat_progress_update_multi_param() in the next > > > patch to replace multiple calls to checkpoint_progress_update_param(). > > > > Fixed. > > --- > > > > > > The other progress tables use [type]_total as column names for counter > > > > targets (e.g. backup_total for backup_streamed, heap_blks_total for > > > > heap_blks_scanned, etc.). I think that `buffers_total` and > > > > `files_total` would be better column names. > > > > > > I agree and I will update this in the next patch. > > > > Fixed. > > --- > > > > > How about this "The checkpoint is started because max_wal_size is reached". > > > > > > "The checkpoint is started because checkpoint_timeout expired". > > > > > > "The checkpoint is started because some operation forced a checkpoint". > > > > I have used the above description. Kindly let me know if any changes > > are required. > > --- > > > > > > > + <entry><literal>checkpointing CommitTs pages</literal></entry> > > > > > > > > CommitTs -> Commit time stamp > > > > > > I will handle this in the next patch. > > > > Fixed. > > --- > > > > > There are more scenarios where you can have a baackend requesting a checkpoint > > > and waiting for its completion, and there may be more than one backend > > > concerned, so I don't think that storing only one / the first backend pid is > > > ok. > > > > Thanks for this information. I am not considering backend_pid. > > --- > > > > > I think all the information should be exposed. Only knowing why the current > > > checkpoint has been triggered without any further information seems a bit > > > useless. Think for instance for cases like [1]. > > > > I have supported all possible checkpoint kinds. Added > > pg_stat_get_progress_checkpoint_kind() to convert the flags (int) to a > > string representing a combination of flags and also checking for the > > flag update in ImmediateCheckpointRequested() which checks whether > > CHECKPOINT_IMMEDIATE flag is set or not. I did not find any other > > cases where the flags get changed (which changes the current > > checkpoint behaviour) during the checkpoint. Kindly let me know if I > > am missing something. > > --- > > > > > > I feel 'processes_wiating' aligns more with the naming conventions of > > > > the fields of the existing progres views. > > > > > > There's at least pg_stat_progress_vacuum.num_dead_tuples. Anyway I don't have > > > a strong opinion on it, just make sure to correct the typo. > > > > More analysis is required to support this. I am planning to take care > > in the next patch. > > --- > > > > > If pg_is_in_recovery() is true, then it's a restartpoint, otherwise it's a > > > restartpoint if the checkpoint's timeline is different from the current > > > timeline? > > > > Fixed. > > > > Sharing the v2 patch. Kindly have a look and share your comments. > > > > Thanks & Regards, > > Nitin Jadhav > > > > > > > > > > On Tue, Feb 22, 2022 at 12:08 PM Nitin Jadhav > > <nitinjadhavpostgres@gmail.com> wrote: > > > > > > > > Thank you for sharing the information. 'triggering backend PID' (int) > > > > > - can be stored without any problem. 'checkpoint or restartpoint?' > > > > > (boolean) - can be stored as a integer value like > > > > > PROGRESS_CHECKPOINT_TYPE_CHECKPOINT(0) and > > > > > PROGRESS_CHECKPOINT_TYPE_RESTARTPOINT(1). 'elapsed time' (store as > > > > > start time in stat_progress, timestamp fits in 64 bits) - As > > > > > Timestamptz is of type int64 internally, so we can store the timestamp > > > > > value in the progres parameter and then expose a function like > > > > > 'pg_stat_get_progress_checkpoint_elapsed' which takes int64 (not > > > > > Timestamptz) as argument and then returns string representing the > > > > > elapsed time. > > > > > > > > No need to use a string there; I think exposing the checkpoint start > > > > time is good enough. The conversion of int64 to timestamp[tz] can be > > > > done in SQL (although I'm not sure that exposing the internal bitwise > > > > representation of Interval should be exposed to that extent) [0]. > > > > Users can then extract the duration interval using now() - start_time, > > > > which also allows the user to use their own preferred formatting. > > > > > > The reason for showing the elapsed time rather than exposing the > > > timestamp directly is in case of checkpoint during shutdown and > > > end-of-recovery, I am planning to log a message in server logs using > > > 'log_startup_progress_interval' infrastructure which displays elapsed > > > time. So just to match both of the behaviour I am displaying elapsed > > > time here. I feel that elapsed time gives a quicker feel of the > > > progress. Kindly let me know if you still feel just exposing the > > > timestamp is better than showing the elapsed time. > > > > > > > > 'checkpoint start location' (lsn = uint64) - I feel we > > > > > cannot use progress parameters for this case. As assigning uint64 to > > > > > int64 type would be an issue for larger values and can lead to hidden > > > > > bugs. > > > > > > > > Not necessarily - we can (without much trouble) do a bitwise cast from > > > > uint64 to int64, and then (in SQL) cast it back to a pg_lsn [1]. Not > > > > very elegant, but it works quite well. > > > > > > > > [1] SELECT '0/0'::pg_lsn + ((CASE WHEN stat.my_int64 < 0 THEN > > > > pow(2::numeric, 64::numeric)::numeric ELSE 0::numeric END) + > > > > stat.my_int64::numeric) FROM (SELECT -2::bigint /* 0xFFFFFFFF/FFFFFFFE > > > > */ AS my_bigint_lsn) AS stat(my_int64); > > > > > > Thanks for sharing. It works. I will include this in the next patch. > > > On Sat, Feb 19, 2022 at 11:02 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > > > Hi, > > > > > > > > On Fri, Feb 18, 2022 at 08:07:05PM +0530, Nitin Jadhav wrote: > > > > > > > > > > The backend_pid contains a valid value only during > > > > > the CHECKPOINT command issued by the backend explicitly, otherwise the > > > > > value will be 0. We may have to add an additional field to > > > > > 'CheckpointerShmemStruct' to hold the backend pid. The backend > > > > > requesting the checkpoint will update its pid to this structure. > > > > > Kindly let me know if you still feel the backend_pid field is not > > > > > necessary. > > > > > > > > There are more scenarios where you can have a baackend requesting a checkpoint > > > > and waiting for its completion, and there may be more than one backend > > > > concerned, so I don't think that storing only one / the first backend pid is > > > > ok. > > > > > > > > > > And also while looking at the patch I see there's the same problem that I > > > > > > mentioned in the previous thread, which is that the effective flags can be > > > > > > updated once the checkpoint started, and as-is the view won't reflect that. It > > > > > > also means that you can't simply display one of wal, time or force but a > > > > > > possible combination of the flags (including the one not handled in v1). > > > > > > > > > > If I understand the above comment properly, it has 2 points. First is > > > > > to display the combination of flags rather than just displaying wal, > > > > > time or force - The idea behind this is to just let the user know the > > > > > reason for checkpointing. That is, the checkpoint is started because > > > > > max_wal_size is reached or checkpoint_timeout expired or explicitly > > > > > issued CHECKPOINT command. The other flags like CHECKPOINT_IMMEDIATE, > > > > > CHECKPOINT_WAIT or CHECKPOINT_FLUSH_ALL indicate how the checkpoint > > > > > has to be performed. Hence I have not included those in the view. If > > > > > it is really required, I would like to modify the code to include > > > > > other flags and display the combination. > > > > > > > > I think all the information should be exposed. Only knowing why the current > > > > checkpoint has been triggered without any further information seems a bit > > > > useless. Think for instance for cases like [1]. > > > > > > > > > Second point is to reflect > > > > > the updated flags in the view. AFAIK, there is a possibility that the > > > > > flags get updated during the on-going checkpoint but the reason for > > > > > checkpoint (wal, time or force) will remain same for the current > > > > > checkpoint. There might be a change in how checkpoint has to be > > > > > performed if CHECKPOINT_IMMEDIATE flag is set. If we go with > > > > > displaying the combination of flags in the view, then probably we may > > > > > have to reflect this in the view. > > > > > > > > You can only "upgrade" a checkpoint, but not "downgrade" it. So if for > > > > instance you find both CHECKPOINT_CAUSE_TIME and CHECKPOINT_FORCE (which is > > > > possible) you can easily know which one was the one that triggered the > > > > checkpoint and which one was added later. > > > > > > > > > > > Probably a new field named 'processes_wiating' or 'events_waiting' can be > > > > > > > added for this purpose. > > > > > > > > > > > > Maybe num_process_waiting? > > > > > > > > > > I feel 'processes_wiating' aligns more with the naming conventions of > > > > > the fields of the existing progres views. > > > > > > > > There's at least pg_stat_progress_vacuum.num_dead_tuples. Anyway I don't have > > > > a strong opinion on it, just make sure to correct the typo. > > > > > > > > > > > Probably writing of buffers or syncing files may complete before > > > > > > > pg_is_in_recovery() returns false. But there are some cleanup > > > > > > > operations happen as part of the checkpoint. During this scenario, we > > > > > > > may get false value for pg_is_in_recovery(). Please refer following > > > > > > > piece of code which is present in CreateRestartpoint(). > > > > > > > > > > > > > > if (!RecoveryInProgress()) > > > > > > > replayTLI = XLogCtl->InsertTimeLineID; > > > > > > > > > > > > Then maybe we could store the timeline rather then then kind of checkpoint? > > > > > > You should still be able to compute the information while giving a bit more > > > > > > information for the same memory usage. > > > > > > > > > > Can you please describe more about how checkpoint/restartpoint can be > > > > > confirmed using the timeline id. > > > > > > > > If pg_is_in_recovery() is true, then it's a restartpoint, otherwise it's a > > > > restartpoint if the checkpoint's timeline is different from the current > > > > timeline? > > > > > > > > [1] https://www.postgresql.org/message-id/1486805889.24568.96.camel%40credativ.de
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> > Thank you Alvaro and Matthias for your views. I understand your point > > of not updating the progress-report flag here as it just checks > > whether the CHECKPOINT_IMMEDIATE is set or not and takes an action > > based on that but it doesn't change the checkpoint flags. I will > > modify the code but I am a bit confused here. As per Alvaro, we need > > to make the progress-report flag change in whatever is the place that > > *requests* an immediate checkpoint. I feel this gives information > > about the upcoming checkpoint not the current one. So updating here > > provides wrong details in the view. The flags available during > > CreateCheckPoint() will remain same for the entire checkpoint > > operation and we should show the same information in the view till it > > completes. > > I'm not sure what Matthias meant, but as far as I know there's no fundamental > difference between checkpoint with and without the CHECKPOINT_IMMEDIATE flag, > and there's also no scheduling for multiple checkpoints. > > Yes, the flags will remain the same but checkpoint.c will test both the passed > flags and the shmem flags to see whether a delay should be added or not, which > is the only difference in checkpoint processing for this flag. See the call to > ImmediateCheckpointRequested() which will look at the value in shmem: > > /* > * Perform the usual duties and take a nap, unless we're behind schedule, > * in which case we just try to catch up as quickly as possible. > */ > if (!(flags & CHECKPOINT_IMMEDIATE) && > !ShutdownRequestPending && > !ImmediateCheckpointRequested() && > IsCheckpointOnSchedule(progress)) I understand that the checkpointer considers flags as well as the shmem flags and if CHECKPOINT_IMMEDIATE flag is set, it affects the current checkpoint operation (No further delay) but does not change the current flag value. Should we display this change in the kind field of the view or not? Please share your thoughts. Thanks & Regards, Nitin Jadhav On Fri, Feb 25, 2022 at 12:33 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > Hi, > > On Fri, Feb 25, 2022 at 12:23:27AM +0530, Nitin Jadhav wrote: > > > I think the change to ImmediateCheckpointRequested() makes no sense. > > > Before this patch, that function merely inquires whether there's an > > > immediate checkpoint queued. After this patch, it ... changes a > > > progress-reporting flag? I think it would make more sense to make the > > > progress-report flag change in whatever is the place that *requests* an > > > immediate checkpoint rather than here. > > > > > > > diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c > > > > +ImmediateCheckpointRequested(int flags) > > > > if (cps->ckpt_flags & CHECKPOINT_IMMEDIATE) > > > > + { > > > > + updated_flags |= CHECKPOINT_IMMEDIATE; > > > > > > I don't think that these changes are expected behaviour. Under in this > > > condition; the currently running checkpoint is still not 'immediate', > > > but it is going to hurry up for a new, actually immediate checkpoint. > > > Those are different kinds of checkpoint handling; and I don't think > > > you should modify the reported flags to show that we're going to do > > > stuff faster than usual. Maybe maintiain a seperate 'upcoming > > > checkpoint flags' field instead? > > > > Thank you Alvaro and Matthias for your views. I understand your point > > of not updating the progress-report flag here as it just checks > > whether the CHECKPOINT_IMMEDIATE is set or not and takes an action > > based on that but it doesn't change the checkpoint flags. I will > > modify the code but I am a bit confused here. As per Alvaro, we need > > to make the progress-report flag change in whatever is the place that > > *requests* an immediate checkpoint. I feel this gives information > > about the upcoming checkpoint not the current one. So updating here > > provides wrong details in the view. The flags available during > > CreateCheckPoint() will remain same for the entire checkpoint > > operation and we should show the same information in the view till it > > completes. > > I'm not sure what Matthias meant, but as far as I know there's no fundamental > difference between checkpoint with and without the CHECKPOINT_IMMEDIATE flag, > and there's also no scheduling for multiple checkpoints. > > Yes, the flags will remain the same but checkpoint.c will test both the passed > flags and the shmem flags to see whether a delay should be added or not, which > is the only difference in checkpoint processing for this flag. See the call to > ImmediateCheckpointRequested() which will look at the value in shmem: > > /* > * Perform the usual duties and take a nap, unless we're behind schedule, > * in which case we just try to catch up as quickly as possible. > */ > if (!(flags & CHECKPOINT_IMMEDIATE) && > !ShutdownRequestPending && > !ImmediateCheckpointRequested() && > IsCheckpointOnSchedule(progress)) > [...]
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Julien Rouhaud
Date:
On Fri, Feb 25, 2022 at 08:53:50PM +0530, Nitin Jadhav wrote: > > > > I'm not sure what Matthias meant, but as far as I know there's no fundamental > > difference between checkpoint with and without the CHECKPOINT_IMMEDIATE flag, > > and there's also no scheduling for multiple checkpoints. > > > > Yes, the flags will remain the same but checkpoint.c will test both the passed > > flags and the shmem flags to see whether a delay should be added or not, which > > is the only difference in checkpoint processing for this flag. See the call to > > ImmediateCheckpointRequested() which will look at the value in shmem: > > > > /* > > * Perform the usual duties and take a nap, unless we're behind schedule, > > * in which case we just try to catch up as quickly as possible. > > */ > > if (!(flags & CHECKPOINT_IMMEDIATE) && > > !ShutdownRequestPending && > > !ImmediateCheckpointRequested() && > > IsCheckpointOnSchedule(progress)) > > I understand that the checkpointer considers flags as well as the > shmem flags and if CHECKPOINT_IMMEDIATE flag is set, it affects the > current checkpoint operation (No further delay) but does not change > the current flag value. Should we display this change in the kind > field of the view or not? Please share your thoughts. I think the fields should be added. It's good to know that a checkpoint was trigger due to normal activity and should be spreaded, and then something upgraded it to an immediate checkpoint. If you're desperately waiting for the end of a checkpoint for some reason and ask for an immediate checkpoint, you'll certainly be happy to see that the checkpointer is aware of it. But maybe I missed something in the code, so let's wait for Matthias input about it.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Matthias van de Meent
Date:
On Fri, 25 Feb 2022 at 17:35, Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Fri, Feb 25, 2022 at 08:53:50PM +0530, Nitin Jadhav wrote: > > > > > > I'm not sure what Matthias meant, but as far as I know there's no fundamental > > > difference between checkpoint with and without the CHECKPOINT_IMMEDIATE flag, > > > and there's also no scheduling for multiple checkpoints. > > > > > > Yes, the flags will remain the same but checkpoint.c will test both the passed > > > flags and the shmem flags to see whether a delay should be added or not, which > > > is the only difference in checkpoint processing for this flag. See the call to > > > ImmediateCheckpointRequested() which will look at the value in shmem: > > > > > > /* > > > * Perform the usual duties and take a nap, unless we're behind schedule, > > > * in which case we just try to catch up as quickly as possible. > > > */ > > > if (!(flags & CHECKPOINT_IMMEDIATE) && > > > !ShutdownRequestPending && > > > !ImmediateCheckpointRequested() && > > > IsCheckpointOnSchedule(progress)) > > > > I understand that the checkpointer considers flags as well as the > > shmem flags and if CHECKPOINT_IMMEDIATE flag is set, it affects the > > current checkpoint operation (No further delay) but does not change > > the current flag value. Should we display this change in the kind > > field of the view or not? Please share your thoughts. > > I think the fields should be added. It's good to know that a checkpoint was > trigger due to normal activity and should be spreaded, and then something > upgraded it to an immediate checkpoint. If you're desperately waiting for the > end of a checkpoint for some reason and ask for an immediate checkpoint, you'll > certainly be happy to see that the checkpointer is aware of it. > > But maybe I missed something in the code, so let's wait for Matthias input > about it. The point I was trying to make was "If cps->ckpt_flags is CHECKPOINT_IMMEDIATE, we hurry up to start the new checkpoint that is actually immediate". That doesn't mean that this checkpoint was created with IMMEDIATE or running using IMMEDIATE, only that optional delays are now being skipped instead. To let the user detect _why_ the optional delays are now being skipped, I propose not to report this currently running checkpoint's "flags | CHECKPOINT_IMMEDIATE", but to add reporting of the next checkpoint's flags; which would allow the detection and display of the CHECKPOINT_IMMEDIATE we're actually hurrying for (plus some more interesting information flags. -Matthias PS. I just noticed that the checkpoint flags are also being parsed and stringified twice in LogCheckpointStart; and adding another duplicate in the current code would put that at 3 copies of effectively the same code. Do we maybe want to deduplicate that into macros, similar to LSN_FORMAT_ARGS?
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Julien Rouhaud
Date:
On Fri, Feb 25, 2022 at 06:49:42PM +0100, Matthias van de Meent wrote: > > The point I was trying to make was "If cps->ckpt_flags is > CHECKPOINT_IMMEDIATE, we hurry up to start the new checkpoint that is > actually immediate". That doesn't mean that this checkpoint was > created with IMMEDIATE or running using IMMEDIATE, only that optional > delays are now being skipped instead. Ah, I now see what you mean. > To let the user detect _why_ the optional delays are now being > skipped, I propose not to report this currently running checkpoint's > "flags | CHECKPOINT_IMMEDIATE", but to add reporting of the next > checkpoint's flags; which would allow the detection and display of the > CHECKPOINT_IMMEDIATE we're actually hurrying for (plus some more > interesting information flags. I'm still not convinced that's a sensible approach. The next checkpoint will be displayed in the view as CHECKPOINT_IMMEDIATE, so you will then know about it. I'm not sure that having that specific information in the view is going to help, especially if users have to understand "a slow checkpoint is actually fast even if it's displayed as slow if the next checkpoint is going to be fast". Saying "it's timed" (which imply slow) and "it's fast" is maybe still counter intuitive, but at least have a better chance to see there's something going on and refer to the doc if you don't get it.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Julien Rouhaud
Date:
On Sat, Feb 26, 2022 at 02:30:36AM +0800, Julien Rouhaud wrote: > On Fri, Feb 25, 2022 at 06:49:42PM +0100, Matthias van de Meent wrote: > > > > The point I was trying to make was "If cps->ckpt_flags is > > CHECKPOINT_IMMEDIATE, we hurry up to start the new checkpoint that is > > actually immediate". That doesn't mean that this checkpoint was > > created with IMMEDIATE or running using IMMEDIATE, only that optional > > delays are now being skipped instead. > > Ah, I now see what you mean. > > > To let the user detect _why_ the optional delays are now being > > skipped, I propose not to report this currently running checkpoint's > > "flags | CHECKPOINT_IMMEDIATE", but to add reporting of the next > > checkpoint's flags; which would allow the detection and display of the > > CHECKPOINT_IMMEDIATE we're actually hurrying for (plus some more > > interesting information flags. > > I'm still not convinced that's a sensible approach. The next checkpoint will > be displayed in the view as CHECKPOINT_IMMEDIATE, so you will then know about > it. I'm not sure that having that specific information in the view is > going to help, especially if users have to understand "a slow checkpoint is > actually fast even if it's displayed as slow if the next checkpoint is going to > be fast". Saying "it's timed" (which imply slow) and "it's fast" is maybe > still counter intuitive, but at least have a better chance to see there's > something going on and refer to the doc if you don't get it. Just to be clear, I do think that it's worthwhile to add some information that some backends are waiting for that next checkpoint. As discussed before, an int for the number of backends looks like enough information to me.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Bharath Rupireddy
Date:
On Fri, Feb 25, 2022 at 8:38 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: Had a quick look over the v3 patch. I'm not sure if it's the best way to have pg_stat_get_progress_checkpoint_type, pg_stat_get_progress_checkpoint_kind and pg_stat_get_progress_checkpoint_start_time just for printing info in readable format in pg_stat_progress_checkpoint. I don't think these functions will ever be useful for the users. 1) Can't we use pg_is_in_recovery to determine if it's a restartpoint or checkpoint instead of having a new function pg_stat_get_progress_checkpoint_type? 2) Can't we just have these checks inside CASE-WHEN-THEN-ELSE blocks directly instead of new function pg_stat_get_progress_checkpoint_kind? + snprintf(ckpt_kind, MAXPGPATH, "%s%s%s%s%s%s%s%s%s", + (flags == 0) ? "unknown" : "", + (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " : "", + (flags & CHECKPOINT_END_OF_RECOVERY) ? "end-of-recovery " : "", + (flags & CHECKPOINT_IMMEDIATE) ? "immediate " : "", + (flags & CHECKPOINT_FORCE) ? "force " : "", + (flags & CHECKPOINT_WAIT) ? "wait " : "", + (flags & CHECKPOINT_CAUSE_XLOG) ? "wal " : "", + (flags & CHECKPOINT_CAUSE_TIME) ? "time " : "", + (flags & CHECKPOINT_FLUSH_ALL) ? "flush-all" : ""); 3) Why do we need this extra calculation for start_lsn? Do you ever see a negative LSN or something here? + ('0/0'::pg_lsn + ( + CASE + WHEN (s.param3 < 0) THEN pow((2)::numeric, (64)::numeric) + ELSE (0)::numeric + END + (s.param3)::numeric)) AS start_lsn, 4) Can't you use timestamptz_in(to_char(s.param4)) instead of pg_stat_get_progress_checkpoint_start_time? I don't quite understand the reasoning for having this function and it's named as *checkpoint* when it doesn't do anything specific to the checkpoint at all? Having 3 unnecessary functions that aren't useful to the users at all in proc.dat will simply eatup the function oids IMO. Hence, I suggest let's try to do without extra functions. Regards, Bharath Rupireddy.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Bharath Rupireddy
Date:
On Sun, Feb 27, 2022 at 8:44 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Fri, Feb 25, 2022 at 8:38 PM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > Had a quick look over the v3 patch. I'm not sure if it's the best way > to have pg_stat_get_progress_checkpoint_type, > pg_stat_get_progress_checkpoint_kind and > pg_stat_get_progress_checkpoint_start_time just for printing info in > readable format in pg_stat_progress_checkpoint. I don't think these > functions will ever be useful for the users. > > 1) Can't we use pg_is_in_recovery to determine if it's a restartpoint > or checkpoint instead of having a new function > pg_stat_get_progress_checkpoint_type? > > 2) Can't we just have these checks inside CASE-WHEN-THEN-ELSE blocks > directly instead of new function pg_stat_get_progress_checkpoint_kind? > + snprintf(ckpt_kind, MAXPGPATH, "%s%s%s%s%s%s%s%s%s", > + (flags == 0) ? "unknown" : "", > + (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " : "", > + (flags & CHECKPOINT_END_OF_RECOVERY) ? "end-of-recovery " : "", > + (flags & CHECKPOINT_IMMEDIATE) ? "immediate " : "", > + (flags & CHECKPOINT_FORCE) ? "force " : "", > + (flags & CHECKPOINT_WAIT) ? "wait " : "", > + (flags & CHECKPOINT_CAUSE_XLOG) ? "wal " : "", > + (flags & CHECKPOINT_CAUSE_TIME) ? "time " : "", > + (flags & CHECKPOINT_FLUSH_ALL) ? "flush-all" : ""); > > 3) Why do we need this extra calculation for start_lsn? Do you ever > see a negative LSN or something here? > + ('0/0'::pg_lsn + ( > + CASE > + WHEN (s.param3 < 0) THEN pow((2)::numeric, (64)::numeric) > + ELSE (0)::numeric > + END + (s.param3)::numeric)) AS start_lsn, > > 4) Can't you use timestamptz_in(to_char(s.param4)) instead of > pg_stat_get_progress_checkpoint_start_time? I don't quite understand > the reasoning for having this function and it's named as *checkpoint* > when it doesn't do anything specific to the checkpoint at all? > > Having 3 unnecessary functions that aren't useful to the users at all > in proc.dat will simply eatup the function oids IMO. Hence, I suggest > let's try to do without extra functions. Another thought for my review comment: > 1) Can't we use pg_is_in_recovery to determine if it's a restartpoint > or checkpoint instead of having a new function > pg_stat_get_progress_checkpoint_type? I don't think using pg_is_in_recovery work here as it is taken after the checkpoint has started. So, I think the right way here is to send 1 in CreateCheckPoint and 2 in CreateRestartPoint and use CASE-WHEN-ELSE-END to show "1": "checkpoint" "2":"restartpoint". Continuing my review: 5) Do we need a special phase for this checkpoint operation? I'm not sure in which cases it will take a long time, but it looks like there's a wait loop here. vxids = GetVirtualXIDsDelayingChkpt(&nvxids); if (nvxids > 0) { do { pg_usleep(10000L); /* wait for 10 msec */ } while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids)); } Also, how about special phases for SyncPostCheckpoint(), SyncPreCheckpoint(), InvalidateObsoleteReplicationSlots(), PreallocXlogFiles() (it currently pre-allocates only 1 WAL file, but it might be increase in future (?)), TruncateSUBTRANS()? 6) SLRU (Simple LRU) isn't a phase here, you can just say PROGRESS_CHECKPOINT_PHASE_PREDICATE_LOCK_PAGES. + + pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE, + PROGRESS_CHECKPOINT_PHASE_SLRU_PAGES); CheckPointPredicate(); And :s/checkpointing SLRU pages/checkpointing predicate lock pages + WHEN 9 THEN 'checkpointing SLRU pages' 7) :s/PROGRESS_CHECKPOINT_PHASE_FILE_SYNC/PROGRESS_CHECKPOINT_PHASE_PROCESS_FILE_SYNC_REQUESTS And :s/WHEN 11 THEN 'performing sync requests'/WHEN 11 THEN 'processing file sync requests' 8) :s/Finalizing/finalizing + WHEN 14 THEN 'Finalizing' 9) :s/checkpointing snapshots/checkpointing logical replication snapshot files + WHEN 3 THEN 'checkpointing snapshots' :s/checkpointing logical rewrite mappings/checkpointing logical replication rewrite mapping files + WHEN 4 THEN 'checkpointing logical rewrite mappings' 10) I'm not sure if it's discussed, how about adding the number of snapshot/mapping files so far the checkpoint has processed in file processing while loops of CheckPointSnapBuild/CheckPointLogicalRewriteHeap? Sometimes, there can be many logical snapshot or mapping files and users may be interested in knowing the so-far-processed-file-count. 11) I think it's discussed, are we going to add the pid of the checkpoint requestor? Regards, Bharath Rupireddy.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Julien Rouhaud
Date:
Hi, On Mon, Feb 28, 2022 at 10:21:23AM +0530, Bharath Rupireddy wrote: > > Another thought for my review comment: > > 1) Can't we use pg_is_in_recovery to determine if it's a restartpoint > > or checkpoint instead of having a new function > > pg_stat_get_progress_checkpoint_type? > > I don't think using pg_is_in_recovery work here as it is taken after > the checkpoint has started. So, I think the right way here is to send > 1 in CreateCheckPoint and 2 in CreateRestartPoint and use > CASE-WHEN-ELSE-END to show "1": "checkpoint" "2":"restartpoint". I suggested upthread to store the starting timeline instead. This way you can deduce whether it's a restartpoint or a checkpoint, but you can also deduce other information, like what was the starting WAL. > 11) I think it's discussed, are we going to add the pid of the > checkpoint requestor? As mentioned upthread, there can be multiple backends that request a checkpoint, so unless we want to store an array of pid we should store a number of backend that are waiting for a new checkpoint.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Bharath Rupireddy
Date:
On Mon, Feb 28, 2022 at 12:02 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > Hi, > > On Mon, Feb 28, 2022 at 10:21:23AM +0530, Bharath Rupireddy wrote: > > > > Another thought for my review comment: > > > 1) Can't we use pg_is_in_recovery to determine if it's a restartpoint > > > or checkpoint instead of having a new function > > > pg_stat_get_progress_checkpoint_type? > > > > I don't think using pg_is_in_recovery work here as it is taken after > > the checkpoint has started. So, I think the right way here is to send > > 1 in CreateCheckPoint and 2 in CreateRestartPoint and use > > CASE-WHEN-ELSE-END to show "1": "checkpoint" "2":"restartpoint". > > I suggested upthread to store the starting timeline instead. This way you can > deduce whether it's a restartpoint or a checkpoint, but you can also deduce > other information, like what was the starting WAL. I don't understand why we need the timeline here to just determine whether it's a restartpoint or checkpoint. I know that the InsertTimeLineID is 0 during recovery. IMO, emitting 1 for checkpoint and 2 for restartpoint in CreateCheckPoint and CreateRestartPoint respectively and using CASE-WHEN-ELSE-END to show it in readable format is the easiest way. Can't the checkpoint start LSN be deduced from PROGRESS_CHECKPOINT_LSN, checkPoint.redo? I'm completely against these pg_stat_get_progress_checkpoint_{type, kind, start_time} functions unless there's a strong case. IMO, we can achieve what we want without these functions as well. > > 11) I think it's discussed, are we going to add the pid of the > > checkpoint requestor? > > As mentioned upthread, there can be multiple backends that request a > checkpoint, so unless we want to store an array of pid we should store a number > of backend that are waiting for a new checkpoint. Yeah, you are right. Let's not go that path and store an array of pids. I don't see a strong use-case with the pid of the process requesting checkpoint. If required, we can add it later once the pg_stat_progress_checkpoint view gets in. Regards, Bharath Rupireddy.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Julien Rouhaud
Date:
On Mon, Feb 28, 2022 at 06:03:54PM +0530, Bharath Rupireddy wrote: > On Mon, Feb 28, 2022 at 12:02 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > I suggested upthread to store the starting timeline instead. This way you can > > deduce whether it's a restartpoint or a checkpoint, but you can also deduce > > other information, like what was the starting WAL. > > I don't understand why we need the timeline here to just determine > whether it's a restartpoint or checkpoint. I'm not saying it's necessary, I'm saying that for the same space usage we can store something a bit more useful. If no one cares about having the starting timeline available for no extra cost then sure, let's just store the kind directly. > Can't the checkpoint start LSN be deduced from > PROGRESS_CHECKPOINT_LSN, checkPoint.redo? I'm not sure I'm following, isn't checkPoint.redo the checkpoint start LSN? > > As mentioned upthread, there can be multiple backends that request a > > checkpoint, so unless we want to store an array of pid we should store a number > > of backend that are waiting for a new checkpoint. > > Yeah, you are right. Let's not go that path and store an array of > pids. I don't see a strong use-case with the pid of the process > requesting checkpoint. If required, we can add it later once the > pg_stat_progress_checkpoint view gets in. I don't think that's really necessary to give the pid list. If you requested a new checkpoint, it doesn't matter if it's only your backend that triggered it, another backend or a few other dozen, the result will be the same and you have the information that the request has been seen. We could store just a bool for that but having a number instead also gives a bit more information and may allow you to detect some broken logic on your client code if it keeps increasing.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Matthias van de Meent
Date:
On Sun, 27 Feb 2022 at 16:14, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > 3) Why do we need this extra calculation for start_lsn? > Do you ever see a negative LSN or something here? > + ('0/0'::pg_lsn + ( > + CASE > + WHEN (s.param3 < 0) THEN pow((2)::numeric, (64)::numeric) > + ELSE (0)::numeric > + END + (s.param3)::numeric)) AS start_lsn, Yes: LSN can take up all of an uint64; whereas the pgstat column is a bigint type; thus the signed int64. This cast is OK as it wraps around, but that means we have to take care to correctly display the LSN when it is > 0x7FFF_FFFF_FFFF_FFFF; which is what we do here using the special-casing for negative values. As to whether it is reasonable: Generating 16GB of wal every second (2^34 bytes /sec) is probably not impossible (cpu <> memory bandwidth has been > 20GB/sec for a while); and that leaves you 2^29 seconds of database runtime; or about 17 years. Seeing that a cluster can be `pg_upgrade`d (which doesn't reset cluster LSN) since PG 9.0 from at least version PG 8.4.0 (2009) (and through pg_migrator, from 8.3.0)), we can assume that clusters hitting LSN=2^63 will be a reasonable possibility within the next few years. As the lifespan of a PG release is about 5 years, it doesn't seem impossible that there will be actual clusters that are going to hit this naturally in the lifespan of PG15. It is also possible that someone fat-fingers pg_resetwal; and creates a cluster with LSN >= 2^63; resulting in negative values in the s.param3 field. Not likely, but we can force such situations; and as such we should handle that gracefully. > 4) Can't you use timestamptz_in(to_char(s.param4)) instead of > pg_stat_get_progress_checkpoint_start_time? I don't quite understand > the reasoning for having this function and it's named as *checkpoint* > when it doesn't do anything specific to the checkpoint at all? I hadn't thought of using the types' inout functions, but it looks like timestamp IO functions use a formatted timestring, which won't work with the epoch-based timestamp stored in the view. > Having 3 unnecessary functions that aren't useful to the users at all > in proc.dat will simply eatup the function oids IMO. Hence, I suggest > let's try to do without extra functions. I agree that (1) could be simplified, or at least fully expressed in SQL without exposing too many internals. If we're fine with exposing internals like flags and type layouts, then (2), and arguably (4), can be expressed in SQL as well. -Matthias
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> > 3) Why do we need this extra calculation for start_lsn? > > Do you ever see a negative LSN or something here? > > + ('0/0'::pg_lsn + ( > > + CASE > > + WHEN (s.param3 < 0) THEN pow((2)::numeric, (64)::numeric) > > + ELSE (0)::numeric > > + END + (s.param3)::numeric)) AS start_lsn, > > Yes: LSN can take up all of an uint64; whereas the pgstat column is a > bigint type; thus the signed int64. This cast is OK as it wraps > around, but that means we have to take care to correctly display the > LSN when it is > 0x7FFF_FFFF_FFFF_FFFF; which is what we do here using > the special-casing for negative values. Yes. The extra calculation is required here as we are storing unit64 value in the variable of type int64. When we convert uint64 to int64 then the bit pattern is preserved (so no data is lost). The high-order bit becomes the sign bit and if the sign bit is set, both the sign and magnitude of the value changes. To safely get the actual uint64 value whatever was assigned, we need the above calculations. > > 4) Can't you use timestamptz_in(to_char(s.param4)) instead of > > pg_stat_get_progress_checkpoint_start_time? I don't quite understand > > the reasoning for having this function and it's named as *checkpoint* > > when it doesn't do anything specific to the checkpoint at all? > > I hadn't thought of using the types' inout functions, but it looks > like timestamp IO functions use a formatted timestring, which won't > work with the epoch-based timestamp stored in the view. There is a variation of to_timestamp() which takes UNIX epoch (float8) as an argument and converts it to timestamptz but we cannot directly call this function with S.param4. TimestampTz GetCurrentTimestamp(void) { TimestampTz result; struct timeval tp; gettimeofday(&tp, NULL); result = (TimestampTz) tp.tv_sec - ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY); result = (result * USECS_PER_SEC) + tp.tv_usec; return result; } S.param4 contains the output of the above function (GetCurrentTimestamp()) which returns Postgres epoch but the to_timestamp() expects UNIX epoch as input. So some calculation is required here. I feel the SQL 'to_timestamp(946684800 + (S.param4::float / 1000000)) AS start_time' works fine. The value '946684800' is equal to ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY). I am not sure whether it is good practice to use this way. Kindly share your thoughts. Thanks & Regards, Nitin Jadhav On Mon, Feb 28, 2022 at 6:40 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > On Sun, 27 Feb 2022 at 16:14, Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > 3) Why do we need this extra calculation for start_lsn? > > Do you ever see a negative LSN or something here? > > + ('0/0'::pg_lsn + ( > > + CASE > > + WHEN (s.param3 < 0) THEN pow((2)::numeric, (64)::numeric) > > + ELSE (0)::numeric > > + END + (s.param3)::numeric)) AS start_lsn, > > Yes: LSN can take up all of an uint64; whereas the pgstat column is a > bigint type; thus the signed int64. This cast is OK as it wraps > around, but that means we have to take care to correctly display the > LSN when it is > 0x7FFF_FFFF_FFFF_FFFF; which is what we do here using > the special-casing for negative values. > > As to whether it is reasonable: Generating 16GB of wal every second > (2^34 bytes /sec) is probably not impossible (cpu <> memory bandwidth > has been > 20GB/sec for a while); and that leaves you 2^29 seconds of > database runtime; or about 17 years. Seeing that a cluster can be > `pg_upgrade`d (which doesn't reset cluster LSN) since PG 9.0 from at > least version PG 8.4.0 (2009) (and through pg_migrator, from 8.3.0)), > we can assume that clusters hitting LSN=2^63 will be a reasonable > possibility within the next few years. As the lifespan of a PG release > is about 5 years, it doesn't seem impossible that there will be actual > clusters that are going to hit this naturally in the lifespan of PG15. > > It is also possible that someone fat-fingers pg_resetwal; and creates > a cluster with LSN >= 2^63; resulting in negative values in the > s.param3 field. Not likely, but we can force such situations; and as > such we should handle that gracefully. > > > 4) Can't you use timestamptz_in(to_char(s.param4)) instead of > > pg_stat_get_progress_checkpoint_start_time? I don't quite understand > > the reasoning for having this function and it's named as *checkpoint* > > when it doesn't do anything specific to the checkpoint at all? > > I hadn't thought of using the types' inout functions, but it looks > like timestamp IO functions use a formatted timestring, which won't > work with the epoch-based timestamp stored in the view. > > > Having 3 unnecessary functions that aren't useful to the users at all > > in proc.dat will simply eatup the function oids IMO. Hence, I suggest > > let's try to do without extra functions. > > I agree that (1) could be simplified, or at least fully expressed in > SQL without exposing too many internals. If we're fine with exposing > internals like flags and type layouts, then (2), and arguably (4), can > be expressed in SQL as well. > > -Matthias
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
Thanks for reviewing. > > > I suggested upthread to store the starting timeline instead. This way you can > > > deduce whether it's a restartpoint or a checkpoint, but you can also deduce > > > other information, like what was the starting WAL. > > > > I don't understand why we need the timeline here to just determine > > whether it's a restartpoint or checkpoint. > > I'm not saying it's necessary, I'm saying that for the same space usage we can > store something a bit more useful. If no one cares about having the starting > timeline available for no extra cost then sure, let's just store the kind > directly. Fixed. > 2) Can't we just have these checks inside CASE-WHEN-THEN-ELSE blocks > directly instead of new function pg_stat_get_progress_checkpoint_kind? > + snprintf(ckpt_kind, MAXPGPATH, "%s%s%s%s%s%s%s%s%s", > + (flags == 0) ? "unknown" : "", > + (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " : "", > + (flags & CHECKPOINT_END_OF_RECOVERY) ? "end-of-recovery " : "", > + (flags & CHECKPOINT_IMMEDIATE) ? "immediate " : "", > + (flags & CHECKPOINT_FORCE) ? "force " : "", > + (flags & CHECKPOINT_WAIT) ? "wait " : "", > + (flags & CHECKPOINT_CAUSE_XLOG) ? "wal " : "", > + (flags & CHECKPOINT_CAUSE_TIME) ? "time " : "", > + (flags & CHECKPOINT_FLUSH_ALL) ? "flush-all" : ""); Fixed. --- > 5) Do we need a special phase for this checkpoint operation? I'm not > sure in which cases it will take a long time, but it looks like > there's a wait loop here. > vxids = GetVirtualXIDsDelayingChkpt(&nvxids); > if (nvxids > 0) > { > do > { > pg_usleep(10000L); /* wait for 10 msec */ > } while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids)); > } Yes. It is better to add a separate phase here. --- > Also, how about special phases for SyncPostCheckpoint(), > SyncPreCheckpoint(), InvalidateObsoleteReplicationSlots(), > PreallocXlogFiles() (it currently pre-allocates only 1 WAL file, but > it might be increase in future (?)), TruncateSUBTRANS()? SyncPreCheckpoint() is just incrementing a counter and PreallocXlogFiles() currently pre-allocates only 1 WAL file. I feel there is no need to add any phases for these as of now. We can add in the future if necessary. Added phases for SyncPostCheckpoint(), InvalidateObsoleteReplicationSlots() and TruncateSUBTRANS(). --- > 6) SLRU (Simple LRU) isn't a phase here, you can just say > PROGRESS_CHECKPOINT_PHASE_PREDICATE_LOCK_PAGES. > + > + pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE, > + PROGRESS_CHECKPOINT_PHASE_SLRU_PAGES); > CheckPointPredicate(); > > And :s/checkpointing SLRU pages/checkpointing predicate lock pages >+ WHEN 9 THEN 'checkpointing SLRU pages' Fixed. --- > 7) :s/PROGRESS_CHECKPOINT_PHASE_FILE_SYNC/PROGRESS_CHECKPOINT_PHASE_PROCESS_FILE_SYNC_REQUESTS I feel PROGRESS_CHECKPOINT_PHASE_FILE_SYNC is a better option here as it describes the purpose in less words. > And :s/WHEN 11 THEN 'performing sync requests'/WHEN 11 THEN > 'processing file sync requests' Fixed. --- > 8) :s/Finalizing/finalizing > + WHEN 14 THEN 'Finalizing' Fixed. --- > 9) :s/checkpointing snapshots/checkpointing logical replication snapshot files > + WHEN 3 THEN 'checkpointing snapshots' > :s/checkpointing logical rewrite mappings/checkpointing logical > replication rewrite mapping files > + WHEN 4 THEN 'checkpointing logical rewrite mappings' Fixed. --- > 10) I'm not sure if it's discussed, how about adding the number of > snapshot/mapping files so far the checkpoint has processed in file > processing while loops of > CheckPointSnapBuild/CheckPointLogicalRewriteHeap? Sometimes, there can > be many logical snapshot or mapping files and users may be interested > in knowing the so-far-processed-file-count. I had thought about this while sharing the v1 patch and mentioned my views upthread. I feel it won't give meaningful progress information (It can be treated as statistics). Hence not included. Thoughts? > > > As mentioned upthread, there can be multiple backends that request a > > > checkpoint, so unless we want to store an array of pid we should store a number > > > of backend that are waiting for a new checkpoint. > > > > Yeah, you are right. Let's not go that path and store an array of > > pids. I don't see a strong use-case with the pid of the process > > requesting checkpoint. If required, we can add it later once the > > pg_stat_progress_checkpoint view gets in. > > I don't think that's really necessary to give the pid list. > > If you requested a new checkpoint, it doesn't matter if it's only your backend > that triggered it, another backend or a few other dozen, the result will be the > same and you have the information that the request has been seen. We could > store just a bool for that but having a number instead also gives a bit more > information and may allow you to detect some broken logic on your client code > if it keeps increasing. It's a good metric to show in the view but the information is not readily available. Additional code is required to calculate the number of requests. Is it worth doing that? I feel this can be added later if required. Please find the v4 patch attached and share your thoughts. Thanks & Regards, Nitin Jadhav On Tue, Mar 1, 2022 at 2:27 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > > > 3) Why do we need this extra calculation for start_lsn? > > > Do you ever see a negative LSN or something here? > > > + ('0/0'::pg_lsn + ( > > > + CASE > > > + WHEN (s.param3 < 0) THEN pow((2)::numeric, (64)::numeric) > > > + ELSE (0)::numeric > > > + END + (s.param3)::numeric)) AS start_lsn, > > > > Yes: LSN can take up all of an uint64; whereas the pgstat column is a > > bigint type; thus the signed int64. This cast is OK as it wraps > > around, but that means we have to take care to correctly display the > > LSN when it is > 0x7FFF_FFFF_FFFF_FFFF; which is what we do here using > > the special-casing for negative values. > > Yes. The extra calculation is required here as we are storing unit64 > value in the variable of type int64. When we convert uint64 to int64 > then the bit pattern is preserved (so no data is lost). The high-order > bit becomes the sign bit and if the sign bit is set, both the sign and > magnitude of the value changes. To safely get the actual uint64 value > whatever was assigned, we need the above calculations. > > > > 4) Can't you use timestamptz_in(to_char(s.param4)) instead of > > > pg_stat_get_progress_checkpoint_start_time? I don't quite understand > > > the reasoning for having this function and it's named as *checkpoint* > > > when it doesn't do anything specific to the checkpoint at all? > > > > I hadn't thought of using the types' inout functions, but it looks > > like timestamp IO functions use a formatted timestring, which won't > > work with the epoch-based timestamp stored in the view. > > There is a variation of to_timestamp() which takes UNIX epoch (float8) > as an argument and converts it to timestamptz but we cannot directly > call this function with S.param4. > > TimestampTz > GetCurrentTimestamp(void) > { > TimestampTz result; > struct timeval tp; > > gettimeofday(&tp, NULL); > > result = (TimestampTz) tp.tv_sec - > ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY); > result = (result * USECS_PER_SEC) + tp.tv_usec; > > return result; > } > > S.param4 contains the output of the above function > (GetCurrentTimestamp()) which returns Postgres epoch but the > to_timestamp() expects UNIX epoch as input. So some calculation is > required here. I feel the SQL 'to_timestamp(946684800 + > (S.param4::float / 1000000)) AS start_time' works fine. The value > '946684800' is equal to ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * > SECS_PER_DAY). I am not sure whether it is good practice to use this > way. Kindly share your thoughts. > > Thanks & Regards, > Nitin Jadhav > > On Mon, Feb 28, 2022 at 6:40 PM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > > > On Sun, 27 Feb 2022 at 16:14, Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > 3) Why do we need this extra calculation for start_lsn? > > > Do you ever see a negative LSN or something here? > > > + ('0/0'::pg_lsn + ( > > > + CASE > > > + WHEN (s.param3 < 0) THEN pow((2)::numeric, (64)::numeric) > > > + ELSE (0)::numeric > > > + END + (s.param3)::numeric)) AS start_lsn, > > > > Yes: LSN can take up all of an uint64; whereas the pgstat column is a > > bigint type; thus the signed int64. This cast is OK as it wraps > > around, but that means we have to take care to correctly display the > > LSN when it is > 0x7FFF_FFFF_FFFF_FFFF; which is what we do here using > > the special-casing for negative values. > > > > As to whether it is reasonable: Generating 16GB of wal every second > > (2^34 bytes /sec) is probably not impossible (cpu <> memory bandwidth > > has been > 20GB/sec for a while); and that leaves you 2^29 seconds of > > database runtime; or about 17 years. Seeing that a cluster can be > > `pg_upgrade`d (which doesn't reset cluster LSN) since PG 9.0 from at > > least version PG 8.4.0 (2009) (and through pg_migrator, from 8.3.0)), > > we can assume that clusters hitting LSN=2^63 will be a reasonable > > possibility within the next few years. As the lifespan of a PG release > > is about 5 years, it doesn't seem impossible that there will be actual > > clusters that are going to hit this naturally in the lifespan of PG15. > > > > It is also possible that someone fat-fingers pg_resetwal; and creates > > a cluster with LSN >= 2^63; resulting in negative values in the > > s.param3 field. Not likely, but we can force such situations; and as > > such we should handle that gracefully. > > > > > 4) Can't you use timestamptz_in(to_char(s.param4)) instead of > > > pg_stat_get_progress_checkpoint_start_time? I don't quite understand > > > the reasoning for having this function and it's named as *checkpoint* > > > when it doesn't do anything specific to the checkpoint at all? > > > > I hadn't thought of using the types' inout functions, but it looks > > like timestamp IO functions use a formatted timestring, which won't > > work with the epoch-based timestamp stored in the view. > > > > > Having 3 unnecessary functions that aren't useful to the users at all > > > in proc.dat will simply eatup the function oids IMO. Hence, I suggest > > > let's try to do without extra functions. > > > > I agree that (1) could be simplified, or at least fully expressed in > > SQL without exposing too many internals. If we're fine with exposing > > internals like flags and type layouts, then (2), and arguably (4), can > > be expressed in SQL as well. > > > > -Matthias
Attachment
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Bharath Rupireddy
Date:
On Wed, Mar 2, 2022 at 4:45 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > Also, how about special phases for SyncPostCheckpoint(), > > SyncPreCheckpoint(), InvalidateObsoleteReplicationSlots(), > > PreallocXlogFiles() (it currently pre-allocates only 1 WAL file, but > > it might be increase in future (?)), TruncateSUBTRANS()? > > SyncPreCheckpoint() is just incrementing a counter and > PreallocXlogFiles() currently pre-allocates only 1 WAL file. I feel > there is no need to add any phases for these as of now. We can add in > the future if necessary. Added phases for SyncPostCheckpoint(), > InvalidateObsoleteReplicationSlots() and TruncateSUBTRANS(). Okay. > > 10) I'm not sure if it's discussed, how about adding the number of > > snapshot/mapping files so far the checkpoint has processed in file > > processing while loops of > > CheckPointSnapBuild/CheckPointLogicalRewriteHeap? Sometimes, there can > > be many logical snapshot or mapping files and users may be interested > > in knowing the so-far-processed-file-count. > > I had thought about this while sharing the v1 patch and mentioned my > views upthread. I feel it won't give meaningful progress information > (It can be treated as statistics). Hence not included. Thoughts? Okay. If there are any complaints about it we can always add them later. > > > > As mentioned upthread, there can be multiple backends that request a > > > > checkpoint, so unless we want to store an array of pid we should store a number > > > > of backend that are waiting for a new checkpoint. > > > > > > Yeah, you are right. Let's not go that path and store an array of > > > pids. I don't see a strong use-case with the pid of the process > > > requesting checkpoint. If required, we can add it later once the > > > pg_stat_progress_checkpoint view gets in. > > > > I don't think that's really necessary to give the pid list. > > > > If you requested a new checkpoint, it doesn't matter if it's only your backend > > that triggered it, another backend or a few other dozen, the result will be the > > same and you have the information that the request has been seen. We could > > store just a bool for that but having a number instead also gives a bit more > > information and may allow you to detect some broken logic on your client code > > if it keeps increasing. > > It's a good metric to show in the view but the information is not > readily available. Additional code is required to calculate the number > of requests. Is it worth doing that? I feel this can be added later if > required. Yes, we can always add it later if required. > Please find the v4 patch attached and share your thoughts. I reviewed v4 patch, here are my comments: 1) Can we convert below into pgstat_progress_update_multi_param, just to avoid function calls? pgstat_progress_update_param(PROGRESS_CHECKPOINT_LSN, checkPoint.redo); pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE, 2) Why are we not having special phase for CheckPointReplicationOrigin as it does good bunch of work (writing to disk, XLogFlush, durable_rename) especially when max_replication_slots is large? 3) I don't think "requested" is necessary here as it doesn't add any value or it's not a checkpoint kind or such, you can remove it. 4) s:/'recycling old XLOG files'/'recycling old WAL files' + WHEN 16 THEN 'recycling old XLOG files' 5) Can we place CREATE VIEW pg_stat_progress_checkpoint AS definition next to pg_stat_progress_copy in system_view.sql? It looks like all the progress reporting views are next to each other. 6) How about shutdown and end-of-recovery checkpoint? Are you planning to have an ereport_startup_progress mechanism as 0002? 7) I think you don't need to call checkpoint_progress_start and pgstat_progress_update_param, any other progress reporting function for shutdown and end-of-recovery checkpoint right? 8) Not for all kinds of checkpoints right? pg_stat_progress_checkpoint can't show progress report for shutdown and end-of-recovery checkpoint, I think you need to specify that here in wal.sgml and checkpoint.sgml. + command <command>CHECKPOINT</command>. The checkpointer process running the + checkpoint will report its progress in the + <structname>pg_stat_progress_checkpoint</structname> view. See + <xref linkend="checkpoint-progress-reporting"/> for details. 9) Can you add a test case for pg_stat_progress_checkpoint view? I think it's good to add one. See, below for reference: -- Add a trigger to catch and print the contents of the catalog view -- pg_stat_progress_copy during data insertion. This allows to test -- the validation of some progress reports for COPY FROM where the trigger -- would fire. create function notice_after_tab_progress_reporting() returns trigger AS $$ declare report record; 10) Typo: it's not "is happens" + The checkpoint is happens without delays. 11) Can you be specific what are those "some operations" that forced a checkpoint? May be like, basebackup, createdb or something? + The checkpoint is started because some operation forced a checkpoint. 12) Can you be a bit elobartive here who waits? Something like the backend that requested checkpoint will wait until it's completion .... + Wait for completion before returning. 13) "removing unneeded or flushing needed logical rewrite mapping files" + The checkpointer process is currently removing/flushing the logical 14) "old WAL files" + The checkpointer process is currently recycling old XLOG files. Regards, Bharath Rupireddy.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Ashutosh Sharma
Date:
Here are some of my review comments on the latest patch: + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>type</structfield> <type>text</type> + </para> + <para> + Type of checkpoint. See <xref linkend="checkpoint-types"/>. + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>kind</structfield> <type>text</type> + </para> + <para> + Kind of checkpoint. See <xref linkend="checkpoint-kinds"/>. + </para></entry> + </row> This looks a bit confusing. Two columns, one with the name "checkpoint types" and another "checkpoint kinds". You can probably rename checkpoint-kinds to checkpoint-flags and let the checkpoint-types be as-it-is. == + <entry><structname>pg_stat_progress_checkpoint</structname><indexterm><primary>pg_stat_progress_checkpoint</primary></indexterm></entry> + <entry>One row only, showing the progress of the checkpoint. Let's make this message consistent with the already existing message for pg_stat_wal_receiver. See description for pg_stat_wal_receiver view in "Dynamic Statistics Views" table. == [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; -[ RECORD 1 ]-----+------------------------------------- pid | 22043 type | checkpoint kind | immediate force wait requested time I think the output in the kind column can be displayed as {immediate, force, wait, requested, time}. By the way these are all checkpoint flags so it is better to display it as checkpoint flags instead of checkpoint kind as mentioned in one of my previous comments. == [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; -[ RECORD 1 ]-----+------------------------------------- pid | 22043 type | checkpoint kind | immediate force wait requested time start_lsn | 0/14C60F8 start_time | 2022-03-03 18:59:56.018662+05:30 phase | performing two phase checkpoint This is the output I see when the checkpointer process has come out of the two phase checkpoint and is currently writing checkpoint xlog records and doing other stuff like updating control files etc. Is this okay? == The output of log_checkpoint shows the number of buffers written is 3 whereas the output of pg_stat_progress_checkpoint shows it as 0. See below: 2022-03-03 20:04:45.643 IST [22043] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=24.652 s, sync=104.256 s, total=3889.625 s; sync files=2, longest=0.011 s, average=0.008 s; distance=0 kB, estimate=0 kB -- [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; -[ RECORD 1 ]-----+------------------------------------- pid | 22043 type | checkpoint kind | immediate force wait requested time start_lsn | 0/14C60F8 start_time | 2022-03-03 18:59:56.018662+05:30 phase | finalizing buffers_total | 0 buffers_processed | 0 buffers_written | 0 Any idea why this mismatch? == I think we can add a couple of more information to this view - start_time for buffer write operation and start_time for buffer sync operation. These are two very time consuming tasks in a checkpoint and people would find it useful to know how much time is being taken by the checkpoint in I/O operation phase. thoughts? -- With Regards, Ashutosh Sharma. On Wed, Mar 2, 2022 at 4:45 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > Thanks for reviewing. > > > > > I suggested upthread to store the starting timeline instead. This way you can > > > > deduce whether it's a restartpoint or a checkpoint, but you can also deduce > > > > other information, like what was the starting WAL. > > > > > > I don't understand why we need the timeline here to just determine > > > whether it's a restartpoint or checkpoint. > > > > I'm not saying it's necessary, I'm saying that for the same space usage we can > > store something a bit more useful. If no one cares about having the starting > > timeline available for no extra cost then sure, let's just store the kind > > directly. > > Fixed. > > > 2) Can't we just have these checks inside CASE-WHEN-THEN-ELSE blocks > > directly instead of new function pg_stat_get_progress_checkpoint_kind? > > + snprintf(ckpt_kind, MAXPGPATH, "%s%s%s%s%s%s%s%s%s", > > + (flags == 0) ? "unknown" : "", > > + (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " : "", > > + (flags & CHECKPOINT_END_OF_RECOVERY) ? "end-of-recovery " : "", > > + (flags & CHECKPOINT_IMMEDIATE) ? "immediate " : "", > > + (flags & CHECKPOINT_FORCE) ? "force " : "", > > + (flags & CHECKPOINT_WAIT) ? "wait " : "", > > + (flags & CHECKPOINT_CAUSE_XLOG) ? "wal " : "", > > + (flags & CHECKPOINT_CAUSE_TIME) ? "time " : "", > > + (flags & CHECKPOINT_FLUSH_ALL) ? "flush-all" : ""); > > Fixed. > --- > > > 5) Do we need a special phase for this checkpoint operation? I'm not > > sure in which cases it will take a long time, but it looks like > > there's a wait loop here. > > vxids = GetVirtualXIDsDelayingChkpt(&nvxids); > > if (nvxids > 0) > > { > > do > > { > > pg_usleep(10000L); /* wait for 10 msec */ > > } while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids)); > > } > > Yes. It is better to add a separate phase here. > --- > > > Also, how about special phases for SyncPostCheckpoint(), > > SyncPreCheckpoint(), InvalidateObsoleteReplicationSlots(), > > PreallocXlogFiles() (it currently pre-allocates only 1 WAL file, but > > it might be increase in future (?)), TruncateSUBTRANS()? > > SyncPreCheckpoint() is just incrementing a counter and > PreallocXlogFiles() currently pre-allocates only 1 WAL file. I feel > there is no need to add any phases for these as of now. We can add in > the future if necessary. Added phases for SyncPostCheckpoint(), > InvalidateObsoleteReplicationSlots() and TruncateSUBTRANS(). > --- > > > 6) SLRU (Simple LRU) isn't a phase here, you can just say > > PROGRESS_CHECKPOINT_PHASE_PREDICATE_LOCK_PAGES. > > + > > + pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE, > > + PROGRESS_CHECKPOINT_PHASE_SLRU_PAGES); > > CheckPointPredicate(); > > > > And :s/checkpointing SLRU pages/checkpointing predicate lock pages > >+ WHEN 9 THEN 'checkpointing SLRU pages' > > Fixed. > --- > > > 7) :s/PROGRESS_CHECKPOINT_PHASE_FILE_SYNC/PROGRESS_CHECKPOINT_PHASE_PROCESS_FILE_SYNC_REQUESTS > > I feel PROGRESS_CHECKPOINT_PHASE_FILE_SYNC is a better option here as > it describes the purpose in less words. > > > And :s/WHEN 11 THEN 'performing sync requests'/WHEN 11 THEN > > 'processing file sync requests' > > Fixed. > --- > > > 8) :s/Finalizing/finalizing > > + WHEN 14 THEN 'Finalizing' > > Fixed. > --- > > > 9) :s/checkpointing snapshots/checkpointing logical replication snapshot files > > + WHEN 3 THEN 'checkpointing snapshots' > > :s/checkpointing logical rewrite mappings/checkpointing logical > > replication rewrite mapping files > > + WHEN 4 THEN 'checkpointing logical rewrite mappings' > > Fixed. > --- > > > 10) I'm not sure if it's discussed, how about adding the number of > > snapshot/mapping files so far the checkpoint has processed in file > > processing while loops of > > CheckPointSnapBuild/CheckPointLogicalRewriteHeap? Sometimes, there can > > be many logical snapshot or mapping files and users may be interested > > in knowing the so-far-processed-file-count. > > I had thought about this while sharing the v1 patch and mentioned my > views upthread. I feel it won't give meaningful progress information > (It can be treated as statistics). Hence not included. Thoughts? > > > > > As mentioned upthread, there can be multiple backends that request a > > > > checkpoint, so unless we want to store an array of pid we should store a number > > > > of backend that are waiting for a new checkpoint. > > > > > > Yeah, you are right. Let's not go that path and store an array of > > > pids. I don't see a strong use-case with the pid of the process > > > requesting checkpoint. If required, we can add it later once the > > > pg_stat_progress_checkpoint view gets in. > > > > I don't think that's really necessary to give the pid list. > > > > If you requested a new checkpoint, it doesn't matter if it's only your backend > > that triggered it, another backend or a few other dozen, the result will be the > > same and you have the information that the request has been seen. We could > > store just a bool for that but having a number instead also gives a bit more > > information and may allow you to detect some broken logic on your client code > > if it keeps increasing. > > It's a good metric to show in the view but the information is not > readily available. Additional code is required to calculate the number > of requests. Is it worth doing that? I feel this can be added later if > required. > > Please find the v4 patch attached and share your thoughts. > > Thanks & Regards, > Nitin Jadhav > > On Tue, Mar 1, 2022 at 2:27 PM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > > > > > 3) Why do we need this extra calculation for start_lsn? > > > > Do you ever see a negative LSN or something here? > > > > + ('0/0'::pg_lsn + ( > > > > + CASE > > > > + WHEN (s.param3 < 0) THEN pow((2)::numeric, (64)::numeric) > > > > + ELSE (0)::numeric > > > > + END + (s.param3)::numeric)) AS start_lsn, > > > > > > Yes: LSN can take up all of an uint64; whereas the pgstat column is a > > > bigint type; thus the signed int64. This cast is OK as it wraps > > > around, but that means we have to take care to correctly display the > > > LSN when it is > 0x7FFF_FFFF_FFFF_FFFF; which is what we do here using > > > the special-casing for negative values. > > > > Yes. The extra calculation is required here as we are storing unit64 > > value in the variable of type int64. When we convert uint64 to int64 > > then the bit pattern is preserved (so no data is lost). The high-order > > bit becomes the sign bit and if the sign bit is set, both the sign and > > magnitude of the value changes. To safely get the actual uint64 value > > whatever was assigned, we need the above calculations. > > > > > > 4) Can't you use timestamptz_in(to_char(s.param4)) instead of > > > > pg_stat_get_progress_checkpoint_start_time? I don't quite understand > > > > the reasoning for having this function and it's named as *checkpoint* > > > > when it doesn't do anything specific to the checkpoint at all? > > > > > > I hadn't thought of using the types' inout functions, but it looks > > > like timestamp IO functions use a formatted timestring, which won't > > > work with the epoch-based timestamp stored in the view. > > > > There is a variation of to_timestamp() which takes UNIX epoch (float8) > > as an argument and converts it to timestamptz but we cannot directly > > call this function with S.param4. > > > > TimestampTz > > GetCurrentTimestamp(void) > > { > > TimestampTz result; > > struct timeval tp; > > > > gettimeofday(&tp, NULL); > > > > result = (TimestampTz) tp.tv_sec - > > ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY); > > result = (result * USECS_PER_SEC) + tp.tv_usec; > > > > return result; > > } > > > > S.param4 contains the output of the above function > > (GetCurrentTimestamp()) which returns Postgres epoch but the > > to_timestamp() expects UNIX epoch as input. So some calculation is > > required here. I feel the SQL 'to_timestamp(946684800 + > > (S.param4::float / 1000000)) AS start_time' works fine. The value > > '946684800' is equal to ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * > > SECS_PER_DAY). I am not sure whether it is good practice to use this > > way. Kindly share your thoughts. > > > > Thanks & Regards, > > Nitin Jadhav > > > > On Mon, Feb 28, 2022 at 6:40 PM Matthias van de Meent > > <boekewurm+postgres@gmail.com> wrote: > > > > > > On Sun, 27 Feb 2022 at 16:14, Bharath Rupireddy > > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > 3) Why do we need this extra calculation for start_lsn? > > > > Do you ever see a negative LSN or something here? > > > > + ('0/0'::pg_lsn + ( > > > > + CASE > > > > + WHEN (s.param3 < 0) THEN pow((2)::numeric, (64)::numeric) > > > > + ELSE (0)::numeric > > > > + END + (s.param3)::numeric)) AS start_lsn, > > > > > > Yes: LSN can take up all of an uint64; whereas the pgstat column is a > > > bigint type; thus the signed int64. This cast is OK as it wraps > > > around, but that means we have to take care to correctly display the > > > LSN when it is > 0x7FFF_FFFF_FFFF_FFFF; which is what we do here using > > > the special-casing for negative values. > > > > > > As to whether it is reasonable: Generating 16GB of wal every second > > > (2^34 bytes /sec) is probably not impossible (cpu <> memory bandwidth > > > has been > 20GB/sec for a while); and that leaves you 2^29 seconds of > > > database runtime; or about 17 years. Seeing that a cluster can be > > > `pg_upgrade`d (which doesn't reset cluster LSN) since PG 9.0 from at > > > least version PG 8.4.0 (2009) (and through pg_migrator, from 8.3.0)), > > > we can assume that clusters hitting LSN=2^63 will be a reasonable > > > possibility within the next few years. As the lifespan of a PG release > > > is about 5 years, it doesn't seem impossible that there will be actual > > > clusters that are going to hit this naturally in the lifespan of PG15. > > > > > > It is also possible that someone fat-fingers pg_resetwal; and creates > > > a cluster with LSN >= 2^63; resulting in negative values in the > > > s.param3 field. Not likely, but we can force such situations; and as > > > such we should handle that gracefully. > > > > > > > 4) Can't you use timestamptz_in(to_char(s.param4)) instead of > > > > pg_stat_get_progress_checkpoint_start_time? I don't quite understand > > > > the reasoning for having this function and it's named as *checkpoint* > > > > when it doesn't do anything specific to the checkpoint at all? > > > > > > I hadn't thought of using the types' inout functions, but it looks > > > like timestamp IO functions use a formatted timestring, which won't > > > work with the epoch-based timestamp stored in the view. > > > > > > > Having 3 unnecessary functions that aren't useful to the users at all > > > > in proc.dat will simply eatup the function oids IMO. Hence, I suggest > > > > let's try to do without extra functions. > > > > > > I agree that (1) could be simplified, or at least fully expressed in > > > SQL without exposing too many internals. If we're fine with exposing > > > internals like flags and type layouts, then (2), and arguably (4), can > > > be expressed in SQL as well. > > > > > > -Matthias
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Julien Rouhaud
Date:
On Wed, Mar 2, 2022 at 7:15 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > > > > As mentioned upthread, there can be multiple backends that request a > > > > checkpoint, so unless we want to store an array of pid we should store a number > > > > of backend that are waiting for a new checkpoint. > > It's a good metric to show in the view but the information is not > readily available. Additional code is required to calculate the number > of requests. Is it worth doing that? I feel this can be added later if > required. Is it that hard or costly to do? Just sending a message to increment the stat counter in RequestCheckpoint() would be enough. Also, unless I'm missing something it's still only showing the initial checkpoint flags, so it's *not* showing what the checkpoint is really doing, only what the checkpoint may be doing if nothing else happens. It just feels wrong. You could even use that ckpt_flags info to know that at least one backend has requested a new checkpoint, if you don't want to have a number of backends.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
Thanks for reviewing. > 6) How about shutdown and end-of-recovery checkpoint? Are you planning > to have an ereport_startup_progress mechanism as 0002? I thought of including it earlier then I felt lets first make the current patch stable. Once all the fields are properly decided and the patch gets in then we can easily extend the functionality to shutdown and end-of-recovery cases. I have also observed that the timer functionality wont work properly in case of shutdown as we are doing an immediate checkpoint. So this needs a lot of discussion and I would like to handle this on a separate thread. --- > 7) I think you don't need to call checkpoint_progress_start and > pgstat_progress_update_param, any other progress reporting function > for shutdown and end-of-recovery checkpoint right? I had included the guards earlier and then removed later based on the discussion upthread. --- > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > -[ RECORD 1 ]-----+------------------------------------- > pid | 22043 > type | checkpoint > kind | immediate force wait requested time > start_lsn | 0/14C60F8 > start_time | 2022-03-03 18:59:56.018662+05:30 > phase | performing two phase checkpoint > > > This is the output I see when the checkpointer process has come out of > the two phase checkpoint and is currently writing checkpoint xlog > records and doing other stuff like updating control files etc. Is this > okay? The idea behind choosing the phases is based on the functionality which takes longer time to execute. Since after two phase checkpoint till post checkpoint cleanup won't take much time to execute, I have not added any additional phase for that. But I also agree that this gives wrong information to the user. How about mentioning the phase information at the end of each phase like "Initializing", "Initialization done", ..., "two phase checkpoint done", "post checkpoint cleanup done", .., "finalizing". Except for the first phase ("initializing") and last phase ("finalizing"), all the other phases describe the end of a certain operation. I feel this gives correct information even though the phase name/description does not represent the entire code block between two phases. For example if the current phase is ''two phase checkpoint done". Then the user can infer that the checkpointer has done till two phase checkpoint and it is doing other stuff that are after that. Thoughts? > The output of log_checkpoint shows the number of buffers written is 3 > whereas the output of pg_stat_progress_checkpoint shows it as 0. See > below: > > 2022-03-03 20:04:45.643 IST [22043] LOG: checkpoint complete: wrote 3 > buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; > write=24.652 s, sync=104.256 s, total=3889.625 s; sync files=2, > longest=0.011 s, average=0.008 s; distance=0 kB, estimate=0 kB > > -- > > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > -[ RECORD 1 ]-----+------------------------------------- > pid | 22043 > type | checkpoint > kind | immediate force wait requested time > start_lsn | 0/14C60F8 > start_time | 2022-03-03 18:59:56.018662+05:30 > phase | finalizing > buffers_total | 0 > buffers_processed | 0 > buffers_written | 0 > > Any idea why this mismatch? Good catch. In BufferSync() we have 'num_to_scan' (buffers_total) which indicates the total number of buffers to be processed. Based on that, the 'buffers_processed' and 'buffers_written' counter gets incremented. I meant these values may reach upto 'buffers_total'. The current pg_stat_progress_view support above information. There is another place when 'ckpt_bufs_written' gets incremented (In SlruInternalWritePage()). This increment is above the 'buffers_total' value and it is included in the server log message (checkpoint end) and not included in the view. I am a bit confused here. If we include this increment in the view then we cannot calculate the exact 'buffers_total' beforehand. Can we increment the 'buffers_toal' also when 'ckpt_bufs_written' gets incremented so that we can match the behaviour with checkpoint end message? Please share your thoughts. --- > I think we can add a couple of more information to this view - > start_time for buffer write operation and start_time for buffer sync > operation. These are two very time consuming tasks in a checkpoint and > people would find it useful to know how much time is being taken by > the checkpoint in I/O operation phase. thoughts? I felt the detailed progress is getting shown for these 2 phases of the checkpoint like 'buffers_processed', 'buffers_written' and 'files_synced'. Hence I did not think about adding start time and If it is really required, then I can add. > Is it that hard or costly to do? Just sending a message to increment > the stat counter in RequestCheckpoint() would be enough. > > Also, unless I'm missing something it's still only showing the initial > checkpoint flags, so it's *not* showing what the checkpoint is really > doing, only what the checkpoint may be doing if nothing else happens. > It just feels wrong. You could even use that ckpt_flags info to know > that at least one backend has requested a new checkpoint, if you don't > want to have a number of backends. I think using ckpt_flags to display whether any new requests have been made or not is a good idea. I will include it in the next patch. Thanks & Regards, Nitin Jadhav On Thu, Mar 3, 2022 at 11:58 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Wed, Mar 2, 2022 at 7:15 PM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > > > > > > As mentioned upthread, there can be multiple backends that request a > > > > > checkpoint, so unless we want to store an array of pid we should store a number > > > > > of backend that are waiting for a new checkpoint. > > > > It's a good metric to show in the view but the information is not > > readily available. Additional code is required to calculate the number > > of requests. Is it worth doing that? I feel this can be added later if > > required. > > Is it that hard or costly to do? Just sending a message to increment > the stat counter in RequestCheckpoint() would be enough. > > Also, unless I'm missing something it's still only showing the initial > checkpoint flags, so it's *not* showing what the checkpoint is really > doing, only what the checkpoint may be doing if nothing else happens. > It just feels wrong. You could even use that ckpt_flags info to know > that at least one backend has requested a new checkpoint, if you don't > want to have a number of backends.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Ashutosh Sharma
Date:
Please don't mix comments from multiple reviewers into one thread. It's hard to understand which comments are mine or Julien's or from others. Can you please respond to the email from each of us separately with an inline response. That will be helpful to understand your thoughts on our review comments. -- With Regards, Ashutosh Sharma. On Fri, Mar 4, 2022 at 4:59 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > Thanks for reviewing. > > > 6) How about shutdown and end-of-recovery checkpoint? Are you planning > > to have an ereport_startup_progress mechanism as 0002? > > I thought of including it earlier then I felt lets first make the > current patch stable. Once all the fields are properly decided and the > patch gets in then we can easily extend the functionality to shutdown > and end-of-recovery cases. I have also observed that the timer > functionality wont work properly in case of shutdown as we are doing > an immediate checkpoint. So this needs a lot of discussion and I would > like to handle this on a separate thread. > --- > > > 7) I think you don't need to call checkpoint_progress_start and > > pgstat_progress_update_param, any other progress reporting function > > for shutdown and end-of-recovery checkpoint right? > > I had included the guards earlier and then removed later based on the > discussion upthread. > --- > > > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > > -[ RECORD 1 ]-----+------------------------------------- > > pid | 22043 > > type | checkpoint > > kind | immediate force wait requested time > > start_lsn | 0/14C60F8 > > start_time | 2022-03-03 18:59:56.018662+05:30 > > phase | performing two phase checkpoint > > > > > > This is the output I see when the checkpointer process has come out of > > the two phase checkpoint and is currently writing checkpoint xlog > > records and doing other stuff like updating control files etc. Is this > > okay? > > The idea behind choosing the phases is based on the functionality > which takes longer time to execute. Since after two phase checkpoint > till post checkpoint cleanup won't take much time to execute, I have > not added any additional phase for that. But I also agree that this > gives wrong information to the user. How about mentioning the phase > information at the end of each phase like "Initializing", > "Initialization done", ..., "two phase checkpoint done", "post > checkpoint cleanup done", .., "finalizing". Except for the first phase > ("initializing") and last phase ("finalizing"), all the other phases > describe the end of a certain operation. I feel this gives correct > information even though the phase name/description does not represent > the entire code block between two phases. For example if the current > phase is ''two phase checkpoint done". Then the user can infer that > the checkpointer has done till two phase checkpoint and it is doing > other stuff that are after that. Thoughts? > > > The output of log_checkpoint shows the number of buffers written is 3 > > whereas the output of pg_stat_progress_checkpoint shows it as 0. See > > below: > > > > 2022-03-03 20:04:45.643 IST [22043] LOG: checkpoint complete: wrote 3 > > buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; > > write=24.652 s, sync=104.256 s, total=3889.625 s; sync files=2, > > longest=0.011 s, average=0.008 s; distance=0 kB, estimate=0 kB > > > > -- > > > > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > > -[ RECORD 1 ]-----+------------------------------------- > > pid | 22043 > > type | checkpoint > > kind | immediate force wait requested time > > start_lsn | 0/14C60F8 > > start_time | 2022-03-03 18:59:56.018662+05:30 > > phase | finalizing > > buffers_total | 0 > > buffers_processed | 0 > > buffers_written | 0 > > > > Any idea why this mismatch? > > Good catch. In BufferSync() we have 'num_to_scan' (buffers_total) > which indicates the total number of buffers to be processed. Based on > that, the 'buffers_processed' and 'buffers_written' counter gets > incremented. I meant these values may reach upto 'buffers_total'. The > current pg_stat_progress_view support above information. There is > another place when 'ckpt_bufs_written' gets incremented (In > SlruInternalWritePage()). This increment is above the 'buffers_total' > value and it is included in the server log message (checkpoint end) > and not included in the view. I am a bit confused here. If we include > this increment in the view then we cannot calculate the exact > 'buffers_total' beforehand. Can we increment the 'buffers_toal' also > when 'ckpt_bufs_written' gets incremented so that we can match the > behaviour with checkpoint end message? Please share your thoughts. > --- > > > I think we can add a couple of more information to this view - > > start_time for buffer write operation and start_time for buffer sync > > operation. These are two very time consuming tasks in a checkpoint and > > people would find it useful to know how much time is being taken by > > the checkpoint in I/O operation phase. thoughts? > > I felt the detailed progress is getting shown for these 2 phases of > the checkpoint like 'buffers_processed', 'buffers_written' and > 'files_synced'. Hence I did not think about adding start time and If > it is really required, then I can add. > > > Is it that hard or costly to do? Just sending a message to increment > > the stat counter in RequestCheckpoint() would be enough. > > > > Also, unless I'm missing something it's still only showing the initial > > checkpoint flags, so it's *not* showing what the checkpoint is really > > doing, only what the checkpoint may be doing if nothing else happens. > > It just feels wrong. You could even use that ckpt_flags info to know > > that at least one backend has requested a new checkpoint, if you don't > > want to have a number of backends. > > I think using ckpt_flags to display whether any new requests have been > made or not is a good idea. I will include it in the next patch. > > Thanks & Regards, > Nitin Jadhav > On Thu, Mar 3, 2022 at 11:58 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > On Wed, Mar 2, 2022 at 7:15 PM Nitin Jadhav > > <nitinjadhavpostgres@gmail.com> wrote: > > > > > > > > > As mentioned upthread, there can be multiple backends that request a > > > > > > checkpoint, so unless we want to store an array of pid we should store a number > > > > > > of backend that are waiting for a new checkpoint. > > > > > > It's a good metric to show in the view but the information is not > > > readily available. Additional code is required to calculate the number > > > of requests. Is it worth doing that? I feel this can be added later if > > > required. > > > > Is it that hard or costly to do? Just sending a message to increment > > the stat counter in RequestCheckpoint() would be enough. > > > > Also, unless I'm missing something it's still only showing the initial > > checkpoint flags, so it's *not* showing what the checkpoint is really > > doing, only what the checkpoint may be doing if nothing else happens. > > It just feels wrong. You could even use that ckpt_flags info to know > > that at least one backend has requested a new checkpoint, if you don't > > want to have a number of backends.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> 1) Can we convert below into pgstat_progress_update_multi_param, just > to avoid function calls? > pgstat_progress_update_param(PROGRESS_CHECKPOINT_LSN, checkPoint.redo); > pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE, > > 2) Why are we not having special phase for CheckPointReplicationOrigin > as it does good bunch of work (writing to disk, XLogFlush, > durable_rename) especially when max_replication_slots is large? > > 3) I don't think "requested" is necessary here as it doesn't add any > value or it's not a checkpoint kind or such, you can remove it. > > 4) s:/'recycling old XLOG files'/'recycling old WAL files' > + WHEN 16 THEN 'recycling old XLOG files' > > 5) Can we place CREATE VIEW pg_stat_progress_checkpoint AS definition > next to pg_stat_progress_copy in system_view.sql? It looks like all > the progress reporting views are next to each other. I will take care in the next patch. --- > 6) How about shutdown and end-of-recovery checkpoint? Are you planning > to have an ereport_startup_progress mechanism as 0002? I thought of including it earlier then I felt lets first make the current patch stable. Once all the fields are properly decided and the patch gets in then we can easily extend the functionality to shutdown and end-of-recovery cases. I have also observed that the timer functionality wont work properly in case of shutdown as we are doing an immediate checkpoint. So this needs a lot of discussion and I would like to handle this on a separate thread. --- > 7) I think you don't need to call checkpoint_progress_start and > pgstat_progress_update_param, any other progress reporting function > for shutdown and end-of-recovery checkpoint right? I had included the guards earlier and then removed later based on the discussion upthread. --- > 8) Not for all kinds of checkpoints right? pg_stat_progress_checkpoint > can't show progress report for shutdown and end-of-recovery > checkpoint, I think you need to specify that here in wal.sgml and > checkpoint.sgml. > + command <command>CHECKPOINT</command>. The checkpointer process running the > + checkpoint will report its progress in the > + <structname>pg_stat_progress_checkpoint</structname> view. See > + <xref linkend="checkpoint-progress-reporting"/> for details. > > 9) Can you add a test case for pg_stat_progress_checkpoint view? I > think it's good to add one. See, below for reference: > -- Add a trigger to catch and print the contents of the catalog view > -- pg_stat_progress_copy during data insertion. This allows to test > -- the validation of some progress reports for COPY FROM where the trigger > -- would fire. > create function notice_after_tab_progress_reporting() returns trigger AS > $$ > declare report record; > > 10) Typo: it's not "is happens" > + The checkpoint is happens without delays. > > 11) Can you be specific what are those "some operations" that forced a > checkpoint? May be like, basebackup, createdb or something? > + The checkpoint is started because some operation forced a checkpoint. > > 12) Can you be a bit elobartive here who waits? Something like the > backend that requested checkpoint will wait until it's completion .... > + Wait for completion before returning. > > 13) "removing unneeded or flushing needed logical rewrite mapping files" > + The checkpointer process is currently removing/flushing the logical > > 14) "old WAL files" > + The checkpointer process is currently recycling old XLOG files. I will take care in the next patch. Thanks & Regards, Nitin Jadhav On Wed, Mar 2, 2022 at 11:52 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Wed, Mar 2, 2022 at 4:45 PM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > > Also, how about special phases for SyncPostCheckpoint(), > > > SyncPreCheckpoint(), InvalidateObsoleteReplicationSlots(), > > > PreallocXlogFiles() (it currently pre-allocates only 1 WAL file, but > > > it might be increase in future (?)), TruncateSUBTRANS()? > > > > SyncPreCheckpoint() is just incrementing a counter and > > PreallocXlogFiles() currently pre-allocates only 1 WAL file. I feel > > there is no need to add any phases for these as of now. We can add in > > the future if necessary. Added phases for SyncPostCheckpoint(), > > InvalidateObsoleteReplicationSlots() and TruncateSUBTRANS(). > > Okay. > > > > 10) I'm not sure if it's discussed, how about adding the number of > > > snapshot/mapping files so far the checkpoint has processed in file > > > processing while loops of > > > CheckPointSnapBuild/CheckPointLogicalRewriteHeap? Sometimes, there can > > > be many logical snapshot or mapping files and users may be interested > > > in knowing the so-far-processed-file-count. > > > > I had thought about this while sharing the v1 patch and mentioned my > > views upthread. I feel it won't give meaningful progress information > > (It can be treated as statistics). Hence not included. Thoughts? > > Okay. If there are any complaints about it we can always add them later. > > > > > > As mentioned upthread, there can be multiple backends that request a > > > > > checkpoint, so unless we want to store an array of pid we should store a number > > > > > of backend that are waiting for a new checkpoint. > > > > > > > > Yeah, you are right. Let's not go that path and store an array of > > > > pids. I don't see a strong use-case with the pid of the process > > > > requesting checkpoint. If required, we can add it later once the > > > > pg_stat_progress_checkpoint view gets in. > > > > > > I don't think that's really necessary to give the pid list. > > > > > > If you requested a new checkpoint, it doesn't matter if it's only your backend > > > that triggered it, another backend or a few other dozen, the result will be the > > > same and you have the information that the request has been seen. We could > > > store just a bool for that but having a number instead also gives a bit more > > > information and may allow you to detect some broken logic on your client code > > > if it keeps increasing. > > > > It's a good metric to show in the view but the information is not > > readily available. Additional code is required to calculate the number > > of requests. Is it worth doing that? I feel this can be added later if > > required. > > Yes, we can always add it later if required. > > > Please find the v4 patch attached and share your thoughts. > > I reviewed v4 patch, here are my comments: > > 1) Can we convert below into pgstat_progress_update_multi_param, just > to avoid function calls? > pgstat_progress_update_param(PROGRESS_CHECKPOINT_LSN, checkPoint.redo); > pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE, > > 2) Why are we not having special phase for CheckPointReplicationOrigin > as it does good bunch of work (writing to disk, XLogFlush, > durable_rename) especially when max_replication_slots is large? > > 3) I don't think "requested" is necessary here as it doesn't add any > value or it's not a checkpoint kind or such, you can remove it. > > 4) s:/'recycling old XLOG files'/'recycling old WAL files' > + WHEN 16 THEN 'recycling old XLOG files' > > 5) Can we place CREATE VIEW pg_stat_progress_checkpoint AS definition > next to pg_stat_progress_copy in system_view.sql? It looks like all > the progress reporting views are next to each other. > > 6) How about shutdown and end-of-recovery checkpoint? Are you planning > to have an ereport_startup_progress mechanism as 0002? > > 7) I think you don't need to call checkpoint_progress_start and > pgstat_progress_update_param, any other progress reporting function > for shutdown and end-of-recovery checkpoint right? > > 8) Not for all kinds of checkpoints right? pg_stat_progress_checkpoint > can't show progress report for shutdown and end-of-recovery > checkpoint, I think you need to specify that here in wal.sgml and > checkpoint.sgml. > + command <command>CHECKPOINT</command>. The checkpointer process running the > + checkpoint will report its progress in the > + <structname>pg_stat_progress_checkpoint</structname> view. See > + <xref linkend="checkpoint-progress-reporting"/> for details. > > 9) Can you add a test case for pg_stat_progress_checkpoint view? I > think it's good to add one. See, below for reference: > -- Add a trigger to catch and print the contents of the catalog view > -- pg_stat_progress_copy during data insertion. This allows to test > -- the validation of some progress reports for COPY FROM where the trigger > -- would fire. > create function notice_after_tab_progress_reporting() returns trigger AS > $$ > declare report record; > > 10) Typo: it's not "is happens" > + The checkpoint is happens without delays. > > 11) Can you be specific what are those "some operations" that forced a > checkpoint? May be like, basebackup, createdb or something? > + The checkpoint is started because some operation forced a checkpoint. > > 12) Can you be a bit elobartive here who waits? Something like the > backend that requested checkpoint will wait until it's completion .... > + Wait for completion before returning. > > 13) "removing unneeded or flushing needed logical rewrite mapping files" > + The checkpointer process is currently removing/flushing the logical > > 14) "old WAL files" > + The checkpointer process is currently recycling old XLOG files. > > Regards, > Bharath Rupireddy.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> + <row> > + <entry role="catalog_table_entry"><para role="column_definition"> > + <structfield>type</structfield> <type>text</type> > + </para> > + <para> > + Type of checkpoint. See <xref linkend="checkpoint-types"/>. > + </para></entry> > + </row> > + > + <row> > + <entry role="catalog_table_entry"><para role="column_definition"> > + <structfield>kind</structfield> <type>text</type> > + </para> > + <para> > + Kind of checkpoint. See <xref linkend="checkpoint-kinds"/>. > + </para></entry> > + </row> > > This looks a bit confusing. Two columns, one with the name "checkpoint > types" and another "checkpoint kinds". You can probably rename > checkpoint-kinds to checkpoint-flags and let the checkpoint-types be > as-it-is. Makes sense. I will change in the next patch. --- > + <entry><structname>pg_stat_progress_checkpoint</structname><indexterm><primary>pg_stat_progress_checkpoint</primary></indexterm></entry> > + <entry>One row only, showing the progress of the checkpoint. > > Let's make this message consistent with the already existing message > for pg_stat_wal_receiver. See description for pg_stat_wal_receiver > view in "Dynamic Statistics Views" table. You want me to change "One row only" to "Only one row" ? If that is the case then for other views in the "Collected Statistics Views" table, it is referred as "One row only". Let me know if you are pointing out something else. --- > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > -[ RECORD 1 ]-----+------------------------------------- > pid | 22043 > type | checkpoint > kind | immediate force wait requested time > > I think the output in the kind column can be displayed as {immediate, > force, wait, requested, time}. By the way these are all checkpoint > flags so it is better to display it as checkpoint flags instead of > checkpoint kind as mentioned in one of my previous comments. I will update in the next patch. --- > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > -[ RECORD 1 ]-----+------------------------------------- > pid | 22043 > type | checkpoint > kind | immediate force wait requested time > start_lsn | 0/14C60F8 > start_time | 2022-03-03 18:59:56.018662+05:30 > phase | performing two phase checkpoint > > This is the output I see when the checkpointer process has come out of > the two phase checkpoint and is currently writing checkpoint xlog > records and doing other stuff like updating control files etc. Is this > okay? The idea behind choosing the phases is based on the functionality which takes longer time to execute. Since after two phase checkpoint till post checkpoint cleanup won't take much time to execute, I have not added any additional phase for that. But I also agree that this gives wrong information to the user. How about mentioning the phase information at the end of each phase like "Initializing", "Initialization done", ..., "two phase checkpoint done", "post checkpoint cleanup done", .., "finalizing". Except for the first phase ("initializing") and last phase ("finalizing"), all the other phases describe the end of a certain operation. I feel this gives correct information even though the phase name/description does not represent the entire code block between two phases. For example if the current phase is ''two phase checkpoint done". Then the user can infer that the checkpointer has done till two phase checkpoint and it is doing other stuff that are after that. Thoughts? --- > The output of log_checkpoint shows the number of buffers written is 3 > whereas the output of pg_stat_progress_checkpoint shows it as 0. See > below: > > 2022-03-03 20:04:45.643 IST [22043] LOG: checkpoint complete: wrote 3 > buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; > write=24.652 s, sync=104.256 s, total=3889.625 s; sync files=2, > longest=0.011 s, average=0.008 s; distance=0 kB, estimate=0 kB > > -- > > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > -[ RECORD 1 ]-----+------------------------------------- > pid | 22043 > type | checkpoint > kind | immediate force wait requested time > start_lsn | 0/14C60F8 > start_time | 2022-03-03 18:59:56.018662+05:30 > phase | finalizing > buffers_total | 0 > buffers_processed | 0 > buffers_written | 0 > > Any idea why this mismatch? Good catch. In BufferSync() we have 'num_to_scan' (buffers_total) which indicates the total number of buffers to be processed. Based on that, the 'buffers_processed' and 'buffers_written' counter gets incremented. I meant these values may reach upto 'buffers_total'. The current pg_stat_progress_view support above information. There is another place when 'ckpt_bufs_written' gets incremented (In SlruInternalWritePage()). This increment is above the 'buffers_total' value and it is included in the server log message (checkpoint end) and not included in the view. I am a bit confused here. If we include this increment in the view then we cannot calculate the exact 'buffers_total' beforehand. Can we increment the 'buffers_toal' also when 'ckpt_bufs_written' gets incremented so that we can match the behaviour with checkpoint end message? Please share your thoughts. --- > I think we can add a couple of more information to this view - > start_time for buffer write operation and start_time for buffer sync > operation. These are two very time consuming tasks in a checkpoint and > people would find it useful to know how much time is being taken by > the checkpoint in I/O operation phase. thoughts? The detailed progress is getting shown for these 2 phases of the checkpoint like 'buffers_processed', 'buffers_written' and 'files_synced'. Hence I did not think about adding start time and If it is really required, then I can add. Thanks & Regards, Nitin Jadhav On Thu, Mar 3, 2022 at 8:30 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > Here are some of my review comments on the latest patch: > > + <row> > + <entry role="catalog_table_entry"><para role="column_definition"> > + <structfield>type</structfield> <type>text</type> > + </para> > + <para> > + Type of checkpoint. See <xref linkend="checkpoint-types"/>. > + </para></entry> > + </row> > + > + <row> > + <entry role="catalog_table_entry"><para role="column_definition"> > + <structfield>kind</structfield> <type>text</type> > + </para> > + <para> > + Kind of checkpoint. See <xref linkend="checkpoint-kinds"/>. > + </para></entry> > + </row> > > This looks a bit confusing. Two columns, one with the name "checkpoint > types" and another "checkpoint kinds". You can probably rename > checkpoint-kinds to checkpoint-flags and let the checkpoint-types be > as-it-is. > > == > > + <entry><structname>pg_stat_progress_checkpoint</structname><indexterm><primary>pg_stat_progress_checkpoint</primary></indexterm></entry> > + <entry>One row only, showing the progress of the checkpoint. > > Let's make this message consistent with the already existing message > for pg_stat_wal_receiver. See description for pg_stat_wal_receiver > view in "Dynamic Statistics Views" table. > > == > > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > -[ RECORD 1 ]-----+------------------------------------- > pid | 22043 > type | checkpoint > kind | immediate force wait requested time > > I think the output in the kind column can be displayed as {immediate, > force, wait, requested, time}. By the way these are all checkpoint > flags so it is better to display it as checkpoint flags instead of > checkpoint kind as mentioned in one of my previous comments. > > == > > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > -[ RECORD 1 ]-----+------------------------------------- > pid | 22043 > type | checkpoint > kind | immediate force wait requested time > start_lsn | 0/14C60F8 > start_time | 2022-03-03 18:59:56.018662+05:30 > phase | performing two phase checkpoint > > > This is the output I see when the checkpointer process has come out of > the two phase checkpoint and is currently writing checkpoint xlog > records and doing other stuff like updating control files etc. Is this > okay? > > == > > The output of log_checkpoint shows the number of buffers written is 3 > whereas the output of pg_stat_progress_checkpoint shows it as 0. See > below: > > 2022-03-03 20:04:45.643 IST [22043] LOG: checkpoint complete: wrote 3 > buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; > write=24.652 s, sync=104.256 s, total=3889.625 s; sync files=2, > longest=0.011 s, average=0.008 s; distance=0 kB, estimate=0 kB > > -- > > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > -[ RECORD 1 ]-----+------------------------------------- > pid | 22043 > type | checkpoint > kind | immediate force wait requested time > start_lsn | 0/14C60F8 > start_time | 2022-03-03 18:59:56.018662+05:30 > phase | finalizing > buffers_total | 0 > buffers_processed | 0 > buffers_written | 0 > > Any idea why this mismatch? > > == > > I think we can add a couple of more information to this view - > start_time for buffer write operation and start_time for buffer sync > operation. These are two very time consuming tasks in a checkpoint and > people would find it useful to know how much time is being taken by > the checkpoint in I/O operation phase. thoughts? > > -- > With Regards, > Ashutosh Sharma. > > On Wed, Mar 2, 2022 at 4:45 PM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > > > Thanks for reviewing. > > > > > > > I suggested upthread to store the starting timeline instead. This way you can > > > > > deduce whether it's a restartpoint or a checkpoint, but you can also deduce > > > > > other information, like what was the starting WAL. > > > > > > > > I don't understand why we need the timeline here to just determine > > > > whether it's a restartpoint or checkpoint. > > > > > > I'm not saying it's necessary, I'm saying that for the same space usage we can > > > store something a bit more useful. If no one cares about having the starting > > > timeline available for no extra cost then sure, let's just store the kind > > > directly. > > > > Fixed. > > > > > 2) Can't we just have these checks inside CASE-WHEN-THEN-ELSE blocks > > > directly instead of new function pg_stat_get_progress_checkpoint_kind? > > > + snprintf(ckpt_kind, MAXPGPATH, "%s%s%s%s%s%s%s%s%s", > > > + (flags == 0) ? "unknown" : "", > > > + (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " : "", > > > + (flags & CHECKPOINT_END_OF_RECOVERY) ? "end-of-recovery " : "", > > > + (flags & CHECKPOINT_IMMEDIATE) ? "immediate " : "", > > > + (flags & CHECKPOINT_FORCE) ? "force " : "", > > > + (flags & CHECKPOINT_WAIT) ? "wait " : "", > > > + (flags & CHECKPOINT_CAUSE_XLOG) ? "wal " : "", > > > + (flags & CHECKPOINT_CAUSE_TIME) ? "time " : "", > > > + (flags & CHECKPOINT_FLUSH_ALL) ? "flush-all" : ""); > > > > Fixed. > > --- > > > > > 5) Do we need a special phase for this checkpoint operation? I'm not > > > sure in which cases it will take a long time, but it looks like > > > there's a wait loop here. > > > vxids = GetVirtualXIDsDelayingChkpt(&nvxids); > > > if (nvxids > 0) > > > { > > > do > > > { > > > pg_usleep(10000L); /* wait for 10 msec */ > > > } while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids)); > > > } > > > > Yes. It is better to add a separate phase here. > > --- > > > > > Also, how about special phases for SyncPostCheckpoint(), > > > SyncPreCheckpoint(), InvalidateObsoleteReplicationSlots(), > > > PreallocXlogFiles() (it currently pre-allocates only 1 WAL file, but > > > it might be increase in future (?)), TruncateSUBTRANS()? > > > > SyncPreCheckpoint() is just incrementing a counter and > > PreallocXlogFiles() currently pre-allocates only 1 WAL file. I feel > > there is no need to add any phases for these as of now. We can add in > > the future if necessary. Added phases for SyncPostCheckpoint(), > > InvalidateObsoleteReplicationSlots() and TruncateSUBTRANS(). > > --- > > > > > 6) SLRU (Simple LRU) isn't a phase here, you can just say > > > PROGRESS_CHECKPOINT_PHASE_PREDICATE_LOCK_PAGES. > > > + > > > + pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE, > > > + PROGRESS_CHECKPOINT_PHASE_SLRU_PAGES); > > > CheckPointPredicate(); > > > > > > And :s/checkpointing SLRU pages/checkpointing predicate lock pages > > >+ WHEN 9 THEN 'checkpointing SLRU pages' > > > > Fixed. > > --- > > > > > 7) :s/PROGRESS_CHECKPOINT_PHASE_FILE_SYNC/PROGRESS_CHECKPOINT_PHASE_PROCESS_FILE_SYNC_REQUESTS > > > > I feel PROGRESS_CHECKPOINT_PHASE_FILE_SYNC is a better option here as > > it describes the purpose in less words. > > > > > And :s/WHEN 11 THEN 'performing sync requests'/WHEN 11 THEN > > > 'processing file sync requests' > > > > Fixed. > > --- > > > > > 8) :s/Finalizing/finalizing > > > + WHEN 14 THEN 'Finalizing' > > > > Fixed. > > --- > > > > > 9) :s/checkpointing snapshots/checkpointing logical replication snapshot files > > > + WHEN 3 THEN 'checkpointing snapshots' > > > :s/checkpointing logical rewrite mappings/checkpointing logical > > > replication rewrite mapping files > > > + WHEN 4 THEN 'checkpointing logical rewrite mappings' > > > > Fixed. > > --- > > > > > 10) I'm not sure if it's discussed, how about adding the number of > > > snapshot/mapping files so far the checkpoint has processed in file > > > processing while loops of > > > CheckPointSnapBuild/CheckPointLogicalRewriteHeap? Sometimes, there can > > > be many logical snapshot or mapping files and users may be interested > > > in knowing the so-far-processed-file-count. > > > > I had thought about this while sharing the v1 patch and mentioned my > > views upthread. I feel it won't give meaningful progress information > > (It can be treated as statistics). Hence not included. Thoughts? > > > > > > > As mentioned upthread, there can be multiple backends that request a > > > > > checkpoint, so unless we want to store an array of pid we should store a number > > > > > of backend that are waiting for a new checkpoint. > > > > > > > > Yeah, you are right. Let's not go that path and store an array of > > > > pids. I don't see a strong use-case with the pid of the process > > > > requesting checkpoint. If required, we can add it later once the > > > > pg_stat_progress_checkpoint view gets in. > > > > > > I don't think that's really necessary to give the pid list. > > > > > > If you requested a new checkpoint, it doesn't matter if it's only your backend > > > that triggered it, another backend or a few other dozen, the result will be the > > > same and you have the information that the request has been seen. We could > > > store just a bool for that but having a number instead also gives a bit more > > > information and may allow you to detect some broken logic on your client code > > > if it keeps increasing. > > > > It's a good metric to show in the view but the information is not > > readily available. Additional code is required to calculate the number > > of requests. Is it worth doing that? I feel this can be added later if > > required. > > > > Please find the v4 patch attached and share your thoughts. > > > > Thanks & Regards, > > Nitin Jadhav > > > > On Tue, Mar 1, 2022 at 2:27 PM Nitin Jadhav > > <nitinjadhavpostgres@gmail.com> wrote: > > > > > > > > 3) Why do we need this extra calculation for start_lsn? > > > > > Do you ever see a negative LSN or something here? > > > > > + ('0/0'::pg_lsn + ( > > > > > + CASE > > > > > + WHEN (s.param3 < 0) THEN pow((2)::numeric, (64)::numeric) > > > > > + ELSE (0)::numeric > > > > > + END + (s.param3)::numeric)) AS start_lsn, > > > > > > > > Yes: LSN can take up all of an uint64; whereas the pgstat column is a > > > > bigint type; thus the signed int64. This cast is OK as it wraps > > > > around, but that means we have to take care to correctly display the > > > > LSN when it is > 0x7FFF_FFFF_FFFF_FFFF; which is what we do here using > > > > the special-casing for negative values. > > > > > > Yes. The extra calculation is required here as we are storing unit64 > > > value in the variable of type int64. When we convert uint64 to int64 > > > then the bit pattern is preserved (so no data is lost). The high-order > > > bit becomes the sign bit and if the sign bit is set, both the sign and > > > magnitude of the value changes. To safely get the actual uint64 value > > > whatever was assigned, we need the above calculations. > > > > > > > > 4) Can't you use timestamptz_in(to_char(s.param4)) instead of > > > > > pg_stat_get_progress_checkpoint_start_time? I don't quite understand > > > > > the reasoning for having this function and it's named as *checkpoint* > > > > > when it doesn't do anything specific to the checkpoint at all? > > > > > > > > I hadn't thought of using the types' inout functions, but it looks > > > > like timestamp IO functions use a formatted timestring, which won't > > > > work with the epoch-based timestamp stored in the view. > > > > > > There is a variation of to_timestamp() which takes UNIX epoch (float8) > > > as an argument and converts it to timestamptz but we cannot directly > > > call this function with S.param4. > > > > > > TimestampTz > > > GetCurrentTimestamp(void) > > > { > > > TimestampTz result; > > > struct timeval tp; > > > > > > gettimeofday(&tp, NULL); > > > > > > result = (TimestampTz) tp.tv_sec - > > > ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY); > > > result = (result * USECS_PER_SEC) + tp.tv_usec; > > > > > > return result; > > > } > > > > > > S.param4 contains the output of the above function > > > (GetCurrentTimestamp()) which returns Postgres epoch but the > > > to_timestamp() expects UNIX epoch as input. So some calculation is > > > required here. I feel the SQL 'to_timestamp(946684800 + > > > (S.param4::float / 1000000)) AS start_time' works fine. The value > > > '946684800' is equal to ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * > > > SECS_PER_DAY). I am not sure whether it is good practice to use this > > > way. Kindly share your thoughts. > > > > > > Thanks & Regards, > > > Nitin Jadhav > > > > > > On Mon, Feb 28, 2022 at 6:40 PM Matthias van de Meent > > > <boekewurm+postgres@gmail.com> wrote: > > > > > > > > On Sun, 27 Feb 2022 at 16:14, Bharath Rupireddy > > > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > 3) Why do we need this extra calculation for start_lsn? > > > > > Do you ever see a negative LSN or something here? > > > > > + ('0/0'::pg_lsn + ( > > > > > + CASE > > > > > + WHEN (s.param3 < 0) THEN pow((2)::numeric, (64)::numeric) > > > > > + ELSE (0)::numeric > > > > > + END + (s.param3)::numeric)) AS start_lsn, > > > > > > > > Yes: LSN can take up all of an uint64; whereas the pgstat column is a > > > > bigint type; thus the signed int64. This cast is OK as it wraps > > > > around, but that means we have to take care to correctly display the > > > > LSN when it is > 0x7FFF_FFFF_FFFF_FFFF; which is what we do here using > > > > the special-casing for negative values. > > > > > > > > As to whether it is reasonable: Generating 16GB of wal every second > > > > (2^34 bytes /sec) is probably not impossible (cpu <> memory bandwidth > > > > has been > 20GB/sec for a while); and that leaves you 2^29 seconds of > > > > database runtime; or about 17 years. Seeing that a cluster can be > > > > `pg_upgrade`d (which doesn't reset cluster LSN) since PG 9.0 from at > > > > least version PG 8.4.0 (2009) (and through pg_migrator, from 8.3.0)), > > > > we can assume that clusters hitting LSN=2^63 will be a reasonable > > > > possibility within the next few years. As the lifespan of a PG release > > > > is about 5 years, it doesn't seem impossible that there will be actual > > > > clusters that are going to hit this naturally in the lifespan of PG15. > > > > > > > > It is also possible that someone fat-fingers pg_resetwal; and creates > > > > a cluster with LSN >= 2^63; resulting in negative values in the > > > > s.param3 field. Not likely, but we can force such situations; and as > > > > such we should handle that gracefully. > > > > > > > > > 4) Can't you use timestamptz_in(to_char(s.param4)) instead of > > > > > pg_stat_get_progress_checkpoint_start_time? I don't quite understand > > > > > the reasoning for having this function and it's named as *checkpoint* > > > > > when it doesn't do anything specific to the checkpoint at all? > > > > > > > > I hadn't thought of using the types' inout functions, but it looks > > > > like timestamp IO functions use a formatted timestring, which won't > > > > work with the epoch-based timestamp stored in the view. > > > > > > > > > Having 3 unnecessary functions that aren't useful to the users at all > > > > > in proc.dat will simply eatup the function oids IMO. Hence, I suggest > > > > > let's try to do without extra functions. > > > > > > > > I agree that (1) could be simplified, or at least fully expressed in > > > > SQL without exposing too many internals. If we're fine with exposing > > > > internals like flags and type layouts, then (2), and arguably (4), can > > > > be expressed in SQL as well. > > > > > > > > -Matthias
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> > 11) Can you be specific what are those "some operations" that forced a > > checkpoint? May be like, basebackup, createdb or something? > > + The checkpoint is started because some operation forced a checkpoint. > > > I will take care in the next patch. I feel mentioning/listing the specific operation makes it difficult to maintain the document. If we add any new functionality in future which needs a force checkpoint, then there is a high chance that we will miss to update here. Hence modified it to "The checkpoint is started because some operation (for which the checkpoint is necessary) is forced the checkpoint". Fixed other comments as per the discussion above. Please find the v5 patch attached and share your thoughts. Thanks & Regards, Nitin Jadhav On Mon, Mar 7, 2022 at 7:45 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > > 1) Can we convert below into pgstat_progress_update_multi_param, just > > to avoid function calls? > > pgstat_progress_update_param(PROGRESS_CHECKPOINT_LSN, checkPoint.redo); > > pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE, > > > > 2) Why are we not having special phase for CheckPointReplicationOrigin > > as it does good bunch of work (writing to disk, XLogFlush, > > durable_rename) especially when max_replication_slots is large? > > > > 3) I don't think "requested" is necessary here as it doesn't add any > > value or it's not a checkpoint kind or such, you can remove it. > > > > 4) s:/'recycling old XLOG files'/'recycling old WAL files' > > + WHEN 16 THEN 'recycling old XLOG files' > > > > 5) Can we place CREATE VIEW pg_stat_progress_checkpoint AS definition > > next to pg_stat_progress_copy in system_view.sql? It looks like all > > the progress reporting views are next to each other. > > I will take care in the next patch. > --- > > > 6) How about shutdown and end-of-recovery checkpoint? Are you planning > > to have an ereport_startup_progress mechanism as 0002? > > I thought of including it earlier then I felt lets first make the > current patch stable. Once all the fields are properly decided and the > patch gets in then we can easily extend the functionality to shutdown > and end-of-recovery cases. I have also observed that the timer > functionality wont work properly in case of shutdown as we are doing > an immediate checkpoint. So this needs a lot of discussion and I would > like to handle this on a separate thread. > --- > > > 7) I think you don't need to call checkpoint_progress_start and > > pgstat_progress_update_param, any other progress reporting function > > for shutdown and end-of-recovery checkpoint right? > > I had included the guards earlier and then removed later based on the > discussion upthread. > --- > > > 8) Not for all kinds of checkpoints right? pg_stat_progress_checkpoint > > can't show progress report for shutdown and end-of-recovery > > checkpoint, I think you need to specify that here in wal.sgml and > > checkpoint.sgml. > > + command <command>CHECKPOINT</command>. The checkpointer process running the > > + checkpoint will report its progress in the > > + <structname>pg_stat_progress_checkpoint</structname> view. See > > + <xref linkend="checkpoint-progress-reporting"/> for details. > > > > 9) Can you add a test case for pg_stat_progress_checkpoint view? I > > think it's good to add one. See, below for reference: > > -- Add a trigger to catch and print the contents of the catalog view > > -- pg_stat_progress_copy during data insertion. This allows to test > > -- the validation of some progress reports for COPY FROM where the trigger > > -- would fire. > > create function notice_after_tab_progress_reporting() returns trigger AS > > $$ > > declare report record; > > > > 10) Typo: it's not "is happens" > > + The checkpoint is happens without delays. > > > > 11) Can you be specific what are those "some operations" that forced a > > checkpoint? May be like, basebackup, createdb or something? > > + The checkpoint is started because some operation forced a checkpoint. > > > > 12) Can you be a bit elobartive here who waits? Something like the > > backend that requested checkpoint will wait until it's completion .... > > + Wait for completion before returning. > > > > 13) "removing unneeded or flushing needed logical rewrite mapping files" > > + The checkpointer process is currently removing/flushing the logical > > > > 14) "old WAL files" > > + The checkpointer process is currently recycling old XLOG files. > > I will take care in the next patch. > > Thanks & Regards, > Nitin Jadhav > > On Wed, Mar 2, 2022 at 11:52 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > On Wed, Mar 2, 2022 at 4:45 PM Nitin Jadhav > > <nitinjadhavpostgres@gmail.com> wrote: > > > > Also, how about special phases for SyncPostCheckpoint(), > > > > SyncPreCheckpoint(), InvalidateObsoleteReplicationSlots(), > > > > PreallocXlogFiles() (it currently pre-allocates only 1 WAL file, but > > > > it might be increase in future (?)), TruncateSUBTRANS()? > > > > > > SyncPreCheckpoint() is just incrementing a counter and > > > PreallocXlogFiles() currently pre-allocates only 1 WAL file. I feel > > > there is no need to add any phases for these as of now. We can add in > > > the future if necessary. Added phases for SyncPostCheckpoint(), > > > InvalidateObsoleteReplicationSlots() and TruncateSUBTRANS(). > > > > Okay. > > > > > > 10) I'm not sure if it's discussed, how about adding the number of > > > > snapshot/mapping files so far the checkpoint has processed in file > > > > processing while loops of > > > > CheckPointSnapBuild/CheckPointLogicalRewriteHeap? Sometimes, there can > > > > be many logical snapshot or mapping files and users may be interested > > > > in knowing the so-far-processed-file-count. > > > > > > I had thought about this while sharing the v1 patch and mentioned my > > > views upthread. I feel it won't give meaningful progress information > > > (It can be treated as statistics). Hence not included. Thoughts? > > > > Okay. If there are any complaints about it we can always add them later. > > > > > > > > As mentioned upthread, there can be multiple backends that request a > > > > > > checkpoint, so unless we want to store an array of pid we should store a number > > > > > > of backend that are waiting for a new checkpoint. > > > > > > > > > > Yeah, you are right. Let's not go that path and store an array of > > > > > pids. I don't see a strong use-case with the pid of the process > > > > > requesting checkpoint. If required, we can add it later once the > > > > > pg_stat_progress_checkpoint view gets in. > > > > > > > > I don't think that's really necessary to give the pid list. > > > > > > > > If you requested a new checkpoint, it doesn't matter if it's only your backend > > > > that triggered it, another backend or a few other dozen, the result will be the > > > > same and you have the information that the request has been seen. We could > > > > store just a bool for that but having a number instead also gives a bit more > > > > information and may allow you to detect some broken logic on your client code > > > > if it keeps increasing. > > > > > > It's a good metric to show in the view but the information is not > > > readily available. Additional code is required to calculate the number > > > of requests. Is it worth doing that? I feel this can be added later if > > > required. > > > > Yes, we can always add it later if required. > > > > > Please find the v4 patch attached and share your thoughts. > > > > I reviewed v4 patch, here are my comments: > > > > 1) Can we convert below into pgstat_progress_update_multi_param, just > > to avoid function calls? > > pgstat_progress_update_param(PROGRESS_CHECKPOINT_LSN, checkPoint.redo); > > pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE, > > > > 2) Why are we not having special phase for CheckPointReplicationOrigin > > as it does good bunch of work (writing to disk, XLogFlush, > > durable_rename) especially when max_replication_slots is large? > > > > 3) I don't think "requested" is necessary here as it doesn't add any > > value or it's not a checkpoint kind or such, you can remove it. > > > > 4) s:/'recycling old XLOG files'/'recycling old WAL files' > > + WHEN 16 THEN 'recycling old XLOG files' > > > > 5) Can we place CREATE VIEW pg_stat_progress_checkpoint AS definition > > next to pg_stat_progress_copy in system_view.sql? It looks like all > > the progress reporting views are next to each other. > > > > 6) How about shutdown and end-of-recovery checkpoint? Are you planning > > to have an ereport_startup_progress mechanism as 0002? > > > > 7) I think you don't need to call checkpoint_progress_start and > > pgstat_progress_update_param, any other progress reporting function > > for shutdown and end-of-recovery checkpoint right? > > > > 8) Not for all kinds of checkpoints right? pg_stat_progress_checkpoint > > can't show progress report for shutdown and end-of-recovery > > checkpoint, I think you need to specify that here in wal.sgml and > > checkpoint.sgml. > > + command <command>CHECKPOINT</command>. The checkpointer process running the > > + checkpoint will report its progress in the > > + <structname>pg_stat_progress_checkpoint</structname> view. See > > + <xref linkend="checkpoint-progress-reporting"/> for details. > > > > 9) Can you add a test case for pg_stat_progress_checkpoint view? I > > think it's good to add one. See, below for reference: > > -- Add a trigger to catch and print the contents of the catalog view > > -- pg_stat_progress_copy during data insertion. This allows to test > > -- the validation of some progress reports for COPY FROM where the trigger > > -- would fire. > > create function notice_after_tab_progress_reporting() returns trigger AS > > $$ > > declare report record; > > > > 10) Typo: it's not "is happens" > > + The checkpoint is happens without delays. > > > > 11) Can you be specific what are those "some operations" that forced a > > checkpoint? May be like, basebackup, createdb or something? > > + The checkpoint is started because some operation forced a checkpoint. > > > > 12) Can you be a bit elobartive here who waits? Something like the > > backend that requested checkpoint will wait until it's completion .... > > + Wait for completion before returning. > > > > 13) "removing unneeded or flushing needed logical rewrite mapping files" > > + The checkpointer process is currently removing/flushing the logical > > > > 14) "old WAL files" > > + The checkpointer process is currently recycling old XLOG files. > > > > Regards, > > Bharath Rupireddy.
Attachment
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > > -[ RECORD 1 ]-----+------------------------------------- > > pid | 22043 > > type | checkpoint > > kind | immediate force wait requested time > > > > I think the output in the kind column can be displayed as {immediate, > > force, wait, requested, time}. By the way these are all checkpoint > > flags so it is better to display it as checkpoint flags instead of > > checkpoint kind as mentioned in one of my previous comments. > > I will update in the next patch. The current format matches with the server log message for the checkpoint start in LogCheckpointStart(). Just to be consistent, I have not changed the code. I have taken care of the rest of the comments in v5 patch for which there was clarity. Thanks & Regards, Nitin Jadhav On Mon, Mar 7, 2022 at 8:15 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > > + <row> > > + <entry role="catalog_table_entry"><para role="column_definition"> > > + <structfield>type</structfield> <type>text</type> > > + </para> > > + <para> > > + Type of checkpoint. See <xref linkend="checkpoint-types"/>. > > + </para></entry> > > + </row> > > + > > + <row> > > + <entry role="catalog_table_entry"><para role="column_definition"> > > + <structfield>kind</structfield> <type>text</type> > > + </para> > > + <para> > > + Kind of checkpoint. See <xref linkend="checkpoint-kinds"/>. > > + </para></entry> > > + </row> > > > > This looks a bit confusing. Two columns, one with the name "checkpoint > > types" and another "checkpoint kinds". You can probably rename > > checkpoint-kinds to checkpoint-flags and let the checkpoint-types be > > as-it-is. > > Makes sense. I will change in the next patch. > --- > > > + <entry><structname>pg_stat_progress_checkpoint</structname><indexterm><primary>pg_stat_progress_checkpoint</primary></indexterm></entry> > > + <entry>One row only, showing the progress of the checkpoint. > > > > Let's make this message consistent with the already existing message > > for pg_stat_wal_receiver. See description for pg_stat_wal_receiver > > view in "Dynamic Statistics Views" table. > > You want me to change "One row only" to "Only one row" ? If that is > the case then for other views in the "Collected Statistics Views" > table, it is referred as "One row only". Let me know if you are > pointing out something else. > --- > > > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > > -[ RECORD 1 ]-----+------------------------------------- > > pid | 22043 > > type | checkpoint > > kind | immediate force wait requested time > > > > I think the output in the kind column can be displayed as {immediate, > > force, wait, requested, time}. By the way these are all checkpoint > > flags so it is better to display it as checkpoint flags instead of > > checkpoint kind as mentioned in one of my previous comments. > > I will update in the next patch. > --- > > > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > > -[ RECORD 1 ]-----+------------------------------------- > > pid | 22043 > > type | checkpoint > > kind | immediate force wait requested time > > start_lsn | 0/14C60F8 > > start_time | 2022-03-03 18:59:56.018662+05:30 > > phase | performing two phase checkpoint > > > > This is the output I see when the checkpointer process has come out of > > the two phase checkpoint and is currently writing checkpoint xlog > > records and doing other stuff like updating control files etc. Is this > > okay? > > The idea behind choosing the phases is based on the functionality > which takes longer time to execute. Since after two phase checkpoint > till post checkpoint cleanup won't take much time to execute, I have > not added any additional phase for that. But I also agree that this > gives wrong information to the user. How about mentioning the phase > information at the end of each phase like "Initializing", > "Initialization done", ..., "two phase checkpoint done", "post > checkpoint cleanup done", .., "finalizing". Except for the first phase > ("initializing") and last phase ("finalizing"), all the other phases > describe the end of a certain operation. I feel this gives correct > information even though the phase name/description does not represent > the entire code block between two phases. For example if the current > phase is ''two phase checkpoint done". Then the user can infer that > the checkpointer has done till two phase checkpoint and it is doing > other stuff that are after that. Thoughts? > --- > > > The output of log_checkpoint shows the number of buffers written is 3 > > whereas the output of pg_stat_progress_checkpoint shows it as 0. See > > below: > > > > 2022-03-03 20:04:45.643 IST [22043] LOG: checkpoint complete: wrote 3 > > buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; > > write=24.652 s, sync=104.256 s, total=3889.625 s; sync files=2, > > longest=0.011 s, average=0.008 s; distance=0 kB, estimate=0 kB > > > > -- > > > > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > > -[ RECORD 1 ]-----+------------------------------------- > > pid | 22043 > > type | checkpoint > > kind | immediate force wait requested time > > start_lsn | 0/14C60F8 > > start_time | 2022-03-03 18:59:56.018662+05:30 > > phase | finalizing > > buffers_total | 0 > > buffers_processed | 0 > > buffers_written | 0 > > > > Any idea why this mismatch? > > Good catch. In BufferSync() we have 'num_to_scan' (buffers_total) > which indicates the total number of buffers to be processed. Based on > that, the 'buffers_processed' and 'buffers_written' counter gets > incremented. I meant these values may reach upto 'buffers_total'. The > current pg_stat_progress_view support above information. There is > another place when 'ckpt_bufs_written' gets incremented (In > SlruInternalWritePage()). This increment is above the 'buffers_total' > value and it is included in the server log message (checkpoint end) > and not included in the view. I am a bit confused here. If we include > this increment in the view then we cannot calculate the exact > 'buffers_total' beforehand. Can we increment the 'buffers_toal' also > when 'ckpt_bufs_written' gets incremented so that we can match the > behaviour with checkpoint end message? Please share your thoughts. > --- > > > I think we can add a couple of more information to this view - > > start_time for buffer write operation and start_time for buffer sync > > operation. These are two very time consuming tasks in a checkpoint and > > people would find it useful to know how much time is being taken by > > the checkpoint in I/O operation phase. thoughts? > > The detailed progress is getting shown for these 2 phases of the > checkpoint like 'buffers_processed', 'buffers_written' and > 'files_synced'. Hence I did not think about adding start time and If > it is really required, then I can add. > > Thanks & Regards, > Nitin Jadhav > > On Thu, Mar 3, 2022 at 8:30 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > > > Here are some of my review comments on the latest patch: > > > > + <row> > > + <entry role="catalog_table_entry"><para role="column_definition"> > > + <structfield>type</structfield> <type>text</type> > > + </para> > > + <para> > > + Type of checkpoint. See <xref linkend="checkpoint-types"/>. > > + </para></entry> > > + </row> > > + > > + <row> > > + <entry role="catalog_table_entry"><para role="column_definition"> > > + <structfield>kind</structfield> <type>text</type> > > + </para> > > + <para> > > + Kind of checkpoint. See <xref linkend="checkpoint-kinds"/>. > > + </para></entry> > > + </row> > > > > This looks a bit confusing. Two columns, one with the name "checkpoint > > types" and another "checkpoint kinds". You can probably rename > > checkpoint-kinds to checkpoint-flags and let the checkpoint-types be > > as-it-is. > > > > == > > > > + <entry><structname>pg_stat_progress_checkpoint</structname><indexterm><primary>pg_stat_progress_checkpoint</primary></indexterm></entry> > > + <entry>One row only, showing the progress of the checkpoint. > > > > Let's make this message consistent with the already existing message > > for pg_stat_wal_receiver. See description for pg_stat_wal_receiver > > view in "Dynamic Statistics Views" table. > > > > == > > > > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > > -[ RECORD 1 ]-----+------------------------------------- > > pid | 22043 > > type | checkpoint > > kind | immediate force wait requested time > > > > I think the output in the kind column can be displayed as {immediate, > > force, wait, requested, time}. By the way these are all checkpoint > > flags so it is better to display it as checkpoint flags instead of > > checkpoint kind as mentioned in one of my previous comments. > > > > == > > > > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > > -[ RECORD 1 ]-----+------------------------------------- > > pid | 22043 > > type | checkpoint > > kind | immediate force wait requested time > > start_lsn | 0/14C60F8 > > start_time | 2022-03-03 18:59:56.018662+05:30 > > phase | performing two phase checkpoint > > > > > > This is the output I see when the checkpointer process has come out of > > the two phase checkpoint and is currently writing checkpoint xlog > > records and doing other stuff like updating control files etc. Is this > > okay? > > > > == > > > > The output of log_checkpoint shows the number of buffers written is 3 > > whereas the output of pg_stat_progress_checkpoint shows it as 0. See > > below: > > > > 2022-03-03 20:04:45.643 IST [22043] LOG: checkpoint complete: wrote 3 > > buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; > > write=24.652 s, sync=104.256 s, total=3889.625 s; sync files=2, > > longest=0.011 s, average=0.008 s; distance=0 kB, estimate=0 kB > > > > -- > > > > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > > -[ RECORD 1 ]-----+------------------------------------- > > pid | 22043 > > type | checkpoint > > kind | immediate force wait requested time > > start_lsn | 0/14C60F8 > > start_time | 2022-03-03 18:59:56.018662+05:30 > > phase | finalizing > > buffers_total | 0 > > buffers_processed | 0 > > buffers_written | 0 > > > > Any idea why this mismatch? > > > > == > > > > I think we can add a couple of more information to this view - > > start_time for buffer write operation and start_time for buffer sync > > operation. These are two very time consuming tasks in a checkpoint and > > people would find it useful to know how much time is being taken by > > the checkpoint in I/O operation phase. thoughts? > > > > -- > > With Regards, > > Ashutosh Sharma. > > > > On Wed, Mar 2, 2022 at 4:45 PM Nitin Jadhav > > <nitinjadhavpostgres@gmail.com> wrote: > > > > > > Thanks for reviewing. > > > > > > > > > I suggested upthread to store the starting timeline instead. This way you can > > > > > > deduce whether it's a restartpoint or a checkpoint, but you can also deduce > > > > > > other information, like what was the starting WAL. > > > > > > > > > > I don't understand why we need the timeline here to just determine > > > > > whether it's a restartpoint or checkpoint. > > > > > > > > I'm not saying it's necessary, I'm saying that for the same space usage we can > > > > store something a bit more useful. If no one cares about having the starting > > > > timeline available for no extra cost then sure, let's just store the kind > > > > directly. > > > > > > Fixed. > > > > > > > 2) Can't we just have these checks inside CASE-WHEN-THEN-ELSE blocks > > > > directly instead of new function pg_stat_get_progress_checkpoint_kind? > > > > + snprintf(ckpt_kind, MAXPGPATH, "%s%s%s%s%s%s%s%s%s", > > > > + (flags == 0) ? "unknown" : "", > > > > + (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " : "", > > > > + (flags & CHECKPOINT_END_OF_RECOVERY) ? "end-of-recovery " : "", > > > > + (flags & CHECKPOINT_IMMEDIATE) ? "immediate " : "", > > > > + (flags & CHECKPOINT_FORCE) ? "force " : "", > > > > + (flags & CHECKPOINT_WAIT) ? "wait " : "", > > > > + (flags & CHECKPOINT_CAUSE_XLOG) ? "wal " : "", > > > > + (flags & CHECKPOINT_CAUSE_TIME) ? "time " : "", > > > > + (flags & CHECKPOINT_FLUSH_ALL) ? "flush-all" : ""); > > > > > > Fixed. > > > --- > > > > > > > 5) Do we need a special phase for this checkpoint operation? I'm not > > > > sure in which cases it will take a long time, but it looks like > > > > there's a wait loop here. > > > > vxids = GetVirtualXIDsDelayingChkpt(&nvxids); > > > > if (nvxids > 0) > > > > { > > > > do > > > > { > > > > pg_usleep(10000L); /* wait for 10 msec */ > > > > } while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids)); > > > > } > > > > > > Yes. It is better to add a separate phase here. > > > --- > > > > > > > Also, how about special phases for SyncPostCheckpoint(), > > > > SyncPreCheckpoint(), InvalidateObsoleteReplicationSlots(), > > > > PreallocXlogFiles() (it currently pre-allocates only 1 WAL file, but > > > > it might be increase in future (?)), TruncateSUBTRANS()? > > > > > > SyncPreCheckpoint() is just incrementing a counter and > > > PreallocXlogFiles() currently pre-allocates only 1 WAL file. I feel > > > there is no need to add any phases for these as of now. We can add in > > > the future if necessary. Added phases for SyncPostCheckpoint(), > > > InvalidateObsoleteReplicationSlots() and TruncateSUBTRANS(). > > > --- > > > > > > > 6) SLRU (Simple LRU) isn't a phase here, you can just say > > > > PROGRESS_CHECKPOINT_PHASE_PREDICATE_LOCK_PAGES. > > > > + > > > > + pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE, > > > > + PROGRESS_CHECKPOINT_PHASE_SLRU_PAGES); > > > > CheckPointPredicate(); > > > > > > > > And :s/checkpointing SLRU pages/checkpointing predicate lock pages > > > >+ WHEN 9 THEN 'checkpointing SLRU pages' > > > > > > Fixed. > > > --- > > > > > > > 7) :s/PROGRESS_CHECKPOINT_PHASE_FILE_SYNC/PROGRESS_CHECKPOINT_PHASE_PROCESS_FILE_SYNC_REQUESTS > > > > > > I feel PROGRESS_CHECKPOINT_PHASE_FILE_SYNC is a better option here as > > > it describes the purpose in less words. > > > > > > > And :s/WHEN 11 THEN 'performing sync requests'/WHEN 11 THEN > > > > 'processing file sync requests' > > > > > > Fixed. > > > --- > > > > > > > 8) :s/Finalizing/finalizing > > > > + WHEN 14 THEN 'Finalizing' > > > > > > Fixed. > > > --- > > > > > > > 9) :s/checkpointing snapshots/checkpointing logical replication snapshot files > > > > + WHEN 3 THEN 'checkpointing snapshots' > > > > :s/checkpointing logical rewrite mappings/checkpointing logical > > > > replication rewrite mapping files > > > > + WHEN 4 THEN 'checkpointing logical rewrite mappings' > > > > > > Fixed. > > > --- > > > > > > > 10) I'm not sure if it's discussed, how about adding the number of > > > > snapshot/mapping files so far the checkpoint has processed in file > > > > processing while loops of > > > > CheckPointSnapBuild/CheckPointLogicalRewriteHeap? Sometimes, there can > > > > be many logical snapshot or mapping files and users may be interested > > > > in knowing the so-far-processed-file-count. > > > > > > I had thought about this while sharing the v1 patch and mentioned my > > > views upthread. I feel it won't give meaningful progress information > > > (It can be treated as statistics). Hence not included. Thoughts? > > > > > > > > > As mentioned upthread, there can be multiple backends that request a > > > > > > checkpoint, so unless we want to store an array of pid we should store a number > > > > > > of backend that are waiting for a new checkpoint. > > > > > > > > > > Yeah, you are right. Let's not go that path and store an array of > > > > > pids. I don't see a strong use-case with the pid of the process > > > > > requesting checkpoint. If required, we can add it later once the > > > > > pg_stat_progress_checkpoint view gets in. > > > > > > > > I don't think that's really necessary to give the pid list. > > > > > > > > If you requested a new checkpoint, it doesn't matter if it's only your backend > > > > that triggered it, another backend or a few other dozen, the result will be the > > > > same and you have the information that the request has been seen. We could > > > > store just a bool for that but having a number instead also gives a bit more > > > > information and may allow you to detect some broken logic on your client code > > > > if it keeps increasing. > > > > > > It's a good metric to show in the view but the information is not > > > readily available. Additional code is required to calculate the number > > > of requests. Is it worth doing that? I feel this can be added later if > > > required. > > > > > > Please find the v4 patch attached and share your thoughts. > > > > > > Thanks & Regards, > > > Nitin Jadhav > > > > > > On Tue, Mar 1, 2022 at 2:27 PM Nitin Jadhav > > > <nitinjadhavpostgres@gmail.com> wrote: > > > > > > > > > > 3) Why do we need this extra calculation for start_lsn? > > > > > > Do you ever see a negative LSN or something here? > > > > > > + ('0/0'::pg_lsn + ( > > > > > > + CASE > > > > > > + WHEN (s.param3 < 0) THEN pow((2)::numeric, (64)::numeric) > > > > > > + ELSE (0)::numeric > > > > > > + END + (s.param3)::numeric)) AS start_lsn, > > > > > > > > > > Yes: LSN can take up all of an uint64; whereas the pgstat column is a > > > > > bigint type; thus the signed int64. This cast is OK as it wraps > > > > > around, but that means we have to take care to correctly display the > > > > > LSN when it is > 0x7FFF_FFFF_FFFF_FFFF; which is what we do here using > > > > > the special-casing for negative values. > > > > > > > > Yes. The extra calculation is required here as we are storing unit64 > > > > value in the variable of type int64. When we convert uint64 to int64 > > > > then the bit pattern is preserved (so no data is lost). The high-order > > > > bit becomes the sign bit and if the sign bit is set, both the sign and > > > > magnitude of the value changes. To safely get the actual uint64 value > > > > whatever was assigned, we need the above calculations. > > > > > > > > > > 4) Can't you use timestamptz_in(to_char(s.param4)) instead of > > > > > > pg_stat_get_progress_checkpoint_start_time? I don't quite understand > > > > > > the reasoning for having this function and it's named as *checkpoint* > > > > > > when it doesn't do anything specific to the checkpoint at all? > > > > > > > > > > I hadn't thought of using the types' inout functions, but it looks > > > > > like timestamp IO functions use a formatted timestring, which won't > > > > > work with the epoch-based timestamp stored in the view. > > > > > > > > There is a variation of to_timestamp() which takes UNIX epoch (float8) > > > > as an argument and converts it to timestamptz but we cannot directly > > > > call this function with S.param4. > > > > > > > > TimestampTz > > > > GetCurrentTimestamp(void) > > > > { > > > > TimestampTz result; > > > > struct timeval tp; > > > > > > > > gettimeofday(&tp, NULL); > > > > > > > > result = (TimestampTz) tp.tv_sec - > > > > ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY); > > > > result = (result * USECS_PER_SEC) + tp.tv_usec; > > > > > > > > return result; > > > > } > > > > > > > > S.param4 contains the output of the above function > > > > (GetCurrentTimestamp()) which returns Postgres epoch but the > > > > to_timestamp() expects UNIX epoch as input. So some calculation is > > > > required here. I feel the SQL 'to_timestamp(946684800 + > > > > (S.param4::float / 1000000)) AS start_time' works fine. The value > > > > '946684800' is equal to ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * > > > > SECS_PER_DAY). I am not sure whether it is good practice to use this > > > > way. Kindly share your thoughts. > > > > > > > > Thanks & Regards, > > > > Nitin Jadhav > > > > > > > > On Mon, Feb 28, 2022 at 6:40 PM Matthias van de Meent > > > > <boekewurm+postgres@gmail.com> wrote: > > > > > > > > > > On Sun, 27 Feb 2022 at 16:14, Bharath Rupireddy > > > > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > > 3) Why do we need this extra calculation for start_lsn? > > > > > > Do you ever see a negative LSN or something here? > > > > > > + ('0/0'::pg_lsn + ( > > > > > > + CASE > > > > > > + WHEN (s.param3 < 0) THEN pow((2)::numeric, (64)::numeric) > > > > > > + ELSE (0)::numeric > > > > > > + END + (s.param3)::numeric)) AS start_lsn, > > > > > > > > > > Yes: LSN can take up all of an uint64; whereas the pgstat column is a > > > > > bigint type; thus the signed int64. This cast is OK as it wraps > > > > > around, but that means we have to take care to correctly display the > > > > > LSN when it is > 0x7FFF_FFFF_FFFF_FFFF; which is what we do here using > > > > > the special-casing for negative values. > > > > > > > > > > As to whether it is reasonable: Generating 16GB of wal every second > > > > > (2^34 bytes /sec) is probably not impossible (cpu <> memory bandwidth > > > > > has been > 20GB/sec for a while); and that leaves you 2^29 seconds of > > > > > database runtime; or about 17 years. Seeing that a cluster can be > > > > > `pg_upgrade`d (which doesn't reset cluster LSN) since PG 9.0 from at > > > > > least version PG 8.4.0 (2009) (and through pg_migrator, from 8.3.0)), > > > > > we can assume that clusters hitting LSN=2^63 will be a reasonable > > > > > possibility within the next few years. As the lifespan of a PG release > > > > > is about 5 years, it doesn't seem impossible that there will be actual > > > > > clusters that are going to hit this naturally in the lifespan of PG15. > > > > > > > > > > It is also possible that someone fat-fingers pg_resetwal; and creates > > > > > a cluster with LSN >= 2^63; resulting in negative values in the > > > > > s.param3 field. Not likely, but we can force such situations; and as > > > > > such we should handle that gracefully. > > > > > > > > > > > 4) Can't you use timestamptz_in(to_char(s.param4)) instead of > > > > > > pg_stat_get_progress_checkpoint_start_time? I don't quite understand > > > > > > the reasoning for having this function and it's named as *checkpoint* > > > > > > when it doesn't do anything specific to the checkpoint at all? > > > > > > > > > > I hadn't thought of using the types' inout functions, but it looks > > > > > like timestamp IO functions use a formatted timestring, which won't > > > > > work with the epoch-based timestamp stored in the view. > > > > > > > > > > > Having 3 unnecessary functions that aren't useful to the users at all > > > > > > in proc.dat will simply eatup the function oids IMO. Hence, I suggest > > > > > > let's try to do without extra functions. > > > > > > > > > > I agree that (1) could be simplified, or at least fully expressed in > > > > > SQL without exposing too many internals. If we're fine with exposing > > > > > internals like flags and type layouts, then (2), and arguably (4), can > > > > > be expressed in SQL as well. > > > > > > > > > > -Matthias
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> > > > > As mentioned upthread, there can be multiple backends that request a > > > > > checkpoint, so unless we want to store an array of pid we should store a number > > > > > of backend that are waiting for a new checkpoint. > > > > It's a good metric to show in the view but the information is not > > readily available. Additional code is required to calculate the number > > of requests. Is it worth doing that? I feel this can be added later if > > required. > > Is it that hard or costly to do? Just sending a message to increment > the stat counter in RequestCheckpoint() would be enough. > > Also, unless I'm missing something it's still only showing the initial > checkpoint flags, so it's *not* showing what the checkpoint is really > doing, only what the checkpoint may be doing if nothing else happens. > It just feels wrong. You could even use that ckpt_flags info to know > that at least one backend has requested a new checkpoint, if you don't > want to have a number of backends. I just wanted to avoid extra calculations just to show the progress in the view. Since it's a good metric, I have added an additional field named 'next_flags' to the view which holds all possible flag values of the next checkpoint. This gives more information than just saying whether the new checkpoint is requested or not with the same memory. I am updating the progress of 'next_flags' in ImmediateCheckpointRequested() which gets called during buffer write phase. I gave a thought to update the progress in other places also but I feel updating in ImmediateCheckpointRequested() is enough as the current checkpoint behaviour gets affected by only CHECKPOINT_IMMEDIATE flag and all other checkpoint requests done in case of createdb(), dropdb(), etc gets called with CHECKPOINT_IMMEDIATE flag. I have updated this in the v5 patch. Please share your thoughts. Thanks & Regards, Nitin Jadhav On Thu, Mar 3, 2022 at 11:58 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Wed, Mar 2, 2022 at 7:15 PM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > > > > > > As mentioned upthread, there can be multiple backends that request a > > > > > checkpoint, so unless we want to store an array of pid we should store a number > > > > > of backend that are waiting for a new checkpoint. > > > > It's a good metric to show in the view but the information is not > > readily available. Additional code is required to calculate the number > > of requests. Is it worth doing that? I feel this can be added later if > > required. > > Is it that hard or costly to do? Just sending a message to increment > the stat counter in RequestCheckpoint() would be enough. > > Also, unless I'm missing something it's still only showing the initial > checkpoint flags, so it's *not* showing what the checkpoint is really > doing, only what the checkpoint may be doing if nothing else happens. > It just feels wrong. You could even use that ckpt_flags info to know > that at least one backend has requested a new checkpoint, if you don't > want to have a number of backends.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Ashutosh Sharma
Date:
On Tue, Mar 8, 2022 at 8:31 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > > > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > > > -[ RECORD 1 ]-----+------------------------------------- > > > pid | 22043 > > > type | checkpoint > > > kind | immediate force wait requested time > > > > > > I think the output in the kind column can be displayed as {immediate, > > > force, wait, requested, time}. By the way these are all checkpoint > > > flags so it is better to display it as checkpoint flags instead of > > > checkpoint kind as mentioned in one of my previous comments. > > > > I will update in the next patch. > > The current format matches with the server log message for the > checkpoint start in LogCheckpointStart(). Just to be consistent, I > have not changed the code. > See below, how flags are shown in other sql functions like: ashu@postgres=# select * from heap_tuple_infomask_flags(2304, 1); raw_flags | combined_flags -----------------------------------------+---------------- {HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID} | {} (1 row) This looks more readable and it's easy to understand for the end-users.. Further comparing the way log messages are displayed with the way sql functions display its output doesn't look like a right comparison to me. Obviously both should show matching data but the way it is shown doesn't need to be the same. In fact it is not in most of the cases. > I have taken care of the rest of the comments in v5 patch for which > there was clarity. > Thank you very much. Will take a look at it later. -- With Regards, Ashutosh Sharma.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Julien Rouhaud
Date:
On Tue, Mar 08, 2022 at 08:57:23PM +0530, Nitin Jadhav wrote: > > I just wanted to avoid extra calculations just to show the progress in > the view. Since it's a good metric, I have added an additional field > named 'next_flags' to the view which holds all possible flag values of > the next checkpoint. I still don't think that's ok. IIUC the only way to know if the current checkpoint is throttled or not is to be aware that the "next_flags" can apply to the current checkpoint too, look for it and see if that changes the semantics of what the view say the current checkpoint is. Most users will get it wrong. > This gives more information than just saying > whether the new checkpoint is requested or not with the same memory. So that next_flags will be empty most of the time? It seems confusing. Again I would just display a bool flag saying whether a new checkpoint has been explicitly requested or not, it seems enough. If you're interested in that next checkpoint, you probably want a quick completion of the current checkpoint first (and thus need to know if it's throttled or not). And then you will have to keep monitoring that view for the next checkpoint anyway, and at that point the view will show the relevant information.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> > The current format matches with the server log message for the > > checkpoint start in LogCheckpointStart(). Just to be consistent, I > > have not changed the code. > > > > See below, how flags are shown in other sql functions like: > > ashu@postgres=# select * from heap_tuple_infomask_flags(2304, 1); > raw_flags | combined_flags > -----------------------------------------+---------------- > {HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID} | {} > (1 row) > > This looks more readable and it's easy to understand for the > end-users.. Further comparing the way log messages are displayed with > the way sql functions display its output doesn't look like a right > comparison to me. Obviously both should show matching data but the way > it is shown doesn't need to be the same. In fact it is not in most of > the cases. ok. I will take care in the next patch. I would like to handle this at the SQL level in system_views.sql. The following can be used to display in the format described above. ( '{' || CASE WHEN (S.param2 & 4) > 0 THEN 'immediate' ELSE '' END || CASE WHEN (S.param2 & 4) > 0 AND (S.param2 & -8) > 0 THEN ', ' ELSE '' END || CASE WHEN (S.param2 & 8) > 0 THEN 'force' ELSE '' END || CASE WHEN (S.param2 & 8) > 0 AND (S.param2 & -16) > 0 THEN ', ' ELSE '' END || CASE WHEN (S.param2 & 16) > 0 THEN 'flush-all' ELSE '' END || CASE WHEN (S.param2 & 16) > 0 AND (S.param2 & -32) > 0 THEN ', ' ELSE '' END || CASE WHEN (S.param2 & 32) > 0 THEN 'wait' ELSE '' END || CASE WHEN (S.param2 & 32) > 0 AND (S.param2 & -128) > 0 THEN ', ' ELSE '' END || CASE WHEN (S.param2 & 128) > 0 THEN 'wal' ELSE '' END || CASE WHEN (S.param2 & 128) > 0 AND (S.param2 & -256) > 0 THEN ', ' ELSE '' END || CASE WHEN (S.param2 & 256) > 0 THEN 'time' ELSE '' END || '}' Basically, a separate CASE statement is used to decide whether a comma has to be printed or not, which is done by checking whether the previous flag bit is enabled (so that the appropriate flag has to be displayed) and if any next bits are enabled (So there are some more flags to be displayed). Kindly let me know if you know any other better approach. Thanks & Regards, Nitin Jadhav On Wed, Mar 9, 2022 at 7:07 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > On Tue, Mar 8, 2022 at 8:31 PM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > > > > > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > > > > -[ RECORD 1 ]-----+------------------------------------- > > > > pid | 22043 > > > > type | checkpoint > > > > kind | immediate force wait requested time > > > > > > > > I think the output in the kind column can be displayed as {immediate, > > > > force, wait, requested, time}. By the way these are all checkpoint > > > > flags so it is better to display it as checkpoint flags instead of > > > > checkpoint kind as mentioned in one of my previous comments. > > > > > > I will update in the next patch. > > > > The current format matches with the server log message for the > > checkpoint start in LogCheckpointStart(). Just to be consistent, I > > have not changed the code. > > > > See below, how flags are shown in other sql functions like: > > ashu@postgres=# select * from heap_tuple_infomask_flags(2304, 1); > raw_flags | combined_flags > -----------------------------------------+---------------- > {HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID} | {} > (1 row) > > This looks more readable and it's easy to understand for the > end-users.. Further comparing the way log messages are displayed with > the way sql functions display its output doesn't look like a right > comparison to me. Obviously both should show matching data but the way > it is shown doesn't need to be the same. In fact it is not in most of > the cases. > > > I have taken care of the rest of the comments in v5 patch for which > > there was clarity. > > > > Thank you very much. Will take a look at it later. > > -- > With Regards, > Ashutosh Sharma.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> > I just wanted to avoid extra calculations just to show the progress in > > the view. Since it's a good metric, I have added an additional field > > named 'next_flags' to the view which holds all possible flag values of > > the next checkpoint. > > I still don't think that's ok. IIUC the only way to know if the current > checkpoint is throttled or not is to be aware that the "next_flags" can apply > to the current checkpoint too, look for it and see if that changes the > semantics of what the view say the current checkpoint is. Most users will get > it wrong. > > Again I would just display a bool flag saying whether a new checkpoint has been > explicitly requested or not, it seems enough. Ok. I agree that it is difficult to interpret it correctly. So even if say that a new checkpoint has been explicitly requested, the user may not understand that it affects current checkpoint behaviour unless the user knows the internals of the checkpoint. How about naming the field to 'throttled' (Yes/ No) since our objective is to show that the current checkpoint is throttled or not. Thanks & Regards, Nitin Jadhav On Wed, Mar 9, 2022 at 7:48 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Tue, Mar 08, 2022 at 08:57:23PM +0530, Nitin Jadhav wrote: > > > > I just wanted to avoid extra calculations just to show the progress in > > the view. Since it's a good metric, I have added an additional field > > named 'next_flags' to the view which holds all possible flag values of > > the next checkpoint. > > I still don't think that's ok. IIUC the only way to know if the current > checkpoint is throttled or not is to be aware that the "next_flags" can apply > to the current checkpoint too, look for it and see if that changes the > semantics of what the view say the current checkpoint is. Most users will get > it wrong. > > > This gives more information than just saying > > whether the new checkpoint is requested or not with the same memory. > > So that next_flags will be empty most of the time? It seems confusing. > > Again I would just display a bool flag saying whether a new checkpoint has been > explicitly requested or not, it seems enough. > > If you're interested in that next checkpoint, you probably want a quick > completion of the current checkpoint first (and thus need to know if it's > throttled or not). And then you will have to keep monitoring that view for the > next checkpoint anyway, and at that point the view will show the relevant > information.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Julien Rouhaud
Date:
On Fri, Mar 11, 2022 at 02:41:23PM +0530, Nitin Jadhav wrote: > > Ok. I agree that it is difficult to interpret it correctly. So even if > say that a new checkpoint has been explicitly requested, the user may > not understand that it affects current checkpoint behaviour unless the > user knows the internals of the checkpoint. How about naming the field > to 'throttled' (Yes/ No) since our objective is to show that the > current checkpoint is throttled or not. -1 That "throttled" flag should be the same as having or not a "force" in the flags. We should be consistent and report information the same way, so either a lot of flags (is_throttled, is_force...) or as now a single field containing the set flags, so the current approach seems better. Also, it wouldn't be much better to show the checkpoint as not having the force flags and still not being throttled. Why not just reporting (ckpt_flags & (CHECKPOINT_REQUESTED | CHECKPOINT_IMMEDIATE)) in the path(s) that can update the new flags for the view? CHECKPOINT_REQUESTED will always be set by RequestCheckpoint(), and can be used to detect that someone wants a new checkpoint afterwards, whatever it's and whether or not the current checkpoint to be finished quickly. For this flag I think it's better to not report it in the view flags but with a new field, as discussed before, as it's really what it means. CHECKPOINT_IMMEDIATE is the only new flag that can be used in an already in progress checkpoint, so it can be simply added to the view flags.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> > > > Ok. I agree that it is difficult to interpret it correctly. So even if > > say that a new checkpoint has been explicitly requested, the user may > > not understand that it affects current checkpoint behaviour unless the > > user knows the internals of the checkpoint. How about naming the field > > to 'throttled' (Yes/ No) since our objective is to show that the > > current checkpoint is throttled or not. > > -1 > > That "throttled" flag should be the same as having or not a "force" in the > flags. We should be consistent and report information the same way, so either > a lot of flags (is_throttled, is_force...) or as now a single field containing > the set flags, so the current approach seems better. Also, it wouldn't be much > better to show the checkpoint as not having the force flags and still not being > throttled. I think your understanding is wrong here. The flag which affects throttling behaviour is CHECKPOINT_IMMEDIATE. I am not suggesting removing the existing 'flags' field of pg_stat_progress_checkpoint view and adding a new field 'throttled'. The content of the 'flags' field remains the same. I was suggesting replacing the 'next_flags' field with 'throttled' field since the new request with CHECKPOINT_IMMEDIATE flag enabled will affect the current checkpoint. > CHECKPOINT_REQUESTED will always be set by RequestCheckpoint(), and can be used > to detect that someone wants a new checkpoint afterwards, whatever it's and > whether or not the current checkpoint to be finished quickly. For this flag I > think it's better to not report it in the view flags but with a new field, as > discussed before, as it's really what it means. I understand your suggestion of adding a new field to indicate whether any of the new requests have been made or not. You just want this field to represent only a new request or does it also represent the current checkpoint to finish quickly. > CHECKPOINT_IMMEDIATE is the only new flag that can be used in an already in > progress checkpoint, so it can be simply added to the view flags. As discussed upthread this is not advisable to do so. The content of 'flags' remains the same through the checkpoint. We cannot add a new checkpoint's flag (CHECKPOINT_IMMEDIATE ) to the current one even though it affects current checkpoint behaviour. Only thing we can do is to add a new field to show that the current checkpoint is affected with new requests. > Why not just reporting (ckpt_flags & (CHECKPOINT_REQUESTED | > CHECKPOINT_IMMEDIATE)) in the path(s) that can update the new flags for the > view? Where do you want to add this in the path? I feel the new field name is confusing here. 'next_flags' - It shows all the flag values of the next checkpoint. Based on this user can get to know that the new request has been made and also if CHECKPOINT_IMMEDIATE is enabled here, then it indicates that the current checkpoint also gets affected. You are not ok to use this name as it confuses the user. 'throttled' - The value will be set to Yes/No based on the CHECKPOINT_IMMEDIATE bit set in the new checkpoint request's flag. This says that the current checkpoint is affected and also I thought this is an indication that new requests have been made. But there is a confusion here too. If the current checkpoint starts with CHECKPOINT_IMMEDIATE which is described by the 'flags' field and there is no new request, then the value of this field is 'Yes' (Not throttling) which again confuses the user. 'new request' - The value will be set to Yes/No if any new checkpoint requests. This just indicates whether new requests have been made or not. It can not be used to infer other information. Thought? Thanks & Regards, Nitin Jadhav On Fri, Mar 11, 2022 at 3:34 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Fri, Mar 11, 2022 at 02:41:23PM +0530, Nitin Jadhav wrote: > > > > Ok. I agree that it is difficult to interpret it correctly. So even if > > say that a new checkpoint has been explicitly requested, the user may > > not understand that it affects current checkpoint behaviour unless the > > user knows the internals of the checkpoint. How about naming the field > > to 'throttled' (Yes/ No) since our objective is to show that the > > current checkpoint is throttled or not. > > -1 > > That "throttled" flag should be the same as having or not a "force" in the > flags. We should be consistent and report information the same way, so either > a lot of flags (is_throttled, is_force...) or as now a single field containing > the set flags, so the current approach seems better. Also, it wouldn't be much > better to show the checkpoint as not having the force flags and still not being > throttled. > > Why not just reporting (ckpt_flags & (CHECKPOINT_REQUESTED | > CHECKPOINT_IMMEDIATE)) in the path(s) that can update the new flags for the > view? > > CHECKPOINT_REQUESTED will always be set by RequestCheckpoint(), and can be used > to detect that someone wants a new checkpoint afterwards, whatever it's and > whether or not the current checkpoint to be finished quickly. For this flag I > think it's better to not report it in the view flags but with a new field, as > discussed before, as it's really what it means. > > CHECKPOINT_IMMEDIATE is the only new flag that can be used in an already in > progress checkpoint, so it can be simply added to the view flags.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Julien Rouhaud
Date:
On Fri, Mar 11, 2022 at 04:59:11PM +0530, Nitin Jadhav wrote: > > That "throttled" flag should be the same as having or not a "force" in the > > flags. We should be consistent and report information the same way, so either > > a lot of flags (is_throttled, is_force...) or as now a single field containing > > the set flags, so the current approach seems better. Also, it wouldn't be much > > better to show the checkpoint as not having the force flags and still not being > > throttled. > > I think your understanding is wrong here. The flag which affects > throttling behaviour is CHECKPOINT_IMMEDIATE. Yes sorry, that's what I meant and later used in the flags. > I am not suggesting > removing the existing 'flags' field of pg_stat_progress_checkpoint > view and adding a new field 'throttled'. The content of the 'flags' > field remains the same. I was suggesting replacing the 'next_flags' > field with 'throttled' field since the new request with > CHECKPOINT_IMMEDIATE flag enabled will affect the current checkpoint. Are you saying that this new throttled flag will only be set by the overloaded flags in ckpt_flags? So you can have a checkpoint with a CHECKPOINT_IMMEDIATE flags that's throttled, and a checkpoint without the CHECKPOINT_IMMEDIATE flag that's not throttled? > > CHECKPOINT_REQUESTED will always be set by RequestCheckpoint(), and can be used > > to detect that someone wants a new checkpoint afterwards, whatever it's and > > whether or not the current checkpoint to be finished quickly. For this flag I > > think it's better to not report it in the view flags but with a new field, as > > discussed before, as it's really what it means. > > I understand your suggestion of adding a new field to indicate whether > any of the new requests have been made or not. You just want this > field to represent only a new request or does it also represent the > current checkpoint to finish quickly. Only represent what it means: a new checkpoint is requested. An additional CHECKPOINT_IMMEDIATE flag is orthogonal to this flag and this information. > > CHECKPOINT_IMMEDIATE is the only new flag that can be used in an already in > > progress checkpoint, so it can be simply added to the view flags. > > As discussed upthread this is not advisable to do so. The content of > 'flags' remains the same through the checkpoint. We cannot add a new > checkpoint's flag (CHECKPOINT_IMMEDIATE ) to the current one even > though it affects current checkpoint behaviour. Only thing we can do > is to add a new field to show that the current checkpoint is affected > with new requests. I don't get it. The checkpoint flags and the view flags (set by pgstat_progrss_update*) are different, so why can't we add this flag to the view flags? The fact that checkpointer.c doesn't update the passed flag and instead look in the shmem to see if CHECKPOINT_IMMEDIATE has been set since is an implementation detail, and the view shouldn't focus on which flags were initially passed to the checkpointer but instead which flags the checkpointer is actually enforcing, as that's what the user should be interested in. If you want to store it in another field internally but display it in the view with the rest of the flags, I'm fine with it. > > Why not just reporting (ckpt_flags & (CHECKPOINT_REQUESTED | > > CHECKPOINT_IMMEDIATE)) in the path(s) that can update the new flags for the > > view? > > Where do you want to add this in the path? Same as in your current patch I guess.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> > I am not suggesting > > removing the existing 'flags' field of pg_stat_progress_checkpoint > > view and adding a new field 'throttled'. The content of the 'flags' > > field remains the same. I was suggesting replacing the 'next_flags' > > field with 'throttled' field since the new request with > > CHECKPOINT_IMMEDIATE flag enabled will affect the current checkpoint. > > Are you saying that this new throttled flag will only be set by the overloaded > flags in ckpt_flags? Yes. you are right. > So you can have a checkpoint with a CHECKPOINT_IMMEDIATE > flags that's throttled, and a checkpoint without the CHECKPOINT_IMMEDIATE flag > that's not throttled? I think it's the reverse. A checkpoint with a CHECKPOINT_IMMEDIATE flags that's not throttled (disables delays between writes) and a checkpoint without the CHECKPOINT_IMMEDIATE flag that's throttled (enables delays between writes) > > > CHECKPOINT_REQUESTED will always be set by RequestCheckpoint(), and can be used > > > to detect that someone wants a new checkpoint afterwards, whatever it's and > > > whether or not the current checkpoint to be finished quickly. For this flag I > > > think it's better to not report it in the view flags but with a new field, as > > > discussed before, as it's really what it means. > > > > I understand your suggestion of adding a new field to indicate whether > > any of the new requests have been made or not. You just want this > > field to represent only a new request or does it also represent the > > current checkpoint to finish quickly. > > Only represent what it means: a new checkpoint is requested. An additional > CHECKPOINT_IMMEDIATE flag is orthogonal to this flag and this information. Thanks for the confirmation. > > > CHECKPOINT_IMMEDIATE is the only new flag that can be used in an already in > > > progress checkpoint, so it can be simply added to the view flags. > > > > As discussed upthread this is not advisable to do so. The content of > > 'flags' remains the same through the checkpoint. We cannot add a new > > checkpoint's flag (CHECKPOINT_IMMEDIATE ) to the current one even > > though it affects current checkpoint behaviour. Only thing we can do > > is to add a new field to show that the current checkpoint is affected > > with new requests. > > I don't get it. The checkpoint flags and the view flags (set by > pgstat_progrss_update*) are different, so why can't we add this flag to the > view flags? The fact that checkpointer.c doesn't update the passed flag and > instead look in the shmem to see if CHECKPOINT_IMMEDIATE has been set since is > an implementation detail, and the view shouldn't focus on which flags were > initially passed to the checkpointer but instead which flags the checkpointer > is actually enforcing, as that's what the user should be interested in. If you > want to store it in another field internally but display it in the view with > the rest of the flags, I'm fine with it. Just to be in sync with the way code behaves, it is better not to update the next checkpoint request's CHECKPOINT_IMMEDIATE with the current checkpoint 'flags' field. Because the current checkpoint starts with a different set of flags and when there is a new request (with CHECKPOINT_IMMEDIATE), it just processes the pending operations quickly to take up next requests. If we update this information in the 'flags' field of the view, it says that the current checkpoint is started with CHECKPOINT_IMMEDIATE which is not true. Hence I had thought of adding a new field ('next flags' or 'upcoming flags') which contain all the flag values of new checkpoint requests. This field indicates whether the current checkpoint is throttled or not and also it indicates there are new requests. Please share your thoughts. More thoughts are welcomed. Thanks & Regards, Nitin Jadhav On Fri, Mar 11, 2022 at 5:43 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Fri, Mar 11, 2022 at 04:59:11PM +0530, Nitin Jadhav wrote: > > > That "throttled" flag should be the same as having or not a "force" in the > > > flags. We should be consistent and report information the same way, so either > > > a lot of flags (is_throttled, is_force...) or as now a single field containing > > > the set flags, so the current approach seems better. Also, it wouldn't be much > > > better to show the checkpoint as not having the force flags and still not being > > > throttled. > > > > I think your understanding is wrong here. The flag which affects > > throttling behaviour is CHECKPOINT_IMMEDIATE. > > Yes sorry, that's what I meant and later used in the flags. > > > I am not suggesting > > removing the existing 'flags' field of pg_stat_progress_checkpoint > > view and adding a new field 'throttled'. The content of the 'flags' > > field remains the same. I was suggesting replacing the 'next_flags' > > field with 'throttled' field since the new request with > > CHECKPOINT_IMMEDIATE flag enabled will affect the current checkpoint. > > Are you saying that this new throttled flag will only be set by the overloaded > flags in ckpt_flags? So you can have a checkpoint with a CHECKPOINT_IMMEDIATE > flags that's throttled, and a checkpoint without the CHECKPOINT_IMMEDIATE flag > that's not throttled? > > > > CHECKPOINT_REQUESTED will always be set by RequestCheckpoint(), and can be used > > > to detect that someone wants a new checkpoint afterwards, whatever it's and > > > whether or not the current checkpoint to be finished quickly. For this flag I > > > think it's better to not report it in the view flags but with a new field, as > > > discussed before, as it's really what it means. > > > > I understand your suggestion of adding a new field to indicate whether > > any of the new requests have been made or not. You just want this > > field to represent only a new request or does it also represent the > > current checkpoint to finish quickly. > > Only represent what it means: a new checkpoint is requested. An additional > CHECKPOINT_IMMEDIATE flag is orthogonal to this flag and this information. > > > > CHECKPOINT_IMMEDIATE is the only new flag that can be used in an already in > > > progress checkpoint, so it can be simply added to the view flags. > > > > As discussed upthread this is not advisable to do so. The content of > > 'flags' remains the same through the checkpoint. We cannot add a new > > checkpoint's flag (CHECKPOINT_IMMEDIATE ) to the current one even > > though it affects current checkpoint behaviour. Only thing we can do > > is to add a new field to show that the current checkpoint is affected > > with new requests. > > I don't get it. The checkpoint flags and the view flags (set by > pgstat_progrss_update*) are different, so why can't we add this flag to the > view flags? The fact that checkpointer.c doesn't update the passed flag and > instead look in the shmem to see if CHECKPOINT_IMMEDIATE has been set since is > an implementation detail, and the view shouldn't focus on which flags were > initially passed to the checkpointer but instead which flags the checkpointer > is actually enforcing, as that's what the user should be interested in. If you > want to store it in another field internally but display it in the view with > the rest of the flags, I'm fine with it. > > > > Why not just reporting (ckpt_flags & (CHECKPOINT_REQUESTED | > > > CHECKPOINT_IMMEDIATE)) in the path(s) that can update the new flags for the > > > view? > > > > Where do you want to add this in the path? > > Same as in your current patch I guess.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Julien Rouhaud
Date:
On Mon, Mar 14, 2022 at 03:16:50PM +0530, Nitin Jadhav wrote: > > > I am not suggesting > > > removing the existing 'flags' field of pg_stat_progress_checkpoint > > > view and adding a new field 'throttled'. The content of the 'flags' > > > field remains the same. I was suggesting replacing the 'next_flags' > > > field with 'throttled' field since the new request with > > > CHECKPOINT_IMMEDIATE flag enabled will affect the current checkpoint. > > > > Are you saying that this new throttled flag will only be set by the overloaded > > flags in ckpt_flags? > > Yes. you are right. > > > So you can have a checkpoint with a CHECKPOINT_IMMEDIATE > > flags that's throttled, and a checkpoint without the CHECKPOINT_IMMEDIATE flag > > that's not throttled? > > I think it's the reverse. A checkpoint with a CHECKPOINT_IMMEDIATE > flags that's not throttled (disables delays between writes) and a > checkpoint without the CHECKPOINT_IMMEDIATE flag that's throttled > (enables delays between writes) Yes that's how it's supposed to work, but my point was that your suggested 'throttled' flag could say the opposite, which is bad. > > I don't get it. The checkpoint flags and the view flags (set by > > pgstat_progrss_update*) are different, so why can't we add this flag to the > > view flags? The fact that checkpointer.c doesn't update the passed flag and > > instead look in the shmem to see if CHECKPOINT_IMMEDIATE has been set since is > > an implementation detail, and the view shouldn't focus on which flags were > > initially passed to the checkpointer but instead which flags the checkpointer > > is actually enforcing, as that's what the user should be interested in. If you > > want to store it in another field internally but display it in the view with > > the rest of the flags, I'm fine with it. > > Just to be in sync with the way code behaves, it is better not to > update the next checkpoint request's CHECKPOINT_IMMEDIATE with the > current checkpoint 'flags' field. Because the current checkpoint > starts with a different set of flags and when there is a new request > (with CHECKPOINT_IMMEDIATE), it just processes the pending operations > quickly to take up next requests. If we update this information in the > 'flags' field of the view, it says that the current checkpoint is > started with CHECKPOINT_IMMEDIATE which is not true. Which is why I suggested to only take into account CHECKPOINT_REQUESTED (to be able to display that a new checkpoint was requested) and CHECKPOINT_IMMEDIATE, to be able to display that the current checkpoint isn't throttled anymore if it were. I still don't understand why you want so much to display "how the checkpoint was initially started" rather than "how the checkpoint is really behaving right now". The whole point of having a progress view is to have something dynamic that reflects the current activity. > Hence I had > thought of adding a new field ('next flags' or 'upcoming flags') which > contain all the flag values of new checkpoint requests. This field > indicates whether the current checkpoint is throttled or not and also > it indicates there are new requests. I'm not opposed to having such a field, I'm opposed to having a view with "the current checkpoint is throttled but if there are some flags in the next checkpoint flags and those flags contain checkpoint immediate then the current checkpoint isn't actually throttled anymore" behavior.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> > > I don't get it. The checkpoint flags and the view flags (set by > > > pgstat_progrss_update*) are different, so why can't we add this flag to the > > > view flags? The fact that checkpointer.c doesn't update the passed flag and > > > instead look in the shmem to see if CHECKPOINT_IMMEDIATE has been set since is > > > an implementation detail, and the view shouldn't focus on which flags were > > > initially passed to the checkpointer but instead which flags the checkpointer > > > is actually enforcing, as that's what the user should be interested in. If you > > > want to store it in another field internally but display it in the view with > > > the rest of the flags, I'm fine with it. > > > > Just to be in sync with the way code behaves, it is better not to > > update the next checkpoint request's CHECKPOINT_IMMEDIATE with the > > current checkpoint 'flags' field. Because the current checkpoint > > starts with a different set of flags and when there is a new request > > (with CHECKPOINT_IMMEDIATE), it just processes the pending operations > > quickly to take up next requests. If we update this information in the > > 'flags' field of the view, it says that the current checkpoint is > > started with CHECKPOINT_IMMEDIATE which is not true. > > Which is why I suggested to only take into account CHECKPOINT_REQUESTED (to > be able to display that a new checkpoint was requested) I will take care in the next patch. > > Hence I had > > thought of adding a new field ('next flags' or 'upcoming flags') which > > contain all the flag values of new checkpoint requests. This field > > indicates whether the current checkpoint is throttled or not and also > > it indicates there are new requests. > > I'm not opposed to having such a field, I'm opposed to having a view with "the > current checkpoint is throttled but if there are some flags in the next > checkpoint flags and those flags contain checkpoint immediate then the current > checkpoint isn't actually throttled anymore" behavior. I understand your point and I also agree that it becomes difficult for the user to understand the context. > and > CHECKPOINT_IMMEDIATE, to be able to display that the current checkpoint isn't > throttled anymore if it were. > > I still don't understand why you want so much to display "how the checkpoint > was initially started" rather than "how the checkpoint is really behaving right > now". The whole point of having a progress view is to have something dynamic > that reflects the current activity. As of now I will not consider adding this information to the view. If required and nobody opposes having this included in the 'flags' field of the view, then I will consider adding. Thanks & Regards, Nitin Jadhav On Mon, Mar 14, 2022 at 5:16 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Mon, Mar 14, 2022 at 03:16:50PM +0530, Nitin Jadhav wrote: > > > > I am not suggesting > > > > removing the existing 'flags' field of pg_stat_progress_checkpoint > > > > view and adding a new field 'throttled'. The content of the 'flags' > > > > field remains the same. I was suggesting replacing the 'next_flags' > > > > field with 'throttled' field since the new request with > > > > CHECKPOINT_IMMEDIATE flag enabled will affect the current checkpoint. > > > > > > Are you saying that this new throttled flag will only be set by the overloaded > > > flags in ckpt_flags? > > > > Yes. you are right. > > > > > So you can have a checkpoint with a CHECKPOINT_IMMEDIATE > > > flags that's throttled, and a checkpoint without the CHECKPOINT_IMMEDIATE flag > > > that's not throttled? > > > > I think it's the reverse. A checkpoint with a CHECKPOINT_IMMEDIATE > > flags that's not throttled (disables delays between writes) and a > > checkpoint without the CHECKPOINT_IMMEDIATE flag that's throttled > > (enables delays between writes) > > Yes that's how it's supposed to work, but my point was that your suggested > 'throttled' flag could say the opposite, which is bad. > > > > I don't get it. The checkpoint flags and the view flags (set by > > > pgstat_progrss_update*) are different, so why can't we add this flag to the > > > view flags? The fact that checkpointer.c doesn't update the passed flag and > > > instead look in the shmem to see if CHECKPOINT_IMMEDIATE has been set since is > > > an implementation detail, and the view shouldn't focus on which flags were > > > initially passed to the checkpointer but instead which flags the checkpointer > > > is actually enforcing, as that's what the user should be interested in. If you > > > want to store it in another field internally but display it in the view with > > > the rest of the flags, I'm fine with it. > > > > Just to be in sync with the way code behaves, it is better not to > > update the next checkpoint request's CHECKPOINT_IMMEDIATE with the > > current checkpoint 'flags' field. Because the current checkpoint > > starts with a different set of flags and when there is a new request > > (with CHECKPOINT_IMMEDIATE), it just processes the pending operations > > quickly to take up next requests. If we update this information in the > > 'flags' field of the view, it says that the current checkpoint is > > started with CHECKPOINT_IMMEDIATE which is not true. > > Which is why I suggested to only take into account CHECKPOINT_REQUESTED (to > be able to display that a new checkpoint was requested) and > CHECKPOINT_IMMEDIATE, to be able to display that the current checkpoint isn't > throttled anymore if it were. > > I still don't understand why you want so much to display "how the checkpoint > was initially started" rather than "how the checkpoint is really behaving right > now". The whole point of having a progress view is to have something dynamic > that reflects the current activity. > > > Hence I had > > thought of adding a new field ('next flags' or 'upcoming flags') which > > contain all the flag values of new checkpoint requests. This field > > indicates whether the current checkpoint is throttled or not and also > > it indicates there are new requests. > > I'm not opposed to having such a field, I'm opposed to having a view with "the > current checkpoint is throttled but if there are some flags in the next > checkpoint flags and those flags contain checkpoint immediate then the current > checkpoint isn't actually throttled anymore" behavior.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Andres Freund
Date:
Hi, This is a long thread, sorry for asking if this has been asked before. On 2022-03-08 20:25:28 +0530, Nitin Jadhav wrote: > * Sort buffers that need to be written to reduce the likelihood of random > @@ -2129,6 +2132,8 @@ BufferSync(int flags) > bufHdr = GetBufferDescriptor(buf_id); > > num_processed++; > + pgstat_progress_update_param(PROGRESS_CHECKPOINT_BUFFERS_PROCESSED, > + num_processed); > > /* > * We don't need to acquire the lock here, because we're only looking > @@ -2149,6 +2154,8 @@ BufferSync(int flags) > TRACE_POSTGRESQL_BUFFER_SYNC_WRITTEN(buf_id); > PendingCheckpointerStats.m_buf_written_checkpoints++; > num_written++; > + pgstat_progress_update_param(PROGRESS_CHECKPOINT_BUFFERS_WRITTEN, > + num_written); > } > } Have you measured the performance effects of this? On fast storage with large shared_buffers I've seen these loops in profiles. It's probably fine, but it'd be good to verify that. > @@ -1897,6 +1897,112 @@ pg_stat_progress_basebackup| SELECT s.pid, > s.param4 AS tablespaces_total, > s.param5 AS tablespaces_streamed > FROM pg_stat_get_progress_info('BASEBACKUP'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6,param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19,param20); > +pg_stat_progress_checkpoint| SELECT s.pid, > + CASE s.param1 > + WHEN 1 THEN 'checkpoint'::text > + WHEN 2 THEN 'restartpoint'::text > + ELSE NULL::text > + END AS type, > + ((((((( > + CASE > + WHEN ((s.param2 & (1)::bigint) > 0) THEN 'shutdown '::text > + ELSE ''::text > + END || > + CASE > + WHEN ((s.param2 & (2)::bigint) > 0) THEN 'end-of-recovery '::text > + ELSE ''::text > + END) || > + CASE > + WHEN ((s.param2 & (4)::bigint) > 0) THEN 'immediate '::text > + ELSE ''::text > + END) || > + CASE > + WHEN ((s.param2 & (8)::bigint) > 0) THEN 'force '::text > + ELSE ''::text > + END) || > + CASE > + WHEN ((s.param2 & (16)::bigint) > 0) THEN 'flush-all '::text > + ELSE ''::text > + END) || > + CASE > + WHEN ((s.param2 & (32)::bigint) > 0) THEN 'wait '::text > + ELSE ''::text > + END) || > + CASE > + WHEN ((s.param2 & (128)::bigint) > 0) THEN 'wal '::text > + ELSE ''::text > + END) || > + CASE > + WHEN ((s.param2 & (256)::bigint) > 0) THEN 'time '::text > + ELSE ''::text > + END) AS flags, > + ((((((( > + CASE > + WHEN ((s.param3 & (1)::bigint) > 0) THEN 'shutdown '::text > + ELSE ''::text > + END || > + CASE > + WHEN ((s.param3 & (2)::bigint) > 0) THEN 'end-of-recovery '::text > + ELSE ''::text > + END) || > + CASE > + WHEN ((s.param3 & (4)::bigint) > 0) THEN 'immediate '::text > + ELSE ''::text > + END) || > + CASE > + WHEN ((s.param3 & (8)::bigint) > 0) THEN 'force '::text > + ELSE ''::text > + END) || > + CASE > + WHEN ((s.param3 & (16)::bigint) > 0) THEN 'flush-all '::text > + ELSE ''::text > + END) || > + CASE > + WHEN ((s.param3 & (32)::bigint) > 0) THEN 'wait '::text > + ELSE ''::text > + END) || > + CASE > + WHEN ((s.param3 & (128)::bigint) > 0) THEN 'wal '::text > + ELSE ''::text > + END) || > + CASE > + WHEN ((s.param3 & (256)::bigint) > 0) THEN 'time '::text > + ELSE ''::text > + END) AS next_flags, > + ('0/0'::pg_lsn + ( > + CASE > + WHEN (s.param4 < 0) THEN pow((2)::numeric, (64)::numeric) > + ELSE (0)::numeric > + END + (s.param4)::numeric)) AS start_lsn, > + to_timestamp(((946684800)::double precision + ((s.param5)::double precision / (1000000)::double precision))) AS start_time, > + CASE s.param6 > + WHEN 1 THEN 'initializing'::text > + WHEN 2 THEN 'getting virtual transaction IDs'::text > + WHEN 3 THEN 'checkpointing replication slots'::text > + WHEN 4 THEN 'checkpointing logical replication snapshot files'::text > + WHEN 5 THEN 'checkpointing logical rewrite mapping files'::text > + WHEN 6 THEN 'checkpointing replication origin'::text > + WHEN 7 THEN 'checkpointing commit log pages'::text > + WHEN 8 THEN 'checkpointing commit time stamp pages'::text > + WHEN 9 THEN 'checkpointing subtransaction pages'::text > + WHEN 10 THEN 'checkpointing multixact pages'::text > + WHEN 11 THEN 'checkpointing predicate lock pages'::text > + WHEN 12 THEN 'checkpointing buffers'::text > + WHEN 13 THEN 'processing file sync requests'::text > + WHEN 14 THEN 'performing two phase checkpoint'::text > + WHEN 15 THEN 'performing post checkpoint cleanup'::text > + WHEN 16 THEN 'invalidating replication slots'::text > + WHEN 17 THEN 'recycling old WAL files'::text > + WHEN 18 THEN 'truncating subtransactions'::text > + WHEN 19 THEN 'finalizing'::text > + ELSE NULL::text > + END AS phase, > + s.param7 AS buffers_total, > + s.param8 AS buffers_processed, > + s.param9 AS buffers_written, > + s.param10 AS files_total, > + s.param11 AS files_synced > + FROM pg_stat_get_progress_info('CHECKPOINT'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6,param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19,param20); > pg_stat_progress_cluster| SELECT s.pid, > s.datid, > d.datname, This view is depressingly complicated. Added up the view definitions for the already existing pg_stat_progress* views add up to a measurable part of the size of an empty database: postgres[1160866][1]=# SELECT sum(octet_length(ev_action)), SUM(pg_column_size(ev_action)) FROM pg_rewrite WHERE ev_class::regclass::textLIKE '%progress%'; ┌───────┬───────┐ │ sum │ sum │ ├───────┼───────┤ │ 97410 │ 19786 │ └───────┴───────┘ (1 row) and this view looks to be a good bit more complicated than the existing pg_stat_progress* views. Indeed: template1[1165473][1]=# SELECT ev_class::regclass, length(ev_action), pg_column_size(ev_action) FROM pg_rewrite WHERE ev_class::regclass::textLIKE '%progress%' ORDER BY length(ev_action) DESC; ┌───────────────────────────────┬────────┬────────────────┐ │ ev_class │ length │ pg_column_size │ ├───────────────────────────────┼────────┼────────────────┤ │ pg_stat_progress_checkpoint │ 43290 │ 5409 │ │ pg_stat_progress_create_index │ 23293 │ 4177 │ │ pg_stat_progress_cluster │ 18390 │ 3704 │ │ pg_stat_progress_analyze │ 16121 │ 3339 │ │ pg_stat_progress_vacuum │ 16076 │ 3392 │ │ pg_stat_progress_copy │ 15124 │ 3080 │ │ pg_stat_progress_basebackup │ 8406 │ 2094 │ └───────────────────────────────┴────────┴────────────────┘ (7 rows) pg_rewrite without pg_stat_progress_checkpoint: 745472, with: 753664 pg_rewrite is the second biggest relation in an empty database already... template1[1164827][1]=# SELECT relname, pg_total_relation_size(oid) FROM pg_class WHERE relkind = 'r' ORDER BY 2 DESC LIMIT5; ┌────────────────┬────────────────────────┐ │ relname │ pg_total_relation_size │ ├────────────────┼────────────────────────┤ │ pg_proc │ 1212416 │ │ pg_rewrite │ 745472 │ │ pg_attribute │ 704512 │ │ pg_description │ 630784 │ │ pg_collation │ 409600 │ └────────────────┴────────────────────────┘ (5 rows) Greetings, Andres Freund
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Michael Paquier
Date:
On Fri, Mar 18, 2022 at 05:15:56PM -0700, Andres Freund wrote: > Have you measured the performance effects of this? On fast storage with large > shared_buffers I've seen these loops in profiles. It's probably fine, but it'd > be good to verify that. I am wondering if we could make the function inlined at some point. We could also play it safe and only update the counters every N loops instead. > This view is depressingly complicated. Added up the view definitions for > the already existing pg_stat_progress* views add up to a measurable part of > the size of an empty database: Yeah. I think that what's proposed could be simplified, and we had better remove the fields that are not that useful. First, do we have any need for next_flags? Second, is the start LSN really necessary for monitoring purposes? Not all the information in the first parameter is useful, as well. For example "shutdown" will never be seen as it is not possible to use a session at this stage, no? There is also no gain in having "immediate", "flush-all", "force" and "wait" (for this one if the checkpoint is requested the session doing the work knows this information already). A last thing is that we may gain in visibility by having more attributes as an effect of splitting param2. On thing that would make sense is to track the reason why the checkpoint was triggered separately (aka wal and time). Should we use a text[] instead to list all the parameters instead? Using a space-separated list of items is not intuitive IMO, and callers of this routine will likely parse that. Shouldn't we also track the number of files flushed in each sub-step? In some deployments you could have a large number of 2PC files and such. We may want more information on such matters. + WHEN 3 THEN 'checkpointing replication slots' + WHEN 4 THEN 'checkpointing logical replication snapshot files' + WHEN 5 THEN 'checkpointing logical rewrite mapping files' + WHEN 6 THEN 'checkpointing replication origin' + WHEN 7 THEN 'checkpointing commit log pages' + WHEN 8 THEN 'checkpointing commit time stamp pages' There is a lot of "checkpointing" here. All those terms could be shorter without losing their meaning. This patch still needs some work, so I am marking it as RwF for now. -- Michael
Attachment
Size of pg_rewrite (Was: Report checkpoint progress with pg_stat_progress_checkpoint)
From
Matthias van de Meent
Date:
On Sat, 19 Mar 2022 at 01:15, Andres Freund <andres@anarazel.de> wrote: > pg_rewrite without pg_stat_progress_checkpoint: 745472, with: 753664 > > pg_rewrite is the second biggest relation in an empty database already... Yeah, that's not great. Thanks for nerd-sniping me into looking into how views and pg_rewrite rules work, that was very interesting and I learned quite a lot. # Immediately potential, limited to progress views I noticed that the CASE-WHEN (used in translating progress stage index to stage names) in those progress reporting views can be more efficiently described (althoug with slightly worse behaviour around undefined values) using text array lookups (as attached). That resulted in somewhat smaller rewrite entries for the progress views (toast compression was good old pglz): template1=# SELECT sum(octet_length(ev_action)), SUM(pg_column_size(ev_action)) FROM pg_rewrite WHERE ev_class::regclass::text LIKE '%progress%'; master: sum | sum -------+------- 97277 | 19956 patched: sum | sum -------+------- 77069 | 18417 So this seems like a nice improvement of 20% uncompressed / 7% compressed. I tested various cases of phase number to text translations: `CASE .. WHEN`; `(ARRAY[]::text[])[index]` and `('{}'::text[])[index]`. See results below: postgres=# create or replace view arrayliteral_view as select (ARRAY['a','b','c','d','e','f']::text[])[index] as name from tst s(index); CREATE INDEX postgres=# create or replace view stringcast_view as select ('{a,b,c,d,e,f}'::text[])[index] as name from tst s(index); CREATE INDEX postgres=# create or replace view split_stringcast_view as select (('{a,b,' || 'c,d,e,f}')::text[])[index] as name from tst s(index); CREATE VIEW postgres=# create or replace view case_view as select case index when 0 then 'a' when 1 then 'b' when 2 then 'c' when 3 then 'd' when 4 then 'e' when 5 then 'f' end as name from tst s(index); CREATE INDEX postgres=# postgres=# select ev_class::regclass::text, octet_length(ev_action), pg_column_size(ev_action) from pg_rewrite where ev_class in ('arrayliteral_view'::regclass::oid, 'case_view'::regclass::oid, 'split_stringcast_view'::regclass::oid, 'stringcast_view'::regclass::oid); ev_class | octet_length | pg_column_size -----------------------+--------------+---------------- arrayliteral_view | 3311 | 1322 stringcast_view | 2610 | 1257 case_view | 5170 | 1412 split_stringcast_view | 2847 | 1350 It seems to me that we could consider replacing the CASE statements with array literals and lookups if we really value our template database size. But, as text literal concatenations don't seem to get constant folded before storing them in the rules table, this rewrite of the views would result in long lines in the system_views.sql file, or we'd have to deal with the additional overhead of the append operator and cast nodes. # Future work; nodeToString / readNode, all rewrite rules Additionally, we might want to consider other changes like default (or empty value) elision in nodeToString, if that is considered a reasonable option and if we really want to reduce the size of the pg_rewrite table. I think a lot of space can be recovered from that: A manual removal of what seemed to be fields with default values (and the removal of all query location related fields) in the current definition of pg_stat_progress_create_index reduces its uncompressed size from 23226B raw and 4204B compressed to 13821B raw and 2784B compressed, for an on-disk space saving of 33% for this view's ev_action. Do note, however, that that would add significant branching in the nodeToString and readNode code, which might slow down that code significantly. I'm not planning on working on that; but in my opinion that is a viable path to reducing the size of new database catalogs. -Matthias PS. attached patch is not to be considered complete - it is a minimal example of the array literal form. It fails regression tests because I didn't bother updating or including the regression tests on system views.
Attachment
Re: Size of pg_rewrite (Was: Report checkpoint progress with pg_stat_progress_checkpoint)
From
Andres Freund
Date:
Hi, On April 8, 2022 7:52:07 AM PDT, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: >On Sat, 19 Mar 2022 at 01:15, Andres Freund <andres@anarazel.de> wrote: >> pg_rewrite without pg_stat_progress_checkpoint: 745472, with: 753664 >> >> pg_rewrite is the second biggest relation in an empty database already... > >Yeah, that's not great. Thanks for nerd-sniping me into looking into >how views and pg_rewrite rules work, that was very interesting and I >learned quite a lot. Thanks for looking! ># Immediately potential, limited to progress views > >I noticed that the CASE-WHEN (used in translating progress stage index >to stage names) in those progress reporting views can be more >efficiently described (althoug with slightly worse behaviour around >undefined values) using text array lookups (as attached). That >resulted in somewhat smaller rewrite entries for the progress views >(toast compression was good old pglz): > >template1=# SELECT sum(octet_length(ev_action)), >SUM(pg_column_size(ev_action)) FROM pg_rewrite WHERE >ev_class::regclass::text LIKE '%progress%'; > >master: > sum | sum >-------+------- > 97277 | 19956 >patched: > sum | sum >-------+------- > 77069 | 18417 > >So this seems like a nice improvement of 20% uncompressed / 7% compressed. > >I tested various cases of phase number to text translations: `CASE .. >WHEN`; `(ARRAY[]::text[])[index]` and `('{}'::text[])[index]`. See >results below: > >postgres=# create or replace view arrayliteral_view as select >(ARRAY['a','b','c','d','e','f']::text[])[index] as name from tst >s(index); >CREATE INDEX >postgres=# create or replace view stringcast_view as select >('{a,b,c,d,e,f}'::text[])[index] as name from tst s(index); >CREATE INDEX >postgres=# create or replace view split_stringcast_view as select >(('{a,b,' || 'c,d,e,f}')::text[])[index] as name from tst s(index); >CREATE VIEW >postgres=# create or replace view case_view as select case index when >0 then 'a' when 1 then 'b' when 2 then 'c' when 3 then 'd' when 4 then >'e' when 5 then 'f' end as name from tst s(index); >CREATE INDEX > > >postgres=# postgres=# select ev_class::regclass::text, >octet_length(ev_action), pg_column_size(ev_action) from pg_rewrite >where ev_class in ('arrayliteral_view'::regclass::oid, >'case_view'::regclass::oid, 'split_stringcast_view'::regclass::oid, >'stringcast_view'::regclass::oid); > ev_class | octet_length | pg_column_size >-----------------------+--------------+---------------- > arrayliteral_view | 3311 | 1322 > stringcast_view | 2610 | 1257 > case_view | 5170 | 1412 > split_stringcast_view | 2847 | 1350 > >It seems to me that we could consider replacing the CASE statements >with array literals and lookups if we really value our template >database size. But, as text literal concatenations don't seem to get >constant folded before storing them in the rules table, this rewrite >of the views would result in long lines in the system_views.sql file, >or we'd have to deal with the additional overhead of the append >operator and cast nodes. My inclination is that the mapping functions should be c functions. There's really no point in doing it in SQL and it comesat a noticable price. And, if done in C, we can fix mistakes in minor releases, which we can't in SQL. ># Future work; nodeToString / readNode, all rewrite rules > >Additionally, we might want to consider other changes like default (or >empty value) elision in nodeToString, if that is considered a >reasonable option and if we really want to reduce the size of the >pg_rewrite table. > >I think a lot of space can be recovered from that: A manual removal of >what seemed to be fields with default values (and the removal of all >query location related fields) in the current definition of >pg_stat_progress_create_index reduces its uncompressed size from >23226B raw and 4204B compressed to 13821B raw and 2784B compressed, >for an on-disk space saving of 33% for this view's ev_action. > >Do note, however, that that would add significant branching in the >nodeToString and readNode code, which might slow down that code >significantly. I'm not planning on working on that; but in my opinion >that is a viable path to reducing the size of new database catalogs. We should definitely be careful about that. I do agree that there's a lot of efficiency to be gained in the serializationformat. Once we have the automatic node func generation in place, we could have one representation for humanconsumption, and one for density... Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
Hi, Here is the update patch which fixes the previous comments discussed in this thread. I am sorry for the long gap in the discussion. Kindly let me know if I have missed any of the comments or anything new. Thanks & Regards, Nitin Jadhav On Fri, Mar 18, 2022 at 4:52 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > > > > I don't get it. The checkpoint flags and the view flags (set by > > > > pgstat_progrss_update*) are different, so why can't we add this flag to the > > > > view flags? The fact that checkpointer.c doesn't update the passed flag and > > > > instead look in the shmem to see if CHECKPOINT_IMMEDIATE has been set since is > > > > an implementation detail, and the view shouldn't focus on which flags were > > > > initially passed to the checkpointer but instead which flags the checkpointer > > > > is actually enforcing, as that's what the user should be interested in. If you > > > > want to store it in another field internally but display it in the view with > > > > the rest of the flags, I'm fine with it. > > > > > > Just to be in sync with the way code behaves, it is better not to > > > update the next checkpoint request's CHECKPOINT_IMMEDIATE with the > > > current checkpoint 'flags' field. Because the current checkpoint > > > starts with a different set of flags and when there is a new request > > > (with CHECKPOINT_IMMEDIATE), it just processes the pending operations > > > quickly to take up next requests. If we update this information in the > > > 'flags' field of the view, it says that the current checkpoint is > > > started with CHECKPOINT_IMMEDIATE which is not true. > > > > Which is why I suggested to only take into account CHECKPOINT_REQUESTED (to > > be able to display that a new checkpoint was requested) > > I will take care in the next patch. > > > > Hence I had > > > thought of adding a new field ('next flags' or 'upcoming flags') which > > > contain all the flag values of new checkpoint requests. This field > > > indicates whether the current checkpoint is throttled or not and also > > > it indicates there are new requests. > > > > I'm not opposed to having such a field, I'm opposed to having a view with "the > > current checkpoint is throttled but if there are some flags in the next > > checkpoint flags and those flags contain checkpoint immediate then the current > > checkpoint isn't actually throttled anymore" behavior. > > I understand your point and I also agree that it becomes difficult for > the user to understand the context. > > > and > > CHECKPOINT_IMMEDIATE, to be able to display that the current checkpoint isn't > > throttled anymore if it were. > > > > I still don't understand why you want so much to display "how the checkpoint > > was initially started" rather than "how the checkpoint is really behaving right > > now". The whole point of having a progress view is to have something dynamic > > that reflects the current activity. > > As of now I will not consider adding this information to the view. If > required and nobody opposes having this included in the 'flags' field > of the view, then I will consider adding. > > Thanks & Regards, > Nitin Jadhav > > On Mon, Mar 14, 2022 at 5:16 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > On Mon, Mar 14, 2022 at 03:16:50PM +0530, Nitin Jadhav wrote: > > > > > I am not suggesting > > > > > removing the existing 'flags' field of pg_stat_progress_checkpoint > > > > > view and adding a new field 'throttled'. The content of the 'flags' > > > > > field remains the same. I was suggesting replacing the 'next_flags' > > > > > field with 'throttled' field since the new request with > > > > > CHECKPOINT_IMMEDIATE flag enabled will affect the current checkpoint. > > > > > > > > Are you saying that this new throttled flag will only be set by the overloaded > > > > flags in ckpt_flags? > > > > > > Yes. you are right. > > > > > > > So you can have a checkpoint with a CHECKPOINT_IMMEDIATE > > > > flags that's throttled, and a checkpoint without the CHECKPOINT_IMMEDIATE flag > > > > that's not throttled? > > > > > > I think it's the reverse. A checkpoint with a CHECKPOINT_IMMEDIATE > > > flags that's not throttled (disables delays between writes) and a > > > checkpoint without the CHECKPOINT_IMMEDIATE flag that's throttled > > > (enables delays between writes) > > > > Yes that's how it's supposed to work, but my point was that your suggested > > 'throttled' flag could say the opposite, which is bad. > > > > > > I don't get it. The checkpoint flags and the view flags (set by > > > > pgstat_progrss_update*) are different, so why can't we add this flag to the > > > > view flags? The fact that checkpointer.c doesn't update the passed flag and > > > > instead look in the shmem to see if CHECKPOINT_IMMEDIATE has been set since is > > > > an implementation detail, and the view shouldn't focus on which flags were > > > > initially passed to the checkpointer but instead which flags the checkpointer > > > > is actually enforcing, as that's what the user should be interested in. If you > > > > want to store it in another field internally but display it in the view with > > > > the rest of the flags, I'm fine with it. > > > > > > Just to be in sync with the way code behaves, it is better not to > > > update the next checkpoint request's CHECKPOINT_IMMEDIATE with the > > > current checkpoint 'flags' field. Because the current checkpoint > > > starts with a different set of flags and when there is a new request > > > (with CHECKPOINT_IMMEDIATE), it just processes the pending operations > > > quickly to take up next requests. If we update this information in the > > > 'flags' field of the view, it says that the current checkpoint is > > > started with CHECKPOINT_IMMEDIATE which is not true. > > > > Which is why I suggested to only take into account CHECKPOINT_REQUESTED (to > > be able to display that a new checkpoint was requested) and > > CHECKPOINT_IMMEDIATE, to be able to display that the current checkpoint isn't > > throttled anymore if it were. > > > > I still don't understand why you want so much to display "how the checkpoint > > was initially started" rather than "how the checkpoint is really behaving right > > now". The whole point of having a progress view is to have something dynamic > > that reflects the current activity. > > > > > Hence I had > > > thought of adding a new field ('next flags' or 'upcoming flags') which > > > contain all the flag values of new checkpoint requests. This field > > > indicates whether the current checkpoint is throttled or not and also > > > it indicates there are new requests. > > > > I'm not opposed to having such a field, I'm opposed to having a view with "the > > current checkpoint is throttled but if there are some flags in the next > > checkpoint flags and those flags contain checkpoint immediate then the current > > checkpoint isn't actually throttled anymore" behavior.
Attachment
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> Have you measured the performance effects of this? On fast storage with large > shared_buffers I've seen these loops in profiles. It's probably fine, but it'd > be good to verify that. To understand the performance effects of the above, I have taken the average of five checkpoints with the patch and without the patch in my environment. Here are the results. With patch: 269.65 s Without patch: 269.60 s It looks fine. Please share your views. > This view is depressingly complicated. Added up the view definitions for > the already existing pg_stat_progress* views add up to a measurable part of > the size of an empty database: Thank you so much for sharing the detailed analysis. We can remove a few fields which are not so important to make it simple. Thanks & Regards, Nitin Jadhav On Sat, Mar 19, 2022 at 5:45 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > This is a long thread, sorry for asking if this has been asked before. > > On 2022-03-08 20:25:28 +0530, Nitin Jadhav wrote: > > * Sort buffers that need to be written to reduce the likelihood of random > > @@ -2129,6 +2132,8 @@ BufferSync(int flags) > > bufHdr = GetBufferDescriptor(buf_id); > > > > num_processed++; > > + pgstat_progress_update_param(PROGRESS_CHECKPOINT_BUFFERS_PROCESSED, > > + num_processed); > > > > /* > > * We don't need to acquire the lock here, because we're only looking > > @@ -2149,6 +2154,8 @@ BufferSync(int flags) > > TRACE_POSTGRESQL_BUFFER_SYNC_WRITTEN(buf_id); > > PendingCheckpointerStats.m_buf_written_checkpoints++; > > num_written++; > > + pgstat_progress_update_param(PROGRESS_CHECKPOINT_BUFFERS_WRITTEN, > > + num_written); > > } > > } > > Have you measured the performance effects of this? On fast storage with large > shared_buffers I've seen these loops in profiles. It's probably fine, but it'd > be good to verify that. > > > > @@ -1897,6 +1897,112 @@ pg_stat_progress_basebackup| SELECT s.pid, > > s.param4 AS tablespaces_total, > > s.param5 AS tablespaces_streamed > > FROM pg_stat_get_progress_info('BASEBACKUP'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6,param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19,param20); > > +pg_stat_progress_checkpoint| SELECT s.pid, > > + CASE s.param1 > > + WHEN 1 THEN 'checkpoint'::text > > + WHEN 2 THEN 'restartpoint'::text > > + ELSE NULL::text > > + END AS type, > > + ((((((( > > + CASE > > + WHEN ((s.param2 & (1)::bigint) > 0) THEN 'shutdown '::text > > + ELSE ''::text > > + END || > > + CASE > > + WHEN ((s.param2 & (2)::bigint) > 0) THEN 'end-of-recovery '::text > > + ELSE ''::text > > + END) || > > + CASE > > + WHEN ((s.param2 & (4)::bigint) > 0) THEN 'immediate '::text > > + ELSE ''::text > > + END) || > > + CASE > > + WHEN ((s.param2 & (8)::bigint) > 0) THEN 'force '::text > > + ELSE ''::text > > + END) || > > + CASE > > + WHEN ((s.param2 & (16)::bigint) > 0) THEN 'flush-all '::text > > + ELSE ''::text > > + END) || > > + CASE > > + WHEN ((s.param2 & (32)::bigint) > 0) THEN 'wait '::text > > + ELSE ''::text > > + END) || > > + CASE > > + WHEN ((s.param2 & (128)::bigint) > 0) THEN 'wal '::text > > + ELSE ''::text > > + END) || > > + CASE > > + WHEN ((s.param2 & (256)::bigint) > 0) THEN 'time '::text > > + ELSE ''::text > > + END) AS flags, > > + ((((((( > > + CASE > > + WHEN ((s.param3 & (1)::bigint) > 0) THEN 'shutdown '::text > > + ELSE ''::text > > + END || > > + CASE > > + WHEN ((s.param3 & (2)::bigint) > 0) THEN 'end-of-recovery '::text > > + ELSE ''::text > > + END) || > > + CASE > > + WHEN ((s.param3 & (4)::bigint) > 0) THEN 'immediate '::text > > + ELSE ''::text > > + END) || > > + CASE > > + WHEN ((s.param3 & (8)::bigint) > 0) THEN 'force '::text > > + ELSE ''::text > > + END) || > > + CASE > > + WHEN ((s.param3 & (16)::bigint) > 0) THEN 'flush-all '::text > > + ELSE ''::text > > + END) || > > + CASE > > + WHEN ((s.param3 & (32)::bigint) > 0) THEN 'wait '::text > > + ELSE ''::text > > + END) || > > + CASE > > + WHEN ((s.param3 & (128)::bigint) > 0) THEN 'wal '::text > > + ELSE ''::text > > + END) || > > + CASE > > + WHEN ((s.param3 & (256)::bigint) > 0) THEN 'time '::text > > + ELSE ''::text > > + END) AS next_flags, > > + ('0/0'::pg_lsn + ( > > + CASE > > + WHEN (s.param4 < 0) THEN pow((2)::numeric, (64)::numeric) > > + ELSE (0)::numeric > > + END + (s.param4)::numeric)) AS start_lsn, > > + to_timestamp(((946684800)::double precision + ((s.param5)::double precision / (1000000)::double precision))) ASstart_time, > > + CASE s.param6 > > + WHEN 1 THEN 'initializing'::text > > + WHEN 2 THEN 'getting virtual transaction IDs'::text > > + WHEN 3 THEN 'checkpointing replication slots'::text > > + WHEN 4 THEN 'checkpointing logical replication snapshot files'::text > > + WHEN 5 THEN 'checkpointing logical rewrite mapping files'::text > > + WHEN 6 THEN 'checkpointing replication origin'::text > > + WHEN 7 THEN 'checkpointing commit log pages'::text > > + WHEN 8 THEN 'checkpointing commit time stamp pages'::text > > + WHEN 9 THEN 'checkpointing subtransaction pages'::text > > + WHEN 10 THEN 'checkpointing multixact pages'::text > > + WHEN 11 THEN 'checkpointing predicate lock pages'::text > > + WHEN 12 THEN 'checkpointing buffers'::text > > + WHEN 13 THEN 'processing file sync requests'::text > > + WHEN 14 THEN 'performing two phase checkpoint'::text > > + WHEN 15 THEN 'performing post checkpoint cleanup'::text > > + WHEN 16 THEN 'invalidating replication slots'::text > > + WHEN 17 THEN 'recycling old WAL files'::text > > + WHEN 18 THEN 'truncating subtransactions'::text > > + WHEN 19 THEN 'finalizing'::text > > + ELSE NULL::text > > + END AS phase, > > + s.param7 AS buffers_total, > > + s.param8 AS buffers_processed, > > + s.param9 AS buffers_written, > > + s.param10 AS files_total, > > + s.param11 AS files_synced > > + FROM pg_stat_get_progress_info('CHECKPOINT'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6,param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19,param20); > > pg_stat_progress_cluster| SELECT s.pid, > > s.datid, > > d.datname, > > This view is depressingly complicated. Added up the view definitions for > the already existing pg_stat_progress* views add up to a measurable part of > the size of an empty database: > > postgres[1160866][1]=# SELECT sum(octet_length(ev_action)), SUM(pg_column_size(ev_action)) FROM pg_rewrite WHERE ev_class::regclass::textLIKE '%progress%'; > ┌───────┬───────┐ > │ sum │ sum │ > ├───────┼───────┤ > │ 97410 │ 19786 │ > └───────┴───────┘ > (1 row) > > and this view looks to be a good bit more complicated than the existing > pg_stat_progress* views. > > Indeed: > template1[1165473][1]=# SELECT ev_class::regclass, length(ev_action), pg_column_size(ev_action) FROM pg_rewrite WHERE ev_class::regclass::textLIKE '%progress%' ORDER BY length(ev_action) DESC; > ┌───────────────────────────────┬────────┬────────────────┐ > │ ev_class │ length │ pg_column_size │ > ├───────────────────────────────┼────────┼────────────────┤ > │ pg_stat_progress_checkpoint │ 43290 │ 5409 │ > │ pg_stat_progress_create_index │ 23293 │ 4177 │ > │ pg_stat_progress_cluster │ 18390 │ 3704 │ > │ pg_stat_progress_analyze │ 16121 │ 3339 │ > │ pg_stat_progress_vacuum │ 16076 │ 3392 │ > │ pg_stat_progress_copy │ 15124 │ 3080 │ > │ pg_stat_progress_basebackup │ 8406 │ 2094 │ > └───────────────────────────────┴────────┴────────────────┘ > (7 rows) > > pg_rewrite without pg_stat_progress_checkpoint: 745472, with: 753664 > > > pg_rewrite is the second biggest relation in an empty database already... > > template1[1164827][1]=# SELECT relname, pg_total_relation_size(oid) FROM pg_class WHERE relkind = 'r' ORDER BY 2 DESC LIMIT5; > ┌────────────────┬────────────────────────┐ > │ relname │ pg_total_relation_size │ > ├────────────────┼────────────────────────┤ > │ pg_proc │ 1212416 │ > │ pg_rewrite │ 745472 │ > │ pg_attribute │ 704512 │ > │ pg_description │ 630784 │ > │ pg_collation │ 409600 │ > └────────────────┴────────────────────────┘ > (5 rows) > > Greetings, > > Andres Freund
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> > Have you measured the performance effects of this? On fast storage with large > > shared_buffers I've seen these loops in profiles. It's probably fine, but it'd > > be good to verify that. > > I am wondering if we could make the function inlined at some point. > We could also play it safe and only update the counters every N loops > instead. The idea looks good but based on the performance numbers shared above, it is not affecting the performance. So we can use the current approach as it gives more accurate progress. --- > > This view is depressingly complicated. Added up the view definitions for > > the already existing pg_stat_progress* views add up to a measurable part of > > the size of an empty database: > > Yeah. I think that what's proposed could be simplified, and we had > better remove the fields that are not that useful. First, do we have > any need for next_flags? "next_flags" is removed in the v6 patch. Added a "new_requests" field to get to know whether the current checkpoint is being throttled or not. Please share your views on this. --- > Second, is the start LSN really necessary > for monitoring purposes? IMO, start LSN is necessary to debug if the checkpoint is taking longer. --- > Not all the information in the first > parameter is useful, as well. For example "shutdown" will never be > seen as it is not possible to use a session at this stage, no? I understand that "shutdown" and "end-of-recovery" will never be seen and I have removed it in the v6 patch. --- > There > is also no gain in having "immediate", "flush-all", "force" and "wait" > (for this one if the checkpoint is requested the session doing the > work knows this information already). "immediate" is required to understand whether the current checkpoint is throttled or not. I am not sure about other flags "flush-all", "force" and "wait". I have just supported all the flags to match the 'checkpoint start' log message. Please share your views. If it is not really required, I will remove it in the next patch. --- > A last thing is that we may gain in visibility by having more > attributes as an effect of splitting param2. On thing that would make > sense is to track the reason why the checkpoint was triggered > separately (aka wal and time). Should we use a text[] instead to list > all the parameters instead? Using a space-separated list of items is > not intuitive IMO, and callers of this routine will likely parse > that. If I understand the above comment correctly, you are saying to introduce a new field, say "reason" ( possible values are either wal or time) and the "flags" field will continue to represent the other flags like "immediate", etc. The idea looks good here. We can introduce new field "reason" and "flags" field can be renamed to "throttled" (true/false) if we decide to not support other flags "flush-all", "force" and "wait". --- > + WHEN 3 THEN 'checkpointing replication slots' > + WHEN 4 THEN 'checkpointing logical replication snapshot files' > + WHEN 5 THEN 'checkpointing logical rewrite mapping files' > + WHEN 6 THEN 'checkpointing replication origin' > + WHEN 7 THEN 'checkpointing commit log pages' > + WHEN 8 THEN 'checkpointing commit time stamp pages' > There is a lot of "checkpointing" here. All those terms could be > shorter without losing their meaning. I will try to make it short in the next patch. --- Please share your thoughts. Thanks & Regards, Nitin Jadhav On Tue, Apr 5, 2022 at 3:15 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Mar 18, 2022 at 05:15:56PM -0700, Andres Freund wrote: > > Have you measured the performance effects of this? On fast storage with large > > shared_buffers I've seen these loops in profiles. It's probably fine, but it'd > > be good to verify that. > > I am wondering if we could make the function inlined at some point. > We could also play it safe and only update the counters every N loops > instead. > > > This view is depressingly complicated. Added up the view definitions for > > the already existing pg_stat_progress* views add up to a measurable part of > > the size of an empty database: > > Yeah. I think that what's proposed could be simplified, and we had > better remove the fields that are not that useful. First, do we have > any need for next_flags? Second, is the start LSN really necessary > for monitoring purposes? Not all the information in the first > parameter is useful, as well. For example "shutdown" will never be > seen as it is not possible to use a session at this stage, no? There > is also no gain in having "immediate", "flush-all", "force" and "wait" > (for this one if the checkpoint is requested the session doing the > work knows this information already). > > A last thing is that we may gain in visibility by having more > attributes as an effect of splitting param2. On thing that would make > sense is to track the reason why the checkpoint was triggered > separately (aka wal and time). Should we use a text[] instead to list > all the parameters instead? Using a space-separated list of items is > not intuitive IMO, and callers of this routine will likely parse > that. > > Shouldn't we also track the number of files flushed in each sub-step? > In some deployments you could have a large number of 2PC files and > such. We may want more information on such matters. > > + WHEN 3 THEN 'checkpointing replication slots' > + WHEN 4 THEN 'checkpointing logical replication snapshot files' > + WHEN 5 THEN 'checkpointing logical rewrite mapping files' > + WHEN 6 THEN 'checkpointing replication origin' > + WHEN 7 THEN 'checkpointing commit log pages' > + WHEN 8 THEN 'checkpointing commit time stamp pages' > There is a lot of "checkpointing" here. All those terms could be > shorter without losing their meaning. > > This patch still needs some work, so I am marking it as RwF for now. > -- > Michael
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Andres Freund
Date:
Hi, On 2022-06-13 19:08:35 +0530, Nitin Jadhav wrote: > > Have you measured the performance effects of this? On fast storage with large > > shared_buffers I've seen these loops in profiles. It's probably fine, but it'd > > be good to verify that. > > To understand the performance effects of the above, I have taken the > average of five checkpoints with the patch and without the patch in my > environment. Here are the results. > With patch: 269.65 s > Without patch: 269.60 s Those look like timed checkpoints - if the checkpoints are sleeping a part of the time, you're not going to see any potential overhead. To see whether this has an effect you'd have to make sure there's a certain number of dirty buffers (e.g. by doing CREATE TABLE AS some_query) and then do a manual checkpoint and time how long that times. Greetings, Andres Freund
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> > To understand the performance effects of the above, I have taken the > > average of five checkpoints with the patch and without the patch in my > > environment. Here are the results. > > With patch: 269.65 s > > Without patch: 269.60 s > > Those look like timed checkpoints - if the checkpoints are sleeping a > part of the time, you're not going to see any potential overhead. Yes. The above data is collected from timed checkpoints. create table t1(a int); insert into t1 select * from generate_series(1,10000000); I generated a lot of data by using the above queries which would in turn trigger the checkpoint (wal). --- > To see whether this has an effect you'd have to make sure there's a > certain number of dirty buffers (e.g. by doing CREATE TABLE AS > some_query) and then do a manual checkpoint and time how long that > times. For this case I have generated data by using below queries. create table t1(a int); insert into t1 select * from generate_series(1,8000000); This does not trigger the checkpoint automatically. I have issued the CHECKPOINT manually and measured the performance by considering an average of 5 checkpoints. Here are the details. With patch: 2.457 s Without patch: 2.334 s Please share your thoughts. Thanks & Regards, Nitin Jadhav On Thu, Jul 7, 2022 at 5:34 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2022-06-13 19:08:35 +0530, Nitin Jadhav wrote: > > > Have you measured the performance effects of this? On fast storage with large > > > shared_buffers I've seen these loops in profiles. It's probably fine, but it'd > > > be good to verify that. > > > > To understand the performance effects of the above, I have taken the > > average of five checkpoints with the patch and without the patch in my > > environment. Here are the results. > > With patch: 269.65 s > > Without patch: 269.60 s > > Those look like timed checkpoints - if the checkpoints are sleeping a > part of the time, you're not going to see any potential overhead. > > To see whether this has an effect you'd have to make sure there's a > certain number of dirty buffers (e.g. by doing CREATE TABLE AS > some_query) and then do a manual checkpoint and time how long that > times. > > Greetings, > > Andres Freund
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
"Drouvot, Bertrand"
Date:
Hi, On 7/28/22 11:38 AM, Nitin Jadhav wrote: >>> To understand the performance effects of the above, I have taken the >>> average of five checkpoints with the patch and without the patch in my >>> environment. Here are the results. >>> With patch: 269.65 s >>> Without patch: 269.60 s >> >> Those look like timed checkpoints - if the checkpoints are sleeping a >> part of the time, you're not going to see any potential overhead. > > Yes. The above data is collected from timed checkpoints. > > create table t1(a int); > insert into t1 select * from generate_series(1,10000000); > > I generated a lot of data by using the above queries which would in > turn trigger the checkpoint (wal). > --- > >> To see whether this has an effect you'd have to make sure there's a >> certain number of dirty buffers (e.g. by doing CREATE TABLE AS >> some_query) and then do a manual checkpoint and time how long that >> times. > > For this case I have generated data by using below queries. > > create table t1(a int); > insert into t1 select * from generate_series(1,8000000); > > This does not trigger the checkpoint automatically. I have issued the > CHECKPOINT manually and measured the performance by considering an > average of 5 checkpoints. Here are the details. > > With patch: 2.457 s > Without patch: 2.334 s > > Please share your thoughts. > v6 was not applying anymore, due to a change in doc/src/sgml/ref/checkpoint.sgml done by b9eb0ff09e (Rename pg_checkpointer predefined role to pg_checkpoint). Please find attached a rebase in v7. While working on this rebase, I also noticed that "pg_checkpointer" is still mentioned in some translation files: " $ git grep pg_checkpointer src/backend/po/de.po:msgid "must be superuser or have privileges of pg_checkpointer to do CHECKPOINT" src/backend/po/ja.po:msgid "must be superuser or have privileges of pg_checkpointer to do CHECKPOINT" src/backend/po/ja.po:msgstr "CHECKPOINTを実行するにはスーパーユーザーであるか、またはpg_checkpointerの権限を持つ必要があります" src/backend/po/sv.po:msgid "must be superuser or have privileges of pg_checkpointer to do CHECKPOINT" " I'm not familiar with how the translation files are handled (looks like they have their own set of commits, see 3c0bcdbc66 for example) but wanted to mention that "pg_checkpointer" is still mentioned (even if that may be expected as the last commit related to translation files (aka 3c0bcdbc66) is older than the one that renamed pg_checkpointer to pg_checkpoint (aka b9eb0ff09e)). That said, back to this patch: I did not look closely but noticed that the buffers_total reported by pg_stat_progress_checkpoint: postgres=# select type,flags,start_lsn,phase,buffers_total,new_requests from pg_stat_progress_checkpoint; type | flags | start_lsn | phase | buffers_total | new_requests ------------+-----------------------+------------+-----------------------+---------------+-------------- checkpoint | immediate force wait | 1/E6C523A8 | checkpointing buffers | 1024275 | false (1 row) is a little bit different from what is logged once completed: 2022-11-04 08:18:50.806 UTC [3488442] LOG: checkpoint complete: wrote 1024278 buffers (97.7%); Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
> v6 was not applying anymore, due to a change in > doc/src/sgml/ref/checkpoint.sgml done by b9eb0ff09e (Rename > pg_checkpointer predefined role to pg_checkpoint). > > Please find attached a rebase in v7. > > While working on this rebase, I also noticed that "pg_checkpointer" is > still mentioned in some translation files: Thanks for rebasing the patch and sharing the information. --- > That said, back to this patch: I did not look closely but noticed that > the buffers_total reported by pg_stat_progress_checkpoint: > > postgres=# select type,flags,start_lsn,phase,buffers_total,new_requests > from pg_stat_progress_checkpoint; > type | flags | start_lsn | phase > | buffers_total | new_requests > ------------+-----------------------+------------+-----------------------+---------------+-------------- > checkpoint | immediate force wait | 1/E6C523A8 | checkpointing > buffers | 1024275 | false > (1 row) > > is a little bit different from what is logged once completed: > > 2022-11-04 08:18:50.806 UTC [3488442] LOG: checkpoint complete: wrote > 1024278 buffers (97.7%); This is because the count shown in the checkpoint complete message includes the additional increment done during SlruInternalWritePage(). We are not sure of this increment until it really happens. Hence it was not considered in the patch. To make it compatible with the checkpoint complete message, we should increment all three here, buffers_total, buffers_processed and buffers_written. So the total number of buffers calculated earlier may not always be the same. If this looks good, I will update this in the next patch. Thanks & Regards, Nitin Jadhav On Fri, Nov 4, 2022 at 1:57 PM Drouvot, Bertrand <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On 7/28/22 11:38 AM, Nitin Jadhav wrote: > >>> To understand the performance effects of the above, I have taken the > >>> average of five checkpoints with the patch and without the patch in my > >>> environment. Here are the results. > >>> With patch: 269.65 s > >>> Without patch: 269.60 s > >> > >> Those look like timed checkpoints - if the checkpoints are sleeping a > >> part of the time, you're not going to see any potential overhead. > > > > Yes. The above data is collected from timed checkpoints. > > > > create table t1(a int); > > insert into t1 select * from generate_series(1,10000000); > > > > I generated a lot of data by using the above queries which would in > > turn trigger the checkpoint (wal). > > --- > > > >> To see whether this has an effect you'd have to make sure there's a > >> certain number of dirty buffers (e.g. by doing CREATE TABLE AS > >> some_query) and then do a manual checkpoint and time how long that > >> times. > > > > For this case I have generated data by using below queries. > > > > create table t1(a int); > > insert into t1 select * from generate_series(1,8000000); > > > > This does not trigger the checkpoint automatically. I have issued the > > CHECKPOINT manually and measured the performance by considering an > > average of 5 checkpoints. Here are the details. > > > > With patch: 2.457 s > > Without patch: 2.334 s > > > > Please share your thoughts. > > > > v6 was not applying anymore, due to a change in > doc/src/sgml/ref/checkpoint.sgml done by b9eb0ff09e (Rename > pg_checkpointer predefined role to pg_checkpoint). > > Please find attached a rebase in v7. > > While working on this rebase, I also noticed that "pg_checkpointer" is > still mentioned in some translation files: > " > $ git grep pg_checkpointer > src/backend/po/de.po:msgid "must be superuser or have privileges of > pg_checkpointer to do CHECKPOINT" > src/backend/po/ja.po:msgid "must be superuser or have privileges of > pg_checkpointer to do CHECKPOINT" > src/backend/po/ja.po:msgstr > "CHECKPOINTを実行するにはスーパーユーザーであるか、またはpg_checkpointerの権限を持つ必要があります" > src/backend/po/sv.po:msgid "must be superuser or have privileges of > pg_checkpointer to do CHECKPOINT" > " > > I'm not familiar with how the translation files are handled (looks like > they have their own set of commits, see 3c0bcdbc66 for example) but > wanted to mention that "pg_checkpointer" is still mentioned (even if > that may be expected as the last commit related to translation files > (aka 3c0bcdbc66) is older than the one that renamed pg_checkpointer to > pg_checkpoint (aka b9eb0ff09e)). > > That said, back to this patch: I did not look closely but noticed that > the buffers_total reported by pg_stat_progress_checkpoint: > > postgres=# select type,flags,start_lsn,phase,buffers_total,new_requests > from pg_stat_progress_checkpoint; > type | flags | start_lsn | phase > | buffers_total | new_requests > ------------+-----------------------+------------+-----------------------+---------------+-------------- > checkpoint | immediate force wait | 1/E6C523A8 | checkpointing > buffers | 1024275 | false > (1 row) > > is a little bit different from what is logged once completed: > > 2022-11-04 08:18:50.806 UTC [3488442] LOG: checkpoint complete: wrote > 1024278 buffers (97.7%); > > Regards, > > -- > Bertrand Drouvot > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Robert Haas
Date:
On Fri, Nov 4, 2022 at 4:27 AM Drouvot, Bertrand <bertranddrouvot.pg@gmail.com> wrote: > Please find attached a rebase in v7. I don't think it's a good thing that this patch is using the progress-reporting machinery. The point of that machinery is that we want any backend to be able to report progress for any command it happens to be running, and we don't know which command that will be at any given point in time, or how many backends will be running any given command at once. So we need some generic set of counters that can be repurposed for whatever any particular backend happens to be doing right at the moment. But none of that applies to the checkpointer. Any information about the checkpointer that we want to expose can just be advertised in a dedicated chunk of shared memory, perhaps even by simply adding it to CheckpointerShmemStruct. Then you can give the fields whatever names, types, and sizes you like, and you don't have to do all of this stuff with mapping down to integers and back. The only real disadvantage that I can see is then you have to think a bit harder about what the concurrency model is here, and maybe you end up reimplementing something similar to what the progress-reporting stuff does for you, and *maybe* that is a sufficient reason to do it this way. But I'm doubtful. This feels like a square-peg-round-hole situation. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Andres Freund
Date:
Hi, On 2022-11-04 09:25:52 +0100, Drouvot, Bertrand wrote: > > @@ -7023,29 +7048,63 @@ static void > CheckPointGuts(XLogRecPtr checkPointRedo, int flags) > { > CheckPointRelationMap(); > + > + pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE, > + PROGRESS_CHECKPOINT_PHASE_REPLI_SLOTS); > CheckPointReplicationSlots(); > + > + pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE, > + PROGRESS_CHECKPOINT_PHASE_SNAPSHOTS); > CheckPointSnapBuild(); > + > + pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE, > + PROGRESS_CHECKPOINT_PHASE_LOGICAL_REWRITE_MAPPINGS); > CheckPointLogicalRewriteHeap(); > + > + pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE, > + PROGRESS_CHECKPOINT_PHASE_REPLI_ORIGIN); > CheckPointReplicationOrigin(); > > /* Write out all dirty data in SLRUs and the main buffer pool */ > TRACE_POSTGRESQL_BUFFER_CHECKPOINT_START(flags); > CheckpointStats.ckpt_write_t = GetCurrentTimestamp(); > + > + pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE, > + PROGRESS_CHECKPOINT_PHASE_CLOG_PAGES); > CheckPointCLOG(); > + > + pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE, > + PROGRESS_CHECKPOINT_PHASE_COMMITTS_PAGES); > CheckPointCommitTs(); > + > + pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE, > + PROGRESS_CHECKPOINT_PHASE_SUBTRANS_PAGES); > CheckPointSUBTRANS(); > + > + pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE, > + PROGRESS_CHECKPOINT_PHASE_MULTIXACT_PAGES); > CheckPointMultiXact(); > + > + pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE, > + PROGRESS_CHECKPOINT_PHASE_PREDICATE_LOCK_PAGES); > CheckPointPredicate(); > + > + pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE, > + PROGRESS_CHECKPOINT_PHASE_BUFFERS); > CheckPointBuffers(flags); > > /* Perform all queued up fsyncs */ > TRACE_POSTGRESQL_BUFFER_CHECKPOINT_SYNC_START(); > CheckpointStats.ckpt_sync_t = GetCurrentTimestamp(); > + pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE, > + PROGRESS_CHECKPOINT_PHASE_SYNC_FILES); > ProcessSyncRequests(); > CheckpointStats.ckpt_sync_end_t = GetCurrentTimestamp(); > TRACE_POSTGRESQL_BUFFER_CHECKPOINT_DONE(); > > /* We deliberately delay 2PC checkpointing as long as possible */ > + pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE, > + PROGRESS_CHECKPOINT_PHASE_TWO_PHASE); > CheckPointTwoPhase(checkPointRedo); > } This is quite the code bloat. Can we make this less duplicative? > +CREATE VIEW pg_stat_progress_checkpoint AS > + SELECT > + S.pid AS pid, > + CASE S.param1 WHEN 1 THEN 'checkpoint' > + WHEN 2 THEN 'restartpoint' > + END AS type, > + ( CASE WHEN (S.param2 & 4) > 0 THEN 'immediate ' ELSE '' END || > + CASE WHEN (S.param2 & 8) > 0 THEN 'force ' ELSE '' END || > + CASE WHEN (S.param2 & 16) > 0 THEN 'flush-all ' ELSE '' END || > + CASE WHEN (S.param2 & 32) > 0 THEN 'wait ' ELSE '' END || > + CASE WHEN (S.param2 & 128) > 0 THEN 'wal ' ELSE '' END || > + CASE WHEN (S.param2 & 256) > 0 THEN 'time ' ELSE '' END > + ) AS flags, > + ( '0/0'::pg_lsn + > + ((CASE > + WHEN S.param3 < 0 THEN pow(2::numeric, 64::numeric)::numeric > + ELSE 0::numeric > + END) + > + S.param3::numeric) > + ) AS start_lsn, I don't think we should embed this much complexity in the view defintions. It's hard to read, bloats the catalog, we can't fix them once released. This stuff seems like it should be in a helper function. I don't have any iea what that pow stuff is supposed to be doing. > + to_timestamp(946684800 + (S.param4::float8 / 1000000)) AS start_time, I don't think this is a reasonable path - embedding way too much low-level details about the timestamp format in the view definition. Why do we need to do this? Greetings, Andres Freund
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Bharath Rupireddy
Date:
On Wed, Nov 16, 2022 at 1:35 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, Nov 4, 2022 at 4:27 AM Drouvot, Bertrand > <bertranddrouvot.pg@gmail.com> wrote: > > Please find attached a rebase in v7. > > I don't think it's a good thing that this patch is using the > progress-reporting machinery. The point of that machinery is that we > want any backend to be able to report progress for any command it > happens to be running, and we don't know which command that will be at > any given point in time, or how many backends will be running any > given command at once. So we need some generic set of counters that > can be repurposed for whatever any particular backend happens to be > doing right at the moment. Hm. > But none of that applies to the checkpointer. Any information about > the checkpointer that we want to expose can just be advertised in a > dedicated chunk of shared memory, perhaps even by simply adding it to > CheckpointerShmemStruct. Then you can give the fields whatever names, > types, and sizes you like, and you don't have to do all of this stuff > with mapping down to integers and back. The only real disadvantage > that I can see is then you have to think a bit harder about what the > concurrency model is here, and maybe you end up reimplementing > something similar to what the progress-reporting stuff does for you, > and *maybe* that is a sufficient reason to do it this way. -1 for CheckpointerShmemStruct as it is being used for running checkpoints and I don't think adding stats to it is a great idea. Instead, extending PgStat_CheckpointerStats and using shared memory stats for reporting progress/last checkpoint related stats is a good idea IMO. I also think that a new pg_stat_checkpoint view is needed because, right now, the PgStat_CheckpointerStats stats are exposed via the pg_stat_bgwriter view, having a separate view for checkpoint stats is good here. Also, removing CheckpointStatsData and moving all of those members to PgStat_CheckpointerStats, of course, by being careful about the amount of shared memory required, is also a good idea IMO. Going forward, PgStat_CheckpointerStats and pg_stat_checkpoint view can be a single point of location for all the checkpoint related stats. Thoughts? In fact, I was recently having an off-list chat with Bertrand Drouvot about the above idea. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Andres Freund
Date:
Hi, On 2022-11-16 16:01:55 +0530, Bharath Rupireddy wrote: > -1 for CheckpointerShmemStruct as it is being used for running > checkpoints and I don't think adding stats to it is a great idea. Why? Imo the data needed for progress reporting aren't really "stats". We'd not accumulate counters over time, just for the current checkpoint. I think it might even be useful for other parts of the system to know what the checkpointer is doing, e.g. bgwriter or autovacuum could adapt the behaviour if checkpointer can't keep up. Somehow it'd feel wrong to use the stats system as the source of such adjustments - but perhaps my gut feeling on that isn't right. The best argument for combining progress reporting with accumulating stats is that we could likely share some of the code. Having accumulated stats for all the checkpoint phases would e.g. be quite valuable. > Instead, extending PgStat_CheckpointerStats and using shared memory > stats for reporting progress/last checkpoint related stats is a good > idea IMO There's certainly some potential for deduplicating state and to make stats updated more frequently. But that doesn't necessarily mean that putting the checkpoint progress into PgStat_CheckpointerStats is a good idea (nor the opposite). > I also think that a new pg_stat_checkpoint view is needed > because, right now, the PgStat_CheckpointerStats stats are exposed via > the pg_stat_bgwriter view, having a separate view for checkpoint stats > is good here. I agree that we should do that, but largely independent of the architectural question at hand. Greetings, Andres Freund
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Robert Haas
Date:
On Wed, Nov 16, 2022 at 5:32 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > -1 for CheckpointerShmemStruct as it is being used for running > checkpoints and I don't think adding stats to it is a great idea. > Instead, extending PgStat_CheckpointerStats and using shared memory > stats for reporting progress/last checkpoint related stats is a good > idea IMO. I agree with Andres: progress reporting isn't really quite the same thing as stats, and either place seems like it could be reasonable. I don't presently have an opinion on which is a better fit, but I don't think the fact that CheckpointerShmemStruct is used for running checkpoints rules anything out. Progress reporting is *also* about running checkpoints. Any historical data you want to expose might not be about running checkpoints, but, uh, so what? I don't really see that as a strong argument against it fitting into this struct. > I also think that a new pg_stat_checkpoint view is needed > because, right now, the PgStat_CheckpointerStats stats are exposed via > the pg_stat_bgwriter view, having a separate view for checkpoint stats > is good here. Yep. > Also, removing CheckpointStatsData and moving all of > those members to PgStat_CheckpointerStats, of course, by being careful > about the amount of shared memory required, is also a good idea IMO. > Going forward, PgStat_CheckpointerStats and pg_stat_checkpoint view > can be a single point of location for all the checkpoint related > stats. I'm not sure that I completely follow this part, or that I agree with it. I have never really understood why we drive background writer or checkpointer statistics through the statistics collector. Here again, for things like table statistics, there is no choice, because we could have an unbounded number of tables and need to keep statistics about all of them. The statistics collector can handle that by allocating more memory as required. But there is only one background writer and only one checkpointer, so that is not needed in those cases. Why not just have them expose anything they want to expose through shared memory directly? If the statistics collector provides services that we care about, like persisting data across restarts or making snapshots for transactional behavior, then those might be reasons to go through it even for the background writer or checkpointer. But if so, we should be explicit about what the reasons are, both in the mailing list discussion and in code comments. Otherwise I fear that we'll just end up doing something in a more complicated way than is really necessary. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Andres Freund
Date:
Hi, On 2022-11-16 14:19:32 -0500, Robert Haas wrote: > I have never really understood why we drive background writer or > checkpointer statistics through the statistics collector. To some degree it is required for durability - the stats system needs to know how to write out those stats. But that wasn't ever a good reason to send messages to the stats collector - it could just read the stats from shared memory after all. There's also integration with snapshots of the stats, resetting them, etc. There's also the complexity that some of the stats e.g. for checkpointer aren't about work the checkpointer did, but just have ended up there for historical raisins. E.g. the number of fsyncs and writes done by backends. See below: > Here again, for things like table statistics, there is no choice, because we > could have an unbounded number of tables and need to keep statistics about > all of them. The statistics collector can handle that by allocating more > memory as required. But there is only one background writer and only one > checkpointer, so that is not needed in those cases. Why not just have them > expose anything they want to expose through shared memory directly? That's how it is in 15+. The memory for "fixed-numbered" or "global" statistics are maintained by the stats system, but in plain shared memory, allocated at server start. Not via the hash table. Right now stats updates for the checkpointer use the "changecount" approach to updates. For now that makes sense, because we update the stats only occasionally (after a checkpoint or when writing in CheckpointWriteDelay()) - a stats viewer seeing the checkpoint count go up, without yet seeing the corresponding buffers written would be misleading. I don't think we'd want every buffer write or whatnot go through the changecount mechanism, on some non-x86 platforms that could be noticable. But if we didn't stage the stats updates locally I think we could make most of the stats changes without that overhead. For updates that just increment a single counter there's simply no benefit in the changecount mechanism afaict. I didn't want to do that change during the initial shared memory stats work, it already was bigger than I could handle... It's not quite clear to me what the best path forward is for buf_written_backend / buf_fsync_backend, which currently are reported via the checkpointer stats. I think the best path might be to stop counting them via the CheckpointerShmem->num_backend_writes etc and just populate the fields in the view (for backward compat) via the proposed [1] pg_stat_io patch. Doing that accounting with CheckpointerCommLock held exclusively isn't free. > If the statistics collector provides services that we care about, like > persisting data across restarts or making snapshots for transactional > behavior, then those might be reasons to go through it even for the > background writer or checkpointer. But if so, we should be explicit > about what the reasons are, both in the mailing list discussion and in > code comments. Otherwise I fear that we'll just end up doing something > in a more complicated way than is really necessary. I tried to provide at least some of that in the comments at the start of pgstat.c in 15+. There's very likely more that should be added, but I think it's a decent start. Greetings, Andres Freund [1] https://www.postgresql.org/message-id/CAOtHd0ApHna7_p6mvHoO%2BgLZdxjaQPRemg3_o0a4ytCPijLytQ%40mail.gmail.com
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Bharath Rupireddy
Date:
On Thu, Nov 17, 2022 at 12:49 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > I also think that a new pg_stat_checkpoint view is needed > > because, right now, the PgStat_CheckpointerStats stats are exposed via > > the pg_stat_bgwriter view, having a separate view for checkpoint stats > > is good here. > > Yep. On Wed, Nov 16, 2022 at 11:44 PM Andres Freund <andres@anarazel.de> wrote: > > > I also think that a new pg_stat_checkpoint view is needed > > because, right now, the PgStat_CheckpointerStats stats are exposed via > > the pg_stat_bgwriter view, having a separate view for checkpoint stats > > is good here. > > I agree that we should do that, but largely independent of the architectural > question at hand. Thanks. I quickly prepared a patch introducing pg_stat_checkpointer view and posted it here - https://www.postgresql.org/message-id/CALj2ACVxX2ii%3D66RypXRweZe2EsBRiPMj0aHfRfHUeXJcC7kHg%40mail.gmail.com. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Robert Haas
Date:
On Wed, Nov 16, 2022 at 2:52 PM Andres Freund <andres@anarazel.de> wrote: > I don't think we'd want every buffer write or whatnot go through the > changecount mechanism, on some non-x86 platforms that could be noticable. But > if we didn't stage the stats updates locally I think we could make most of the > stats changes without that overhead. For updates that just increment a single > counter there's simply no benefit in the changecount mechanism afaict. You might be right, but I'm not sure whether it's worth stressing about. The progress reporting mechanism uses the st_changecount mechanism, too, and as far as I know nobody's complained about that having too much overhead. Maybe they have, though, and I've just missed it. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Andres Freund
Date:
Hi, On 2022-11-17 09:03:32 -0500, Robert Haas wrote: > On Wed, Nov 16, 2022 at 2:52 PM Andres Freund <andres@anarazel.de> wrote: > > I don't think we'd want every buffer write or whatnot go through the > > changecount mechanism, on some non-x86 platforms that could be noticable. But > > if we didn't stage the stats updates locally I think we could make most of the > > stats changes without that overhead. For updates that just increment a single > > counter there's simply no benefit in the changecount mechanism afaict. > > You might be right, but I'm not sure whether it's worth stressing > about. The progress reporting mechanism uses the st_changecount > mechanism, too, and as far as I know nobody's complained about that > having too much overhead. Maybe they have, though, and I've just > missed it. I've seen it in profiles, although not as the major contributor. Most places do a reasonable amount of work between calls though. As an experiment, I added a progress report to BufferSync()'s first loop (i.e. where it checks all buffers). On a 128GB shared_buffers cluster that increases the time for a do-nothing checkpoint from ~235ms to ~280ms. If I remove the changecount stuff and use a single write + write barrier, it ends up as 250ms. Inlining brings it down a bit further, to 247ms. Obviously this is a very extreme case - we only do very little work between the progress report calls. But it does seem to show that the overhead is not entirely neglegible. I think pgstat_progress_start_command() needs the changecount stuff, as does pgstat_progress_update_multi_param(). But for anything updating a single parameter at a time it really doesn't do anything useful on a platform that doesn't tear 64bit writes (so it could be #ifdef PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY). Out of further curiosity I wanted to test the impact when the loop doesn't even do a LockBufHdr() and added an unlocked pre-check. 109ms without progress. 138ms with. 114ms with the simplified pgstat_progress_update_param(). 108ms after inlining the simplified pgstat_progress_update_param(). Greetings, Andres Freund
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes: > I think pgstat_progress_start_command() needs the changecount stuff, as does > pgstat_progress_update_multi_param(). But for anything updating a single > parameter at a time it really doesn't do anything useful on a platform that > doesn't tear 64bit writes (so it could be #ifdef > PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY). Seems safe to restrict it to that case. regards, tom lane
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Robert Haas
Date:
On Thu, Nov 17, 2022 at 11:24 AM Andres Freund <andres@anarazel.de> wrote: > As an experiment, I added a progress report to BufferSync()'s first loop > (i.e. where it checks all buffers). On a 128GB shared_buffers cluster that > increases the time for a do-nothing checkpoint from ~235ms to ~280ms. If I > remove the changecount stuff and use a single write + write barrier, it ends > up as 250ms. Inlining brings it down a bit further, to 247ms. OK, I'd say that's pretty good evidence that we can't totally disregard the issue. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Andres Freund
Date:
Hi, On 2022-11-04 09:25:52 +0100, Drouvot, Bertrand wrote: > Please find attached a rebase in v7. cfbot complains that the docs don't build: https://cirrus-ci.com/task/6694349031866368?logs=docs_build#L296 [03:24:27.317] ref/checkpoint.sgml:66: element para: validity error : Element para is not declared in para list of possiblechildren I've marked the patch as waitin-on-author for now. There's been a bunch of architectural feedback too, but tbh, I don't know if we came to any conclusion on that front... Greetings, Andres Freund
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
vignesh C
Date:
On Thu, 8 Dec 2022 at 00:33, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2022-11-04 09:25:52 +0100, Drouvot, Bertrand wrote: > > Please find attached a rebase in v7. > > cfbot complains that the docs don't build: > https://cirrus-ci.com/task/6694349031866368?logs=docs_build#L296 > > [03:24:27.317] ref/checkpoint.sgml:66: element para: validity error : Element para is not declared in para list of possiblechildren > > I've marked the patch as waitin-on-author for now. > > > There's been a bunch of architectural feedback too, but tbh, I don't know if > we came to any conclusion on that front... There has been no updates on this thread for some time, so this has been switched as Returned with Feedback. Feel free to open it in the next commitfest if you plan to continue on this. Regards, Vignesh
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
From
Nitin Jadhav
Date:
Hi, It’s been a long gap in the activity of this thread, and I apologize for the delay. However, I have now returned and reviewed the other threads [1],[2] that have made changes in this area. I would like to share a summary of the discussion that took place among Robert, Andres, Bharath, and Tom on this thread, to make it easier to move forward. Robert was dissatisfied with the approach used in the patch to report progress for the checkpointer process, as he felt the current mechanism is not suitable. He proposed allocating a dedicated chunk of shared memory in CheckpointerShmemStruct. Bharath opposed this, suggesting instead to use PgStat_CheckpointerStats. Andres somewhat supported Robert’s idea but noted that using PgStat_CheckpointerStats would allow for more code reuse. The discussion then shifted towards the statistics handling for the checkpointer process. Robert expressed dissatisfaction with the current statistics handling mechanism. Andres explained the rationale behind the existing setup and the improvements made in pg_stat_io. He also mentioned the overhead of the changecount mechanism when updating for every buffer write. However, for updates involving a single parameter at a time, performance can be improved on platforms that support atomic 64-bit writes (indicated by #ifdef PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY). He also shared performance metrics demonstrating good results with this approach. Tom agreed to use this and restrict it to the specific case. But I am not quite clear on the direction ahead. Let me summarise the approaches based on the above discussion. Approach-1: The approach used in the current patch which uses the existing mechanism of progress reporting. The advantage of this approach is that the machinery is already in place and ready to use. However, it is not suitable for the checkpointer process because only the checkpointer process runs the checkpoint, even if the command is issued from a different backend. The current mechanism is designed for any backend to report progress for any command it is running, and we don’t know which command that will be at any given point in time, or how many backends will be running any given command simultaneously. Hence, this approach does not fit the checkpointer. Additionally, there is complexity involved in mapping down to integers and back. Approach-2: Allocate a dedicated chunk of shared memory in CheckpointerShmemStruct with an appropriate name and size. This approach eliminates the complexity involved in Approach-1 related to mapping down to integers and back. However, it requires building the necessary machinery to suit checkpointer progress reporting which might be similar to the current progress reporting mechanism. Approach-3: Using PgStat_CheckpointerStats to store the progress information. Have we completely ruled out this approach? Additionally all three approaches require improvements in the changecount mechanism on platforms that support atomic 64-bit writes. I’m inclined to favor Approach-2 because it provides a clearer method for reporting progress for the checkpointer process, with the additional work required to implement the necessary machinery. However, I’m still uncertain about the best path forward. Please share your thoughts. [1]: https://www.postgresql.org/message-id/flat/CAOtHd0ApHna7_p6mvHoO%2BgLZdxjaQPRemg3_o0a4ytCPijLytQ%40mail.gmail.com#74ae447064932198495aa6d722fdc092 [2]: https://www.postgresql.org/message-id/CALj2ACVxX2ii=66RypXRweZe2EsBRiPMj0aHfRfHUeXJcC7kHg@mail.gmail.com Best Regards, Nitin Jadhav Azure Database for PostgreSQL Microsoft On Tue, Jan 31, 2023 at 11:16 PM vignesh C <vignesh21@gmail.com> wrote: > > On Thu, 8 Dec 2022 at 00:33, Andres Freund <andres@anarazel.de> wrote: > > > > Hi, > > > > On 2022-11-04 09:25:52 +0100, Drouvot, Bertrand wrote: > > > Please find attached a rebase in v7. > > > > cfbot complains that the docs don't build: > > https://cirrus-ci.com/task/6694349031866368?logs=docs_build#L296 > > > > [03:24:27.317] ref/checkpoint.sgml:66: element para: validity error : Element para is not declared in para list of possiblechildren > > > > I've marked the patch as waitin-on-author for now. > > > > > > There's been a bunch of architectural feedback too, but tbh, I don't know if > > we came to any conclusion on that front... > > There has been no updates on this thread for some time, so this has > been switched as Returned with Feedback. Feel free to open it in the > next commitfest if you plan to continue on this. > > Regards, > Vignesh