Re: Report checkpoint progress in server logs - Mailing list pgsql-hackers

From SATYANARAYANA NARLAPURAM
Subject Re: Report checkpoint progress in server logs
Date
Msg-id CAHg+QDerBgyBMrDFAHAqBa0QaNo6J59xjxekkWSEa6oTvb_jvw@mail.gmail.com
Whole thread Raw
In response to Re: Report checkpoint progress in server logs  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
  Coincidentally, I was thinking about the same yesterday after tired of waiting for the checkpoint completion on a server.  

On Wed, Dec 29, 2021 at 7:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
>> Therefore, reporting the checkpoint progress in the server logs, much
>> like [1], seems to be the best way IMO.

> I find progress reporting in the logfile to generally be a terrible
> way of doing things, and the fact that we do it for the startup
> process is/should be only because we have no other choice, not because
> it's the right choice.

I'm already pretty seriously unhappy about the log-spamming effects of
64da07c41 (default to log_checkpoints=on), and am willing to lay a side
bet that that gets reverted after we have some field experience with it.
This proposal seems far worse from that standpoint.  Keep in mind that
our out-of-the-box logging configuration still doesn't have any log
rotation ability, which means that the noisier the server is in normal
operation, the sooner you fill your disk.

Server is not open up for the queries while running the end of recovery checkpoint and a catalog view may not help here but the process title change or logging would be helpful in such cases. When the server is running the recovery, anxious customers ask several times the ETA for recovery completion, and not having visibility into these operations makes life difficult for the DBA/operations.
 

> I think the right choice to solve the *general* problem is the
> mentioned pg_stat_progress_checkpoints.

+1
 
+1 to this. We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of just emitting the stats at the end.


Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substep it is running, whether it is on target for completion, checkpoint_Reason (manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need some refactoring here.
 

                        regards, tom lane


pgsql-hackers by date:

Previous
From: Sadhuprasad Patro
Date:
Subject: Per-table storage parameters for TableAM/IndexAM extensions
Next
From: SATYANARAYANA NARLAPURAM
Date:
Subject: Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes