Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) - Mailing list pgsql-hackers
From | Nitin Jadhav |
---|---|
Subject | Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) |
Date | |
Msg-id | CAMm1aWYhnsoOgOJ7FPDa_hjriu=0YNz2QFaR6H508MOxTQN4bw@mail.gmail.com Whole thread Raw |
In response to | Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) (Michael Paquier <michael@paquier.xyz>) |
List | pgsql-hackers |
> > Have you measured the performance effects of this? On fast storage with large > > shared_buffers I've seen these loops in profiles. It's probably fine, but it'd > > be good to verify that. > > I am wondering if we could make the function inlined at some point. > We could also play it safe and only update the counters every N loops > instead. The idea looks good but based on the performance numbers shared above, it is not affecting the performance. So we can use the current approach as it gives more accurate progress. --- > > This view is depressingly complicated. Added up the view definitions for > > the already existing pg_stat_progress* views add up to a measurable part of > > the size of an empty database: > > Yeah. I think that what's proposed could be simplified, and we had > better remove the fields that are not that useful. First, do we have > any need for next_flags? "next_flags" is removed in the v6 patch. Added a "new_requests" field to get to know whether the current checkpoint is being throttled or not. Please share your views on this. --- > Second, is the start LSN really necessary > for monitoring purposes? IMO, start LSN is necessary to debug if the checkpoint is taking longer. --- > Not all the information in the first > parameter is useful, as well. For example "shutdown" will never be > seen as it is not possible to use a session at this stage, no? I understand that "shutdown" and "end-of-recovery" will never be seen and I have removed it in the v6 patch. --- > There > is also no gain in having "immediate", "flush-all", "force" and "wait" > (for this one if the checkpoint is requested the session doing the > work knows this information already). "immediate" is required to understand whether the current checkpoint is throttled or not. I am not sure about other flags "flush-all", "force" and "wait". I have just supported all the flags to match the 'checkpoint start' log message. Please share your views. If it is not really required, I will remove it in the next patch. --- > A last thing is that we may gain in visibility by having more > attributes as an effect of splitting param2. On thing that would make > sense is to track the reason why the checkpoint was triggered > separately (aka wal and time). Should we use a text[] instead to list > all the parameters instead? Using a space-separated list of items is > not intuitive IMO, and callers of this routine will likely parse > that. If I understand the above comment correctly, you are saying to introduce a new field, say "reason" ( possible values are either wal or time) and the "flags" field will continue to represent the other flags like "immediate", etc. The idea looks good here. We can introduce new field "reason" and "flags" field can be renamed to "throttled" (true/false) if we decide to not support other flags "flush-all", "force" and "wait". --- > + WHEN 3 THEN 'checkpointing replication slots' > + WHEN 4 THEN 'checkpointing logical replication snapshot files' > + WHEN 5 THEN 'checkpointing logical rewrite mapping files' > + WHEN 6 THEN 'checkpointing replication origin' > + WHEN 7 THEN 'checkpointing commit log pages' > + WHEN 8 THEN 'checkpointing commit time stamp pages' > There is a lot of "checkpointing" here. All those terms could be > shorter without losing their meaning. I will try to make it short in the next patch. --- Please share your thoughts. Thanks & Regards, Nitin Jadhav On Tue, Apr 5, 2022 at 3:15 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Mar 18, 2022 at 05:15:56PM -0700, Andres Freund wrote: > > Have you measured the performance effects of this? On fast storage with large > > shared_buffers I've seen these loops in profiles. It's probably fine, but it'd > > be good to verify that. > > I am wondering if we could make the function inlined at some point. > We could also play it safe and only update the counters every N loops > instead. > > > This view is depressingly complicated. Added up the view definitions for > > the already existing pg_stat_progress* views add up to a measurable part of > > the size of an empty database: > > Yeah. I think that what's proposed could be simplified, and we had > better remove the fields that are not that useful. First, do we have > any need for next_flags? Second, is the start LSN really necessary > for monitoring purposes? Not all the information in the first > parameter is useful, as well. For example "shutdown" will never be > seen as it is not possible to use a session at this stage, no? There > is also no gain in having "immediate", "flush-all", "force" and "wait" > (for this one if the checkpoint is requested the session doing the > work knows this information already). > > A last thing is that we may gain in visibility by having more > attributes as an effect of splitting param2. On thing that would make > sense is to track the reason why the checkpoint was triggered > separately (aka wal and time). Should we use a text[] instead to list > all the parameters instead? Using a space-separated list of items is > not intuitive IMO, and callers of this routine will likely parse > that. > > Shouldn't we also track the number of files flushed in each sub-step? > In some deployments you could have a large number of 2PC files and > such. We may want more information on such matters. > > + WHEN 3 THEN 'checkpointing replication slots' > + WHEN 4 THEN 'checkpointing logical replication snapshot files' > + WHEN 5 THEN 'checkpointing logical rewrite mapping files' > + WHEN 6 THEN 'checkpointing replication origin' > + WHEN 7 THEN 'checkpointing commit log pages' > + WHEN 8 THEN 'checkpointing commit time stamp pages' > There is a lot of "checkpointing" here. All those terms could be > shorter without losing their meaning. > > This patch still needs some work, so I am marking it as RwF for now. > -- > Michael
pgsql-hackers by date: