Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) - Mailing list pgsql-hackers
From | Julien Rouhaud |
---|---|
Subject | Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) |
Date | |
Msg-id | 20220219053215.r4wqatln5awklecf@jrouhaud Whole thread Raw |
In response to | Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) (Nitin Jadhav <nitinjadhavpostgres@gmail.com>) |
Responses |
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
|
List | pgsql-hackers |
Hi, On Fri, Feb 18, 2022 at 08:07:05PM +0530, Nitin Jadhav wrote: > > The backend_pid contains a valid value only during > the CHECKPOINT command issued by the backend explicitly, otherwise the > value will be 0. We may have to add an additional field to > 'CheckpointerShmemStruct' to hold the backend pid. The backend > requesting the checkpoint will update its pid to this structure. > Kindly let me know if you still feel the backend_pid field is not > necessary. There are more scenarios where you can have a baackend requesting a checkpoint and waiting for its completion, and there may be more than one backend concerned, so I don't think that storing only one / the first backend pid is ok. > > And also while looking at the patch I see there's the same problem that I > > mentioned in the previous thread, which is that the effective flags can be > > updated once the checkpoint started, and as-is the view won't reflect that. It > > also means that you can't simply display one of wal, time or force but a > > possible combination of the flags (including the one not handled in v1). > > If I understand the above comment properly, it has 2 points. First is > to display the combination of flags rather than just displaying wal, > time or force - The idea behind this is to just let the user know the > reason for checkpointing. That is, the checkpoint is started because > max_wal_size is reached or checkpoint_timeout expired or explicitly > issued CHECKPOINT command. The other flags like CHECKPOINT_IMMEDIATE, > CHECKPOINT_WAIT or CHECKPOINT_FLUSH_ALL indicate how the checkpoint > has to be performed. Hence I have not included those in the view. If > it is really required, I would like to modify the code to include > other flags and display the combination. I think all the information should be exposed. Only knowing why the current checkpoint has been triggered without any further information seems a bit useless. Think for instance for cases like [1]. > Second point is to reflect > the updated flags in the view. AFAIK, there is a possibility that the > flags get updated during the on-going checkpoint but the reason for > checkpoint (wal, time or force) will remain same for the current > checkpoint. There might be a change in how checkpoint has to be > performed if CHECKPOINT_IMMEDIATE flag is set. If we go with > displaying the combination of flags in the view, then probably we may > have to reflect this in the view. You can only "upgrade" a checkpoint, but not "downgrade" it. So if for instance you find both CHECKPOINT_CAUSE_TIME and CHECKPOINT_FORCE (which is possible) you can easily know which one was the one that triggered the checkpoint and which one was added later. > > > Probably a new field named 'processes_wiating' or 'events_waiting' can be > > > added for this purpose. > > > > Maybe num_process_waiting? > > I feel 'processes_wiating' aligns more with the naming conventions of > the fields of the existing progres views. There's at least pg_stat_progress_vacuum.num_dead_tuples. Anyway I don't have a strong opinion on it, just make sure to correct the typo. > > > Probably writing of buffers or syncing files may complete before > > > pg_is_in_recovery() returns false. But there are some cleanup > > > operations happen as part of the checkpoint. During this scenario, we > > > may get false value for pg_is_in_recovery(). Please refer following > > > piece of code which is present in CreateRestartpoint(). > > > > > > if (!RecoveryInProgress()) > > > replayTLI = XLogCtl->InsertTimeLineID; > > > > Then maybe we could store the timeline rather then then kind of checkpoint? > > You should still be able to compute the information while giving a bit more > > information for the same memory usage. > > Can you please describe more about how checkpoint/restartpoint can be > confirmed using the timeline id. If pg_is_in_recovery() is true, then it's a restartpoint, otherwise it's a restartpoint if the checkpoint's timeline is different from the current timeline? [1] https://www.postgresql.org/message-id/1486805889.24568.96.camel%40credativ.de
pgsql-hackers by date: