Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) - Mailing list pgsql-hackers

From Ashutosh Sharma
Subject Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Date
Msg-id CAE9k0Png9tjW5M7wjMKzEoOqkHS-VToP9yC2pr=HX7aQ0yQRNA@mail.gmail.com
Whole thread Raw
In response to Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)  (Nitin Jadhav <nitinjadhavpostgres@gmail.com>)
Responses Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)  (Nitin Jadhav <nitinjadhavpostgres@gmail.com>)
List pgsql-hackers
+       if ((ckpt_flags &
+                (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY)) == 0)
+       {

This code (present at multiple places) looks a little ugly to me, what
we can do instead is add a macro probably named IsShutdownCheckpoint()
which does the above check and use it in all the functions that have
this check. See below:

#define IsShutdownCheckpoint(flags) \
  (flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY) != 0)

And then you may use this macro like:

if (IsBootstrapProcessingMode() || IsShutdownCheckpoint(flags))
    return;

This change can be done in all these functions:

+void
+checkpoint_progress_start(int flags)

--

+ */
+void
+checkpoint_progress_update_param(int index, int64 val)

--

+ * Stop reporting progress of the checkpoint.
+ */
+void
+checkpoint_progress_end(void)

==

+
pgstat_progress_start_command(PROGRESS_COMMAND_CHECKPOINT,
InvalidOid);
+
+               val[0] = XLogCtl->InsertTimeLineID;
+               val[1] = flags;
+               val[2] = PROGRESS_CHECKPOINT_PHASE_INIT;
+               val[3] = CheckpointStats.ckpt_start_t;
+
+               pgstat_progress_update_multi_param(4, index, val);
+       }

Any specific reason for recording the timelineID in checkpoint stats
table? Will this ever change in our case?

--
With Regards,
Ashutosh Sharma.

On Wed, Feb 23, 2022 at 6:59 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
>
> > I will make use of pgstat_progress_update_multi_param() in the next
> > patch to replace multiple calls to checkpoint_progress_update_param().
>
> Fixed.
> ---
>
> > > The other progress tables use [type]_total as column names for counter
> > > targets (e.g. backup_total for backup_streamed, heap_blks_total for
> > > heap_blks_scanned, etc.). I think that `buffers_total` and
> > > `files_total` would be better column names.
> >
> > I agree and I will update this in the next patch.
>
> Fixed.
> ---
>
> > How about this "The checkpoint is started because max_wal_size is reached".
> >
> > "The checkpoint is started because checkpoint_timeout expired".
> >
> > "The checkpoint is started because some operation forced a checkpoint".
>
> I have used the above description. Kindly let me know if any changes
> are required.
> ---
>
> > > > +      <entry><literal>checkpointing CommitTs pages</literal></entry>
> > >
> > > CommitTs -> Commit time stamp
> >
> > I will handle this in the next patch.
>
> Fixed.
> ---
>
> > There are more scenarios where you can have a baackend requesting a checkpoint
> > and waiting for its completion, and there may be more than one backend
> > concerned, so I don't think that storing only one / the first backend pid is
> > ok.
>
> Thanks for this information. I am not considering backend_pid.
> ---
>
> > I think all the information should be exposed.  Only knowing why the current
> > checkpoint has been triggered without any further information seems a bit
> > useless.  Think for instance for cases like [1].
>
> I have supported all possible checkpoint kinds. Added
> pg_stat_get_progress_checkpoint_kind() to convert the flags (int) to a
> string representing a combination of flags and also checking for the
> flag update in ImmediateCheckpointRequested() which checks whether
> CHECKPOINT_IMMEDIATE flag is set or not. I did not find any other
> cases where the flags get changed (which changes the current
> checkpoint behaviour) during the checkpoint. Kindly let me know if I
> am missing something.
> ---
>
> > > I feel 'processes_wiating' aligns more with the naming conventions of
> > > the fields of the existing progres views.
> >
> > There's at least pg_stat_progress_vacuum.num_dead_tuples.  Anyway I don't have
> > a strong opinion on it, just make sure to correct the typo.
>
> More analysis is required to support this. I am planning to take care
> in the next patch.
> ---
>
> > If pg_is_in_recovery() is true, then it's a restartpoint, otherwise it's a
> > restartpoint if the checkpoint's timeline is different from the current
> > timeline?
>
> Fixed.
>
> Sharing the v2 patch. Kindly have a look and share your comments.
>
> Thanks & Regards,
> Nitin Jadhav
>
>
>
>
> On Tue, Feb 22, 2022 at 12:08 PM Nitin Jadhav
> <nitinjadhavpostgres@gmail.com> wrote:
> >
> > > > Thank you for sharing the information.  'triggering backend PID' (int)
> > > > - can be stored without any problem. 'checkpoint or restartpoint?'
> > > > (boolean) - can be stored as a integer value like
> > > > PROGRESS_CHECKPOINT_TYPE_CHECKPOINT(0) and
> > > > PROGRESS_CHECKPOINT_TYPE_RESTARTPOINT(1). 'elapsed time' (store as
> > > > start time in stat_progress, timestamp fits in 64 bits) - As
> > > > Timestamptz is of type int64 internally, so we can store the timestamp
> > > > value in the progres parameter and then expose a function like
> > > > 'pg_stat_get_progress_checkpoint_elapsed' which takes int64 (not
> > > > Timestamptz) as argument and then returns string representing the
> > > > elapsed time.
> > >
> > > No need to use a string there; I think exposing the checkpoint start
> > > time is good enough. The conversion of int64 to timestamp[tz] can be
> > > done in SQL (although I'm not sure that exposing the internal bitwise
> > > representation of Interval should be exposed to that extent) [0].
> > > Users can then extract the duration interval using now() - start_time,
> > > which also allows the user to use their own preferred formatting.
> >
> > The reason for showing the elapsed time rather than exposing the
> > timestamp directly is in case of checkpoint during shutdown and
> > end-of-recovery, I am planning to log a message in server logs using
> > 'log_startup_progress_interval' infrastructure which displays elapsed
> > time. So just to match both of the behaviour I am displaying elapsed
> > time here. I feel that elapsed time gives a quicker feel of the
> > progress. Kindly let me know if you still feel just exposing the
> > timestamp is better than showing the elapsed time.
> >
> > > >  'checkpoint start location' (lsn = uint64) - I feel we
> > > > cannot use progress parameters for this case. As assigning uint64 to
> > > > int64 type would be an issue for larger values and can lead to hidden
> > > > bugs.
> > >
> > > Not necessarily - we can (without much trouble) do a bitwise cast from
> > > uint64 to int64, and then (in SQL) cast it back to a pg_lsn [1]. Not
> > > very elegant, but it works quite well.
> > >
> > > [1] SELECT '0/0'::pg_lsn + ((CASE WHEN stat.my_int64 < 0 THEN
> > > pow(2::numeric, 64::numeric)::numeric ELSE 0::numeric END) +
> > > stat.my_int64::numeric) FROM (SELECT -2::bigint /* 0xFFFFFFFF/FFFFFFFE
> > > */ AS my_bigint_lsn) AS stat(my_int64);
> >
> > Thanks for sharing. It works. I will include this in the next patch.
> > On Sat, Feb 19, 2022 at 11:02 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > On Fri, Feb 18, 2022 at 08:07:05PM +0530, Nitin Jadhav wrote:
> > > >
> > > > The backend_pid contains a valid value only during
> > > > the CHECKPOINT command issued by the backend explicitly, otherwise the
> > > > value will be 0.  We may have to add an additional field to
> > > > 'CheckpointerShmemStruct' to hold the backend pid. The backend
> > > > requesting the checkpoint will update its pid to this structure.
> > > > Kindly let me know if you still feel the backend_pid field is not
> > > > necessary.
> > >
> > > There are more scenarios where you can have a baackend requesting a checkpoint
> > > and waiting for its completion, and there may be more than one backend
> > > concerned, so I don't think that storing only one / the first backend pid is
> > > ok.
> > >
> > > > > And also while looking at the patch I see there's the same problem that I
> > > > > mentioned in the previous thread, which is that the effective flags can be
> > > > > updated once the checkpoint started, and as-is the view won't reflect that.  It
> > > > > also means that you can't simply display one of wal, time or force but a
> > > > > possible combination of the flags (including the one not handled in v1).
> > > >
> > > > If I understand the above comment properly, it has 2 points. First is
> > > > to display the combination of flags rather than just displaying wal,
> > > > time or force - The idea behind this is to just let the user know the
> > > > reason for checkpointing. That is, the checkpoint is started because
> > > > max_wal_size is reached or checkpoint_timeout expired or explicitly
> > > > issued CHECKPOINT command. The other flags like CHECKPOINT_IMMEDIATE,
> > > > CHECKPOINT_WAIT or CHECKPOINT_FLUSH_ALL indicate how the checkpoint
> > > > has to be performed. Hence I have not included those in the view.  If
> > > > it is really required, I would like to modify the code to include
> > > > other flags and display the combination.
> > >
> > > I think all the information should be exposed.  Only knowing why the current
> > > checkpoint has been triggered without any further information seems a bit
> > > useless.  Think for instance for cases like [1].
> > >
> > > > Second point is to reflect
> > > > the updated flags in the view. AFAIK, there is a possibility that the
> > > > flags get updated during the on-going checkpoint but the reason for
> > > > checkpoint (wal, time or force) will remain same for the current
> > > > checkpoint. There might be a change in how checkpoint has to be
> > > > performed if CHECKPOINT_IMMEDIATE flag is set. If we go with
> > > > displaying the combination of flags in the view, then probably we may
> > > > have to reflect this in the view.
> > >
> > > You can only "upgrade" a checkpoint, but not "downgrade" it.  So if for
> > > instance you find both CHECKPOINT_CAUSE_TIME and CHECKPOINT_FORCE (which is
> > > possible) you can easily know which one was the one that triggered the
> > > checkpoint and which one was added later.
> > >
> > > > > > Probably a new field named 'processes_wiating' or 'events_waiting' can be
> > > > > > added for this purpose.
> > > > >
> > > > > Maybe num_process_waiting?
> > > >
> > > > I feel 'processes_wiating' aligns more with the naming conventions of
> > > > the fields of the existing progres views.
> > >
> > > There's at least pg_stat_progress_vacuum.num_dead_tuples.  Anyway I don't have
> > > a strong opinion on it, just make sure to correct the typo.
> > >
> > > > > > Probably writing of buffers or syncing files may complete before
> > > > > > pg_is_in_recovery() returns false. But there are some cleanup
> > > > > > operations happen as part of the checkpoint. During this scenario, we
> > > > > > may get false value for pg_is_in_recovery(). Please refer following
> > > > > > piece of code which is present in CreateRestartpoint().
> > > > > >
> > > > > > if (!RecoveryInProgress())
> > > > > >         replayTLI = XLogCtl->InsertTimeLineID;
> > > > >
> > > > > Then maybe we could store the timeline rather then then kind of checkpoint?
> > > > > You should still be able to compute the information while giving a bit more
> > > > > information for the same memory usage.
> > > >
> > > > Can you please describe more about how checkpoint/restartpoint can be
> > > > confirmed using the timeline id.
> > >
> > > If pg_is_in_recovery() is true, then it's a restartpoint, otherwise it's a
> > > restartpoint if the checkpoint's timeline is different from the current
> > > timeline?
> > >
> > > [1] https://www.postgresql.org/message-id/1486805889.24568.96.camel%40credativ.de



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: logical decoding and replication of sequences
Next
From: Andres Freund
Date:
Subject: Re: bailing out in tap tests nearly always a bad idea