Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Date
Msg-id CALj2ACXDpgbVBt48D+2X4Hb7XvROMAg7EnbM-tHB+gPoHvEQew@mail.gmail.com
Whole thread Raw
In response to Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)  (Julien Rouhaud <rjuju123@gmail.com>)
List pgsql-hackers
On Sun, Feb 27, 2022 at 8:44 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Feb 25, 2022 at 8:38 PM Nitin Jadhav
> <nitinjadhavpostgres@gmail.com> wrote:
>
> Had a quick look over the v3 patch. I'm not sure if it's the best way
> to have pg_stat_get_progress_checkpoint_type,
> pg_stat_get_progress_checkpoint_kind and
> pg_stat_get_progress_checkpoint_start_time just for printing info in
> readable format in pg_stat_progress_checkpoint. I don't think these
> functions will ever be useful for the users.
>
> 1) Can't we use pg_is_in_recovery to determine if it's a restartpoint
> or checkpoint instead of having a new function
> pg_stat_get_progress_checkpoint_type?
>
> 2) Can't we just have these checks inside CASE-WHEN-THEN-ELSE blocks
> directly instead of new function pg_stat_get_progress_checkpoint_kind?
> + snprintf(ckpt_kind, MAXPGPATH, "%s%s%s%s%s%s%s%s%s",
> + (flags == 0) ? "unknown" : "",
> + (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " : "",
> + (flags & CHECKPOINT_END_OF_RECOVERY) ? "end-of-recovery " : "",
> + (flags & CHECKPOINT_IMMEDIATE) ? "immediate " : "",
> + (flags & CHECKPOINT_FORCE) ? "force " : "",
> + (flags & CHECKPOINT_WAIT) ? "wait " : "",
> + (flags & CHECKPOINT_CAUSE_XLOG) ? "wal " : "",
> + (flags & CHECKPOINT_CAUSE_TIME) ? "time " : "",
> + (flags & CHECKPOINT_FLUSH_ALL) ? "flush-all" : "");
>
> 3) Why do we need this extra calculation for start_lsn? Do you ever
> see a negative LSN or something here?
> +    ('0/0'::pg_lsn + (
> +        CASE
> +            WHEN (s.param3 < 0) THEN pow((2)::numeric, (64)::numeric)
> +            ELSE (0)::numeric
> +        END + (s.param3)::numeric)) AS start_lsn,
>
> 4) Can't you use timestamptz_in(to_char(s.param4))  instead of
> pg_stat_get_progress_checkpoint_start_time? I don't quite understand
> the reasoning for having this function and it's named as *checkpoint*
> when it doesn't do anything specific to the checkpoint at all?
>
> Having 3 unnecessary functions that aren't useful to the users at all
> in proc.dat will simply eatup the function oids IMO. Hence, I suggest
> let's try to do without extra functions.

Another thought for my review comment:
> 1) Can't we use pg_is_in_recovery to determine if it's a restartpoint
> or checkpoint instead of having a new function
> pg_stat_get_progress_checkpoint_type?

I don't think using pg_is_in_recovery work here as it is taken after
the checkpoint has started. So, I think the right way here is to send
1 in CreateCheckPoint  and 2 in CreateRestartPoint and use
CASE-WHEN-ELSE-END to show "1": "checkpoint" "2":"restartpoint".

Continuing my review:

5) Do we need a special phase for this checkpoint operation? I'm not
sure in which cases it will take a long time, but it looks like
there's a wait loop here.
vxids = GetVirtualXIDsDelayingChkpt(&nvxids);
if (nvxids > 0)
{
do
{
pg_usleep(10000L); /* wait for 10 msec */
} while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids));
}

Also, how about special phases for SyncPostCheckpoint(),
SyncPreCheckpoint(), InvalidateObsoleteReplicationSlots(),
PreallocXlogFiles() (it currently pre-allocates only 1 WAL file, but
it might be increase in future (?)), TruncateSUBTRANS()?

6) SLRU (Simple LRU) isn't a phase here, you can just say
PROGRESS_CHECKPOINT_PHASE_PREDICATE_LOCK_PAGES.
+
+ pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE,
+ PROGRESS_CHECKPOINT_PHASE_SLRU_PAGES);
  CheckPointPredicate();

And :s/checkpointing SLRU pages/checkpointing predicate lock pages
+                      WHEN 9 THEN 'checkpointing SLRU pages'


7) :s/PROGRESS_CHECKPOINT_PHASE_FILE_SYNC/PROGRESS_CHECKPOINT_PHASE_PROCESS_FILE_SYNC_REQUESTS

And :s/WHEN 11 THEN 'performing sync requests'/WHEN 11 THEN
'processing file sync requests'

8) :s/Finalizing/finalizing
+                      WHEN 14 THEN 'Finalizing'

9) :s/checkpointing snapshots/checkpointing logical replication snapshot files
+                      WHEN 3 THEN 'checkpointing snapshots'
:s/checkpointing logical rewrite mappings/checkpointing logical
replication rewrite mapping files
+                      WHEN 4 THEN 'checkpointing logical rewrite mappings'

10) I'm not sure if it's discussed, how about adding the number of
snapshot/mapping files so far the checkpoint has processed in file
processing while loops of
CheckPointSnapBuild/CheckPointLogicalRewriteHeap? Sometimes, there can
be many logical snapshot or mapping files and users may be interested
in knowing the so-far-processed-file-count.

11) I think it's discussed, are we going to add the pid of the
checkpoint requestor?

Regards,
Bharath Rupireddy.



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Design of pg_stat_subscription_workers vs pgstats
Next
From: Bharath Rupireddy
Date:
Subject: Re: Synchronizing slots from primary to standby