Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) - Mailing list pgsql-hackers

From Nitin Jadhav
Subject Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Date
Msg-id CAMm1aWbAuRyJn9v9py=jU7BSfYz6Zb4EaNFda6YQ6519f0305w@mail.gmail.com
Whole thread Raw
In response to Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)  (Ashutosh Sharma <ashu.coek88@gmail.com>)
Responses Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)  (Nitin Jadhav <nitinjadhavpostgres@gmail.com>)
List pgsql-hackers
> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>type</structfield> <type>text</type>
> +      </para>
> +      <para>
> +       Type of checkpoint. See <xref linkend="checkpoint-types"/>.
> +      </para></entry>
> +     </row>
> +
> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>kind</structfield> <type>text</type>
> +      </para>
> +      <para>
> +       Kind of checkpoint. See <xref linkend="checkpoint-kinds"/>.
> +      </para></entry>
> +     </row>
>
> This looks a bit confusing. Two columns, one with the name "checkpoint
> types" and another "checkpoint kinds". You can probably rename
> checkpoint-kinds to checkpoint-flags and let the checkpoint-types be
> as-it-is.

Makes sense. I will change in the next patch.
---

> +
<entry><structname>pg_stat_progress_checkpoint</structname><indexterm><primary>pg_stat_progress_checkpoint</primary></indexterm></entry>
> +      <entry>One row only, showing the progress of the checkpoint.
>
> Let's make this message consistent with the already existing message
> for pg_stat_wal_receiver. See description for pg_stat_wal_receiver
> view in "Dynamic Statistics Views" table.

You want me to change "One row only" to "Only one row" ? If that is
the case then for other views in the "Collected Statistics Views"
table, it is referred as "One row only". Let me know if you are
pointing out something else.
---

> [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint;
> -[ RECORD 1 ]-----+-------------------------------------
> pid               | 22043
> type              | checkpoint
> kind              | immediate force wait requested time
>
> I think the output in the kind column can be displayed as {immediate,
> force, wait, requested, time}. By the way these are all checkpoint
> flags so it is better to display it as checkpoint flags instead of
> checkpoint kind as mentioned in one of my previous comments.

I will update in the next patch.
---

> [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint;
> -[ RECORD 1 ]-----+-------------------------------------
> pid               | 22043
> type              | checkpoint
> kind              | immediate force wait requested time
> start_lsn         | 0/14C60F8
> start_time        | 2022-03-03 18:59:56.018662+05:30
> phase             | performing two phase checkpoint
>
> This is the output I see when the checkpointer process has come out of
> the two phase checkpoint and is currently writing checkpoint xlog
> records and doing other stuff like updating control files etc. Is this
> okay?

The idea behind choosing the phases is based on the functionality
which takes longer time to execute. Since after two phase checkpoint
till post checkpoint cleanup won't take much time to execute, I have
not added any additional phase for that. But I also agree that this
gives wrong information to the user. How about mentioning the phase
information at the end of each phase like "Initializing",
"Initialization done", ..., "two phase checkpoint done", "post
checkpoint cleanup done", .., "finalizing". Except for the first phase
("initializing") and last phase ("finalizing"), all the other phases
describe the end of a certain operation. I feel this gives correct
information even though the phase name/description does not represent
the entire code block between two phases. For example if the current
phase is ''two phase checkpoint done". Then the user can infer that
the checkpointer has done till two phase checkpoint and it is doing
other stuff that are after that. Thoughts?
---

> The output of log_checkpoint shows the number of buffers written is 3
> whereas the output of pg_stat_progress_checkpoint shows it as 0. See
> below:
>
> 2022-03-03 20:04:45.643 IST [22043] LOG:  checkpoint complete: wrote 3
> buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled;
> write=24.652 s, sync=104.256 s, total=3889.625 s; sync files=2,
> longest=0.011 s, average=0.008 s; distance=0 kB, estimate=0 kB
>
> --
>
> [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint;
> -[ RECORD 1 ]-----+-------------------------------------
> pid               | 22043
> type              | checkpoint
> kind              | immediate force wait requested time
> start_lsn         | 0/14C60F8
> start_time        | 2022-03-03 18:59:56.018662+05:30
> phase             | finalizing
> buffers_total     | 0
> buffers_processed | 0
> buffers_written   | 0
>
> Any idea why this mismatch?

Good catch. In BufferSync() we have 'num_to_scan' (buffers_total)
which indicates the total number of buffers to be processed. Based on
that, the 'buffers_processed' and 'buffers_written' counter gets
incremented. I meant these values may reach upto 'buffers_total'. The
current pg_stat_progress_view support above information. There is
another place when 'ckpt_bufs_written' gets incremented (In
SlruInternalWritePage()). This increment is above the 'buffers_total'
value and it is included in the server log message (checkpoint end)
and not included in the view. I am a bit confused here. If we include
this increment in the view then we cannot calculate the exact
'buffers_total' beforehand. Can we increment the 'buffers_toal' also
when 'ckpt_bufs_written' gets incremented so that we can match the
behaviour with checkpoint end message?  Please share your thoughts.
---

> I think we can add a couple of more information to this view -
> start_time for buffer write operation and start_time for buffer sync
> operation. These are two very time consuming tasks in a checkpoint and
> people would find it useful to know how much time is being taken by
> the checkpoint in I/O operation phase. thoughts?

The detailed progress is getting shown for these 2 phases of the
checkpoint like 'buffers_processed', 'buffers_written' and
'files_synced'. Hence I did not think about adding start time and If
it is really required, then I can add.

Thanks & Regards,
Nitin Jadhav

On Thu, Mar 3, 2022 at 8:30 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> Here are some of my review comments on the latest patch:
>
> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>type</structfield> <type>text</type>
> +      </para>
> +      <para>
> +       Type of checkpoint. See <xref linkend="checkpoint-types"/>.
> +      </para></entry>
> +     </row>
> +
> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>kind</structfield> <type>text</type>
> +      </para>
> +      <para>
> +       Kind of checkpoint. See <xref linkend="checkpoint-kinds"/>.
> +      </para></entry>
> +     </row>
>
> This looks a bit confusing. Two columns, one with the name "checkpoint
> types" and another "checkpoint kinds". You can probably rename
> checkpoint-kinds to checkpoint-flags and let the checkpoint-types be
> as-it-is.
>
> ==
>
> +
<entry><structname>pg_stat_progress_checkpoint</structname><indexterm><primary>pg_stat_progress_checkpoint</primary></indexterm></entry>
> +      <entry>One row only, showing the progress of the checkpoint.
>
> Let's make this message consistent with the already existing message
> for pg_stat_wal_receiver. See description for pg_stat_wal_receiver
> view in "Dynamic Statistics Views" table.
>
> ==
>
> [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint;
> -[ RECORD 1 ]-----+-------------------------------------
> pid               | 22043
> type              | checkpoint
> kind              | immediate force wait requested time
>
> I think the output in the kind column can be displayed as {immediate,
> force, wait, requested, time}. By the way these are all checkpoint
> flags so it is better to display it as checkpoint flags instead of
> checkpoint kind as mentioned in one of my previous comments.
>
> ==
>
> [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint;
> -[ RECORD 1 ]-----+-------------------------------------
> pid               | 22043
> type              | checkpoint
> kind              | immediate force wait requested time
> start_lsn         | 0/14C60F8
> start_time        | 2022-03-03 18:59:56.018662+05:30
> phase             | performing two phase checkpoint
>
>
> This is the output I see when the checkpointer process has come out of
> the two phase checkpoint and is currently writing checkpoint xlog
> records and doing other stuff like updating control files etc. Is this
> okay?
>
> ==
>
> The output of log_checkpoint shows the number of buffers written is 3
> whereas the output of pg_stat_progress_checkpoint shows it as 0. See
> below:
>
> 2022-03-03 20:04:45.643 IST [22043] LOG:  checkpoint complete: wrote 3
> buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled;
> write=24.652 s, sync=104.256 s, total=3889.625 s; sync files=2,
> longest=0.011 s, average=0.008 s; distance=0 kB, estimate=0 kB
>
> --
>
> [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint;
> -[ RECORD 1 ]-----+-------------------------------------
> pid               | 22043
> type              | checkpoint
> kind              | immediate force wait requested time
> start_lsn         | 0/14C60F8
> start_time        | 2022-03-03 18:59:56.018662+05:30
> phase             | finalizing
> buffers_total     | 0
> buffers_processed | 0
> buffers_written   | 0
>
> Any idea why this mismatch?
>
> ==
>
> I think we can add a couple of more information to this view -
> start_time for buffer write operation and start_time for buffer sync
> operation. These are two very time consuming tasks in a checkpoint and
> people would find it useful to know how much time is being taken by
> the checkpoint in I/O operation phase. thoughts?
>
> --
> With Regards,
> Ashutosh Sharma.
>
> On Wed, Mar 2, 2022 at 4:45 PM Nitin Jadhav
> <nitinjadhavpostgres@gmail.com> wrote:
> >
> > Thanks for reviewing.
> >
> > > > > I suggested upthread to store the starting timeline instead.  This way you can
> > > > > deduce whether it's a restartpoint or a checkpoint, but you can also deduce
> > > > > other information, like what was the starting WAL.
> > > >
> > > > I don't understand why we need the timeline here to just determine
> > > > whether it's a restartpoint or checkpoint.
> > >
> > > I'm not saying it's necessary, I'm saying that for the same space usage we can
> > > store something a bit more useful.  If no one cares about having the starting
> > > timeline available for no extra cost then sure, let's just store the kind
> > > directly.
> >
> > Fixed.
> >
> > > 2) Can't we just have these checks inside CASE-WHEN-THEN-ELSE blocks
> > > directly instead of new function pg_stat_get_progress_checkpoint_kind?
> > > + snprintf(ckpt_kind, MAXPGPATH, "%s%s%s%s%s%s%s%s%s",
> > > + (flags == 0) ? "unknown" : "",
> > > + (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " : "",
> > > + (flags & CHECKPOINT_END_OF_RECOVERY) ? "end-of-recovery " : "",
> > > + (flags & CHECKPOINT_IMMEDIATE) ? "immediate " : "",
> > > + (flags & CHECKPOINT_FORCE) ? "force " : "",
> > > + (flags & CHECKPOINT_WAIT) ? "wait " : "",
> > > + (flags & CHECKPOINT_CAUSE_XLOG) ? "wal " : "",
> > > + (flags & CHECKPOINT_CAUSE_TIME) ? "time " : "",
> > > + (flags & CHECKPOINT_FLUSH_ALL) ? "flush-all" : "");
> >
> > Fixed.
> > ---
> >
> > > 5) Do we need a special phase for this checkpoint operation? I'm not
> > > sure in which cases it will take a long time, but it looks like
> > > there's a wait loop here.
> > > vxids = GetVirtualXIDsDelayingChkpt(&nvxids);
> > > if (nvxids > 0)
> > > {
> > > do
> > > {
> > > pg_usleep(10000L); /* wait for 10 msec */
> > > } while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids));
> > > }
> >
> > Yes. It is better to add a separate phase here.
> > ---
> >
> > > Also, how about special phases for SyncPostCheckpoint(),
> > > SyncPreCheckpoint(), InvalidateObsoleteReplicationSlots(),
> > > PreallocXlogFiles() (it currently pre-allocates only 1 WAL file, but
> > > it might be increase in future (?)), TruncateSUBTRANS()?
> >
> > SyncPreCheckpoint() is just incrementing a counter and
> > PreallocXlogFiles() currently pre-allocates only 1 WAL file. I feel
> > there is no need to add any phases for these as of now. We can add in
> > the future if necessary. Added phases for SyncPostCheckpoint(),
> > InvalidateObsoleteReplicationSlots() and TruncateSUBTRANS().
> > ---
> >
> > > 6) SLRU (Simple LRU) isn't a phase here, you can just say
> > > PROGRESS_CHECKPOINT_PHASE_PREDICATE_LOCK_PAGES.
> > > +
> > > + pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE,
> > > + PROGRESS_CHECKPOINT_PHASE_SLRU_PAGES);
> > >  CheckPointPredicate();
> > >
> > > And :s/checkpointing SLRU pages/checkpointing predicate lock pages
> > >+                      WHEN 9 THEN 'checkpointing SLRU pages'
> >
> > Fixed.
> > ---
> >
> > > 7) :s/PROGRESS_CHECKPOINT_PHASE_FILE_SYNC/PROGRESS_CHECKPOINT_PHASE_PROCESS_FILE_SYNC_REQUESTS
> >
> > I feel PROGRESS_CHECKPOINT_PHASE_FILE_SYNC is a better option here as
> > it describes the purpose in less words.
> >
> > > And :s/WHEN 11 THEN 'performing sync requests'/WHEN 11 THEN
> > > 'processing file sync requests'
> >
> > Fixed.
> > ---
> >
> > > 8) :s/Finalizing/finalizing
> > > +                      WHEN 14 THEN 'Finalizing'
> >
> > Fixed.
> > ---
> >
> > > 9) :s/checkpointing snapshots/checkpointing logical replication snapshot files
> > > +                      WHEN 3 THEN 'checkpointing snapshots'
> > > :s/checkpointing logical rewrite mappings/checkpointing logical
> > > replication rewrite mapping files
> > > +                      WHEN 4 THEN 'checkpointing logical rewrite mappings'
> >
> > Fixed.
> > ---
> >
> > > 10) I'm not sure if it's discussed, how about adding the number of
> > > snapshot/mapping files so far the checkpoint has processed in file
> > > processing while loops of
> > > CheckPointSnapBuild/CheckPointLogicalRewriteHeap? Sometimes, there can
> > > be many logical snapshot or mapping files and users may be interested
> > > in knowing the so-far-processed-file-count.
> >
> > I had thought about this while sharing the v1 patch and mentioned my
> > views upthread. I feel it won't give meaningful progress information
> > (It can be treated as statistics). Hence not included. Thoughts?
> >
> > > > > As mentioned upthread, there can be multiple backends that request a
> > > > > checkpoint, so unless we want to store an array of pid we should store a number
> > > > > of backend that are waiting for a new checkpoint.
> > > >
> > > > Yeah, you are right. Let's not go that path and store an array of
> > > > pids. I don't see a strong use-case with the pid of the process
> > > > requesting checkpoint. If required, we can add it later once the
> > > > pg_stat_progress_checkpoint view gets in.
> > >
> > > I don't think that's really necessary to give the pid list.
> > >
> > > If you requested a new checkpoint, it doesn't matter if it's only your backend
> > > that triggered it, another backend or a few other dozen, the result will be the
> > > same and you have the information that the request has been seen.  We could
> > > store just a bool for that but having a number instead also gives a bit more
> > > information and may allow you to detect some broken logic on your client code
> > > if it keeps increasing.
> >
> > It's a good metric to show in the view but the information is not
> > readily available. Additional code is required to calculate the number
> > of requests. Is it worth doing that? I feel this can be added later if
> > required.
> >
> > Please find the v4 patch attached and share your thoughts.
> >
> > Thanks & Regards,
> > Nitin Jadhav
> >
> > On Tue, Mar 1, 2022 at 2:27 PM Nitin Jadhav
> > <nitinjadhavpostgres@gmail.com> wrote:
> > >
> > > > > 3) Why do we need this extra calculation for start_lsn?
> > > > > Do you ever see a negative LSN or something here?
> > > > > +    ('0/0'::pg_lsn + (
> > > > > +        CASE
> > > > > +            WHEN (s.param3 < 0) THEN pow((2)::numeric, (64)::numeric)
> > > > > +            ELSE (0)::numeric
> > > > > +        END + (s.param3)::numeric)) AS start_lsn,
> > > >
> > > > Yes: LSN can take up all of an uint64; whereas the pgstat column is a
> > > > bigint type; thus the signed int64. This cast is OK as it wraps
> > > > around, but that means we have to take care to correctly display the
> > > > LSN when it is > 0x7FFF_FFFF_FFFF_FFFF; which is what we do here using
> > > > the special-casing for negative values.
> > >
> > > Yes. The extra calculation is required here as we are storing unit64
> > > value in the variable of type int64. When we convert uint64 to int64
> > > then the bit pattern is preserved (so no data is lost). The high-order
> > > bit becomes the sign bit and if the sign bit is set, both the sign and
> > > magnitude of the value changes. To safely get the actual uint64 value
> > > whatever was assigned, we need the above calculations.
> > >
> > > > > 4) Can't you use timestamptz_in(to_char(s.param4))  instead of
> > > > > pg_stat_get_progress_checkpoint_start_time? I don't quite understand
> > > > > the reasoning for having this function and it's named as *checkpoint*
> > > > > when it doesn't do anything specific to the checkpoint at all?
> > > >
> > > > I hadn't thought of using the types' inout functions, but it looks
> > > > like timestamp IO functions use a formatted timestring, which won't
> > > > work with the epoch-based timestamp stored in the view.
> > >
> > > There is a variation of to_timestamp() which takes UNIX epoch (float8)
> > > as an argument and converts it to timestamptz but we cannot directly
> > > call this function with S.param4.
> > >
> > > TimestampTz
> > > GetCurrentTimestamp(void)
> > > {
> > >     TimestampTz result;
> > >     struct timeval tp;
> > >
> > >     gettimeofday(&tp, NULL);
> > >
> > >     result = (TimestampTz) tp.tv_sec -
> > >         ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY);
> > >     result = (result * USECS_PER_SEC) + tp.tv_usec;
> > >
> > >     return result;
> > > }
> > >
> > > S.param4 contains the output of the above function
> > > (GetCurrentTimestamp()) which returns Postgres epoch but the
> > > to_timestamp() expects UNIX epoch as input. So some calculation is
> > > required here. I feel the SQL 'to_timestamp(946684800 +
> > > (S.param4::float / 1000000)) AS start_time' works fine. The value
> > > '946684800' is equal to ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) *
> > > SECS_PER_DAY). I am not sure whether it is good practice to use this
> > > way. Kindly share your thoughts.
> > >
> > > Thanks & Regards,
> > > Nitin Jadhav
> > >
> > > On Mon, Feb 28, 2022 at 6:40 PM Matthias van de Meent
> > > <boekewurm+postgres@gmail.com> wrote:
> > > >
> > > > On Sun, 27 Feb 2022 at 16:14, Bharath Rupireddy
> > > > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > > > 3) Why do we need this extra calculation for start_lsn?
> > > > > Do you ever see a negative LSN or something here?
> > > > > +    ('0/0'::pg_lsn + (
> > > > > +        CASE
> > > > > +            WHEN (s.param3 < 0) THEN pow((2)::numeric, (64)::numeric)
> > > > > +            ELSE (0)::numeric
> > > > > +        END + (s.param3)::numeric)) AS start_lsn,
> > > >
> > > > Yes: LSN can take up all of an uint64; whereas the pgstat column is a
> > > > bigint type; thus the signed int64. This cast is OK as it wraps
> > > > around, but that means we have to take care to correctly display the
> > > > LSN when it is > 0x7FFF_FFFF_FFFF_FFFF; which is what we do here using
> > > > the special-casing for negative values.
> > > >
> > > > As to whether it is reasonable: Generating 16GB of wal every second
> > > > (2^34 bytes /sec) is probably not impossible (cpu <> memory bandwidth
> > > > has been > 20GB/sec for a while); and that leaves you 2^29 seconds of
> > > > database runtime; or about 17 years. Seeing that a cluster can be
> > > > `pg_upgrade`d (which doesn't reset cluster LSN) since PG 9.0 from at
> > > > least version PG 8.4.0 (2009) (and through pg_migrator, from 8.3.0)),
> > > > we can assume that clusters hitting LSN=2^63 will be a reasonable
> > > > possibility within the next few years. As the lifespan of a PG release
> > > > is about 5 years, it doesn't seem impossible that there will be actual
> > > > clusters that are going to hit this naturally in the lifespan of PG15.
> > > >
> > > > It is also possible that someone fat-fingers pg_resetwal; and creates
> > > > a cluster with LSN >= 2^63; resulting in negative values in the
> > > > s.param3 field. Not likely, but we can force such situations; and as
> > > > such we should handle that gracefully.
> > > >
> > > > > 4) Can't you use timestamptz_in(to_char(s.param4))  instead of
> > > > > pg_stat_get_progress_checkpoint_start_time? I don't quite understand
> > > > > the reasoning for having this function and it's named as *checkpoint*
> > > > > when it doesn't do anything specific to the checkpoint at all?
> > > >
> > > > I hadn't thought of using the types' inout functions, but it looks
> > > > like timestamp IO functions use a formatted timestring, which won't
> > > > work with the epoch-based timestamp stored in the view.
> > > >
> > > > > Having 3 unnecessary functions that aren't useful to the users at all
> > > > > in proc.dat will simply eatup the function oids IMO. Hence, I suggest
> > > > > let's try to do without extra functions.
> > > >
> > > > I agree that (1) could be simplified, or at least fully expressed in
> > > > SQL without exposing too many internals. If we're fine with exposing
> > > > internals like flags and type layouts, then (2), and arguably (4), can
> > > > be expressed in SQL as well.
> > > >
> > > > -Matthias



pgsql-hackers by date:

Previous
From: Joshua Brindle
Date:
Subject: Re: Proposal: Support custom authentication methods using hooks,Re: Proposal: Support custom authentication methods using hooks
Next
From: Magnus Hagander
Date:
Subject: Re: Add parameter jit_warn_above_fraction