Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) - Mailing list pgsql-hackers
From | Nitin Jadhav |
---|---|
Subject | Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) |
Date | |
Msg-id | CAMm1aWYWmqys2O7UekQ0=ZSWSpG9h2qoEW7w46yiWM0eO1Lu7g@mail.gmail.com Whole thread Raw |
In response to | Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) (Nitin Jadhav <nitinjadhavpostgres@gmail.com>) |
Responses |
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) |
List | pgsql-hackers |
Thanks for reviewing. > > > I suggested upthread to store the starting timeline instead. This way you can > > > deduce whether it's a restartpoint or a checkpoint, but you can also deduce > > > other information, like what was the starting WAL. > > > > I don't understand why we need the timeline here to just determine > > whether it's a restartpoint or checkpoint. > > I'm not saying it's necessary, I'm saying that for the same space usage we can > store something a bit more useful. If no one cares about having the starting > timeline available for no extra cost then sure, let's just store the kind > directly. Fixed. > 2) Can't we just have these checks inside CASE-WHEN-THEN-ELSE blocks > directly instead of new function pg_stat_get_progress_checkpoint_kind? > + snprintf(ckpt_kind, MAXPGPATH, "%s%s%s%s%s%s%s%s%s", > + (flags == 0) ? "unknown" : "", > + (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " : "", > + (flags & CHECKPOINT_END_OF_RECOVERY) ? "end-of-recovery " : "", > + (flags & CHECKPOINT_IMMEDIATE) ? "immediate " : "", > + (flags & CHECKPOINT_FORCE) ? "force " : "", > + (flags & CHECKPOINT_WAIT) ? "wait " : "", > + (flags & CHECKPOINT_CAUSE_XLOG) ? "wal " : "", > + (flags & CHECKPOINT_CAUSE_TIME) ? "time " : "", > + (flags & CHECKPOINT_FLUSH_ALL) ? "flush-all" : ""); Fixed. --- > 5) Do we need a special phase for this checkpoint operation? I'm not > sure in which cases it will take a long time, but it looks like > there's a wait loop here. > vxids = GetVirtualXIDsDelayingChkpt(&nvxids); > if (nvxids > 0) > { > do > { > pg_usleep(10000L); /* wait for 10 msec */ > } while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids)); > } Yes. It is better to add a separate phase here. --- > Also, how about special phases for SyncPostCheckpoint(), > SyncPreCheckpoint(), InvalidateObsoleteReplicationSlots(), > PreallocXlogFiles() (it currently pre-allocates only 1 WAL file, but > it might be increase in future (?)), TruncateSUBTRANS()? SyncPreCheckpoint() is just incrementing a counter and PreallocXlogFiles() currently pre-allocates only 1 WAL file. I feel there is no need to add any phases for these as of now. We can add in the future if necessary. Added phases for SyncPostCheckpoint(), InvalidateObsoleteReplicationSlots() and TruncateSUBTRANS(). --- > 6) SLRU (Simple LRU) isn't a phase here, you can just say > PROGRESS_CHECKPOINT_PHASE_PREDICATE_LOCK_PAGES. > + > + pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE, > + PROGRESS_CHECKPOINT_PHASE_SLRU_PAGES); > CheckPointPredicate(); > > And :s/checkpointing SLRU pages/checkpointing predicate lock pages >+ WHEN 9 THEN 'checkpointing SLRU pages' Fixed. --- > 7) :s/PROGRESS_CHECKPOINT_PHASE_FILE_SYNC/PROGRESS_CHECKPOINT_PHASE_PROCESS_FILE_SYNC_REQUESTS I feel PROGRESS_CHECKPOINT_PHASE_FILE_SYNC is a better option here as it describes the purpose in less words. > And :s/WHEN 11 THEN 'performing sync requests'/WHEN 11 THEN > 'processing file sync requests' Fixed. --- > 8) :s/Finalizing/finalizing > + WHEN 14 THEN 'Finalizing' Fixed. --- > 9) :s/checkpointing snapshots/checkpointing logical replication snapshot files > + WHEN 3 THEN 'checkpointing snapshots' > :s/checkpointing logical rewrite mappings/checkpointing logical > replication rewrite mapping files > + WHEN 4 THEN 'checkpointing logical rewrite mappings' Fixed. --- > 10) I'm not sure if it's discussed, how about adding the number of > snapshot/mapping files so far the checkpoint has processed in file > processing while loops of > CheckPointSnapBuild/CheckPointLogicalRewriteHeap? Sometimes, there can > be many logical snapshot or mapping files and users may be interested > in knowing the so-far-processed-file-count. I had thought about this while sharing the v1 patch and mentioned my views upthread. I feel it won't give meaningful progress information (It can be treated as statistics). Hence not included. Thoughts? > > > As mentioned upthread, there can be multiple backends that request a > > > checkpoint, so unless we want to store an array of pid we should store a number > > > of backend that are waiting for a new checkpoint. > > > > Yeah, you are right. Let's not go that path and store an array of > > pids. I don't see a strong use-case with the pid of the process > > requesting checkpoint. If required, we can add it later once the > > pg_stat_progress_checkpoint view gets in. > > I don't think that's really necessary to give the pid list. > > If you requested a new checkpoint, it doesn't matter if it's only your backend > that triggered it, another backend or a few other dozen, the result will be the > same and you have the information that the request has been seen. We could > store just a bool for that but having a number instead also gives a bit more > information and may allow you to detect some broken logic on your client code > if it keeps increasing. It's a good metric to show in the view but the information is not readily available. Additional code is required to calculate the number of requests. Is it worth doing that? I feel this can be added later if required. Please find the v4 patch attached and share your thoughts. Thanks & Regards, Nitin Jadhav On Tue, Mar 1, 2022 at 2:27 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > > > 3) Why do we need this extra calculation for start_lsn? > > > Do you ever see a negative LSN or something here? > > > + ('0/0'::pg_lsn + ( > > > + CASE > > > + WHEN (s.param3 < 0) THEN pow((2)::numeric, (64)::numeric) > > > + ELSE (0)::numeric > > > + END + (s.param3)::numeric)) AS start_lsn, > > > > Yes: LSN can take up all of an uint64; whereas the pgstat column is a > > bigint type; thus the signed int64. This cast is OK as it wraps > > around, but that means we have to take care to correctly display the > > LSN when it is > 0x7FFF_FFFF_FFFF_FFFF; which is what we do here using > > the special-casing for negative values. > > Yes. The extra calculation is required here as we are storing unit64 > value in the variable of type int64. When we convert uint64 to int64 > then the bit pattern is preserved (so no data is lost). The high-order > bit becomes the sign bit and if the sign bit is set, both the sign and > magnitude of the value changes. To safely get the actual uint64 value > whatever was assigned, we need the above calculations. > > > > 4) Can't you use timestamptz_in(to_char(s.param4)) instead of > > > pg_stat_get_progress_checkpoint_start_time? I don't quite understand > > > the reasoning for having this function and it's named as *checkpoint* > > > when it doesn't do anything specific to the checkpoint at all? > > > > I hadn't thought of using the types' inout functions, but it looks > > like timestamp IO functions use a formatted timestring, which won't > > work with the epoch-based timestamp stored in the view. > > There is a variation of to_timestamp() which takes UNIX epoch (float8) > as an argument and converts it to timestamptz but we cannot directly > call this function with S.param4. > > TimestampTz > GetCurrentTimestamp(void) > { > TimestampTz result; > struct timeval tp; > > gettimeofday(&tp, NULL); > > result = (TimestampTz) tp.tv_sec - > ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY); > result = (result * USECS_PER_SEC) + tp.tv_usec; > > return result; > } > > S.param4 contains the output of the above function > (GetCurrentTimestamp()) which returns Postgres epoch but the > to_timestamp() expects UNIX epoch as input. So some calculation is > required here. I feel the SQL 'to_timestamp(946684800 + > (S.param4::float / 1000000)) AS start_time' works fine. The value > '946684800' is equal to ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * > SECS_PER_DAY). I am not sure whether it is good practice to use this > way. Kindly share your thoughts. > > Thanks & Regards, > Nitin Jadhav > > On Mon, Feb 28, 2022 at 6:40 PM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > > > On Sun, 27 Feb 2022 at 16:14, Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > 3) Why do we need this extra calculation for start_lsn? > > > Do you ever see a negative LSN or something here? > > > + ('0/0'::pg_lsn + ( > > > + CASE > > > + WHEN (s.param3 < 0) THEN pow((2)::numeric, (64)::numeric) > > > + ELSE (0)::numeric > > > + END + (s.param3)::numeric)) AS start_lsn, > > > > Yes: LSN can take up all of an uint64; whereas the pgstat column is a > > bigint type; thus the signed int64. This cast is OK as it wraps > > around, but that means we have to take care to correctly display the > > LSN when it is > 0x7FFF_FFFF_FFFF_FFFF; which is what we do here using > > the special-casing for negative values. > > > > As to whether it is reasonable: Generating 16GB of wal every second > > (2^34 bytes /sec) is probably not impossible (cpu <> memory bandwidth > > has been > 20GB/sec for a while); and that leaves you 2^29 seconds of > > database runtime; or about 17 years. Seeing that a cluster can be > > `pg_upgrade`d (which doesn't reset cluster LSN) since PG 9.0 from at > > least version PG 8.4.0 (2009) (and through pg_migrator, from 8.3.0)), > > we can assume that clusters hitting LSN=2^63 will be a reasonable > > possibility within the next few years. As the lifespan of a PG release > > is about 5 years, it doesn't seem impossible that there will be actual > > clusters that are going to hit this naturally in the lifespan of PG15. > > > > It is also possible that someone fat-fingers pg_resetwal; and creates > > a cluster with LSN >= 2^63; resulting in negative values in the > > s.param3 field. Not likely, but we can force such situations; and as > > such we should handle that gracefully. > > > > > 4) Can't you use timestamptz_in(to_char(s.param4)) instead of > > > pg_stat_get_progress_checkpoint_start_time? I don't quite understand > > > the reasoning for having this function and it's named as *checkpoint* > > > when it doesn't do anything specific to the checkpoint at all? > > > > I hadn't thought of using the types' inout functions, but it looks > > like timestamp IO functions use a formatted timestring, which won't > > work with the epoch-based timestamp stored in the view. > > > > > Having 3 unnecessary functions that aren't useful to the users at all > > > in proc.dat will simply eatup the function oids IMO. Hence, I suggest > > > let's try to do without extra functions. > > > > I agree that (1) could be simplified, or at least fully expressed in > > SQL without exposing too many internals. If we're fine with exposing > > internals like flags and type layouts, then (2), and arguably (4), can > > be expressed in SQL as well. > > > > -Matthias
Attachment
pgsql-hackers by date: