Re: add checkpoint stats of snapshot and mapping files of pg_logical dir - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: add checkpoint stats of snapshot and mapping files of pg_logical dir |
Date | |
Msg-id | CALj2ACWB8to-Y950t+tWE0FKSmvVKq2iOj=TgR7c0So2Fyi2jQ@mail.gmail.com Whole thread Raw |
In response to | Re: add checkpoint stats of snapshot and mapping files of pg_logical dir (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: add checkpoint stats of snapshot and mapping files of pg_logical dir
|
List | pgsql-hackers |
On Mon, Mar 14, 2022 at 10:45 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Sun, Mar 13, 2022 at 02:58:58PM -0700, Nathan Bossart wrote: > > On Sun, Mar 13, 2022 at 01:54:10PM +0530, Bharath Rupireddy wrote: > >> Another thing I added in v2 is to not emit snapshot and mapping files > >> stats in case of restartpoint as logical decoding isn't supported on > >> standbys, so it doesn't make sense to emit the stats there and cause > >> server log to grow unnecessarily. Having said that, I added a note > >> there to change it whenever logical decoding on standbys is supported. > > > > I think we actually do want to include this information for restartpoints > > since some files might be left over from the snapshot that was used to > > create the standby. Also, presumably these functions could do some work > > during recovery on a primary. > > Yes, I would agree that consistency makes sense here, and it is not > complicated to add the code to support this code path anyway. There > is a risk that folks working on logical decoding on standbys overse > this code path, instead. I agree to be consistent and emit the message even in case of restartpoint. > > Another problem I see is that this patch depends on the return value of the > > lstat() calls that we are trying to remove in 0001 from another thread [0]. > > I think the size of the removed/sync'd files is somewhat useful for > > understanding disk space usage, but I suspect the time spent performing > > these tasks is more closely related to the number of files. Do you think > > reporting the sizes is worth the extra system call? > > We are not talking about files that are large either, are we? > > Another thing I am a bit annoyed with in this patch is the fact that > the size of the ereport() call is doubled. The LOG currently > generated is already bloated, and this does not arrange things. Yes, this is a concern. Also, when there were no logical replication slots on a plain server or the server removed or cleaned up all the snapshot/mappings files, why would anyone want to have these messages with all 0s in the server log? Here's what I'm thinking: Leave the existing "checkpoint/restartpoint complete" messages intact, add the following in LogCheckpointEnd: if (CheckpointStats.repl_map_files_rmvd_cnt || CheckpointStats.repl_map_files_syncd_cnt || CheckpointStats.repl_snap_files_rmvd_cnt) { ereport(LOG, (errmsg("snapbuild snapshot file(s) removed=" UINT64_FORMAT ", size=%zu bytes, time=%ld.%03d s, cutoff LSN=%X/%X; " "logical rewrite mapping file(s) removed=" UINT64_FORMAT ", size=%zu bytes, synced=" UINT64_FORMAT ", size=%zu bytes, time=%ld.%03d s, cutoff LSN=%X/%X", CheckpointStats.repl_snap_files_rmvd_cnt, CheckpointStats.repl_snap_files_rmvd_sz, repl_snap_msecs / 1000, (int) (repl_snap_msecs % 1000), LSN_FORMAT_ARGS(CheckpointStats.repl_snap_cutoff_lsn), CheckpointStats.repl_map_files_rmvd_cnt, CheckpointStats.repl_map_files_rmvd_sz, CheckpointStats.repl_map_files_syncd_cnt, CheckpointStats.repl_map_files_syncd_sz, repl_map_msecs / 1000, (int) (repl_map_msecs % 1000), LSN_FORMAT_ARGS(CheckpointStats.repl_map_cutoff_lsn)))); } Thoughts? Regards, Bharath Rupireddy.
pgsql-hackers by date: