Re: add checkpoint stats of snapshot and mapping files of pg_logical dir - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: add checkpoint stats of snapshot and mapping files of pg_logical dir
Date
Msg-id CALj2ACWB8to-Y950t+tWE0FKSmvVKq2iOj=TgR7c0So2Fyi2jQ@mail.gmail.com
Whole thread Raw
In response to Re: add checkpoint stats of snapshot and mapping files of pg_logical dir  (Michael Paquier <michael@paquier.xyz>)
Responses Re: add checkpoint stats of snapshot and mapping files of pg_logical dir  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On Mon, Mar 14, 2022 at 10:45 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Sun, Mar 13, 2022 at 02:58:58PM -0700, Nathan Bossart wrote:
> > On Sun, Mar 13, 2022 at 01:54:10PM +0530, Bharath Rupireddy wrote:
> >> Another thing I added in v2 is to not emit snapshot and mapping files
> >> stats in case of restartpoint as logical decoding isn't supported on
> >> standbys, so it doesn't make sense to emit the stats there and cause
> >> server log to grow unnecessarily. Having said that, I added a note
> >> there to change it whenever logical decoding on standbys is supported.
> >
> > I think we actually do want to include this information for restartpoints
> > since some files might be left over from the snapshot that was used to
> > create the standby.  Also, presumably these functions could do some work
> > during recovery on a primary.
>
> Yes, I would agree that consistency makes sense here, and it is not
> complicated to add the code to support this code path anyway.  There
> is a risk that folks working on logical decoding on standbys overse
> this code path, instead.

I agree to be consistent and emit the message even in case of restartpoint.

> > Another problem I see is that this patch depends on the return value of the
> > lstat() calls that we are trying to remove in 0001 from another thread [0].
> > I think the size of the removed/sync'd files is somewhat useful for
> > understanding disk space usage, but I suspect the time spent performing
> > these tasks is more closely related to the number of files.  Do you think
> > reporting the sizes is worth the extra system call?
>
> We are not talking about files that are large either, are we?
>
> Another thing I am a bit annoyed with in this patch is the fact that
> the size of the ereport() call is doubled.  The LOG currently
> generated is already bloated, and this does not arrange things.

Yes, this is a concern. Also, when there were no logical replication
slots on a plain server or the server removed or cleaned up all the
snapshot/mappings files, why would anyone want to have these messages
with all 0s in the server log?

Here's what I'm thinking:

Leave the existing "checkpoint/restartpoint complete" messages intact,
add the following in LogCheckpointEnd:

if (CheckpointStats.repl_map_files_rmvd_cnt ||
    CheckpointStats.repl_map_files_syncd_cnt ||
    CheckpointStats.repl_snap_files_rmvd_cnt)
{
    ereport(LOG,
            (errmsg("snapbuild snapshot file(s) removed="
UINT64_FORMAT ", size=%zu bytes, time=%ld.%03d s, cutoff LSN=%X/%X; "
                    "logical rewrite mapping file(s) removed="
UINT64_FORMAT ", size=%zu bytes, synced=" UINT64_FORMAT ", size=%zu
bytes, time=%ld.%03d s, cutoff LSN=%X/%X",
                    CheckpointStats.repl_snap_files_rmvd_cnt,
                    CheckpointStats.repl_snap_files_rmvd_sz,
                    repl_snap_msecs / 1000, (int) (repl_snap_msecs % 1000),
                    LSN_FORMAT_ARGS(CheckpointStats.repl_snap_cutoff_lsn),
                    CheckpointStats.repl_map_files_rmvd_cnt,
                    CheckpointStats.repl_map_files_rmvd_sz,
                    CheckpointStats.repl_map_files_syncd_cnt,
                    CheckpointStats.repl_map_files_syncd_sz,
                    repl_map_msecs / 1000, (int) (repl_map_msecs % 1000),
                    LSN_FORMAT_ARGS(CheckpointStats.repl_map_cutoff_lsn))));
}

Thoughts?

Regards,
Bharath Rupireddy.



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: WIP: WAL prefetch (another approach)
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: BufferAlloc: don't take two simultaneous locks