Re: add checkpoint stats of snapshot and mapping files of pg_logical dir - Mailing list pgsql-hackers

From Andres Freund
Subject Re: add checkpoint stats of snapshot and mapping files of pg_logical dir
Date
Msg-id 20220318230944.dzrr25jdvcvgp4fx@alap3.anarazel.de
Whole thread Raw
In response to Re: add checkpoint stats of snapshot and mapping files of pg_logical dir  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: add checkpoint stats of snapshot and mapping files of pg_logical dir  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
Hi,

On 2022-03-18 13:32:52 +0530, Bharath Rupireddy wrote:
> @@ -6115,46 +6116,81 @@ LogCheckpointEnd(bool restartpoint)
>              CheckpointStats.ckpt_sync_rels;
>      average_msecs = (long) ((average_sync_time + 999) / 1000);
>  
> +    initStringInfo(&logmsg);
> +
>      if (restartpoint)
> -        ereport(LOG,
> -                (errmsg("restartpoint complete: wrote %d buffers (%.1f%%); "
> -                        "%d WAL file(s) added, %d removed, %d recycled; "
> -                        "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
> -                        "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
> -                        "distance=%d kB, estimate=%d kB",
> -                        CheckpointStats.ckpt_bufs_written,
> -                        (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
> -                        CheckpointStats.ckpt_segs_added,
> -                        CheckpointStats.ckpt_segs_removed,
> -                        CheckpointStats.ckpt_segs_recycled,
> -                        write_msecs / 1000, (int) (write_msecs % 1000),
> -                        sync_msecs / 1000, (int) (sync_msecs % 1000),
> -                        total_msecs / 1000, (int) (total_msecs % 1000),
> -                        CheckpointStats.ckpt_sync_rels,
> -                        longest_msecs / 1000, (int) (longest_msecs % 1000),
> -                        average_msecs / 1000, (int) (average_msecs % 1000),
> -                        (int) (PrevCheckPointDistance / 1024.0),
> -                        (int) (CheckPointDistanceEstimate / 1024.0))));
> +        appendStringInfo(&logmsg,
> +                        _("restartpoint complete: wrote %d buffers (%.1f%%); "
> +                          "%d WAL file(s) added, %d removed, %d recycled; "
> +                          "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
> +                          "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
> +                          "distance=%d kB, estimate=%d kB"),
> +                          CheckpointStats.ckpt_bufs_written,
> +                          (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
> +                          CheckpointStats.ckpt_segs_added,
> +                          CheckpointStats.ckpt_segs_removed,
> +                          CheckpointStats.ckpt_segs_recycled,
> +                          write_msecs / 1000, (int) (write_msecs % 1000),
> +                          sync_msecs / 1000, (int) (sync_msecs % 1000),
> +                          total_msecs / 1000, (int) (total_msecs % 1000),
> +                          CheckpointStats.ckpt_sync_rels,
> +                          longest_msecs / 1000, (int) (longest_msecs % 1000),
> +                          average_msecs / 1000, (int) (average_msecs % 1000),
> +                          (int) (PrevCheckPointDistance / 1024.0),
> +                          (int) (CheckPointDistanceEstimate / 1024.0));
>      else
> -        ereport(LOG,
> -                (errmsg("checkpoint complete: wrote %d buffers (%.1f%%); "
> -                        "%d WAL file(s) added, %d removed, %d recycled; "
> -                        "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
> -                        "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
> -                        "distance=%d kB, estimate=%d kB",
> -                        CheckpointStats.ckpt_bufs_written,
> -                        (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
> -                        CheckpointStats.ckpt_segs_added,
> -                        CheckpointStats.ckpt_segs_removed,
> -                        CheckpointStats.ckpt_segs_recycled,
> -                        write_msecs / 1000, (int) (write_msecs % 1000),
> -                        sync_msecs / 1000, (int) (sync_msecs % 1000),
> -                        total_msecs / 1000, (int) (total_msecs % 1000),
> -                        CheckpointStats.ckpt_sync_rels,
> -                        longest_msecs / 1000, (int) (longest_msecs % 1000),
> -                        average_msecs / 1000, (int) (average_msecs % 1000),
> -                        (int) (PrevCheckPointDistance / 1024.0),
> -                        (int) (CheckPointDistanceEstimate / 1024.0))));
> +        appendStringInfo(&logmsg,
> +                        _("checkpoint complete: wrote %d buffers (%.1f%%); "
> +                          "%d WAL file(s) added, %d removed, %d recycled; "
> +                          "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
> +                          "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
> +                          "distance=%d kB, estimate=%d kB"),
> +                          CheckpointStats.ckpt_bufs_written,
> +                          (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
> +                          CheckpointStats.ckpt_segs_added,
> +                          CheckpointStats.ckpt_segs_removed,
> +                          CheckpointStats.ckpt_segs_recycled,
> +                          write_msecs / 1000, (int) (write_msecs % 1000),
> +                          sync_msecs / 1000, (int) (sync_msecs % 1000),
> +                          total_msecs / 1000, (int) (total_msecs % 1000),
> +                          CheckpointStats.ckpt_sync_rels,
> +                          longest_msecs / 1000, (int) (longest_msecs % 1000),
> +                          average_msecs / 1000, (int) (average_msecs % 1000),
> +                          (int) (PrevCheckPointDistance / 1024.0),
> +                          (int) (CheckPointDistanceEstimate / 1024.0));

before we further change this (as done in this patch) we should deduplicate
these huge statements in a separate patch / commit.


> +    if (CheckpointStats.repl_snap_files_rmvd_cnt > 0)
> +    {
> +        long t_msecs;
> +
> +        t_msecs = TimestampDifferenceMilliseconds(CheckpointStats.repl_snap_start_t,
> +                                                  CheckpointStats.repl_snap_end_t);
> +
> +        appendStringInfo(&logmsg,
> +                        _("; logical decoding snapshot file(s) removed=" UINT64_FORMAT ", time=%ld.%03d s, cutoff
LSN=%X/%X"),
> +                          CheckpointStats.repl_snap_files_rmvd_cnt,
> +                          t_msecs / 1000, (int) (t_msecs % 1000),
> +                          LSN_FORMAT_ARGS(CheckpointStats.repl_snap_cutoff_lsn));
> +    }

As discussed in a bunch of other threads, we typically prefer to cast to long
long and use %ll instead of using UINT64_FORMAT these days.


> +    if (CheckpointStats.repl_map_files_rmvd_cnt ||
> +        CheckpointStats.repl_map_files_syncd_cnt > 0)
> +    {
> +        long t_msecs;
> +
> +        t_msecs = TimestampDifferenceMilliseconds(CheckpointStats.repl_snap_start_t,
> +                                                  CheckpointStats.repl_snap_end_t);
> +
> +        appendStringInfo(&logmsg,
> +                        _("; logical decoding rewrite mapping file(s) removed=" UINT64_FORMAT ", synced="
UINT64_FORMAT", time=%ld.%03d s, cutoff LSN=%X/%X"),
 
> +                          CheckpointStats.repl_map_files_rmvd_cnt,
> +                          CheckpointStats.repl_map_files_syncd_cnt,
> +                          t_msecs / 1000, (int) (t_msecs % 1000),
> +                          LSN_FORMAT_ARGS(CheckpointStats.repl_map_cutoff_lsn));
> +    }
> +
> +    ereport(LOG, errmsg_internal("%s", logmsg.data));
> +    pfree(logmsg.data);
>  }

This practically doubles the size of the log message - doesn't that seem a bit
disproportionate? Can we make this more dense? "logical decoding rewrite
mapping file(s) removed=" and "logical decoding snapshot file(s) removed=" is
quite long.


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Adding CI to our tree
Next
From: Andres Freund
Date:
Subject: Re: Fix unsigned output for signed values in SLRU error reporting