Re: add checkpoint stats of snapshot and mapping files of pg_logical dir - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: add checkpoint stats of snapshot and mapping files of pg_logical dir
Date
Msg-id CALj2ACWjjcmsdtDp7y2k6Q7gyq_2+9Ccrt_G=ZVEAuc7RE8evw@mail.gmail.com
Whole thread Raw
In response to Re: add checkpoint stats of snapshot and mapping files of pg_logical dir  (Michael Paquier <michael@paquier.xyz>)
Responses Re: add checkpoint stats of snapshot and mapping files of pg_logical dir
List pgsql-hackers
On Tue, Apr 26, 2022 at 6:31 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Apr 25, 2022 at 01:34:38PM -0700, Nathan Bossart wrote:
> > I took another look at the example output, and I think I agree that logging
> > the total time for logical decoding operations is probably the best path
> > forward.  This information would be enough to clue an administrator into
> > the possible causes of lengthy checkpoints, but it also wouldn't disrupt
> > the readability of the log statement too much.
>
> +       /* translator: the placeholders after first %s show restartpoint/checkpoint options */
> +                       (errmsg("%s starting:%s%s%s%s%s%s%s%s",
> +                                       restartpoint ?
> _("restartpoint") : _("checkpoint"),
>
> 0001 breaks translation, as "checkpoint/restartpoint" and "starting"
> would treated as separate terms to translate.  That would not matter
> for English, but it does in French where we'd say "début du
> checkpoint".  You could fix that by adding "starting" to each
> refactored term or build a string.  0002 does the latter, so my take
> is that you should begin using a StringInfo in 0001.

Thanks for reviewing. I've addressed the review comments, PSA v10
patch. Note that we can't use StringInfo as the checkpointer memory
context doesn't allow pallocs in the critical section and the
checkpoint can sometimes be run in the critical section.

I've also added the total number of WAL files a checkpoint has
processed (scanned the pg_wal directory) while removing old WAL files.
This helps to estimate the pg_wal disk space at the time of a
particular checkpoint, especially useful for debugging issues.

[1] sample output:
2022-07-19 10:33:45.378 UTC [3027866] LOG:  checkpoint starting: wal
2022-07-19 10:33:51.434 UTC [3027866] LOG:  checkpoint complete: wrote
50 buffers (0.3%); 0 WAL file(s) added, 12 removed, 35 recycled, 76
processed; write=3.651 s, sync=0.011 s, total=6.136 s; sync files=11,
longest=0.004 s, average=0.001 s; distance=770045 kB, estimate=770045
kB; lsn=0/95000260, redo lsn=0/79000068; logical decoding file(s)
processing=0.007 s

Regards,
Bharath Rupireddy.

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Next
From: Justin Pryzby
Date:
Subject: errdetail/errhint style