Re: Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display) - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display)
Date
Msg-id CALj2ACUVP-gdH8-ZYbL_k_HECjDhhS-6AnrVX=SC7y6Z1=i+TA@mail.gmail.com
Whole thread Raw
In response to Re: Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display)  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display)
List pgsql-hackers
On Fri, May 13, 2022 at 6:41 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Apr 29, 2022 at 5:11 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > Here's the rebased v9 patch.
>
> This seems like it has enormous overlap with the existing
> functionality that we have from log_startup_progress_interval.
>
> I think that facility is also better-designed than this one. It prints
> out a message based on elapsed time, whereas this patch prints out a
> message based progress through the WAL. That means that if WAL replay
> isn't actually advancing for some reason, you just won't get any log
> messages and you don't know whether it's advancing slowly or not at
> all or the server is just hung. With that facility you can distinguish
> those cases.
>
> Also, if for some reason we do think that amount of WAL replayed is
> the right metric, rather than time, why would we only allow high=1
> segment and low=128 segments, rather than say any number of MB or GB
> that the user would like to configure?
>
> I suggest that if log_startup_progress_interval doesn't meet your
> needs here, we should try to understand why not and maybe enhance it,
> instead of adding a separate facility.

Thanks Robert!

In a production environment (of course with a better management of
server logs) one can set log_wal_traffic to "high" and emit the
required info to answer some of the customer questions like - "How far
the server is in recovery? How much time recovery of each WAL file
approximately took? How much time will it take to recover all the WAL
files? What's the rate of recovery - time per WAL file? etc."

Whereas ereport_startup_progress facility will help to emit log
messages only if "some" operation takes longer than set
log_startup_progress_interval time which may not serve the above
purpose.

Actually, IMO a generic GUC log_file_processing_traffic = {none,
medium, high} to help server emit logs for all the critical file
processing operations - WAL file recovery (as proposed here in this
thread), temp file processing during server startup or restarts
(log_startup_progress_interval can't be used here as postmaster
doesn't register for any timeouts) [1], snapshot and mapping file
processing during checkpoint, temp relation files, removing old WAL
files and so on.

Thoughts?

[1] https://www.postgresql.org/message-id/CALj2ACW-ELOF5JT2zPavs95wbZ0BrLPrqvSZ7Ac%2BpjxCkmXtEQ%40mail.gmail.com

Regards,
Bharath Rupireddy.



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Fix a typo in walreceiver.c
Next
From: "osumi.takamichi@fujitsu.com"
Date:
Subject: RE: bogus: logical replication rows/cols combinations