Re: Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display) - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display)
Date
Msg-id 202205050837.bxnkj2jkst5g@alvherre.pgsql
Whole thread Raw
In response to Re: Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display)  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display)
List pgsql-hackers
On 2022-May-05, Bharath Rupireddy wrote:

> On Fri, Apr 29, 2022 at 4:11 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > Did we ever consider the idea of using a new pg_stat_wal_activity_progress
> > view or something like that, using the backend_progress.c functionality?
> > I don't see it mentioned in the thread.
> 
> IMO, progress reporting works well on a running server and at the
> moment. The WAL recovery/replay can happen even before the server
> opens up for connections

It's definitely true that you wouldn't be able to use it for when the
server is not accepting connections.

> and the progress report view can't be used
> for later analysis like how much time the restoring WAL files from
> archive location took

This is true too -- progress report doesn't store historical data, only
current status.

> and also the WAL file names can't be reported in progress reporting
> mechanism

Also true.

> (only integers columns, of course if required we can add text columns
> to pg_stat_get_progress_info).

Yeah, I don't think adding text columns is terribly easy, because the
whole point of the progress reporting infrastructure is that it can be
updated very cheaply as atomic operations, and if you want to transmit
text columns, that's no longer possible.

> Having the recovery info in server logs might help.

I suppose it might.

> I think reporting a long-running file processing operation (removing
> or syncing) within postgres is a generic problem (for snapshot,
> mapping, temporary (pgsql_tmp), temp relation files, old WAL file
> processing, WAL file processing during recovery etc.) and needs to be
> solved

I agree up to here.

> in two ways: 1) logging progress into server logs (which helps
> for analysis and report when the server isn't available for
> connections, crash recovery), a generic GUC
> log_file_processing_traffic = {none, medium, high} might help here
> (also proposed in [1]) and 2) pg_stat_file_processing_progress
> (extending progress reporting pg_stat_get_progress_info to have few
> text columns for current file name and directory path).

I think using the server log to store telemetry data is not a great fit.
It bloats the log files and can be so slow as to block other operations
in the server.  Server logs should normally be considered critical info
that's not okay to lose; telemetry tends to be of secondary importance
and in a pinch you can drop a few messages without hurting too much.

We've done moderately okay so far with having some system views where
some telemetry readings can be obtained, but there several drawbacks to
that approach that we should at some point solve.  My opinion on this is
that we need to bite the bullet and develop separate infrastructure for
reporting server metrics.

That said, I'm not opposed to having a patch somewhat as posted.  I just
think that we should look into a new mechanism going forward.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display)
Next
From: Alvaro Herrera
Date:
Subject: Re: Handle infinite recursion in logical replication setup