On 2022-May-05, Bharath Rupireddy wrote:
> On Fri, Apr 29, 2022 at 4:11 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > Did we ever consider the idea of using a new pg_stat_wal_activity_progress
> > view or something like that, using the backend_progress.c functionality?
> > I don't see it mentioned in the thread.
>
> IMO, progress reporting works well on a running server and at the
> moment. The WAL recovery/replay can happen even before the server
> opens up for connections
It's definitely true that you wouldn't be able to use it for when the
server is not accepting connections.
> and the progress report view can't be used
> for later analysis like how much time the restoring WAL files from
> archive location took
This is true too -- progress report doesn't store historical data, only
current status.
> and also the WAL file names can't be reported in progress reporting
> mechanism
Also true.
> (only integers columns, of course if required we can add text columns
> to pg_stat_get_progress_info).
Yeah, I don't think adding text columns is terribly easy, because the
whole point of the progress reporting infrastructure is that it can be
updated very cheaply as atomic operations, and if you want to transmit
text columns, that's no longer possible.
> Having the recovery info in server logs might help.
I suppose it might.
> I think reporting a long-running file processing operation (removing
> or syncing) within postgres is a generic problem (for snapshot,
> mapping, temporary (pgsql_tmp), temp relation files, old WAL file
> processing, WAL file processing during recovery etc.) and needs to be
> solved
I agree up to here.
> in two ways: 1) logging progress into server logs (which helps
> for analysis and report when the server isn't available for
> connections, crash recovery), a generic GUC
> log_file_processing_traffic = {none, medium, high} might help here
> (also proposed in [1]) and 2) pg_stat_file_processing_progress
> (extending progress reporting pg_stat_get_progress_info to have few
> text columns for current file name and directory path).
I think using the server log to store telemetry data is not a great fit.
It bloats the log files and can be so slow as to block other operations
in the server. Server logs should normally be considered critical info
that's not okay to lose; telemetry tends to be of secondary importance
and in a pinch you can drop a few messages without hurting too much.
We've done moderately okay so far with having some system views where
some telemetry readings can be obtained, but there several drawbacks to
that approach that we should at some point solve. My opinion on this is
that we need to bite the bullet and develop separate infrastructure for
reporting server metrics.
That said, I'm not opposed to having a patch somewhat as posted. I just
think that we should look into a new mechanism going forward.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/