Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display
Date
Msg-id CALj2ACV=HtMzQnWARnQDx4L-L95KdmTNskz6McaO+tPYPiF=uw@mail.gmail.com
Whole thread Raw
In response to Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display  (Michael Paquier <michael@paquier.xyz>)
Responses Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On Wed, Nov 17, 2021 at 8:01 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Nov 16, 2021 at 01:26:49PM -0300, Alvaro Herrera wrote:
> > My opinion is that adding these things willy-nilly is not a solution to
> > any actual problem.  Adding a few additional log lines that are
> > low-volume at DEBUG1 might be useful, but below that (DEBUG2 etc) it's
> > not good for anything other than specific development, IMO.  At least
> > this particular one for streaming replication I think we should not
> > include.
>
> Looking at v2, I think that this leaves the additions of the DEBUG1
> entries in SendBaseBackup() and WalRcvWaitForStartPosition(), then.
> The one in pgarch.c does not provide any additional information as the
> segment to-be-archived should be part of the command.

Thank you all for the inputs. Here's the patch that I've come up with.

Upon thinking further, having at least the messages at LOG level [1]
would be helpful to know what's happening with the system while in
recovery. Although these messages at LOG level seem to be filling up
the server logs, having a good log consumption and rotation mechanism
(I'm sure every major postgres vendor would have one) would be
sufficient to allay that concern.

These LOG messages would help us know how much time a restore command
takes to fetch the WAL file and what is the current WAL file the
server is recovering and where is it recovering from. The customer
often asks questions like: when will my server come up? how much time
does the recovery of my server take?

As a developer or admin, one can monitor these logs and do bunch of things:
1) see how many WAL files left to be recovered by looking at the WAL
files in the archive location or pg_wal directory or from primary
2) provide an approximate estimation of when the server will come up
or how much more the recovery takes by looking at these previous LOG
messages, one can know the number of WAL files that server recovered
over a minute and with the help of left-over WAL files calculated from
(1).

[1]
ereport(LOG,
        (errmsg("waiting for WAL segment \"%s\" from archive",
                xlogfname)));

ereport(LOG,
        (errmsg("restored WAL segment \"%s\" from archive",
                xlogfname)));

ereport(LOG,
        (errmsg("recovering WAL segment \"%s\" from source \"%s\"",
                xlogfname, srcname)));

Regards,
Bharath Rupireddy.

Attachment

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: pg_get_publication_tables() output duplicate relid
Next
From: Ashutosh Bapat
Date:
Subject: Re: row filtering for logical replication