Thread: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

Hi,

As discussed in [1], isn't it a better idea to add some of activity
messages [2] such as recovery, archive, backup, streaming etc. to
server logs at LOG level? They are currently being set into ps display
which is good if the postgres is being run on a standalone box/VM
where users can see the ps display, but it doesn't help much in case
the postgres is being run on a cloud environment where users don't
have access to ps display output. Moreover, the ps display is
transient and will not help to analyze after an issue occurs.

Having the above messages in the server logs will be useful to
understand how the system is/was doing/progressing with these
(sometimes time-intensive) operations.

Thoughts?

[1] - https://www.postgresql.org/message-id/CALj2ACWE6D%2BbbKnA9tQ0d-pPN1n4xrsFPOmiURm2my3qt0w3OA%40mail.gmail.com
[2] -
snprintf(activitymsg, sizeof(activitymsg), "waiting for %s",
snprintf(activitymsg, sizeof(activitymsg), "recovering %s",
snprintf(activitymsg, sizeof(activitymsg), "performing %s%s%s",
snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
snprintf(activitymsg, sizeof(activitymsg), "failed on %s", xlog);
elog(DEBUG1, "archived write-ahead log file \"%s\"", xlog);
snprintf(activitymsg, sizeof(activitymsg), "sending backup \"%s\"",
snprintf(activitymsg, sizeof(activitymsg), "restarting at %X/%X",
snprintf(activitymsg, sizeof(activitymsg), "streaming %X/%X",
snprintf(activitymsg, sizeof(activitymsg), "streaming %X/%X",

Regards,
Bharath Rupireddy.



On 11/10/21, 9:43 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> As discussed in [1], isn't it a better idea to add some of activity
> messages [2] such as recovery, archive, backup, streaming etc. to
> server logs at LOG level? They are currently being set into ps display
> which is good if the postgres is being run on a standalone box/VM
> where users can see the ps display, but it doesn't help much in case
> the postgres is being run on a cloud environment where users don't
> have access to ps display output. Moreover, the ps display is
> transient and will not help to analyze after an issue occurs.
>
> Having the above messages in the server logs will be useful to
> understand how the system is/was doing/progressing with these
> (sometimes time-intensive) operations.

I think this would make the logs far too noisy for many servers.  For
archiving alone, this could cause tens of thousands more log messages
per hour on a busy system.  I think you can already see such
information at a debug level, anyway.

Nathan


On Wed, Nov 10, 2021, at 2:38 PM, Bharath Rupireddy wrote:
As discussed in [1], isn't it a better idea to add some of activity
messages [2] such as recovery, archive, backup, streaming etc. to
server logs at LOG level? They are currently being set into ps display
which is good if the postgres is being run on a standalone box/VM
where users can see the ps display, but it doesn't help much in case
the postgres is being run on a cloud environment where users don't
have access to ps display output. Moreover, the ps display is
transient and will not help to analyze after an issue occurs.
Besides recovery, the other activities already provide information through
views. I might be missing something but it seems the current views already
provide the information that ps displays.

Having the above messages in the server logs will be useful to
understand how the system is/was doing/progressing with these
(sometimes time-intensive) operations.
It could be useful for a root cause analysis, however, you are also increasing
the number of LOG messages in a system that doesn't/won't require such
information. It is fine to add additional DEBUG messages if there isn't a
similar one yet. If at least the message level were module-controlled, you
could modify a setting to gather more messages from a specific module.
Unfortunately, that's not possible so we should avoid superfluous messages.


--
Euler Taveira

On Thu, Nov 11, 2021 at 12:01 AM Bossart, Nathan <bossartn@amazon.com> wrote:
>
> On 11/10/21, 9:43 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> > As discussed in [1], isn't it a better idea to add some of activity
> > messages [2] such as recovery, archive, backup, streaming etc. to
> > server logs at LOG level? They are currently being set into ps display
> > which is good if the postgres is being run on a standalone box/VM
> > where users can see the ps display, but it doesn't help much in case
> > the postgres is being run on a cloud environment where users don't
> > have access to ps display output. Moreover, the ps display is
> > transient and will not help to analyze after an issue occurs.
> >
> > Having the above messages in the server logs will be useful to
> > understand how the system is/was doing/progressing with these
> > (sometimes time-intensive) operations.
>
> I think this would make the logs far too noisy for many servers.  For
> archiving alone, this could cause tens of thousands more log messages
> per hour on a busy system.  I think you can already see such
> information at a debug level, anyway.

You are right, having them at LOG level may blow up the log files and
disk storage, having them at DEBUG1 like other messages would at least
help than having them as activitymsg in ps display.

And not all of the activitymsgs are logged at DEBUGX level, at least I
don't see them(on a quick search).
snprintf(activitymsg, sizeof(activitymsg), "waiting for %s",
snprintf(activitymsg, sizeof(activitymsg), "recovering %s",
snprintf(activitymsg, sizeof(activitymsg), "performing %s%s%s",
snprintf(activitymsg, sizeof(activitymsg), "sending backup \"%s\"",
snprintf(activitymsg, sizeof(activitymsg), "restarting at %X/%X",
snprintf(activitymsg, sizeof(activitymsg), "streaming %X/%X",

But for the snprintf(activitymsg, sizeof(activitymsg), "archiving %s",
xlog); we have elog(DEBUG1, "archived write-ahead log file \"%s\"",
xlog); after the archiving command. It is also good to have a similar
debug message before archive command execution, we can get to know
whether archive command is executed or not, if yes how much time did
it take etc.?

I'm not sure whether it is okay to do the following, for every of the
activitymsg, before the command execution, we have an elog(DEBUG1
message and after the command execution another elog(DEBUG1 message.

Thoughts?

Regards,
Bharath Rupireddy.



On Thu, Nov 11, 2021 at 1:40 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Wed, Nov 10, 2021, at 2:38 PM, Bharath Rupireddy wrote:
>
> As discussed in [1], isn't it a better idea to add some of activity
> messages [2] such as recovery, archive, backup, streaming etc. to
> server logs at LOG level? They are currently being set into ps display
> which is good if the postgres is being run on a standalone box/VM
> where users can see the ps display, but it doesn't help much in case
> the postgres is being run on a cloud environment where users don't
> have access to ps display output. Moreover, the ps display is
> transient and will not help to analyze after an issue occurs.
>
> Besides recovery, the other activities already provide information through
> views. I might be missing something but it seems the current views already
> provide the information that ps displays.
>
> Having the above messages in the server logs will be useful to
> understand how the system is/was doing/progressing with these
> (sometimes time-intensive) operations.
>
> It could be useful for a root cause analysis, however, you are also increasing
> the number of LOG messages in a system that doesn't/won't require such
> information. It is fine to add additional DEBUG messages if there isn't a
> similar one yet. If at least the message level were module-controlled, you
> could modify a setting to gather more messages from a specific module.
> Unfortunately, that's not possible so we should avoid superfluous messages.

Yes, at least having the messages at DEBUG1 level would help do the
RCA after an issue occurs. Please see my response upthread at [1].

[1] - https://www.postgresql.org/message-id/CALj2ACXVwnHhek9bo9wgamtW9SBinj4ohF1f1Yf3Su2o5WarWg%40mail.gmail.com

Regards,
Bharath Rupireddy.



On 2021-Nov-10, Bossart, Nathan wrote:

> On 11/10/21, 9:43 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> > As discussed in [1], isn't it a better idea to add some of activity
> > messages [2] such as recovery, archive, backup, streaming etc. to
> > server logs at LOG level?

> I think this would make the logs far too noisy for many servers.  For
> archiving alone, this could cause tens of thousands more log messages
> per hour on a busy system.  I think you can already see such
> information at a debug level, anyway.

Yeah.  If we had some sort of ring buffer in which to store these
messages, the user could examine them through a view; they would still
be accessible in a running server, but they would not be written to the
server log.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



On 11/10/21, 9:51 PM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> But for the snprintf(activitymsg, sizeof(activitymsg), "archiving %s",
> xlog); we have elog(DEBUG1, "archived write-ahead log file \"%s\"",
> xlog); after the archiving command. It is also good to have a similar
> debug message before archive command execution, we can get to know
> whether archive command is executed or not, if yes how much time did
> it take etc.?

There is a DEBUG3 that emits "executing archive command..." prior to
executing the archive command.

> I'm not sure whether it is okay to do the following, for every of the
> activitymsg, before the command execution, we have an elog(DEBUG1
> message and after the command execution another elog(DEBUG1 message.

The general idea seems reasonable to me.

Nathan


On 11/11/21, 4:38 AM, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote:
> Yeah.  If we had some sort of ring buffer in which to store these
> messages, the user could examine them through a view; they would still
> be accessible in a running server, but they would not be written to the
> server log.

That's an interesting idea.  It'd be nice to have a way to see more
debugging information without exploding the server log.

Nathan


On Thu, Nov 11, 2021, at 2:50 AM, Bharath Rupireddy wrote:
But for the snprintf(activitymsg, sizeof(activitymsg), "archiving %s",
xlog); we have elog(DEBUG1, "archived write-ahead log file \"%s\"",
xlog); after the archiving command. It is also good to have a similar
debug message before archive command execution, we can get to know
whether archive command is executed or not, if yes how much time did
it take etc.?
If you are only interested in how much time it took to archive a WAL file, you
don't need 2 messages (before and after); one message is sufficient. If you
calculate the difference and then print it as part of the existent message.
Calculate the difference iif the log level is at least DEBUG1 (same as the log
message).

Messages related to system() e.g. archive and recovery, you could measure the
execution time in your own script/program.

I like Alvaro's idea of implementing a ring buffer for this kind of activity.
This implementation could be applied to checkpoints, archiving, recovery,
replication, base backup, WAL usage, connections and others.


--
Euler Taveira

On Thu, Nov 11, 2021 at 03:18:16PM +0000, Bossart, Nathan wrote:
> On 11/11/21, 4:38 AM, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote:
> > Yeah.  If we had some sort of ring buffer in which to store these
> > messages, the user could examine them through a view; they would still
> > be accessible in a running server, but they would not be written to the
> > server log.
> 
> That's an interesting idea.  It'd be nice to have a way to see more
> debugging information without exploding the server log.

Uh, that's what dmesg does, which I mentioned earlier.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




On Thu, Nov 11, 2021 at 6:07 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2021-Nov-10, Bossart, Nathan wrote:

> On 11/10/21, 9:43 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> > As discussed in [1], isn't it a better idea to add some of activity
> > messages [2] such as recovery, archive, backup, streaming etc. to
> > server logs at LOG level?

> I think this would make the logs far too noisy for many servers.  For
> archiving alone, this could cause tens of thousands more log messages
> per hour on a busy system.  I think you can already see such
> information at a debug level, anyway.

Yeah.  If we had some sort of ring buffer in which to store these
messages, the user could examine them through a view; they would still
be accessible in a running server, but they would not be written to the
server log.

That's a good idea. How about also adding some GUC(s) to the log archive, recovery related log messages just like we have for checkpoints, autovacuum etc? Maybe something like log_archive, log_recovery etc.

--
With Regards,
Ashutosh Sharma.

On Thu, Nov 11, 2021 at 8:41 PM Bossart, Nathan <bossartn@amazon.com> wrote:
> > I'm not sure whether it is okay to do the following, for every of the
> > activitymsg, before the command execution, we have an elog(DEBUG1
> > message and after the command execution another elog(DEBUG1 message.
>
> The general idea seems reasonable to me.

Here's a simple patch that I could come up with. Please have a look.

Regards,
Bharath Rupireddy.

Attachment
A bunch of these now execute snprintf()s unnecessarily.  I think these
should be made conditional on message_level_is_interesting(DEBUG1) to
avoid that overhead.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"You're _really_ hosed if the person doing the hiring doesn't understand
relational systems: you end up with a whole raft of programmers, none of
whom has had a Date with the clue stick."              (Andrew Sullivan)



On Sat, Nov 13, 2021 at 9:04 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> A bunch of these now execute snprintf()s unnecessarily.  I think these
> should be made conditional on message_level_is_interesting(DEBUG1) to
> avoid that overhead.

Thanks. Attaching the v2 to avoid that by directly using the message in ereport instead of activitymsg.

Regards,
Bharath Rupireddy.
Attachment
On 11/14/21, 4:18 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> Thanks. Attaching the v2 to avoid that by directly using the message
> in ereport instead of activitymsg.

+            ereport(DEBUG1,
+                    (errmsg_internal("waiting for %s", xlogfname)));

As a general comment, I think we should include more detail than what
goes in the ps title.  While we probably want to keep the ps title,
brief, I don't see any issue with including a bit more context in
debug statements.

+    ereport(DEBUG1,
+            (errmsg_internal("executing archive command \"%s\" for archving %s",
+                             xlogarchcmd, xlog)));

I wonder if it's necessary to include the "for archiving" part in this
one.  Typically, the xlogarchcmd will include it somewhere.

Nathan


On Mon, Nov 15, 2021 at 10:36:09PM +0000, Bossart, Nathan wrote:
> +    ereport(DEBUG1,
> +            (errmsg_internal("executing archive command \"%s\" for archving %s",
> +                             xlogarchcmd, xlog)));
>
> I wonder if it's necessary to include the "for archiving" part in this
> one.  Typically, the xlogarchcmd will include it somewhere.

Don't see any need to add that either (s/archving/archiving/, btw).

@@ -1000,6 +1004,10 @@ XLogWalRcvFlush(bool dying, TimeLineID tli)
+               ereport(DEBUG1,
+                               (errmsg_internal("streaming %X/%X",
+ LSN_FORMAT_ARGS(LogstreamResult.Write))));
Having the same wording for the WAL receiver and the WAL sender could
be confusing when it comes to a cascading standby node, particularly
if log_line_prefix does not include the process PID.

+       ereport(DEBUG1,
+                       (errmsg_internal("streaming %X/%X",
+                                                         LSN_FORMAT_ARGS(sentPtr))));
Anyway, those two ones are going to make the logs much more noisy, no?
The same could be said about XLogFileRead(), joining the point of
Nathan's upthread.  So I cannot get excited by this change.
--
Michael

Attachment
On 11/15/21, 7:14 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> +       ereport(DEBUG1,
> +                       (errmsg_internal("streaming %X/%X",
> +                                                         LSN_FORMAT_ARGS(sentPtr))));
> Anyway, those two ones are going to make the logs much more noisy, no?
> The same could be said about XLogFileRead(), joining the point of
> Nathan's upthread.  So I cannot get excited by this change.

Yeah, this might even be too noisy for DEBUG5.

Nathan


On 2021-Nov-16, Bossart, Nathan wrote:

> On 11/15/21, 7:14 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> > +       ereport(DEBUG1,
> > +                       (errmsg_internal("streaming %X/%X",
> > +                                                         LSN_FORMAT_ARGS(sentPtr))));
> > Anyway, those two ones are going to make the logs much more noisy, no?
> > The same could be said about XLogFileRead(), joining the point of
> > Nathan's upthread.  So I cannot get excited by this change.
> 
> Yeah, this might even be too noisy for DEBUG5.

Nod.  And if you're at DEBUG5, the log contains so much other crap that
it is unusable for any purposes anyway.

My opinion is that adding these things willy-nilly is not a solution to
any actual problem.  Adding a few additional log lines that are
low-volume at DEBUG1 might be useful, but below that (DEBUG2 etc) it's
not good for anything other than specific development, IMO.  At least
this particular one for streaming replication I think we should not
include.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



On Tue, Nov 16, 2021 at 01:26:49PM -0300, Alvaro Herrera wrote:
> My opinion is that adding these things willy-nilly is not a solution to
> any actual problem.  Adding a few additional log lines that are
> low-volume at DEBUG1 might be useful, but below that (DEBUG2 etc) it's
> not good for anything other than specific development, IMO.  At least
> this particular one for streaming replication I think we should not
> include.

Looking at v2, I think that this leaves the additions of the DEBUG1
entries in SendBaseBackup() and WalRcvWaitForStartPosition(), then.
The one in pgarch.c does not provide any additional information as the
segment to-be-archived should be part of the command.
--
Michael

Attachment
On Wed, Nov 17, 2021 at 8:01 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Nov 16, 2021 at 01:26:49PM -0300, Alvaro Herrera wrote:
> > My opinion is that adding these things willy-nilly is not a solution to
> > any actual problem.  Adding a few additional log lines that are
> > low-volume at DEBUG1 might be useful, but below that (DEBUG2 etc) it's
> > not good for anything other than specific development, IMO.  At least
> > this particular one for streaming replication I think we should not
> > include.
>
> Looking at v2, I think that this leaves the additions of the DEBUG1
> entries in SendBaseBackup() and WalRcvWaitForStartPosition(), then.
> The one in pgarch.c does not provide any additional information as the
> segment to-be-archived should be part of the command.

Thank you all for the inputs. Here's the patch that I've come up with.

Upon thinking further, having at least the messages at LOG level [1]
would be helpful to know what's happening with the system while in
recovery. Although these messages at LOG level seem to be filling up
the server logs, having a good log consumption and rotation mechanism
(I'm sure every major postgres vendor would have one) would be
sufficient to allay that concern.

These LOG messages would help us know how much time a restore command
takes to fetch the WAL file and what is the current WAL file the
server is recovering and where is it recovering from. The customer
often asks questions like: when will my server come up? how much time
does the recovery of my server take?

As a developer or admin, one can monitor these logs and do bunch of things:
1) see how many WAL files left to be recovered by looking at the WAL
files in the archive location or pg_wal directory or from primary
2) provide an approximate estimation of when the server will come up
or how much more the recovery takes by looking at these previous LOG
messages, one can know the number of WAL files that server recovered
over a minute and with the help of left-over WAL files calculated from
(1).

[1]
ereport(LOG,
        (errmsg("waiting for WAL segment \"%s\" from archive",
                xlogfname)));

ereport(LOG,
        (errmsg("restored WAL segment \"%s\" from archive",
                xlogfname)));

ereport(LOG,
        (errmsg("recovering WAL segment \"%s\" from source \"%s\"",
                xlogfname, srcname)));

Regards,
Bharath Rupireddy.

Attachment
On Tue, Dec 07, 2021 at 06:28:24PM +0530, Bharath Rupireddy wrote:
> Upon thinking further, having at least the messages at LOG level [1]
> would be helpful to know what's happening with the system while in
> recovery. Although these messages at LOG level seem to be filling up
> the server logs, having a good log consumption and rotation mechanism
> (I'm sure every major postgres vendor would have one) would be
> sufficient to allay that concern.

+      ereport(LOG,
+              (errmsg("recovering WAL segment \"%s\" from source \"%s\"",
+               xlogfname, srcname)));
Isn't this approach an issue for translations?  It seems to me that
terms like "stream" or "archive" had better be translated properly,
meaning that this needs a fully-constructed sentence.
--
Michael

Attachment
On Wed, Dec 8, 2021 at 10:05 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Dec 07, 2021 at 06:28:24PM +0530, Bharath Rupireddy wrote:
> > Upon thinking further, having at least the messages at LOG level [1]
> > would be helpful to know what's happening with the system while in
> > recovery. Although these messages at LOG level seem to be filling up
> > the server logs, having a good log consumption and rotation mechanism
> > (I'm sure every major postgres vendor would have one) would be
> > sufficient to allay that concern.
>
> +      ereport(LOG,
> +              (errmsg("recovering WAL segment \"%s\" from source \"%s\"",
> +               xlogfname, srcname)));
> Isn't this approach an issue for translations?  It seems to me that
> terms like "stream" or "archive" had better be translated properly,
> meaning that this needs a fully-constructed sentence.

Thanks for taking a look at the patch. How about the attached v4?

I added a CF entry - https://commitfest.postgresql.org/36/3443/

Regards,
Bharath Rupireddy.

Attachment
On Wed, Dec 08, 2021 at 10:47:57AM +0530, Bharath Rupireddy wrote:
> Thanks for taking a look at the patch. How about the attached v4?
>
> I added a CF entry - https://commitfest.postgresql.org/36/3443/

+       else if (source == XLOG_FROM_STREAM)
+           ereport(LOG,
+                   (errmsg("recovering WAL segment \"%s\" received from primary",
+                           xlogfname)));
This is incorrect when using a cascading standby.  And perhaps this
could use a switch/case?

While quickly testing, I got reminded that the LOG for a segment
retrieved from the local pg_wal would generate some noise when running
for example the bootstrap process.  Is this one really interesting
compared to the two others?
--
Michael

Attachment
On Wed, Dec 8, 2021 at 12:52 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Dec 08, 2021 at 10:47:57AM +0530, Bharath Rupireddy wrote:
> > Thanks for taking a look at the patch. How about the attached v4?
> >
> > I added a CF entry - https://commitfest.postgresql.org/36/3443/
>
> +       else if (source == XLOG_FROM_STREAM)
> +           ereport(LOG,
> +                   (errmsg("recovering WAL segment \"%s\" received from primary",
> +                           xlogfname)));
> This is incorrect when using a cascading standby.

I visualize the setup like this: primary -> standby -> cascading standby

For standby, "received from primary" makes sense. For cascading
standby too it makes sense because the standby still acts as primary
for cascading standby, no? And we don't distinguish any other existing
messages for a standby acting as a cascading standby right? The
comments around the enum XLOG_FROM_STREAM and usage of it still says
from "primary".

Do you want me to add "received from stream" in general?

> And perhaps this could use a switch/case?

Yes, I will change it to switch-case in the next version.

> While quickly testing, I got reminded that the LOG for a segment
> retrieved from the local pg_wal would generate some noise when running
> for example the bootstrap process.  Is this one really interesting
> compared to the two others?

Yes. I observed that for a simple primary-standby setup, inserting
10million rows generated around 40 WAL files and so 40 LOG messages in
the standby server logs, in a typical production environment with
heavy-write workload, the amount of LOG generation is going to be much
more. And we don't want to have them as DEBUG1 messages and enable
DEBUG1 just for recovery messages that one is interested in.

How about we invent a new GUC log_recovery, with default set to false,
similar to log_checkpoints? And, these messages can be like:
ereport(log_recovery ? LOG : DEBUG1, .....)? This GUC can be useful to
add some recovery stats in future as well. Thoughts?

Regards,
Bharath Rupireddy.



On Wed, Dec 08, 2021 at 02:00:12PM +0530, Bharath Rupireddy wrote:
> For standby, "received from primary" makes sense. For cascading
> standby too it makes sense because the standby still acts as primary
> for cascading standby, no? And we don't distinguish any other existing
> messages for a standby acting as a cascading standby right? The
> comments around the enum XLOG_FROM_STREAM and usage of it still says
> from "primary".

Quoting the docs:
https://www.postgresql.org/docs/devel/glossary.html#GLOSSARY-PRIMARY-SERVER
And a standby is no such thing.

> Do you want me to add "received from stream" in general?

"from stream" would be fine, IMHO.

> How about we invent a new GUC log_recovery, with default set to false,
> similar to log_checkpoints? And, these messages can be like:
> ereport(log_recovery ? LOG : DEBUG1, .....)? This GUC can be useful to
> add some recovery stats in future as well. Thoughts?

This discussion does not justify a new GUC at this stage IMO.
--
Michael

Attachment
On Wed, Dec 8, 2021 at 2:08 PM Michael Paquier <michael@paquier.xyz> wrote:
> > Do you want me to add "received from stream" in general?
>
> "from stream" would be fine, IMHO.

Done in the attached v5 patch.

> > How about we invent a new GUC log_recovery, with default set to false,
> > similar to log_checkpoints? And, these messages can be like:
> > ereport(log_recovery ? LOG : DEBUG1, .....)? This GUC can be useful to
> > add some recovery stats in future as well. Thoughts?
>
> This discussion does not justify a new GUC at this stage IMO.

Having "log_recovery" GUC gives the flexibility to the users to enable
the recovery logs added by this patch if they are concerned otherwise
the logs just be at DEBUG1 level. Setting log_min_messages level to
DEBUG1 would definitely flood the server logs.

> While quickly testing, I got reminded that the LOG for a segment
> retrieved from the local pg_wal would generate some noise when running
> for example the bootstrap process.  Is this one really interesting
> compared to the two others?

If we just have the archive related message at LOG level and others at
DEBUG1 level, it is a kind of partial recovery log information that
one has in the log files, missing recovery messages from the pg_wal
directory in the logs.

Another option is to just have all the messages (added in the patch)
at DEBUG1 level, let's users deal with it.

IMO, the new GUC "log_recovery" is way better as it can be extended to
include some recovery stats info as well.

Thoughts?

Regards,
Bharath Rupireddy.

Attachment
On Wed, Dec 08, 2021 at 04:57:49PM +0530, Bharath Rupireddy wrote:
> Done in the attached v5 patch.

Thanks for the new version.  FWIW, as the information logged when
recovering a WAL segment from the local pg_wal could be rather
confusing at bootstrap phase, I would suggest to simplify things as of
the attached and call it a day.
--
Michael

Attachment
On Thu, Dec 9, 2021 at 7:56 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Dec 08, 2021 at 04:57:49PM +0530, Bharath Rupireddy wrote:
> > Done in the attached v5 patch.
>
> Thanks for the new version.  FWIW, as the information logged when
> recovering a WAL segment from the local pg_wal could be rather
> confusing at bootstrap phase, I would suggest to simplify things as of
> the attached and call it a day.

Thanks for the patch. It looks good to me. I have a suggestion: It's
better if we specify why we are not emitting any message if the WAL
segment is found in pg_wal directory, something like "we are
intentionally not emitting any messages in case if the required WAL
segment is found in the pg_wal directory to avoid noise in the server
logs".

+ case XLOG_FROM_PG_WAL:
+ case XLOG_FROM_ANY:
+ break;

Regards,
Bharath Rupireddy.



On Thu, Dec 9, 2021 at 11:28 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Dec 9, 2021 at 7:56 AM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Wed, Dec 08, 2021 at 04:57:49PM +0530, Bharath Rupireddy wrote:
> > > Done in the attached v5 patch.
> >
> > Thanks for the new version.  FWIW, as the information logged when
> > recovering a WAL segment from the local pg_wal could be rather
> > confusing at bootstrap phase, I would suggest to simplify things as of
> > the attached and call it a day.
>
> Thanks for the patch. It looks good to me. I have a suggestion: It's
> better if we specify why we are not emitting any message if the WAL
> segment is found in pg_wal directory, something like "we are
> intentionally not emitting any messages in case if the required WAL
> segment is found in the pg_wal directory to avoid noise in the server
> logs".
>
> + case XLOG_FROM_PG_WAL:
> + case XLOG_FROM_ANY:
> + break;

I just want to call this out: an insertion of 10 million rows in the
primary generates a bunch of messages in the standby [1] within 40 sec
(size of the standby server log grows by 5K).

[1]
2021-12-08 08:11:53.906 UTC [1219898] LOG:  recovering WAL segment
"000000010000000000000004" from stream
2021-12-08 08:11:55.864 UTC [1219898] LOG:  recovering WAL segment
"000000010000000000000005" from stream
2021-12-08 08:11:56.713 UTC [1219898] LOG:  recovering WAL segment
"000000010000000000000006" from stream
2021-12-08 08:11:57.495 UTC [1219898] LOG:  recovering WAL segment
"000000010000000000000007" from stream
2021-12-08 08:11:58.301 UTC [1219898] LOG:  recovering WAL segment
"000000010000000000000008" from stream
2021-12-08 08:11:59.062 UTC [1219898] LOG:  recovering WAL segment
"000000010000000000000009" from stream
2021-12-08 08:11:59.853 UTC [1219898] LOG:  recovering WAL segment
"00000001000000000000000A" from stream
2021-12-08 08:12:00.573 UTC [1219898] LOG:  recovering WAL segment
"00000001000000000000000B" from stream
2021-12-08 08:12:01.346 UTC [1219898] LOG:  recovering WAL segment
"00000001000000000000000C" from stream
2021-12-08 08:12:02.048 UTC [1219898] LOG:  recovering WAL segment
"00000001000000000000000D" from stream
2021-12-08 08:12:02.798 UTC [1219898] LOG:  recovering WAL segment
"00000001000000000000000E" from stream
2021-12-08 08:12:03.544 UTC [1219898] LOG:  recovering WAL segment
"00000001000000000000000F" from stream
2021-12-08 08:12:04.310 UTC [1219898] LOG:  recovering WAL segment
"000000010000000000000010" from stream
2021-12-08 08:12:05.096 UTC [1219898] LOG:  recovering WAL segment
"000000010000000000000011" from stream
2021-12-08 08:12:06.003 UTC [1219898] LOG:  recovering WAL segment
"000000010000000000000012" from stream
2021-12-08 08:12:06.762 UTC [1219898] LOG:  recovering WAL segment
"000000010000000000000013" from stream
2021-12-08 08:12:07.595 UTC [1219898] LOG:  recovering WAL segment
"000000010000000000000014" from stream
2021-12-08 08:12:08.431 UTC [1219898] LOG:  recovering WAL segment
"000000010000000000000015" from stream
2021-12-08 08:12:09.246 UTC [1219898] LOG:  recovering WAL segment
"000000010000000000000016" from stream
2021-12-08 08:12:10.144 UTC [1219898] LOG:  recovering WAL segment
"000000010000000000000017" from stream
2021-12-08 08:12:14.944 UTC [1219898] LOG:  recovering WAL segment
"000000010000000000000018" from stream
2021-12-08 08:12:15.784 UTC [1219898] LOG:  recovering WAL segment
"000000010000000000000019" from stream
2021-12-08 08:12:16.757 UTC [1219898] LOG:  recovering WAL segment
"00000001000000000000001A" from stream
2021-12-08 08:12:17.571 UTC [1219898] LOG:  recovering WAL segment
"00000001000000000000001B" from stream
2021-12-08 08:12:18.375 UTC [1219898] LOG:  recovering WAL segment
"00000001000000000000001C" from stream
2021-12-08 08:12:19.179 UTC [1219898] LOG:  recovering WAL segment
"00000001000000000000001D" from stream
2021-12-08 08:12:19.964 UTC [1219898] LOG:  recovering WAL segment
"00000001000000000000001E" from stream
2021-12-08 08:12:20.839 UTC [1219898] LOG:  recovering WAL segment
"00000001000000000000001F" from stream
2021-12-08 08:12:21.782 UTC [1219898] LOG:  recovering WAL segment
"000000010000000000000020" from stream
2021-12-08 08:12:24.422 UTC [1219898] LOG:  recovering WAL segment
"000000010000000000000021" from stream
2021-12-08 08:12:25.164 UTC [1219898] LOG:  recovering WAL segment
"000000010000000000000022" from stream
2021-12-08 08:12:25.989 UTC [1219898] LOG:  recovering WAL segment
"000000010000000000000023" from stream
2021-12-08 08:12:26.874 UTC [1219898] LOG:  recovering WAL segment
"000000010000000000000024" from stream
2021-12-08 08:12:27.592 UTC [1219898] LOG:  recovering WAL segment
"000000010000000000000025" from stream
2021-12-08 08:12:28.471 UTC [1219898] LOG:  recovering WAL segment
"000000010000000000000026" from stream
2021-12-08 08:12:29.158 UTC [1219898] LOG:  recovering WAL segment
"000000010000000000000027" from stream
2021-12-08 08:12:30.017 UTC [1219898] LOG:  recovering WAL segment
"000000010000000000000028" from stream
2021-12-08 08:12:30.778 UTC [1219898] LOG:  recovering WAL segment
"000000010000000000000029" from stream
2021-12-08 08:12:31.859 UTC [1219898] LOG:  recovering WAL segment
"00000001000000000000002A" from stream

Regards,
Bharath Rupireddy.



On 2021-Dec-09, Bharath Rupireddy wrote:

> I just want to call this out: an insertion of 10 million rows in the
> primary generates a bunch of messages in the standby [1] within 40 sec
> (size of the standby server log grows by 5K).

Hmm, that does sound excessive to me in terms of log bloat increase.
Remember the discussion about turning log_checkpoints on by default?


-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Las mujeres son como hondas:  mientras más resistencia tienen,
 más lejos puedes llegar con ellas"  (Jonas Nightingale, Leap of Faith)



On Thu, Dec 9, 2021 at 6:00 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2021-Dec-09, Bharath Rupireddy wrote:
>
> > I just want to call this out: an insertion of 10 million rows in the
> > primary generates a bunch of messages in the standby [1] within 40 sec
> > (size of the standby server log grows by 5K).
>
> Hmm, that does sound excessive to me in terms of log bloat increase.
> Remember the discussion about turning log_checkpoints on by default?

The amount of LOG messages generated when the log_checkpoints GUC is
set to on, are quite less, hardly 4 messages per-checkpoint (5min). I
think enabling log_checkpoints is still acceptable as most of the
hackers agreed in another thread [1] that the advantages with it are
more and it doesn't seem to be bloating the server logs (in a day at
max 1152 messages).

I'm not sure if it is worth having a GUC log_recovery if enabled the
recovery messages can be emitted at LOG level otherwise DEBUG1 level.
log_recovery GUC can also be used to collect and emit some recovery
stats like log_checkpoints.

[1] - https://www.postgresql.org/message-id/CA%2BTgmoaDFpFmNQuaWj6%2B78CPVBrF_WPT1wFHBTvio%3DtRmxzUcQ%40mail.gmail.com

Regards,
Bharath Rupireddy.



On 2021-Dec-09, Bharath Rupireddy wrote:

> On Thu, Dec 9, 2021 at 6:00 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > On 2021-Dec-09, Bharath Rupireddy wrote:
> >
> > > I just want to call this out: an insertion of 10 million rows in the
> > > primary generates a bunch of messages in the standby [1] within 40 sec
> > > (size of the standby server log grows by 5K).
> >
> > Hmm, that does sound excessive to me in terms of log bloat increase.
> > Remember the discussion about turning log_checkpoints on by default?
> 
> The amount of LOG messages generated when the log_checkpoints GUC is
> set to on, are quite less, hardly 4 messages per-checkpoint (5min). I
> think enabling log_checkpoints is still acceptable as most of the
> hackers agreed in another thread [1] that the advantages with it are
> more and it doesn't seem to be bloating the server logs (in a day at
> max 1152 messages).

My point is that it was argued, in that thread, that setting
log_checkpoints to on by default would cause too much additional log
volume.  That argument was shot down, but in this thread we're arguing
that we want five kilobytes of messages in forty seconds.  That sounds
much less acceptable to me, so making it default behavior or
unconditional behavior is not going to fly very high.

> I'm not sure if it is worth having a GUC log_recovery if enabled the
> recovery messages can be emitted at LOG level otherwise DEBUG1 level.

Maybe some tunable like
log_wal_traffic = { none, medium, high }
where "none" is current behavior of no noise, "medium" gets (say) once
every 256 segments (e.g., when going from XFF to (X+1)00), "high" gets
you one message per segment.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



On Thu, Dec 9, 2021 at 9:28 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2021-Dec-09, Bharath Rupireddy wrote:
>
> > On Thu, Dec 9, 2021 at 6:00 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > >
> > > On 2021-Dec-09, Bharath Rupireddy wrote:
> > >
> > > > I just want to call this out: an insertion of 10 million rows in the
> > > > primary generates a bunch of messages in the standby [1] within 40 sec
> > > > (size of the standby server log grows by 5K).
> > >
> > > Hmm, that does sound excessive to me in terms of log bloat increase.
> > > Remember the discussion about turning log_checkpoints on by default?
> >
> > The amount of LOG messages generated when the log_checkpoints GUC is
> > set to on, are quite less, hardly 4 messages per-checkpoint (5min). I
> > think enabling log_checkpoints is still acceptable as most of the
> > hackers agreed in another thread [1] that the advantages with it are
> > more and it doesn't seem to be bloating the server logs (in a day at
> > max 1152 messages).
>
> My point is that it was argued, in that thread, that setting
> log_checkpoints to on by default would cause too much additional log
> volume.  That argument was shot down, but in this thread we're arguing
> that we want five kilobytes of messages in forty seconds.  That sounds
> much less acceptable to me, so making it default behavior or
> unconditional behavior is not going to fly very high.
>
> > I'm not sure if it is worth having a GUC log_recovery if enabled the
> > recovery messages can be emitted at LOG level otherwise DEBUG1 level.
>
> Maybe some tunable like
> log_wal_traffic = { none, medium, high }
> where "none" is current behavior of no noise, "medium" gets (say) once
> every 256 segments (e.g., when going from XFF to (X+1)00), "high" gets
> you one message per segment.

Yeah, that can be an option.

Another idea could be to use the infrastructure laid out by the commit
9ce346e [1]. With ereport_startup_progress, we can emit the LOGs(of
current recovering WAL file) for every log_startup_progress_interval
seconds/milliseconds.

One problem is that ereport_startup_progress doesn't work on
StandbyMode, maybe we can remove this restriction unless we have a
major reason for not allowing it on the standby.
            /* Prepare to report progress of the redo phase. */
            if (!StandbyMode)
                begin_startup_progress_phase();

Thoughts?

[1]
commit 9ce346eabf350a130bba46be3f8c50ba28506969
Author: Robert Haas <rhaas@postgresql.org>
Date:   Mon Oct 25 11:51:57 2021 -0400

    Report progress of startup operations that take a long time.

    Discussion:
https://postgr.es/m/CA+TgmoaHQrgDFOBwgY16XCoMtXxsrVGFB2jNCvb7-ubuEe1MGg@mail.gmail.com
    Discussion:
https://postgr.es/m/CAMm1aWaHF7VE69572_OLQ+MgpT5RUiUDgF1x5RrtkJBLdpRj3Q@mail.gmail.com

Regards,
Bharath Rupireddy.



On Fri, Dec 24, 2021 at 7:19 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Sun, Dec 12, 2021 at 06:08:05PM +0530, Bharath Rupireddy wrote:
> > Another idea could be to use the infrastructure laid out by the commit
> > 9ce346e [1]. With ereport_startup_progress, we can emit the LOGs(of
> > current recovering WAL file) for every log_startup_progress_interval
> > seconds/milliseconds.
> >
> > One problem is that ereport_startup_progress doesn't work on
> > StandbyMode, maybe we can remove this restriction unless we have a
> > major reason for not allowing it on the standby.
> >             /* Prepare to report progress of the redo phase. */
> >             if (!StandbyMode)
> >                 begin_startup_progress_phase();
>
> The relevant conversation starts here.
> https://www.postgresql.org/message-id/20210814214700.GO10479%40telsasoft.com
>
> There was a lot of confusion in that thread, though.
>
> The understanding was that it didn't make sense for that feature to
> continuously log messages on a standby (every 10sec by default).  That seems
> like too much - the issue of a checkpointed logged every 5min was enough of a
> hurdle.
>
> If you're talking about a new feature that uses the infrastructre from 9ce3,
> but is controlled by a separate GUC like log_wal_traffic, that could be okay.

Do you see any problems using the same GUC
log_startup_progress_interval and ereport_startup_progress API
introduced by 9ce346e? I prefer this instead of a new GUC. Thoughts?

Regards,
Bharath Rupireddy.



On Thu, Dec 9, 2021 at 9:28 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> Maybe some tunable like
> log_wal_traffic = { none, medium, high }
> where "none" is current behavior of no noise, "medium" gets (say) once
> every 256 segments (e.g., when going from XFF to (X+1)00), "high" gets
> you one message per segment.

On Fri, Dec 24, 2021 at 7:19 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> If you're talking about a new feature that uses the infrastructre from 9ce3,
> but is controlled by a separate GUC like log_wal_traffic, that could be okay.

Thanks for the suggestion. I've added a new GUC log_wal_traffic as
suggested above. PSA  v7 patch.

Regards,
Bharath Rupireddy.

Attachment
On Wed, Dec 29, 2021 at 6:50 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Dec 9, 2021 at 9:28 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > Maybe some tunable like
> > log_wal_traffic = { none, medium, high }
> > where "none" is current behavior of no noise, "medium" gets (say) once
> > every 256 segments (e.g., when going from XFF to (X+1)00), "high" gets
> > you one message per segment.
>
> On Fri, Dec 24, 2021 at 7:19 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > If you're talking about a new feature that uses the infrastructre from 9ce3,
> > but is controlled by a separate GUC like log_wal_traffic, that could be okay.
>
> Thanks for the suggestion. I've added a new GUC log_wal_traffic as
> suggested above. PSA  v7 patch.

Here's the rebased v8 patch, please review it.

https://commitfest.postgresql.org/37/3443/

Regards,
Bharath Rupireddy.

Attachment
On Sun, Feb 27, 2022 at 12:50 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Dec 29, 2021 at 6:50 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Thu, Dec 9, 2021 at 9:28 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > >
> > > Maybe some tunable like
> > > log_wal_traffic = { none, medium, high }
> > > where "none" is current behavior of no noise, "medium" gets (say) once
> > > every 256 segments (e.g., when going from XFF to (X+1)00), "high" gets
> > > you one message per segment.
> >
> > On Fri, Dec 24, 2021 at 7:19 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > >
> > > If you're talking about a new feature that uses the infrastructre from 9ce3,
> > > but is controlled by a separate GUC like log_wal_traffic, that could be okay.
> >
> > Thanks for the suggestion. I've added a new GUC log_wal_traffic as
> > suggested above. PSA  v7 patch.
>
> Here's the rebased v8 patch, please review it.
>
> https://commitfest.postgresql.org/37/3443/

Here's the rebased v9 patch.

Regards,
Bharath Rupireddy.

Attachment
Did we ever consider the idea of using a new pg_stat_wal_activity_progress
view or something like that, using the backend_progress.c functionality?
I don't see it mentioned in the thread.

It's not an *exact* fit, since this is not about one "command" being
executed by a "backend", but it seems a near enough fit, and it would
unobtrusive enough that we don't need to worry about the progress-update
rate or have a GUC to determine how frequently to update (with a gauge
that's heavily workload-dependant and is likely to make little sense to
some users -- consider differently-sized WAL segments, for one thing).

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



On Fri, Apr 29, 2022 at 4:11 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> Did we ever consider the idea of using a new pg_stat_wal_activity_progress
> view or something like that, using the backend_progress.c functionality?
> I don't see it mentioned in the thread.

IMO, progress reporting works well on a running server and at the
moment. The WAL recovery/replay can happen even before the server
opens up for connections and the progress report view can't be used
for later analysis like how much time the restoring WAL files from
archive location took and also the WAL file names can't be reported in
progress reporting mechanism (only integers columns, of course if
required we can add text columns to pg_stat_get_progress_info). Having
the recovery info in server logs might help.

> It's not an *exact* fit, since this is not about one "command" being
> executed by a "backend", but it seems a near enough fit, and it would
> unobtrusive enough that we don't need to worry about the progress-update
> rate or have a GUC to determine how frequently to update (with a gauge
> that's heavily workload-dependant and is likely to make little sense to
> some users -- consider differently-sized WAL segments, for one thing).

I think reporting a long-running file processing operation (removing
or syncing) within postgres is a generic problem (for snapshot,
mapping, temporary (pgsql_tmp), temp relation files, old WAL file
processing, WAL file processing during recovery etc.) and needs to be
solved in two ways: 1) logging progress into server logs (which helps
for analysis and report when the server isn't available for
connections, crash recovery), a generic GUC
log_file_processing_traffic = {none, medium, high} might help here
(also proposed in [1]) and 2) pg_stat_file_processing_progress
(extending progress reporting pg_stat_get_progress_info to have few
text columns for current file name and directory path).

Thoughts?

Regards,
Bharath Rupireddy.



On 2022-May-05, Bharath Rupireddy wrote:

> On Fri, Apr 29, 2022 at 4:11 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > Did we ever consider the idea of using a new pg_stat_wal_activity_progress
> > view or something like that, using the backend_progress.c functionality?
> > I don't see it mentioned in the thread.
> 
> IMO, progress reporting works well on a running server and at the
> moment. The WAL recovery/replay can happen even before the server
> opens up for connections

It's definitely true that you wouldn't be able to use it for when the
server is not accepting connections.

> and the progress report view can't be used
> for later analysis like how much time the restoring WAL files from
> archive location took

This is true too -- progress report doesn't store historical data, only
current status.

> and also the WAL file names can't be reported in progress reporting
> mechanism

Also true.

> (only integers columns, of course if required we can add text columns
> to pg_stat_get_progress_info).

Yeah, I don't think adding text columns is terribly easy, because the
whole point of the progress reporting infrastructure is that it can be
updated very cheaply as atomic operations, and if you want to transmit
text columns, that's no longer possible.

> Having the recovery info in server logs might help.

I suppose it might.

> I think reporting a long-running file processing operation (removing
> or syncing) within postgres is a generic problem (for snapshot,
> mapping, temporary (pgsql_tmp), temp relation files, old WAL file
> processing, WAL file processing during recovery etc.) and needs to be
> solved

I agree up to here.

> in two ways: 1) logging progress into server logs (which helps
> for analysis and report when the server isn't available for
> connections, crash recovery), a generic GUC
> log_file_processing_traffic = {none, medium, high} might help here
> (also proposed in [1]) and 2) pg_stat_file_processing_progress
> (extending progress reporting pg_stat_get_progress_info to have few
> text columns for current file name and directory path).

I think using the server log to store telemetry data is not a great fit.
It bloats the log files and can be so slow as to block other operations
in the server.  Server logs should normally be considered critical info
that's not okay to lose; telemetry tends to be of secondary importance
and in a pinch you can drop a few messages without hurting too much.

We've done moderately okay so far with having some system views where
some telemetry readings can be obtained, but there several drawbacks to
that approach that we should at some point solve.  My opinion on this is
that we need to bite the bullet and develop separate infrastructure for
reporting server metrics.

That said, I'm not opposed to having a patch somewhat as posted.  I just
think that we should look into a new mechanism going forward.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



On Thu, May 5, 2022 at 2:07 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2022-May-05, Bharath Rupireddy wrote:
>
> > On Fri, Apr 29, 2022 at 4:11 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > >
> > > Did we ever consider the idea of using a new pg_stat_wal_activity_progress
> > > view or something like that, using the backend_progress.c functionality?
> > > I don't see it mentioned in the thread.
> >
> > IMO, progress reporting works well on a running server and at the
> > moment. The WAL recovery/replay can happen even before the server
> > opens up for connections
>
> It's definitely true that you wouldn't be able to use it for when the
> server is not accepting connections.
>
> > and the progress report view can't be used
> > for later analysis like how much time the restoring WAL files from
> > archive location took
>
> This is true too -- progress report doesn't store historical data, only
> current status.
>
> > and also the WAL file names can't be reported in progress reporting
> > mechanism
>
> Also true.
>
> > (only integers columns, of course if required we can add text columns
> > to pg_stat_get_progress_info).
>
> Yeah, I don't think adding text columns is terribly easy, because the
> whole point of the progress reporting infrastructure is that it can be
> updated very cheaply as atomic operations, and if you want to transmit
> text columns, that's no longer possible.
>
> > Having the recovery info in server logs might help.
>
> I suppose it might.
>
> > I think reporting a long-running file processing operation (removing
> > or syncing) within postgres is a generic problem (for snapshot,
> > mapping, temporary (pgsql_tmp), temp relation files, old WAL file
> > processing, WAL file processing during recovery etc.) and needs to be
> > solved
>
> I agree up to here.
>
> > in two ways: 1) logging progress into server logs (which helps
> > for analysis and report when the server isn't available for
> > connections, crash recovery), a generic GUC
> > log_file_processing_traffic = {none, medium, high} might help here
> > (also proposed in [1]) and 2) pg_stat_file_processing_progress
> > (extending progress reporting pg_stat_get_progress_info to have few
> > text columns for current file name and directory path).
>
> I think using the server log to store telemetry data is not a great fit.
> It bloats the log files and can be so slow as to block other operations
> in the server.  Server logs should normally be considered critical info
> that's not okay to lose; telemetry tends to be of secondary importance
> and in a pinch you can drop a few messages without hurting too much.
>
> We've done moderately okay so far with having some system views where
> some telemetry readings can be obtained, but there several drawbacks to
> that approach that we should at some point solve.  My opinion on this is
> that we need to bite the bullet and develop separate infrastructure for
> reporting server metrics.
>
> I just
> think that we should look into a new mechanism going forward.

I completely agree that we must have new ways to report important
server metrics in an easily consumable fashion - it could be something
like server computing metrics (time taken, the file being processed
etc.) important/time-taking operations such as execution of archive
command, restore command, recovery, checkpoint, old WAL files
removal/recycling, processing of various files - snapshot, mapping,
pgsql_tmp files, temp relation files etc. and writing these metrics to
a file or a stat table on the database itself (this table can have an
automatic mechanism of clean up or limit on number of rows or size of
the table so that the clients can read those metrics and use it in
whichever way they want. To start with, a simple structure of the
metrics can be {OPERATION text, START_TIME timestamp, END_TIME
timestamp, few text, double and/or integer flexible columns that each
operation can use to store metrics}.

I can start a discussion in a separate thread to seek more thoughts on
the server metrics part.

> That said, I'm not opposed to having a patch somewhat as posted.

Thanks. I'm thinking if we should have a generic GUC
log_file_processing_traffic = {none, medium, high} that might help
reporting other time taking file processing operations such as [1].

[1] https://www.postgresql.org/message-id/CALj2ACW-ELOF5JT2zPavs95wbZ0BrLPrqvSZ7Ac%2BpjxCkmXtEQ%40mail.gmail.com

Regards,
Bharath Rupireddy.



On Fri, Apr 29, 2022 at 5:11 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> Here's the rebased v9 patch.

This seems like it has enormous overlap with the existing
functionality that we have from log_startup_progress_interval.

I think that facility is also better-designed than this one. It prints
out a message based on elapsed time, whereas this patch prints out a
message based progress through the WAL. That means that if WAL replay
isn't actually advancing for some reason, you just won't get any log
messages and you don't know whether it's advancing slowly or not at
all or the server is just hung. With that facility you can distinguish
those cases.

Also, if for some reason we do think that amount of WAL replayed is
the right metric, rather than time, why would we only allow high=1
segment and low=128 segments, rather than say any number of MB or GB
that the user would like to configure?

I suggest that if log_startup_progress_interval doesn't meet your
needs here, we should try to understand why not and maybe enhance it,
instead of adding a separate facility.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



On Fri, May 13, 2022 at 6:41 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Apr 29, 2022 at 5:11 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > Here's the rebased v9 patch.
>
> This seems like it has enormous overlap with the existing
> functionality that we have from log_startup_progress_interval.
>
> I think that facility is also better-designed than this one. It prints
> out a message based on elapsed time, whereas this patch prints out a
> message based progress through the WAL. That means that if WAL replay
> isn't actually advancing for some reason, you just won't get any log
> messages and you don't know whether it's advancing slowly or not at
> all or the server is just hung. With that facility you can distinguish
> those cases.
>
> Also, if for some reason we do think that amount of WAL replayed is
> the right metric, rather than time, why would we only allow high=1
> segment and low=128 segments, rather than say any number of MB or GB
> that the user would like to configure?
>
> I suggest that if log_startup_progress_interval doesn't meet your
> needs here, we should try to understand why not and maybe enhance it,
> instead of adding a separate facility.

Thanks Robert!

In a production environment (of course with a better management of
server logs) one can set log_wal_traffic to "high" and emit the
required info to answer some of the customer questions like - "How far
the server is in recovery? How much time recovery of each WAL file
approximately took? How much time will it take to recover all the WAL
files? What's the rate of recovery - time per WAL file? etc."

Whereas ereport_startup_progress facility will help to emit log
messages only if "some" operation takes longer than set
log_startup_progress_interval time which may not serve the above
purpose.

Actually, IMO a generic GUC log_file_processing_traffic = {none,
medium, high} to help server emit logs for all the critical file
processing operations - WAL file recovery (as proposed here in this
thread), temp file processing during server startup or restarts
(log_startup_progress_interval can't be used here as postmaster
doesn't register for any timeouts) [1], snapshot and mapping file
processing during checkpoint, temp relation files, removing old WAL
files and so on.

Thoughts?

[1] https://www.postgresql.org/message-id/CALj2ACW-ELOF5JT2zPavs95wbZ0BrLPrqvSZ7Ac%2BpjxCkmXtEQ%40mail.gmail.com

Regards,
Bharath Rupireddy.



On Tue, May 17, 2022 at 1:32 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> In a production environment (of course with a better management of
> server logs) one can set log_wal_traffic to "high" and emit the
> required info to answer some of the customer questions like - "How far
> the server is in recovery? How much time recovery of each WAL file
> approximately took? How much time will it take to recover all the WAL
> files? What's the rate of recovery - time per WAL file? etc."
>
> Whereas ereport_startup_progress facility will help to emit log
> messages only if "some" operation takes longer than set
> log_startup_progress_interval time which may not serve the above
> purpose.

I think you are misunderstanding what an "operation" is here. If you
take the time to test out that facility, I think you will find that it
is quite capable of answering all of those questions.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



On Fri, May 13, 2022 at 09:10:52AM -0400, Robert Haas wrote:
> I suggest that if log_startup_progress_interval doesn't meet your
> needs here, we should try to understand why not and maybe enhance it,
> instead of adding a separate facility.

+1.  AFAICT it should be possible to make the log_startup_progress_interval
machinery extensible enough to be reused for several other tasks that can
take a while but have little visibility.

Since this thread has been inactive for over 2 months, I'm marking the
commitfest entry as Waiting on Author.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



On Fri, May 13, 2022 at 6:41 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Apr 29, 2022 at 5:11 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > Here's the rebased v9 patch.
>
> This seems like it has enormous overlap with the existing
> functionality that we have from log_startup_progress_interval.
>
> I think that facility is also better-designed than this one. It prints
> out a message based on elapsed time, whereas this patch prints out a
> message based progress through the WAL. That means that if WAL replay
> isn't actually advancing for some reason, you just won't get any log
> messages and you don't know whether it's advancing slowly or not at
> all or the server is just hung. With that facility you can distinguish
> those cases.
>
> Also, if for some reason we do think that amount of WAL replayed is
> the right metric, rather than time, why would we only allow high=1
> segment and low=128 segments, rather than say any number of MB or GB
> that the user would like to configure?
>
> I suggest that if log_startup_progress_interval doesn't meet your
> needs here, we should try to understand why not and maybe enhance it,
> instead of adding a separate facility.

After thinking for a while, I agree with Robert and others that we
could leverage the existing log_startup_progress_interval mechanism
for reporting which WAL file currently is being replayed. I added
current TLI (helps to construct the WAL file name from current LSN)
and current WAL file source (helps to know where the WAL files was
fetched from) to the existing "redo in progress, elapsed time:..."
message. This very well serves the purpose of identifying the issues
such as the restore command taking a lot of time (>
log_startup_progress_interval for instance), WAL replay rate on
standby or primary for long recoveries and so on.

However, ereport_startup_progress isn't enabled on standby to not let
it bloat the server logs. I believe the "redo in progress, elapsed
time:..." message can provide some important info/metric for standby
too and there's no way for the users to enable it on standbys today.
For instance, users can know how well the standby fares in replaying,
they can figure this out, by looking at two or more such messages. If
enabled, with default value of 10 sec for
log_startup_progress_interval, the standby can emit 8640 messages per
day which is too much. I'm not sure if we are okay to change the
default value of log_startup_progress_interval to say 1min or 5min so
that 1440 messages are emitted. In production environments, typically
users may or may not be interested if recovery takes just 10sec, but
they really are interested if it takes in the order of minutes.
Basically, I would like to enable "redo in progress, elapsed time:..."
message for standbys too.

Thoughts?

PSA v10 patch with enhanced "redo in progress, elapsed time:..."
message. Note that it's not a final patch though.

Regards,
Bharath Rupireddy.

Attachment