Thread: wait event and archive_command

wait event and archive_command

From
Fujii Masao
Date:
Hi,

I'd like to propose to add new wait event reported while archiver process
is executing archive_command. This would be helpful to observe
what archiver is doing and check whether it has some troubles or not.
Thought? PoC patch attached.

Also how about adding wait events for other commands like
archive_cleanup_command, restore_command and recovery_end_command?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment

Re: wait event and archive_command

From
Bharath Rupireddy
Date:
On Thu, Oct 21, 2021 at 7:28 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> Hi,
>
> I'd like to propose to add new wait event reported while archiver process
> is executing archive_command. This would be helpful to observe
> what archiver is doing and check whether it has some troubles or not.
> Thought? PoC patch attached.
>
> Also how about adding wait events for other commands like
> archive_cleanup_command, restore_command and recovery_end_command?

+1 for the wait event.

The following activitymsg that are being set to ps display in
XLogFileRead and pgarch_archiveXlog have come up for one of our
internal team discussions recently:

            snprintf(activitymsg, sizeof(activitymsg), "waiting for %s",
                     xlogfname);
            set_ps_display(activitymsg);

        snprintf(activitymsg, sizeof(activitymsg), "recovering %s",
                 xlogfname);
        set_ps_display(activitymsg);

    snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
    set_ps_display(activitymsg);

The ps display info might be useful if we run postgres on a stand
alone box and there's someone monitoring at the ps output, but it
doesn't help debugging after an issue has occurred. How about we have
the following statements which will be useful for someone to look at
the server logs and know what was/is happening during the recovery and
archiving. IMO, we should also have the elog statement.

elog(LOG, "waiting for %s", xlogfname);
elog(LOG, "recovering %s"", xlogfname);
elog(LOG, "archiving %s", xlog);

Another idea could be to have a hook emitting the above info to
outside components, but a hook just for this purpose isn't a great
idea IMO.

Thoughts?

Regards,
Bharath Rupireddy.



Re: wait event and archive_command

From
Michael Paquier
Date:
On Thu, Oct 21, 2021 at 10:57:50PM +0900, Fujii Masao wrote:
> Also how about adding wait events for other commands like
> archive_cleanup_command, restore_command and recovery_end_command?

+1 to add something for all of them as we track the startup process in
pg_stat_activity.  Thinking with a larger picture, this comes down to
the usage of system().  We could introduce a small wrapper of system()
that takes as argument a wait event for the backend.
--
Michael

Attachment

Re: wait event and archive_command

From
Fujii Masao
Date:

On 2021/10/21 23:55, Bharath Rupireddy wrote:
>> Also how about adding wait events for other commands like
>> archive_cleanup_command, restore_command and recovery_end_command?
> 
> +1 for the wait event.

Thanks!
I added the wait events for also restore_command, etc into the patch.
I attached that updated version of the patch.


> The following activitymsg that are being set to ps display in
> XLogFileRead and pgarch_archiveXlog have come up for one of our
> internal team discussions recently:
> 
>              snprintf(activitymsg, sizeof(activitymsg), "waiting for %s",
>                       xlogfname);
>              set_ps_display(activitymsg);
> 
>          snprintf(activitymsg, sizeof(activitymsg), "recovering %s",
>                   xlogfname);
>          set_ps_display(activitymsg);
> 
>      snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
>      set_ps_display(activitymsg);
> 
> The ps display info might be useful if we run postgres on a stand
> alone box and there's someone monitoring at the ps output, but it
> doesn't help debugging after an issue has occurred. How about we have
> the following statements which will be useful for someone to look at
> the server logs and know what was/is happening during the recovery and
> archiving.

If an issue occurs while the command is executing,
the error message is logged, isn't it? Isn't that enough for your case?


> IMO, we should also have the elog statement.
> 
> elog(LOG, "waiting for %s", xlogfname);
> elog(LOG, "recovering %s"", xlogfname);
> elog(LOG, "archiving %s", xlog);

I'm afraid that some people think that it's noisy to always log those messages.


> Another idea could be to have a hook emitting the above info to
> outside components, but a hook just for this purpose isn't a great
> idea IMO.

Yes, this idea sounds overkill to me.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment

Re: wait event and archive_command

From
Fujii Masao
Date:

On 2021/10/22 18:32, Michael Paquier wrote:
> On Thu, Oct 21, 2021 at 10:57:50PM +0900, Fujii Masao wrote:
>> Also how about adding wait events for other commands like
>> archive_cleanup_command, restore_command and recovery_end_command?
> 
> +1 to add something for all of them as we track the startup process in
> pg_stat_activity.  Thinking with a larger picture, this comes down to
> the usage of system().  We could introduce a small wrapper of system()
> that takes as argument a wait event for the backend.

That's an idea, but as far as I implemented the patch, introduing such wrapper
function doesn't seem to simplify or improve the source code.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: wait event and archive_command

From
Bharath Rupireddy
Date:
On Mon, Nov 1, 2021 at 2:31 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> On 2021/10/21 23:55, Bharath Rupireddy wrote:
> >> Also how about adding wait events for other commands like
> >> archive_cleanup_command, restore_command and recovery_end_command?
> >
> > +1 for the wait event.
>
> Thanks!
> I added the wait events for also restore_command, etc into the patch.
> I attached that updated version of the patch.

Thanks for the patch. It looks good to me other than the following comment:

1) Can't we determine the wait event type based on commandName in
ExecuteRecoveryCommand instead of passing it as an extra param?

Regards,
Bharath Rupireddy.



Re: wait event and archive_command

From
Sascha Kuhl
Date:
Great, so great. Thanks you

Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> schrieb am Mi., 10. Nov. 2021, 12:20:
On Mon, Nov 1, 2021 at 2:31 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> On 2021/10/21 23:55, Bharath Rupireddy wrote:
> >> Also how about adding wait events for other commands like
> >> archive_cleanup_command, restore_command and recovery_end_command?
> >
> > +1 for the wait event.
>
> Thanks!
> I added the wait events for also restore_command, etc into the patch.
> I attached that updated version of the patch.

Thanks for the patch. It looks good to me other than the following comment:

1) Can't we determine the wait event type based on commandName in
ExecuteRecoveryCommand instead of passing it as an extra param?

Regards,
Bharath Rupireddy.


Re: wait event and archive_command

From
Bharath Rupireddy
Date:
On Mon, Nov 1, 2021 at 2:31 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > The following activitymsg that are being set to ps display in
> > XLogFileRead and pgarch_archiveXlog have come up for one of our
> > internal team discussions recently:
> >
> >              snprintf(activitymsg, sizeof(activitymsg), "waiting for %s",
> >                       xlogfname);
> >              set_ps_display(activitymsg);
> >
> >          snprintf(activitymsg, sizeof(activitymsg), "recovering %s",
> >                   xlogfname);
> >          set_ps_display(activitymsg);
> >
> >      snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
> >      set_ps_display(activitymsg);
> >
> > The ps display info might be useful if we run postgres on a stand
> > alone box and there's someone monitoring at the ps output, but it
> > doesn't help debugging after an issue has occurred. How about we have
> > the following statements which will be useful for someone to look at
> > the server logs and know what was/is happening during the recovery and
> > archiving.
>
> If an issue occurs while the command is executing,
> the error message is logged, isn't it? Isn't that enough for your case?

You are right when an issue occurs. However, these messages will be
useful 1) if the recovery or archiving is taking a lot of time and one
would want to understand how it is progressing. 2) if these commands
pass but an issue occurs in some other area of the code. IMO, we
should have these as LOG messages instead of just setting in the ps
display for temporary purposes which doesn't work well with the
postgres on cloud where users/admins/developers don't get to see the
ps display.

> > IMO, we should also have the elog statement.
> >
> > elog(LOG, "waiting for %s", xlogfname);
> > elog(LOG, "recovering %s"", xlogfname);
> > elog(LOG, "archiving %s", xlog);
>
> I'm afraid that some people think that it's noisy to always log those messages.

I don't think these are noisy messages at all. In fact, they will be
useful to answer (if not exact answers, but an approximation) some of
the customer queries like "what is happening in my server during the
recovery/archiving phase? how much more time recovery might take?".
Today, the server emits lot of LOGs, adding these will not blow up the
server logs at all if the log rotation policy is configured
appropriately.

Having said the above, I plan to discuss these things in a separate thread.

Regards,
Bharath Rupireddy.



Re: wait event and archive_command

From
Bharath Rupireddy
Date:
On Wed, Nov 10, 2021 at 5:00 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Nov 1, 2021 at 2:31 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > > The following activitymsg that are being set to ps display in
> > > XLogFileRead and pgarch_archiveXlog have come up for one of our
> > > internal team discussions recently:
> > >
> > >              snprintf(activitymsg, sizeof(activitymsg), "waiting for %s",
> > >                       xlogfname);
> > >              set_ps_display(activitymsg);
> > >
> > >          snprintf(activitymsg, sizeof(activitymsg), "recovering %s",
> > >                   xlogfname);
> > >          set_ps_display(activitymsg);
> > >
> > >      snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
> > >      set_ps_display(activitymsg);
> > >
> > > The ps display info might be useful if we run postgres on a stand
> > > alone box and there's someone monitoring at the ps output, but it
> > > doesn't help debugging after an issue has occurred. How about we have
> > > the following statements which will be useful for someone to look at
> > > the server logs and know what was/is happening during the recovery and
> > > archiving.
> >
> > If an issue occurs while the command is executing,
> > the error message is logged, isn't it? Isn't that enough for your case?
>
> You are right when an issue occurs. However, these messages will be
> useful 1) if the recovery or archiving is taking a lot of time and one
> would want to understand how it is progressing. 2) if these commands
> pass but an issue occurs in some other area of the code. IMO, we
> should have these as LOG messages instead of just setting in the ps
> display for temporary purposes which doesn't work well with the
> postgres on cloud where users/admins/developers don't get to see the
> ps display.
>
> > > IMO, we should also have the elog statement.
> > >
> > > elog(LOG, "waiting for %s", xlogfname);
> > > elog(LOG, "recovering %s"", xlogfname);
> > > elog(LOG, "archiving %s", xlog);
> >
> > I'm afraid that some people think that it's noisy to always log those messages.
>
> I don't think these are noisy messages at all. In fact, they will be
> useful to answer (if not exact answers, but an approximation) some of
> the customer queries like "what is happening in my server during the
> recovery/archiving phase? how much more time recovery might take?".
> Today, the server emits lot of LOGs, adding these will not blow up the
> server logs at all if the log rotation policy is configured
> appropriately.
>
> Having said the above, I plan to discuss these things in a separate thread.

Just for the records - I've started a new thread for the above
discussion -
https://www.postgresql.org/message-id/CALj2ACUfMU%3Dahxivfy%2BZmpVZccd5PASG-_10mLpM55_Y_h4-VA%40mail.gmail.com

Regards,
Bharath Rupireddy.



Re: wait event and archive_command

From
Fujii Masao
Date:

On 2021/11/10 20:19, Bharath Rupireddy wrote:
> Thanks for the patch. It looks good to me other than the following comment:

Thanks for the review!


> 1) Can't we determine the wait event type based on commandName in
> ExecuteRecoveryCommand instead of passing it as an extra param?

Yes, that's possible. But isn't it uglier to make ExecuteRecoveryCommand() have
the map of command name and wait event? So I feel inclined to avoid adding
something like the following code into the function... Thought?

if (strcmp(commandName, "recovery_end_command") == 0)
     wait_event_info = WAIT_EVENT_RECOVERY_END_COMMAND;
else if (strcmp(commandName, "archive_command_command") == 0)
...

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: wait event and archive_command

From
Michael Paquier
Date:
On Thu, Nov 18, 2021 at 11:23:17AM +0900, Fujii Masao wrote:
> Yes, that's possible. But isn't it uglier to make ExecuteRecoveryCommand() have
> the map of command name and wait event? So I feel inclined to avoid adding
> something like the following code into the function... Thought?

FWIW, I find cleaner, and less bug-prone, the approach taken by
Fujii-san's patch to have the wait event set as an argument of the
function rather than trying to guess it from the command data.
--
Michael

Attachment

Re: wait event and archive_command

From
Bharath Rupireddy
Date:
On Thu, Nov 18, 2021 at 7:53 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > 1) Can't we determine the wait event type based on commandName in
> > ExecuteRecoveryCommand instead of passing it as an extra param?
>
> Yes, that's possible. But isn't it uglier to make ExecuteRecoveryCommand() have
> the map of command name and wait event? So I feel inclined to avoid adding
> something like the following code into the function... Thought?
>
> if (strcmp(commandName, "recovery_end_command") == 0)
>      wait_event_info = WAIT_EVENT_RECOVERY_END_COMMAND;
> else if (strcmp(commandName, "archive_command_command") == 0)

Yeah let's not do that. I'm fine with the
wait_event_for_archive_command_v2.patch as is.

Regards,
Bharath Rupireddy.



Re: wait event and archive_command

From
Michael Paquier
Date:
On Thu, Nov 18, 2021 at 10:04:57AM +0530, Bharath Rupireddy wrote:
> Yeah let's not do that. I'm fine with the
> wait_event_for_archive_command_v2.patch as is.

Switched the patch as RfC, then.
--
Michael

Attachment

Re: wait event and archive_command

From
Fujii Masao
Date:

On 2021/11/19 16:54, Michael Paquier wrote:
> On Thu, Nov 18, 2021 at 10:04:57AM +0530, Bharath Rupireddy wrote:
>> Yeah let's not do that. I'm fine with the
>> wait_event_for_archive_command_v2.patch as is.
> 
> Switched the patch as RfC, then.

Thanks! Barring any objection, I will commit the patch.

Regards,


-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: wait event and archive_command

From
Fujii Masao
Date:

On 2021/11/20 0:19, Fujii Masao wrote:
> 
> 
> On 2021/11/19 16:54, Michael Paquier wrote:
>> On Thu, Nov 18, 2021 at 10:04:57AM +0530, Bharath Rupireddy wrote:
>>> Yeah let's not do that. I'm fine with the
>>> wait_event_for_archive_command_v2.patch as is.
>>
>> Switched the patch as RfC, then.
> 
> Thanks! Barring any objection, I will commit the patch.

Pushed. Thanks!

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION