Thread: wait event and archive_command
Hi, I'd like to propose to add new wait event reported while archiver process is executing archive_command. This would be helpful to observe what archiver is doing and check whether it has some troubles or not. Thought? PoC patch attached. Also how about adding wait events for other commands like archive_cleanup_command, restore_command and recovery_end_command? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
On Thu, Oct 21, 2021 at 7:28 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > Hi, > > I'd like to propose to add new wait event reported while archiver process > is executing archive_command. This would be helpful to observe > what archiver is doing and check whether it has some troubles or not. > Thought? PoC patch attached. > > Also how about adding wait events for other commands like > archive_cleanup_command, restore_command and recovery_end_command? +1 for the wait event. The following activitymsg that are being set to ps display in XLogFileRead and pgarch_archiveXlog have come up for one of our internal team discussions recently: snprintf(activitymsg, sizeof(activitymsg), "waiting for %s", xlogfname); set_ps_display(activitymsg); snprintf(activitymsg, sizeof(activitymsg), "recovering %s", xlogfname); set_ps_display(activitymsg); snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog); set_ps_display(activitymsg); The ps display info might be useful if we run postgres on a stand alone box and there's someone monitoring at the ps output, but it doesn't help debugging after an issue has occurred. How about we have the following statements which will be useful for someone to look at the server logs and know what was/is happening during the recovery and archiving. IMO, we should also have the elog statement. elog(LOG, "waiting for %s", xlogfname); elog(LOG, "recovering %s"", xlogfname); elog(LOG, "archiving %s", xlog); Another idea could be to have a hook emitting the above info to outside components, but a hook just for this purpose isn't a great idea IMO. Thoughts? Regards, Bharath Rupireddy.
On Thu, Oct 21, 2021 at 10:57:50PM +0900, Fujii Masao wrote: > Also how about adding wait events for other commands like > archive_cleanup_command, restore_command and recovery_end_command? +1 to add something for all of them as we track the startup process in pg_stat_activity. Thinking with a larger picture, this comes down to the usage of system(). We could introduce a small wrapper of system() that takes as argument a wait event for the backend. -- Michael
Attachment
On 2021/10/21 23:55, Bharath Rupireddy wrote: >> Also how about adding wait events for other commands like >> archive_cleanup_command, restore_command and recovery_end_command? > > +1 for the wait event. Thanks! I added the wait events for also restore_command, etc into the patch. I attached that updated version of the patch. > The following activitymsg that are being set to ps display in > XLogFileRead and pgarch_archiveXlog have come up for one of our > internal team discussions recently: > > snprintf(activitymsg, sizeof(activitymsg), "waiting for %s", > xlogfname); > set_ps_display(activitymsg); > > snprintf(activitymsg, sizeof(activitymsg), "recovering %s", > xlogfname); > set_ps_display(activitymsg); > > snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog); > set_ps_display(activitymsg); > > The ps display info might be useful if we run postgres on a stand > alone box and there's someone monitoring at the ps output, but it > doesn't help debugging after an issue has occurred. How about we have > the following statements which will be useful for someone to look at > the server logs and know what was/is happening during the recovery and > archiving. If an issue occurs while the command is executing, the error message is logged, isn't it? Isn't that enough for your case? > IMO, we should also have the elog statement. > > elog(LOG, "waiting for %s", xlogfname); > elog(LOG, "recovering %s"", xlogfname); > elog(LOG, "archiving %s", xlog); I'm afraid that some people think that it's noisy to always log those messages. > Another idea could be to have a hook emitting the above info to > outside components, but a hook just for this purpose isn't a great > idea IMO. Yes, this idea sounds overkill to me. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
On 2021/10/22 18:32, Michael Paquier wrote: > On Thu, Oct 21, 2021 at 10:57:50PM +0900, Fujii Masao wrote: >> Also how about adding wait events for other commands like >> archive_cleanup_command, restore_command and recovery_end_command? > > +1 to add something for all of them as we track the startup process in > pg_stat_activity. Thinking with a larger picture, this comes down to > the usage of system(). We could introduce a small wrapper of system() > that takes as argument a wait event for the backend. That's an idea, but as far as I implemented the patch, introduing such wrapper function doesn't seem to simplify or improve the source code. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Mon, Nov 1, 2021 at 2:31 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > On 2021/10/21 23:55, Bharath Rupireddy wrote: > >> Also how about adding wait events for other commands like > >> archive_cleanup_command, restore_command and recovery_end_command? > > > > +1 for the wait event. > > Thanks! > I added the wait events for also restore_command, etc into the patch. > I attached that updated version of the patch. Thanks for the patch. It looks good to me other than the following comment: 1) Can't we determine the wait event type based on commandName in ExecuteRecoveryCommand instead of passing it as an extra param? Regards, Bharath Rupireddy.
Great, so great. Thanks you
Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> schrieb am Mi., 10. Nov. 2021, 12:20:
On Mon, Nov 1, 2021 at 2:31 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> On 2021/10/21 23:55, Bharath Rupireddy wrote:
> >> Also how about adding wait events for other commands like
> >> archive_cleanup_command, restore_command and recovery_end_command?
> >
> > +1 for the wait event.
>
> Thanks!
> I added the wait events for also restore_command, etc into the patch.
> I attached that updated version of the patch.
Thanks for the patch. It looks good to me other than the following comment:
1) Can't we determine the wait event type based on commandName in
ExecuteRecoveryCommand instead of passing it as an extra param?
Regards,
Bharath Rupireddy.
On Mon, Nov 1, 2021 at 2:31 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > The following activitymsg that are being set to ps display in > > XLogFileRead and pgarch_archiveXlog have come up for one of our > > internal team discussions recently: > > > > snprintf(activitymsg, sizeof(activitymsg), "waiting for %s", > > xlogfname); > > set_ps_display(activitymsg); > > > > snprintf(activitymsg, sizeof(activitymsg), "recovering %s", > > xlogfname); > > set_ps_display(activitymsg); > > > > snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog); > > set_ps_display(activitymsg); > > > > The ps display info might be useful if we run postgres on a stand > > alone box and there's someone monitoring at the ps output, but it > > doesn't help debugging after an issue has occurred. How about we have > > the following statements which will be useful for someone to look at > > the server logs and know what was/is happening during the recovery and > > archiving. > > If an issue occurs while the command is executing, > the error message is logged, isn't it? Isn't that enough for your case? You are right when an issue occurs. However, these messages will be useful 1) if the recovery or archiving is taking a lot of time and one would want to understand how it is progressing. 2) if these commands pass but an issue occurs in some other area of the code. IMO, we should have these as LOG messages instead of just setting in the ps display for temporary purposes which doesn't work well with the postgres on cloud where users/admins/developers don't get to see the ps display. > > IMO, we should also have the elog statement. > > > > elog(LOG, "waiting for %s", xlogfname); > > elog(LOG, "recovering %s"", xlogfname); > > elog(LOG, "archiving %s", xlog); > > I'm afraid that some people think that it's noisy to always log those messages. I don't think these are noisy messages at all. In fact, they will be useful to answer (if not exact answers, but an approximation) some of the customer queries like "what is happening in my server during the recovery/archiving phase? how much more time recovery might take?". Today, the server emits lot of LOGs, adding these will not blow up the server logs at all if the log rotation policy is configured appropriately. Having said the above, I plan to discuss these things in a separate thread. Regards, Bharath Rupireddy.
On Wed, Nov 10, 2021 at 5:00 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Mon, Nov 1, 2021 at 2:31 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > The following activitymsg that are being set to ps display in > > > XLogFileRead and pgarch_archiveXlog have come up for one of our > > > internal team discussions recently: > > > > > > snprintf(activitymsg, sizeof(activitymsg), "waiting for %s", > > > xlogfname); > > > set_ps_display(activitymsg); > > > > > > snprintf(activitymsg, sizeof(activitymsg), "recovering %s", > > > xlogfname); > > > set_ps_display(activitymsg); > > > > > > snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog); > > > set_ps_display(activitymsg); > > > > > > The ps display info might be useful if we run postgres on a stand > > > alone box and there's someone monitoring at the ps output, but it > > > doesn't help debugging after an issue has occurred. How about we have > > > the following statements which will be useful for someone to look at > > > the server logs and know what was/is happening during the recovery and > > > archiving. > > > > If an issue occurs while the command is executing, > > the error message is logged, isn't it? Isn't that enough for your case? > > You are right when an issue occurs. However, these messages will be > useful 1) if the recovery or archiving is taking a lot of time and one > would want to understand how it is progressing. 2) if these commands > pass but an issue occurs in some other area of the code. IMO, we > should have these as LOG messages instead of just setting in the ps > display for temporary purposes which doesn't work well with the > postgres on cloud where users/admins/developers don't get to see the > ps display. > > > > IMO, we should also have the elog statement. > > > > > > elog(LOG, "waiting for %s", xlogfname); > > > elog(LOG, "recovering %s"", xlogfname); > > > elog(LOG, "archiving %s", xlog); > > > > I'm afraid that some people think that it's noisy to always log those messages. > > I don't think these are noisy messages at all. In fact, they will be > useful to answer (if not exact answers, but an approximation) some of > the customer queries like "what is happening in my server during the > recovery/archiving phase? how much more time recovery might take?". > Today, the server emits lot of LOGs, adding these will not blow up the > server logs at all if the log rotation policy is configured > appropriately. > > Having said the above, I plan to discuss these things in a separate thread. Just for the records - I've started a new thread for the above discussion - https://www.postgresql.org/message-id/CALj2ACUfMU%3Dahxivfy%2BZmpVZccd5PASG-_10mLpM55_Y_h4-VA%40mail.gmail.com Regards, Bharath Rupireddy.
On 2021/11/10 20:19, Bharath Rupireddy wrote: > Thanks for the patch. It looks good to me other than the following comment: Thanks for the review! > 1) Can't we determine the wait event type based on commandName in > ExecuteRecoveryCommand instead of passing it as an extra param? Yes, that's possible. But isn't it uglier to make ExecuteRecoveryCommand() have the map of command name and wait event? So I feel inclined to avoid adding something like the following code into the function... Thought? if (strcmp(commandName, "recovery_end_command") == 0) wait_event_info = WAIT_EVENT_RECOVERY_END_COMMAND; else if (strcmp(commandName, "archive_command_command") == 0) ... Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Thu, Nov 18, 2021 at 11:23:17AM +0900, Fujii Masao wrote: > Yes, that's possible. But isn't it uglier to make ExecuteRecoveryCommand() have > the map of command name and wait event? So I feel inclined to avoid adding > something like the following code into the function... Thought? FWIW, I find cleaner, and less bug-prone, the approach taken by Fujii-san's patch to have the wait event set as an argument of the function rather than trying to guess it from the command data. -- Michael
Attachment
On Thu, Nov 18, 2021 at 7:53 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > 1) Can't we determine the wait event type based on commandName in > > ExecuteRecoveryCommand instead of passing it as an extra param? > > Yes, that's possible. But isn't it uglier to make ExecuteRecoveryCommand() have > the map of command name and wait event? So I feel inclined to avoid adding > something like the following code into the function... Thought? > > if (strcmp(commandName, "recovery_end_command") == 0) > wait_event_info = WAIT_EVENT_RECOVERY_END_COMMAND; > else if (strcmp(commandName, "archive_command_command") == 0) Yeah let's not do that. I'm fine with the wait_event_for_archive_command_v2.patch as is. Regards, Bharath Rupireddy.
On Thu, Nov 18, 2021 at 10:04:57AM +0530, Bharath Rupireddy wrote: > Yeah let's not do that. I'm fine with the > wait_event_for_archive_command_v2.patch as is. Switched the patch as RfC, then. -- Michael
Attachment
On 2021/11/19 16:54, Michael Paquier wrote: > On Thu, Nov 18, 2021 at 10:04:57AM +0530, Bharath Rupireddy wrote: >> Yeah let's not do that. I'm fine with the >> wait_event_for_archive_command_v2.patch as is. > > Switched the patch as RfC, then. Thanks! Barring any objection, I will commit the patch. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2021/11/20 0:19, Fujii Masao wrote: > > > On 2021/11/19 16:54, Michael Paquier wrote: >> On Thu, Nov 18, 2021 at 10:04:57AM +0530, Bharath Rupireddy wrote: >>> Yeah let's not do that. I'm fine with the >>> wait_event_for_archive_command_v2.patch as is. >> >> Switched the patch as RfC, then. > > Thanks! Barring any objection, I will commit the patch. Pushed. Thanks! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION