Re: when the startup process doesn't (logging startup delays) - Mailing list pgsql-hackers

From Nitin Jadhav
Subject Re: when the startup process doesn't (logging startup delays)
Date
Msg-id CAMm1aWZi5FV=m2hvkTj5bjQ03DYece3HMJGngdXbRiSuaY2CWw@mail.gmail.com
Whole thread Raw
In response to Re: when the startup process doesn't (logging startup delays)  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: when the startup process doesn't (logging startup delays)  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
> I was fooling around with a test setup today, working on an unrelated
> problem, and this happened:
>
> 2021-10-28 14:21:23.145 EDT [92010] LOG:  resetting unlogged relations
> (init), elapsed time: 0.00 s, current path: base/13020

Nice catch and interesting case.

> That's not supposed to happen. I assume the problem is that the
> timeout for the previous phase fired just as we were beginning a new
> one, and the code got confused. I think we probably need to do
> something like this to make sure that the timeout from one operation
> can't trigger a log message for the next:
>
> diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
> index 28e68dd871..47ec737888 100644
> --- a/src/backend/postmaster/startup.c
> +++ b/src/backend/postmaster/startup.c
> @@ -320,6 +320,8 @@ begin_startup_progress_phase(void)
>     if (log_startup_progress_interval == 0)
>         return;
>
> +    disable_timeout(STARTUP_PROGRESS_TIMEOUT, false);
> +    startup_progress_timer_expired = false;
>     startup_progress_phase_start_time = GetCurrentTimestamp();
>     fin_time = TimestampTzPlusMilliseconds(startup_progress_phase_start_time,
>                                            log_startup_progress_interval);
>
> Thoughts?

Yes. I agree that this is not an expected behaviour and I do agree
that, probably the timeout for the previous phase fired just as we
were beginning a new one. For each operation, we call
begin_startup_progress_phase() before starting the operation and
one/multiple calls to ereport_startup_progress() to report the
progress and intentionally we have not added any functionality to
disable the timer at the end of the operation. The timer remains
active and there may be multiple calls to
startup_progress_timeout_handler() which sets a flag to true. So
whenever we call a begin_startup_progress_phase() for the next
operation, we do enable the timer (In my understanding it will
internally disable the old timer and schedule a new one) but the flag
is already set to true by the previous timer. Hence the next call to
ereport_startup_progress() logs a message. So I feel just setting
'startup_progress_timer_expired' to false in
begin_startup_progress_phase() would work. Please correct me if I am
wrong.

Thanks & Regards,
Nitin Jadhav

On Thu, Oct 28, 2021 at 11:59 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Oct 25, 2021 at 11:56 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > This version looks fine, so I have committed it (and my
> > enable_timeout_every patch also, as a necessary prerequisite).
>
> I was fooling around with a test setup today, working on an unrelated
> problem, and this happened:
>
> 2021-10-28 14:21:23.145 EDT [92010] LOG:  resetting unlogged relations
> (init), elapsed time: 0.00 s, current path: base/13020
>
> That's not supposed to happen. I assume the problem is that the
> timeout for the previous phase fired just as we were beginning a new
> one, and the code got confused. I think we probably need to do
> something like this to make sure that the timeout from one operation
> can't trigger a log message for the next:
>
> diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
> index 28e68dd871..47ec737888 100644
> --- a/src/backend/postmaster/startup.c
> +++ b/src/backend/postmaster/startup.c
> @@ -320,6 +320,8 @@ begin_startup_progress_phase(void)
>      if (log_startup_progress_interval == 0)
>          return;
>
> +    disable_timeout(STARTUP_PROGRESS_TIMEOUT, false);
> +    startup_progress_timer_expired = false;
>      startup_progress_phase_start_time = GetCurrentTimestamp();
>      fin_time = TimestampTzPlusMilliseconds(startup_progress_phase_start_time,
>                                             log_startup_progress_interval);
>
> Thoughts?
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Teach pg_receivewal to use lz4 compression
Next
From: Daniel Gustafsson
Date:
Subject: Re: Support for NSS as a libpq TLS backend