Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module
Date
Msg-id CALj2ACVh2m4o+8T-Y9oXN22RUgGBAqcKcQdyA2+fgt4faeoGHQ@mail.gmail.com
Whole thread Raw
In response to Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
On Wed, Nov 25, 2020 at 3:29 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> On 2020/11/20 19:33, Bharath Rupireddy wrote:
> > On Wed, Nov 18, 2020 at 5:52 PM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> wrote:
> >  >
> >  > > handle_sigterm() and die() are similar except that die() has extra
> >  > > handling(below) for exiting immediately when waiting for input/command
> >  > > from the client.
> >  > >      /*
> >  > >       * If we're in single user mode, we want to quit immediately - we can't
> >  > >       * rely on latches as they wouldn't work when stdin/stdout is a file.
> >  > >       * Rather ugly, but it's unlikely to be worthwhile to invest much more
> >  > >       * effort just for the benefit of single user mode.
> >  > >       */
> >  > >      if (DoingCommandRead && whereToSendOutput != DestRemote)
> >  > >          ProcessInterrupts();
> >  > >
> >  > > Having this extra handling is correct for normal backends as they can
> >  > > connect directly to clients for reading input commands, the above if
> >  > > condition may become true and ProcessInterrupts() may be called to
> >  > > handle the SIGTERM immediately.
> >  > >
> >  > > Note that DoingCommandRead can never be true in bg workers as it's
> >  > > being set to true only in normal backend PostgresMain(). If we use
> >  > > die()(instead of custom SIGTERM handlers such as handle_sigterm()) in
> >  > > a bg worker/non-backend process, there are no chances that the above
> >  > > part of the code gets hit and the interrupts are processed
> >  > > immediately. And also here are the bg worker process that use die()
> >  > > instead of their own custom handlers: parallel workers, autoprewarm
> >  > > worker, autovacuum worker, logical replication launcher and apply
> >  > > worker, wal sender process
> >  > >
> >  > > I think we can also use die() instead of handle_sigterm() in
> >  > > test_shm_mq/worker.c.
> >  > >
> >  > > I attached a patch for this change.
> >  >
> >  > Thanks for the patch! It looks good to me.
> >  >
> >
> > Thanks.
>
> When I read the patch again, I found that, with the patch, the shutdown
> of worker_spi causes to report the following FATAL message.
>
>      FATAL:  terminating connection due to administrator command
>
> Isn't this message confusing because it's not a connection? If so,
> we need to update ProcessInterrupts() so that the proper message is
> reported like other bgworkers do.
>

This is also true for all the bgworker that use the die() handler. How about doing it the way bgworker_die() does in ProcessInterrupts()? This would give meaningful information. Thoughts? If okay, I can make a separate patch.

        else if (RecoveryConflictPending)
        {
            /* Currently there is only one non-retryable recovery conflict */
            Assert(RecoveryConflictReason == PROCSIG_RECOVERY_CONFLICT_DATABASE);
            pgstat_report_recovery_conflict(RecoveryConflictReason);
            ereport(FATAL,
                    (errcode(ERRCODE_DATABASE_DROPPED),
                     errmsg("terminating connection due to conflict with recovery"),
                     errdetail_recovery_conflict()));
        }
       else if (IsBackgroundWorker)
       {
               ereport(FATAL,
                           (errcode(ERRCODE_ADMIN_SHUTDOWN),
                            errmsg("terminating background worker \"%s\" due to administrator command",
                            MyBgworkerEntry->bgw_type)));
       }

        else
            ereport(FATAL,
                    (errcode(ERRCODE_ADMIN_SHUTDOWN),
                     errmsg("terminating connection due to administrator command")));

>
> BTW, though this is not directly related to this topic, it's strange to me
> that the normal shutdown of bgworker causes to emit FATAL message
> and the message "background worker ... exited with exit code 1"
> at the normal shutdown. I've heard sometimes that users coplained that
> it's confusing to report that message for logical replication launcher
> at the normal shutdown. Maybe the mechanism to shutdown bgworker
> normally seems to have not been implemented well yet.
>

What do you mean by normal shutdown of bgworker? Is it that bgworker has exited successfully with exit code 0 or for some reason with exit code other than 0? Or is it when the postmaster is shutdown normally?

IIUC, when a bgworker exists either normally with an exit code 0 or other than 0, then CleanupBackgroundWorker() is called in postmaster, a message(like below) is prepared, and the LogChildExit() is called with either DEBUG1 or LOG level and for instance the message you specified gets printed "background worker ... exited with exit code 1". I have not seen a FATAL message similar to "background worker ... exited with exit code 1" at the normal shutdown.

snprintf(namebuf, MAXPGPATH, _("background worker \"%s\""), rw->rw_worker.bgw_type);

LogChildExit(EXIT_STATUS_0(exitstatus) ? DEBUG1 : LOG, namebuf, pid, exitstatus);

Am I missing something?

If my analysis is right, then for instance, when a logical replication launcher is exited, it logs "background worker "logical replication launcher" exited with exit code X" with either DEBUG1 or LOG level but not with FATAL level.

>
> Without the patch, since worker_spi cannot handle the shutdown request
> promptly while it's executing the backend code, maybe we can say this is
> a bug. But worker_spi is a just example of the code, I'm not thinking to
> do back-patch for now. Thought?
>

+1 to not backpatch it.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Pavel Borisov
Date:
Subject: Re: Is postgres ready for 2038?
Next
From: Anastasia Lubennikova
Date:
Subject: Re: Improper use about DatumGetInt32