Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module
Date
Msg-id 8a7c4f21-ccee-b454-a265-c6b788b5c5d1@oss.nttdata.com
Whole thread Raw
In response to Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers

On 2020/11/25 23:38, Bharath Rupireddy wrote:
> On Wed, Nov 25, 2020 at 3:29 PM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>
wrote:
>  >
>  > On 2020/11/20 19:33, Bharath Rupireddy wrote:
>  > > On Wed, Nov 18, 2020 at 5:52 PM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>
<mailto:masao.fujii@oss.nttdata.com<mailto:masao.fujii@oss.nttdata.com>>> wrote:
 
>  > >  >
>  > >  > > handle_sigterm() and die() are similar except that die() has extra
>  > >  > > handling(below) for exiting immediately when waiting for input/command
>  > >  > > from the client.
>  > >  > >      /*
>  > >  > >       * If we're in single user mode, we want to quit immediately - we can't
>  > >  > >       * rely on latches as they wouldn't work when stdin/stdout is a file.
>  > >  > >       * Rather ugly, but it's unlikely to be worthwhile to invest much more
>  > >  > >       * effort just for the benefit of single user mode.
>  > >  > >       */
>  > >  > >      if (DoingCommandRead && whereToSendOutput != DestRemote)
>  > >  > >          ProcessInterrupts();
>  > >  > >
>  > >  > > Having this extra handling is correct for normal backends as they can
>  > >  > > connect directly to clients for reading input commands, the above if
>  > >  > > condition may become true and ProcessInterrupts() may be called to
>  > >  > > handle the SIGTERM immediately.
>  > >  > >
>  > >  > > Note that DoingCommandRead can never be true in bg workers as it's
>  > >  > > being set to true only in normal backend PostgresMain(). If we use
>  > >  > > die()(instead of custom SIGTERM handlers such as handle_sigterm()) in
>  > >  > > a bg worker/non-backend process, there are no chances that the above
>  > >  > > part of the code gets hit and the interrupts are processed
>  > >  > > immediately. And also here are the bg worker process that use die()
>  > >  > > instead of their own custom handlers: parallel workers, autoprewarm
>  > >  > > worker, autovacuum worker, logical replication launcher and apply
>  > >  > > worker, wal sender process
>  > >  > >
>  > >  > > I think we can also use die() instead of handle_sigterm() in
>  > >  > > test_shm_mq/worker.c.
>  > >  > >
>  > >  > > I attached a patch for this change.
>  > >  >
>  > >  > Thanks for the patch! It looks good to me.
>  > >  >
>  > >
>  > > Thanks.
>  >
>  > When I read the patch again, I found that, with the patch, the shutdown
>  > of worker_spi causes to report the following FATAL message.
>  >
>  >      FATAL:  terminating connection due to administrator command
>  >
>  > Isn't this message confusing because it's not a connection? If so,
>  > we need to update ProcessInterrupts() so that the proper message is
>  > reported like other bgworkers do.
>  >
> 
> This is also true for all the bgworker that use the die() handler. How about doing it the way bgworker_die() does in
ProcessInterrupts()?This would give meaningful information. Thoughts? If okay, I can make a separate patch.
 

+1
Thanks!


> 
>          else if (RecoveryConflictPending)
>          {
>              /* Currently there is only one non-retryable recovery conflict */
>              Assert(RecoveryConflictReason == PROCSIG_RECOVERY_CONFLICT_DATABASE);
>              pgstat_report_recovery_conflict(RecoveryConflictReason);
>              ereport(FATAL,
>                      (errcode(ERRCODE_DATABASE_DROPPED),
>                       errmsg("terminating connection due to conflict with recovery"),
>                       errdetail_recovery_conflict()));
>          }
> *       else if (IsBackgroundWorker)
>         {
>                 ereport(FATAL,
>                             (errcode(ERRCODE_ADMIN_SHUTDOWN),
>                              errmsg("terminating background worker \"%s\" due to administrator command",
>                              MyBgworkerEntry->bgw_type)));
>         }*
>          else
>              ereport(FATAL,
>                      (errcode(ERRCODE_ADMIN_SHUTDOWN),
>                       errmsg("terminating connection due to administrator command")));
> 
>  >
>  > BTW, though this is not directly related to this topic, it's strange to me
>  > that the normal shutdown of bgworker causes to emit FATAL message
>  > and the message "background worker ... exited with exit code 1"
>  > at the normal shutdown. I've heard sometimes that users coplained that
>  > it's confusing to report that message for logical replication launcher
>  > at the normal shutdown. Maybe the mechanism to shutdown bgworker
>  > normally seems to have not been implemented well yet.
>  >
> 
> What do you mean by normal shutdown of bgworker? Is it that bgworker has exited successfully with exit code 0 or for
somereason with exit code other than 0? Or is it when the postmaster is shutdown normally?
 
> 
> IIUC, when a bgworker exists either normally with an exit code 0 or other than 0, then CleanupBackgroundWorker() is
calledin postmaster, a message(like below) is prepared, and the LogChildExit() is called with either DEBUG1 or LOG
leveland for instance the message you specified gets printed "background worker ... exited with exit code 1". I have
notseen a FATAL message similar to "background worker ... exited with exit code 1" at the normal shutdown.
 
> 
> snprintf(namebuf, MAXPGPATH, _("background worker \"%s\""), rw->rw_worker.bgw_type);
> 
> LogChildExit(EXIT_STATUS_0(exitstatus) ? DEBUG1 : LOG, namebuf, pid, exitstatus);
> 
> Am I missing something?
> 
> If my analysis is right, then for instance, when a logical replication launcher is exited, it logs "background worker
"logicalreplication launcher" exited with exit code X" with either DEBUG1 or LOG level but not with FATAL level.
 

Yes, it's not with FATAL level. But that message looks like that it's
reporting error message. This is why we sometimes received
the complaints (e.g., [1][2]) about that message.

[1]
https://postgr.es/m/CAKFQuwZHcZnnwoFaXX2YKNT3KhaNr_+vd95Oo=f045zfn7Tetw@mail.gmail.com

[2]
https://postgr.es/m/CAEepm=1c3hG1g3iKYwfa_PDsT49RBaBJsaot_qNhPSCXBm9rzA@mail.gmail.com

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



pgsql-hackers by date:

Previous
From: Paul Förster
Date:
Subject: Re: configure and DocBook XML
Next
From: Alvaro Herrera
Date:
Subject: Re: configure and DocBook XML