Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested. - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.
Date
Msg-id 3d721f7d-3c8f-6fc2-d0a9-94c299d6a01b@oss.nttdata.com
Whole thread Raw
In response to Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.  (Masahiro Ikeda <ikedamsh@oss.nttdata.com>)
Responses Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.  (Masahiro Ikeda <ikedamsh@oss.nttdata.com>)
List pgsql-hackers

On 2021/03/26 9:25, Masahiro Ikeda wrote:
> 
> On 2021/03/25 21:23, Fujii Masao wrote:
>> On 2021/03/25 9:58, Andres Freund wrote:
>>> Outside of very
>>> narrow circumstances a normal exit shouldn't use _exit(1). Neither 1 no
>>> _exit() (as opposed to exit()) seem appropriate. This seems like a bad
>>> idea.
>>
>> So you're thinking that the stats collector should do proc_exit(0)
>> or something even when it receives immediate shutdown request?
>>
>> One idea to avoid using _exit(1) is to change the SIGQUIT handler
>> so that it just sets the flag. Then if the stats collector detects that
>> the flag is set in the main loop, it gets out of the loop,
>> skips writing the permanent stats file and then exits with exit(0).
>> That is, normal and immediate shutdown requests are treated
>> almost the same way in the stats collector. Only the difference of
>> them is whether it saves the stats to the file or not. Thought?
> 
> Although I don't oppose the idea to change from _exit(1) to proc_exit(0), the
> reason why I used _exit(1) is that I thought the behavior to skip writing the
> stat counters is not normal exit. Actually, other background processes use
> _exit(2) instead of _exit(0) or proc_exit(0) in immediate shutdown although
> the status code is different because they touch shared memory.

True.


> 
> 
>>> Also, won't this lead to postmaster now starting to complain about
>>> pgstat having crashed in an immediate shutdown? Right now only exit
>>> status 0 is expected:
>>>
>>>          if (pid == PgStatPID)
>>>          {
>>>              PgStatPID = 0;
>>>              if (!EXIT_STATUS_0(exitstatus))
>>>                  LogChildExit(LOG, _("statistics collector process"),
>>>                               pid, exitstatus);
>>>              if (pmState == PM_RUN || pmState == PM_HOT_STANDBY)
>>>                  PgStatPID = pgstat_start();
>>>              continue;
>>>          }
>>
>> No. In immediate shutdown case, pmdie() sets "Shutdown" variable to
>> ImmdiateShutdown and HandleChildCrash() doesn't complain that in that case
>> because of the following.
>>
>>      /*
>>       * We only log messages and send signals if this is the first process
>>       * crash and we're not doing an immediate shutdown; otherwise, we're only
>>       * here to update postmaster's idea of live processes.  If we have already
>>       * signaled children, nonzero exit status is to be expected, so don't
>>       * clutter log.
>>       */
>>      take_action = !FatalError && Shutdown != ImmediateShutdown;
> 
> IIUC, in immediate shutdown case (and other cases too?), HandleChildCrash() is
> never invoked due to the exit of pgstat. My understanding of Andres-san's
> comment is that is it ok to show like the following log message?
> 
> ```
> LOG:  statistics collector process (PID 64991) exited with exit code 1
> ```
> 
> Surely, other processes don't output the log like it. So, I agree to suppress
> the log message.

Yes. I was mistakenly thinking that HandleChildCrash() was called
even when the stats collector exits with non-zero code, like other processes.
But that's not true.

How should we do this? HandleChildCrash() calls LogChildExit()
when FatalError = false and Shutdown != ImmediateShutdown.
We should use the same conditions for the stats collector as follows?

                 if (pid == PgStatPID)
                 {
                         PgStatPID = 0;
-                       if (!EXIT_STATUS_0(exitstatus))
+                       if (!EXIT_STATUS_0(exitstatus) && !FatalError &&
+                               Shutdown != ImmediateShutdown)
                                 LogChildExit(LOG, _("statistics collector process"),
                                                          pid, exitstatus);

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: DETAIL for wrong scram password
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: wal stats questions