Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested. - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.
Date
Msg-id 633cbcc3-08cf-2560-e5cf-91f567282dbc@oss.nttdata.com
Whole thread Raw
In response to Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.  (Andres Freund <andres@anarazel.de>)
Responses Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.  (Masahiro Ikeda <ikedamsh@oss.nttdata.com>)
Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers

On 2021/03/25 9:58, Andres Freund wrote:
> Hi,
> 
> On 2021-03-24 18:36:14 +0900, Fujii Masao wrote:
>> Barring any objection, I'm thinking to commit this patch.
> 
> To which branches?

I was thinking to push the patch to the master branch
because this is not a bug fix.

> I am *strongly* opposed to backpatching this.

+1

> This strikes me as a quite a misleading function name.

Yeah, better name is always welcome :)

> Outside of very
> narrow circumstances a normal exit shouldn't use _exit(1). Neither 1 no
> _exit() (as opposed to exit()) seem appropriate. This seems like a bad
> idea.

So you're thinking that the stats collector should do proc_exit(0)
or something even when it receives immediate shutdown request?

One idea to avoid using _exit(1) is to change the SIGQUIT handler
so that it just sets the flag. Then if the stats collector detects that
the flag is set in the main loop, it gets out of the loop,
skips writing the permanent stats file and then exits with exit(0).
That is, normal and immediate shutdown requests are treated
almost the same way in the stats collector. Only the difference of
them is whether it saves the stats to the file or not. Thought?

> Also, won't this lead to postmaster now starting to complain about
> pgstat having crashed in an immediate shutdown? Right now only exit
> status 0 is expected:
> 
>         if (pid == PgStatPID)
>         {
>             PgStatPID = 0;
>             if (!EXIT_STATUS_0(exitstatus))
>                 LogChildExit(LOG, _("statistics collector process"),
>                              pid, exitstatus);
>             if (pmState == PM_RUN || pmState == PM_HOT_STANDBY)
>                 PgStatPID = pgstat_start();
>             continue;
>         }

No. In immediate shutdown case, pmdie() sets "Shutdown" variable to
ImmdiateShutdown and HandleChildCrash() doesn't complain that in that case
because of the following.

    /*
     * We only log messages and send signals if this is the first process
     * crash and we're not doing an immediate shutdown; otherwise, we're only
     * here to update postmaster's idea of live processes.  If we have already
     * signaled children, nonzero exit status is to be expected, so don't
     * clutter log.
     */
    take_action = !FatalError && Shutdown != ImmediateShutdown;

>> + * The statistics collector is started by the postmaster as soon as the
>> + * startup subprocess finishes, or as soon as the postmaster is ready
>> + * to accept read only connections during archive recovery.  It remains
>> + * alive until the postmaster commands it to terminate.  Normal
>> + * termination is by SIGUSR2 after the checkpointer must exit(0),
>> + * which instructs the statistics collector to save the final statistics
>> + * to reuse at next startup and then exit(0).
> 
> I can't parse "...after the checkpointer must exit(0), which instructs..."

What about changing that to the following?

------------------------
Normal termination is requested after checkpointer exits. It's by SIGUSR2,
which instructs the statistics collector to save the final statistics and
then exit(0).
------------------------

>> + * Emergency termination is by SIGQUIT; like any backend, the statistics
>> + * collector will exit quickly without saving the final statistics.  It's
>> + * ok because the startup process will remove all statistics at next
>> + * startup after emergency termination.
> 
> Normally there won't be any stats to remove, no? The permanent stats
> file has been removed when the stats collector started up.

Yes. In normal case, when the stats collector starts up, it loads the stats
from the file and remove the file. OTOH, when the recovery is necessary,
instead the startup process calls pgstat_reset_all() and removes the stats files.

You're thinking that the above comments are confusing?
If so, what about the following?

----------------------
Emergency termination is by SIGQUIT; the statistics collector
will exit quickly without saving the final statistics. In this case
the statistics files don't need to be written because the next startup
will trigger a crash recovery and remove all statistics files forcibly
even if they exist.
----------------------

Regards,


-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



pgsql-hackers by date:

Previous
From: "Daniel Verite"
Date:
Subject: Re: insensitive collations
Next
From: David Steele
Date:
Subject: Re: [PATCH 1/1] Initial mach based shared memory support.