Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested. - Mailing list pgsql-hackers

From Masahiro Ikeda
Subject Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.
Date
Msg-id c96d8989100e4bce4fa586064aa7e0e9@oss.nttdata.com
Whole thread Raw
In response to Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.  (Fujii Masao <masao.fujii@oss.nttdata.com>)
List pgsql-hackers
On 2021-03-18 13:37, Fujii Masao wrote:
> On 2021/03/18 11:59, kuroda.hayato@fujitsu.com wrote:
>> Dear Ikeda-san,
>> 
>> I confirmed new patch and no problem was found. Thanks.
>> (I'm not a native English speaker, so I cannot check your comments 
>> correctly, sorry)
> 
> One user-visible side-effect by this change is; with the patch, the 
> stats is
> cleared when only the stats collector is killed (with SIGQUIT) 
> accidentally
> and restarted by postmaster later.

Thanks for your comments.

As you said, the temporary stats files don't removed if the stats 
collector is killed with SIGQUIT.
So, if the user change the GUC parameter "stats_temp_directory" after 
immediate shutdown,
temporary stats file can't be removed forever.

But, I think this case is rarely happened and unimportant. Actually, 
pgstat_write_statsfiles()
didn't check error of unlink() and the same problem is occurred if the 
server is crashed now.
The documentation said following. I think it's enough.

```
    For better performance, <varname>stats_temp_directory</varname> can 
be
    pointed at a RAM-based file system, decreasing physical I/O 
requirements.
    When the server shuts down cleanly, a permanent copy of the 
statistics
    data is stored in the <filename>pg_stat</filename> subdirectory, so 
that
    statistics can be retained across server restarts.  When recovery is
    performed at server start (e.g., after immediate shutdown, server 
crash,
    and point-in-time recovery), all statistics counters are reset.
```


> On the other than, currently the stats is
> written in that case and subsequently-restarted stats collector can use
> that stats file. I'm not sure if we need to keep supporting this
> behavior, though.

I don't have any strong opinion this behaivor is useless too.

Since the reinitialized phase is not executed when only the stats 
collector is crashed
(since it didn't touch the shared memory), if the permanent stats file 
is exists, the
stats collector can use it. But, IIUC, the case is rare.

The case is happened by operation mistake which a operator sends the 
SIGQUIT signal to
the stats collector. Please let me know if there are more important 
case.

But, if SIGKILL is sent by operator, the stats can't be rescure now 
because the permanent stats
files can't be written before exiting. Since the case which can rescure 
the stats is rare,
I think it's ok to initialize the stats even if SIGQUIT is sent.

If to support this feature, we need to implement the following first.

> (2) As another aspect, it needs to change the behavior removing all 
> stats files because autovacuum
>    uses the stats. There are some ideas, for example writing the stats 
> files every X minutes
>    (using wal or another mechanism) and even if a crash occurs, the 
> startup process can restore
>    the stats with slightly low freshness. But, it needs to find a way 
> how to handle the stats files
>    when deleting on PITR rewind or stats collector crash happens.



> When only the stats collector exits by SIGQUIT, with the patch
> FreeWaitEventSet() is also skipped. Is this ok?

Thanks, I fixed it.


> -     * Loop to process messages until we get SIGQUIT or detect ungraceful
> -     * death of our parent postmaster.
> +     * Loop to process messages until we get SIGTERM or SIGQUIT of our 
> parent
> +     * postmaster.
> 
> "SIGTERM or SIGQUIT of our parent postmaster" should be
> "SIGTERM, SIGQUIT, or detect ungraceful death of our parent 
> postmaster"?

Yes, I fixed it.


> +SignalHandlerForUnsafeExit(SIGNAL_ARGS)
> 
> I don't think SignalHandlerForUnsafeExit is good name. Because that's 
> not
> "unsafe" exit. No? Even after this signal handler is triggered, the 
> server is
> still running normally and a process that exits will be restarted 
> later. What
> about SignalHandlerForNonCrashExit or SignalHandlerForNonFatalExit?

OK, I fixed.
I changed to the SignalPgstatHandlerForNonCrashExit() to add 
FreeWaitEventSet()
in the handler for the stats collector.


Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment

pgsql-hackers by date:

Previous
From: "iwata.aya@fujitsu.com"
Date:
Subject: RE: libpq debug log
Next
From: Thomas Munro
Date:
Subject: Re: fdatasync performance problem with large number of DB files