Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested. - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.
Date
Msg-id 5289df2d-acce-ca30-9a5e-ab75f621cc29@oss.nttdata.com
Whole thread Raw
In response to Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.  (Masahiro Ikeda <ikedamsh@oss.nttdata.com>)
Responses Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.  (Masahiro Ikeda <ikedamsh@oss.nttdata.com>)
List pgsql-hackers

On 2021/03/25 9:31, Masahiro Ikeda wrote:
> 
> 
> On 2021/03/24 18:36, Fujii Masao wrote:
>>
>>
>> On 2021/03/24 3:51, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2021-03-23 15:50:46 +0900, Fujii Masao wrote:
>>>> This fact makes me wonder that if we collect the statistics about WAL writing
>>>> from walreceiver as we discussed in other thread, the stats collector should
>>>> be invoked at more earlier stage. IIUC walreceiver can be invoked before
>>>> PMSIGNAL_BEGIN_HOT_STANDBY is sent.
>>>
>>> FWIW, in the shared memory stats patch the stats subsystem is
>>> initialized early on by the startup process.
>>
>> This is good news!
> 
> Fujii-san, Andres-san,
> Thanks for your comments!
> 
> I didn't think about the start order. From the point of view, I noticed that
> the current source code has two other concerns.
> 
> 
> 1. This problem is not only for the wal receiver.
> 
> The problem which the wal receiver starts before the stats collector
> is launched during archive recovery is not only for the the wal receiver but
> also the checkpointer and the bgwriter. Before starting redo, the startup
> process sends the postmaster "PMSIGNAL_RECOVERY_STARTED" signal to launch the
> checkpointer and the bgwriter to be able to perform creating restartpoint.
> 
> Although the socket for communication between the stats collector and the
> other processes is made in earlier stage via pgstat_init(), I agree to make
> the stats collector starts earlier stage is defensive. BTW, in my
> environments(linux, net.core.rmem_default = 212992), the socket can buffer
> almost 300 WAL stats messages. This mean that, as you said, if the redo phase
> is too long, it can lost the messages easily.
> 
> 
> 2. To make the stats clear in redo phase.
> 
> The statistics can be reset after the wal receiver, the checkpointer and
> the wal writer are started in redo phase. So, it's not enough the stats
> collector is invoked at more earlier stage. We need to fix it.
> 
> 
> 
> (I hope I am not missing something.)
> Thanks to Andres-san's work([1]), the above problems will be handle in the
> shared memory stats patch. First problem will be resolved since the stats are
> collected in shared memory, so the stats collector process is unnecessary
> itself. Second problem will be resolved to remove the reset code because the
> temporary stats file won't generated, and if the permanent stats file
> corrupted, just recreate it.

Yes. So we should wait for the shared memory stats patch to be committed
before working on walreceiver stats patch more?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: wal stats questions
Next
From: Amul Sul
Date:
Subject: Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb