On 2021/03/25 9:31, Masahiro Ikeda wrote:
>
>
> On 2021/03/24 18:36, Fujii Masao wrote:
>>
>>
>> On 2021/03/24 3:51, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2021-03-23 15:50:46 +0900, Fujii Masao wrote:
>>>> This fact makes me wonder that if we collect the statistics about WAL writing
>>>> from walreceiver as we discussed in other thread, the stats collector should
>>>> be invoked at more earlier stage. IIUC walreceiver can be invoked before
>>>> PMSIGNAL_BEGIN_HOT_STANDBY is sent.
>>>
>>> FWIW, in the shared memory stats patch the stats subsystem is
>>> initialized early on by the startup process.
>>
>> This is good news!
>
> Fujii-san, Andres-san,
> Thanks for your comments!
>
> I didn't think about the start order. From the point of view, I noticed that
> the current source code has two other concerns.
>
>
> 1. This problem is not only for the wal receiver.
>
> The problem which the wal receiver starts before the stats collector
> is launched during archive recovery is not only for the the wal receiver but
> also the checkpointer and the bgwriter. Before starting redo, the startup
> process sends the postmaster "PMSIGNAL_RECOVERY_STARTED" signal to launch the
> checkpointer and the bgwriter to be able to perform creating restartpoint.
>
> Although the socket for communication between the stats collector and the
> other processes is made in earlier stage via pgstat_init(), I agree to make
> the stats collector starts earlier stage is defensive. BTW, in my
> environments(linux, net.core.rmem_default = 212992), the socket can buffer
> almost 300 WAL stats messages. This mean that, as you said, if the redo phase
> is too long, it can lost the messages easily.
>
>
> 2. To make the stats clear in redo phase.
>
> The statistics can be reset after the wal receiver, the checkpointer and
> the wal writer are started in redo phase. So, it's not enough the stats
> collector is invoked at more earlier stage. We need to fix it.
>
>
>
> (I hope I am not missing something.)
> Thanks to Andres-san's work([1]), the above problems will be handle in the
> shared memory stats patch. First problem will be resolved since the stats are
> collected in shared memory, so the stats collector process is unnecessary
> itself. Second problem will be resolved to remove the reset code because the
> temporary stats file won't generated, and if the permanent stats file
> corrupted, just recreate it.
Yes. So we should wait for the shared memory stats patch to be committed
before working on walreceiver stats patch more?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION