On 2021/03/24 18:36, Fujii Masao wrote:
>
>
> On 2021/03/24 3:51, Andres Freund wrote:
>> Hi,
>>
>> On 2021-03-23 15:50:46 +0900, Fujii Masao wrote:
>>> This fact makes me wonder that if we collect the statistics about WAL writing
>>> from walreceiver as we discussed in other thread, the stats collector should
>>> be invoked at more earlier stage. IIUC walreceiver can be invoked before
>>> PMSIGNAL_BEGIN_HOT_STANDBY is sent.
>>
>> FWIW, in the shared memory stats patch the stats subsystem is
>> initialized early on by the startup process.
>
> This is good news!
Fujii-san, Andres-san,
Thanks for your comments!
I didn't think about the start order. From the point of view, I noticed that
the current source code has two other concerns.
1. This problem is not only for the wal receiver.
The problem which the wal receiver starts before the stats collector
is launched during archive recovery is not only for the the wal receiver but
also the checkpointer and the bgwriter. Before starting redo, the startup
process sends the postmaster "PMSIGNAL_RECOVERY_STARTED" signal to launch the
checkpointer and the bgwriter to be able to perform creating restartpoint.
Although the socket for communication between the stats collector and the
other processes is made in earlier stage via pgstat_init(), I agree to make
the stats collector starts earlier stage is defensive. BTW, in my
environments(linux, net.core.rmem_default = 212992), the socket can buffer
almost 300 WAL stats messages. This mean that, as you said, if the redo phase
is too long, it can lost the messages easily.
2. To make the stats clear in redo phase.
The statistics can be reset after the wal receiver, the checkpointer and
the wal writer are started in redo phase. So, it's not enough the stats
collector is invoked at more earlier stage. We need to fix it.
(I hope I am not missing something.)
Thanks to Andres-san's work([1]), the above problems will be handle in the
shared memory stats patch. First problem will be resolved since the stats are
collected in shared memory, so the stats collector process is unnecessary
itself. Second problem will be resolved to remove the reset code because the
temporary stats file won't generated, and if the permanent stats file
corrupted, just recreate it.
[1]
https://github.com/anarazel/postgres/compare/master...shmstat-before-split-2021-03-22
Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION