On Tue, Sep 21, 2021 at 9:12 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
>
> > IIUC, these are called after processing each WAL record so not
> sure how is it possible in your case that these are not reached?
>
> I don't know, as you say, to highlight the problem we would have to debug the WalSndKeepaliveIfNecessary function
>
> > I was curious to know if the walsender has exited before walreceiver
>
> During the last tests we made we didn't observe any timeout of the wal sender process.
>
> > Do you mean you are planning to change from 1 minute to 5 minutes?
>
> We set wal_sender_timeout/wal_receiver_timeout to 5' and launch new test. The result is surprising and rather
positivethere is no timeout any more in the log and the 20Gb of snap files are removed in less than 5 minutes.
> How to explain that behaviour, why the snap files are consumed suddenly so quickly.
>
I think it is because we decide that the data in those snap files
doesn't need to be sent at xact end, so we remove them.
> I choose the value arbitrarily for wal_sender_timeout/wal_receiver_timeout parameters, are theses values appropriate
fromyour point of view?
>
It is difficult to say what is the appropriate value for these
parameters unless in some way we debug WalSndKeepaliveIfNecessary() to
find why it didn't send keep alive when it is expected. Would you be
able to make code changes and test or if you want I can make changes
and send the patch if you can test it? If not, is it possible that in
some way you send a reproducible test?
--
With Regards,
Amit Kapila.