On Tue, Sep 21, 2021 at 1:52 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
>
> If I understand, the instruction to send keep alive by the wal sender has not been reached in the for loop, for what
reason?
> ...
> * Check for replication timeout. */
> WalSndCheckTimeOut();
>
> /* Send keepalive if the time has come */
> WalSndKeepaliveIfNecessary();
> ...
>
Are you sure that these functions have not been called? Or the case is
that these are called but due to some reason the keep-alive is not
sent? IIUC, these are called after processing each WAL record so not
sure how is it possible in your case that these are not reached?
> The data load is performed on a table which is not replicated, I do not understand why the whole transaction linked
toan insert is copied to snap files given that table does not take part of the logical replication.
>
It is because we don't know till the end of the transaction (where we
start sending the data) whether the table will be replicated or not. I
think specifically for this purpose the new 'streaming' feature
introduced in PG-14 will help us to avoid writing data of such tables
to snap/spill files. See 'streaming' option in Create Subscription
docs [1].
> We are going to do a test by modifying parameters wal_sender_timeout/wal_receiver_timeout from 1' to 5'. The problem
isthat these parameters are global and changing them will also impact the physical replication.
>
Do you mean you are planning to change from 1 minute to 5 minutes? I
agree with the global nature of parameters and I think your approach
to finding out the root cause is good here because otherwise, under
some similar or more heavy workload, it might lead to the same
situation.
> Concerning the walsender timeout, when the worker is started again after a timeout, it will trigger a new walsender
associatedwith it.
>
Right, I know that but I was curious to know if the walsender has
exited before walreceiver.
[1] - https://www.postgresql.org/docs/devel/sql-createsubscription.html
--
With Regards,
Amit Kapila.