Re: Logical replication timeout problem - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Logical replication timeout problem
Date
Msg-id CAA4eK1JYB2eLNTs48szQ8oJXW9GLcrxT0yWETQS0gF2sVBxMYA@mail.gmail.com
Whole thread Raw
In response to Re: Logical replication timeout problem  (Fabrice Chapuis <fabrice636861@gmail.com>)
Responses Re: Logical replication timeout problem
List pgsql-hackers
On Tue, Sep 21, 2021 at 1:52 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
>
> If I understand, the instruction to send keep alive by the wal sender has not been reached in the for loop, for what
reason?
> ...
> * Check for replication timeout. */
>   WalSndCheckTimeOut();
>
> /* Send keepalive if the time has come */
>   WalSndKeepaliveIfNecessary();
> ...
>

Are you sure that these functions have not been called? Or the case is
that these are called but due to some reason the keep-alive is not
sent? IIUC, these are called after processing each WAL record so not
sure how is it possible in your case that these are not reached?

> The data load is performed on a table which is not replicated, I do not understand why the whole transaction linked
toan insert is copied to snap files given that table does not take part of the logical replication.
 
>

It is because we don't know till the end of the transaction (where we
start sending the data) whether the table will be replicated or not. I
think specifically for this purpose the new 'streaming' feature
introduced in PG-14 will help us to avoid writing data of such tables
to snap/spill files. See 'streaming' option in Create Subscription
docs [1].

> We are going to do a test by modifying parameters wal_sender_timeout/wal_receiver_timeout from 1' to 5'. The problem
isthat these parameters are global and changing them will also impact the physical replication.
 
>

Do you mean you are planning to change from 1 minute to 5 minutes? I
agree with the global nature of parameters and I think your approach
to finding out the root cause is good here because otherwise, under
some similar or more heavy workload, it might lead to the same
situation.

> Concerning the walsender timeout, when the worker is started again after a timeout, it will trigger a new walsender
associatedwith it.
 
>

Right, I know that but I was curious to know if the walsender has
exited before walreceiver.

[1] - https://www.postgresql.org/docs/devel/sql-createsubscription.html

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: row filtering for logical replication
Next
From: Amit Kapila
Date:
Subject: Re: POC: Cleaning up orphaned files using undo logs