On Tue, Jan 31, 2023 at 4:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> Thanks, the patch looks good to me. I have slightly adjusted one of
> the comments and ran pgindent. See attached. As mentioned in the
> commit message, we shouldn't backpatch this as this requires a new
> callback and moreover, users can increase the wal_sender_timeout and
> wal_receiver_timeout to avoid this problem. What do you think?
The callback and the implementation is all in core. What's the risk
you see in backpatching it?
Customers can adjust the timeouts, but only after the receiver has
timed out a few times. Replication remains broekn till they notice it
and adjust timeouts. By that time WAL has piled up. It also takes a
few attempts to increase timeouts since the time taken by a
transaction to decode can not be estimated beforehand. All that makes
it worth back-patching if it's possible. We had a customer who piled
up GBs of WAL before realising that this is the problem. Their system
almost came to a halt due to that.
--
Best Wishes,
Ashutosh Bapat