walsender timeout on logical replication set - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject walsender timeout on logical replication set
Date
Msg-id 20210913.103107.813489310351696839.horikyota.ntt@gmail.com
Whole thread Raw
Responses Re: walsender timeout on logical replication set
List pgsql-hackers
Hello.

As reported in [1] it seems that walsender can suffer timeout in
certain cases.  It is not clearly confirmed, but I suspect that
there's the case where LogicalRepApplyLoop keeps running the innermost
loop without receiving keepalive packet for longer than
wal_sender_timeout (not wal_receiver_timeout).  Of course that can be
resolved by giving sufficient processing power to the subscriber if
not. But if that happens between the servers with the equal processing
power, it is reasonable to "fix" this.  Theoretically I think this can
happen with equally-powered servers if the connecting network is
sufficiently fast.  Because sending reordered changes is relatively
simple and fast than apllying the changes on subscriber.

I think we don't want to call GetCurrentTimestamp every iteration of
the innermost loop.  Even if we call it every N iterations, I don't
come up with a proper N that fits any workload. So one possible
solution would be using slgalrm.  Is it worth doing?  Or is there any
other way?

Even if we won't fix this, we might need to add a description about
this restriciton in the documentation?

Any thougths?

[1] https://www.postgresql.org/message-id/CAEDsCzhBtkNDLM46_fo_HirFYE2Mb3ucbZrYqG59ocWqWy7-xA%40mail.gmail.com

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Remove redundant initializations
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: corruption of WAL page header is never reported