Home > mailing lists

walsender timeout on logical replication set - Mailing list pgsql-hackers

From	Kyotaro Horiguchi
Subject	walsender timeout on logical replication set
Date	September 13, 2021 04:31:07
Msg-id	20210913.103107.813489310351696839.horikyota.ntt@gmail.com Whole thread Raw
Responses	Re: walsender timeout on logical replication set
List	pgsql-hackers

Tree view

Hello.

As reported in [1] it seems that walsender can suffer timeout in
certain cases.  It is not clearly confirmed, but I suspect that
there's the case where LogicalRepApplyLoop keeps running the innermost
loop without receiving keepalive packet for longer than
wal_sender_timeout (not wal_receiver_timeout).  Of course that can be
resolved by giving sufficient processing power to the subscriber if
not. But if that happens between the servers with the equal processing
power, it is reasonable to "fix" this.  Theoretically I think this can
happen with equally-powered servers if the connecting network is
sufficiently fast.  Because sending reordered changes is relatively
simple and fast than apllying the changes on subscriber.

I think we don't want to call GetCurrentTimestamp every iteration of
the innermost loop.  Even if we call it every N iterations, I don't
come up with a proper N that fits any workload. So one possible
solution would be using slgalrm.  Is it worth doing?  Or is there any
other way?

Even if we won't fix this, we might need to add a description about
this restriciton in the documentation?

Any thougths?

[1] https://www.postgresql.org/message-id/CAEDsCzhBtkNDLM46_fo_HirFYE2Mb3ucbZrYqG59ocWqWy7-xA%40mail.gmail.com

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

pgsql-hackers by date:

From: Noah Misch
Date: 13 September 2021, 04:26:33
Subject: Re: Remove redundant initializations

From: Kyotaro Horiguchi
Date: 13 September 2021, 05:00:04
Subject: Re: corruption of WAL page header is never reported

walsender timeout on logical replication set - Mailing list pgsql-hackers

Previous

Next