Re: PostgreSQL 10.5 : Logical replication timeout results in PANIC inpg_wal "No space left on device" - Mailing list pgsql-admin

From Achilleas Mantzios
Subject Re: PostgreSQL 10.5 : Logical replication timeout results in PANIC inpg_wal "No space left on device"
Date
Msg-id 48415b2a-ea0f-cf5e-1145-17e8797a6e79@matrix.gatewaynet.com
Whole thread Raw
In response to Re: PostgreSQL 10.5 : Logical replication timeout results in PANIC inpg_wal "No space left on device"  (Rui DeSousa <rui@crazybean.net>)
Responses Re: PostgreSQL 10.5 : Logical replication timeout results in PANIC inpg_wal "No space left on device"  (Rui DeSousa <rui@crazybean.net>)
List pgsql-admin
On 13/11/18 5:35 μ.μ., Rui DeSousa wrote:


On Nov 13, 2018, at 7:00 AM, Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:

Is there a way for the WAL receiver to not have detected the termination of the replication stream?

The teardown of the network socket on the upstream server should send a reset packet to the downstream server and at that point the WAL receiver would close its connection.  Is there any firewalls, router, rules, etc between the nodes that could have dropped the packet?

No



Shouldn't normally the WAL receiver detect this and try again in wal_retrieve_retry_interval ?

Not really… if the connection has already been torn down; the upstream server would send another reset packet on the next request and in this case it would.  However, if request packets at not reaching the upstream server; i.e. due to firewall silently dropping the packets (personally I believe firewall should always set reset packets to friendly hosts) then what happens is the TCP/IP send queue builds up with the requests packets instead — a t this point waiting on the OS to terminate the connection which can day or two depending on your TCP/IP setting.


Again no dropping, no firewall.

What you want to use instead is wal_receiver_timeout to detect the given case where upstream server either no longer exists or the firewall, etc is silently dropping packets.

Once again from my original message :
"while setting up logical replication since August we had seen early on the need to increase max_receiver_timeout and max_sender_timeout from 60sec to 5mins"

So with wal_receiver_timeout='5 min', the receiver never detected any timeout.





-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

pgsql-admin by date:

Previous
From: Rui DeSousa
Date:
Subject: Re: hot_standby_feedback parameter doesn't work
Next
From: SAS
Date:
Subject: Re: Ora2pg Not Getting Installed- Please Provide Inputs