Thread: [GENERAL] the database system is shutting down - terminating walsender processdue to replication timeout
[GENERAL] the database system is shutting down - terminating walsender processdue to replication timeout
Greetings everyone,
I'm looking for a bit of help understanding a particular behavior we are seeing with our PostgreSQL 9.6. After issuing a service shutdown command with "systemctl stop" on RHEL 7 our PostgreSQL instance started behaving weirdly. For the first time it wouldn't shutdown so easily / quickly.
From the logs we could see that standby nodes that were trying to connect were rejected due to database being shutdown. After wal_sender_timeout and wal_receiver_timeout (default 60s) were reached the database finally shut down. It seems that walsender process was preventing the shutdown of the master database - until timeout was reached, a behavior we didn't experience before.
Does anyone know why would this happen?
We have 1 standby node in our primary site (same subnet as master DB), and two standbys in a remote site. WAL archiving is enabled to the remote site with rsync command that worked normally during this time and generally completes within a couple of seconds - definitely less than a minute. So we kind of ruled out WAL archiving.
Would shutting down remote site standbys prevent this kind of delay, they usually have a couple of seconds lag in terms of replication (pg_xlog) location?
Here is the postgres log from master that was being shutdown.
2017-10-26 22:04:36 CDT [8763]: [1-1] "Local site standby IP"(49606) user=replication_user ,db=[unknown] FATAL: the database system is shutting down
[OMITTED - a bunch of "FATAL: the database system is shutting down" messages]
2017-10-26 22:05:00 CDT [2669]: [1-1] "Disaster recovery site standby IP" (55498) user=replication_user,db=[unknown] LOG: terminating walsender process due to replication timeout
Zarko