Hello,
Today, I received email notifications from a server telling me the replication was lagging.
The application is monitoring the delay. In the past, it happened a few times I received this notification during a high load but not from this server.
However, the replication was not lagging. It stopped.
There was a single line in the log of the standby server:
FATAL: terminating walreceiver process due to administrator command
I did not find anything related in the master’s log. The rest of the log was all about the slow statements.
As far I can tell you, this never happened before. The standby server was running but the replication stopped. I restarted the server after 42 minutes of that line in the log. The replication caught up in 3-5 seconds.
Only I have access to the servers.
I did not stop the replication process. I don’t even know how to do it.
There is no cron task that would do such thing. Only one application access the database. I wrote it hence I know it didn’t do it either.
I have been running the servers for years with more or less the same configuration.
As far as I see, when I see the same line in earlier logs, the database was shut down as well. This was the only lonely line like that.
Recent changes on the servers:
* On 11 January, I upgraded from 10.5 to 10.6_2
* A few days ago, set up a new server that replicates one table from the same master. This is a huge table but it’s rarely written. The replication works. It’s the same time I used logical replication. The server where the replication stopped uses async stream replication.
* When I set up the logical replication, I increased the wal_sender_timeout
I found nothing related in the logs (/var/log/messages, /var/log/all.log, dmesg). This slave is probably the least loaded server of the group.
FreeBSD xxx 11.2-RELEASE-p8 FreeBSD 11.2-RELEASE-p8 #0: Tue Jan 8 21:35:12 UTC 2019 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64
/boot/loader.conf:
# PostgreSQL
kern.ipc.semmni=256
kern.ipc.semmns=512
kern.ipc.semmnu=256
Everything else is either FreeBSD default or unrelated.
There is a lot of free memory. I don’t mean usable but free. 3GB RAM was not even touched since the last boot.
M.