FATAL: terminating walreceiver process due to administrator command - Mailing list pgsql-bugs

From Maeldron T.
Subject FATAL: terminating walreceiver process due to administrator command
Date
Msg-id CAKatfSnQP4gwpGNPxT6Gg-HFL9T6yefYaiSGhx=j5mrgOGV1Rg@mail.gmail.com
Whole thread Raw
Responses Re: FATAL: terminating walreceiver process due to administratorcommand
List pgsql-bugs
Hello,

Today, I received email notifications from a server telling me the replication was lagging.

The application is monitoring the delay. In the past, it happened a few times I received this notification during a high load but not from this server.

However, the replication was not lagging. It stopped.

There was a single line in the log of the standby server:

FATAL:  terminating walreceiver process due to administrator command

I did not find anything related in the master’s log. The rest of the log was all about the slow statements.

As far I can tell you, this never happened before. The standby server was running but the replication stopped. I restarted the server after 42 minutes of that line in the log. The replication caught up in 3-5 seconds.

Only I have access to the servers.

I did not stop the replication process. I don’t even know how to do it.

There is no cron task that would do such thing. Only one application access the database. I wrote it hence I know it didn’t do it either.

I have been running the servers for years with more or less the same configuration.

As far as I see, when I see the same line in earlier logs, the database was shut down as well. This was the only lonely line like that.

Recent changes on the servers:

* On 11 January, I upgraded from 10.5 to 10.6_2

* A few days ago, set up a new server that replicates one table from the same master. This is a huge table but it’s rarely written. The replication works. It’s the same time I used logical replication. The server where the replication stopped uses async stream replication.

* When I set up the logical replication, I increased the wal_sender_timeout 

I found nothing related in the logs (/var/log/messages, /var/log/all.log, dmesg). This slave is probably the least loaded server of the group.

FreeBSD xxx 11.2-RELEASE-p8 FreeBSD 11.2-RELEASE-p8 #0: Tue Jan  8 21:35:12 UTC 2019     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

/boot/loader.conf:
# PostgreSQL
kern.ipc.semmni=256
kern.ipc.semmns=512
kern.ipc.semmnu=256

Everything else is either FreeBSD default or unrelated.

There is a lot of free memory. I don’t mean usable but free. 3GB RAM was not even touched since the last boot.

M.



pgsql-bugs by date:

Previous
From: Thomas Munro
Date:
Subject: Re: BUG #15548: Unaccent does not remove combining diacritical characters
Next
From: Petr Fedorov
Date:
Subject: 'update returning *' returns 0 columns instead of empty row with 2columns when (i) no rows updated and (ii) when applied to a partitioned tablewith sub-partition