Hi,
Recently I am having some strange problems with pg_basebackup. About once a week the backup process ends with an error message like this:
2020-02-11 23:25:40,447 INFO pg_basebackup: could not read COPY data: server closed the connection unexpectedly
2020-02-11 23:25:40,447 INFO This probably means the server terminated abnormally
2020-02-11 23:25:40,447 INFO before or while processing the request.
2020-02-11 23:25:40,447 ERROR Error creating basebackup! RC: 1
On the database side the logs show the same error:
2020-02-11 23:25:40 UTC [25790]: [1-1] user=replicator,db=[unknown] LOG: could not send data to client: Connection reset by peer
2020-02-11 23:25:40 UTC [25790]: [2-1] user=replicator,db=[unknown] ERROR: base backup could not send data, aborting backup
2020-02-11 23:25:40 UTC [25790]: [3-1] user=replicator,db=[unknown] LOG: could not send data to client: Broken pipe
2020-02-11 23:25:40 UTC [25790]: [4-1] user=replicator,db=[unknown] FATAL: connection to client lost
2020-02-11 23:25:40 UTC [29824]: [1-1] user=replicator,db=[unknown] LOG: unexpected EOF on standby connection
The problem started occurring after a hardware (RAM + SSD) upgrade and an OS Upgrade to Ubuntu 18.04. Both the server and backup process run in separate docker containers on the same machine. This happens randomly on multiple servers with the same configuration and it is probably not hardware related. Also, this happens evenly on 9.4 and 9.6, and using the same docker images that worked flawlessly on the previous installation.
I have been investigating the issue for at least a month and found no problems in any log or metric before or after the event. I suspect that this is related to some OS/docker parameter that is not well configured.
Would increasing the database log level give me any more info about what caused the connection to close?
Regards,
Mladen Marinović