Ninad Shah <nshah.postgres@gmail.com> writes:
> What I observed is that it takes a couple of hours between below 2 lines.
> 115454656/1304172127 kB (8%), 0/1 tablespace
> (...atastaging/base/115868/154220.2)
> pgbasebackup: could not read COPY data: could not receive data from server:
> Connection timed out
We have heard reports of network connections dropping while pg_basebackup
is busy doing something disk-intensive such as fsync'ing. The apparent
2-hour delay here does not mean that pg_basebackup was out to lunch for
2 hours; more likely that reflects the TCP timeout delay before the kernel
realizes that the connection is lost. The actual blame probably resides
with some firewall or router that has a short timeout for idle
connections.
I'd try turning on fairly aggressive TCP keepalive settings for the
connection, say keepalives_idle=30 or so.
regards, tom lane