Greetings,
I also would like to thank everyone for looking into this.
On Sat, Jan 26, 2019 at 01:45:46PM +0100, Magnus Hagander wrote:
> One workaround you could perhaps look at here is to run pg_basebackup
> with --no-sync. That way there will be no fsyncs issued while running. You
> will then of course have to take care of syncing all the files to disk
> after it's done, but a network filesystem might be happier in dealing with
> a large "batch-sync" like that rather than piece-by-piece sync.
Thanks for the pointer. I actually was not aware of the existence of this flag. I've ran two rounds of tests with --no-sync and backup failed at a much later point in time, which suggests that the bottleneck is in fact the metadata server of ceph. We're now looking into ways of improving this. (This is a 15TB cluster with a few hundred thousands tables which on average generates 4 WAL segments per second, so throttling transfer rate is not a good option either).
On Sat, Jan 26, 2019 at 4:23 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> The docs could be improved to describe that better..
I had an off-list discussion of a possible documentation update with Stephen Frost and he voiced an opinion that the behaviour I was trying to describe sounds a lot like a bug and documenting that is not a good practice.
Upon further examination of WalSndKeepaliveIfNecessary I found out that the implementation of "requesting an immediate reply" is done by setting the socket into non-blocking mode and issuing a flush. I find it hard to believe there is a scenario where client can react to that keep-alive on time (unless of course I misunderstood something). So the question is, will we ever wait the actual wal_sender_timeout before terminating the connection?
Regards,
Nick.