On Sat, Jan 26, 2019 at 1:35 PM Nick B <nbedxp@gmail.com> wrote:
On Sat, Jan 26, 2019 at 4:23 AM Michael Paquier <michael@paquier.xyz> wrote: > These are a bit unregular. Which files are taking that long to > complete while others are way faster? It may be something that we > could improve on the base backup side as there is no actual point in > syncing segments while the backup is running and we could delay that > at the end of the backup (if I recall that stuff correctly).
I don't have a good sample for these. One instance of this happening is below: .... 0.000125 fsync(7) = 0 <0.016677> 0.000039 fsync(7) = 0 <0.000005> # 2048 writes for total of 16777216 bytes (16MB) 0.000618 write(7, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 <0.000021> 0.000078 fsync(8) = 0 <57.609720> 57.609830 fsync(8) = 0 <0.000007>
Again, it is a problem with our network file system that we are still investigating. I'm not sure this can be improved easily, since pg_basebackup shares this code with walreceiver.
One workaround you could perhaps look at here is to run pg_basebackup with --no-sync. That way there will be no fsyncs issued while running. You will then of course have to take care of syncing all the files to disk after it's done, but a network filesystem might be happier in dealing with a large "batch-sync" like that rather than piece-by-piece sync.
(yes, I realize that wasn't your original question, just wanted to make sure it was a workaround you had considered)