On Thu, Jan 3, 2019 at 3:46 PM Stephen Frost <sfrost@snowman.net> wrote:
Greetings Chuck,
* Chuck Martin (clmartin@theombudsman.com) wrote: > Using iperf, the transfer speed between the two servers (from the main to > the standby) was 938 Mbits/sec. If I understand the units correctly, it is > close to what it can be.
That does look like the rate it should be going at, but it should only take about 2 hours to copy 750GB at that rate.
That’s what I was expecting.
How much WAL does this system generate though...? If you're generating a very large amount then it's possible the WAL streaming is actually clogging up the network and causing the rate of copy on the data files to be quite slow. You'd have to be generating quite a bit of WAL though.
It shouldn’t be excessive, but I’ll look closely at that.
> Your earlier suggestion was to do the pg_basebackup locally and rsync it > over. Maybe that would be faster. At this point, it is saying it is 6% > through, over 24 hours after being started.
For building out a replica, I'd tend to use my backups anyway instead of using pg_basebackup. Provided you have good backups and reasonable WAL retention, restoring a backup and then letting it replay WAL from the archive until it can catch up with the primary works very well. If you have a very high rate of WAL then you might consider taking a full backup and then taking an incremental backup (which is much faster, and reduces the amount of WAL required to be only that needed for the length of time that the incremental backup is started until the replica has caught up to WAL that the primary has).
There's a few different backup tools out there which can do parallel backup and in-transit compression, which loads up the primary's CPUs with process doing compression but should reduce the overall time if the bottleneck is the network.