Thread: getting pg_basebackup to use remote destination

getting pg_basebackup to use remote destination

From
Chuck Martin
Date:
I thought I knew how to do this, but I apparently don't. I have to set up a new server as a standby for a PG 11.1 server. The main server has a lot more resources than the standby. What I want to do is run pg_basebackup on the main server with the output going to the data directory on the new server. But when I give this command:

pg_basebackup -D "ssh root@10.0.1.16:/mnt/dbraid/data" -P -v -X s


it instead writes to my root drive which doesn't have the space, so it fails and deletes the partial backup. 


While I think I could figure out how to backup to a local directory then rsync it to the new server, I'd like to avoid that due to the 750GB size. 


Is there a way to tell pg_basebackup to use a remote destination for output? Or do I have to run pg_basebackup on the standby server?


And while I'm asking, has anyone yet written a guide/tutorial for PG 11 replication? Everything I find online is very old.


Chuck Martin
Avondale Software

Re: getting pg_basebackup to use remote destination

From
Adrian Klaver
Date:
On 12/29/18 11:04 AM, Chuck Martin wrote:
> I thought I knew how to do this, but I apparently don't. I have to set 
> up a new server as a standby for a PG 11.1 server. The main server has a 
> lot more resources than the standby. What I want to do is run 
> pg_basebackup on the main server with the output going to the data 
> directory on the new server. But when I give this command:
> 
> pg_basebackup -D "ssh root@10.0.1.16:/mnt/dbraid/data" -P -v -X s
> 
> 
> it instead writes to my root drive which doesn't have the space, so it 
> fails and deletes the partial backup.
> 
> 
> While I think I could figure out how to backup to a local directory then 
> rsync it to the new server, I'd like to avoid that due to the 750GB size.
> 
> 
> Is there a way to tell pg_basebackup to use a remote destination for 
> output? Or do I have to run pg_basebackup on the standby server?

Not that I know of.

I would run it from the standby as a good deal of the heavy lifting is 
done on the main server anyway. In either case the standby will incur 
roughly the same load, namely receiving the data and writing it to disk.

> 
> 
> And while I'm asking, has anyone yet written a guide/tutorial for PG 11 
> replication? Everything I find online is very old.

Maybe?:
https://www.postgresql.org/docs/11/high-availability.html

> 
> 
> Chuck Martin
> Avondale Software


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: getting pg_basebackup to use remote destination

From
Peter Eisentraut
Date:
On 29/12/2018 20:04, Chuck Martin wrote:
> I thought I knew how to do this, but I apparently don't. I have to set
> up a new server as a standby for a PG 11.1 server. The main server has a
> lot more resources than the standby. What I want to do is run
> pg_basebackup on the main server with the output going to the data
> directory on the new server. But when I give this command:
> 
> pg_basebackup -D "ssh root@10.0.1.16:/mnt/dbraid/data" -P -v -X s
> 
> 
> it instead writes to my root drive which doesn't have the space, so it
> fails and deletes the partial backup. 

What you might be thinking of is the "old" method of doing base backups
before pg_basebackup:  Call pg_start_backup() and then do file system
operations (tar, scp, whatever) to move the data files to where you want
them.  This is mostly obsolete.  You should run pg_basebackup on the
host where you want to set up your standby.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: getting pg_basebackup to use remote destination

From
Chuck Martin
Date:


On Sun, Dec 30, 2018 at 11:20 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 29/12/2018 20:04, Chuck Martin wrote:
> I thought I knew how to do this, but I apparently don't. I have to set
> up a new server as a standby for a PG 11.1 server. The main server has a
> lot more resources than the standby. What I want to do is run
> pg_basebackup on the main server with the output going to the data
> directory on the new server. But when I give this command:
>
> pg_basebackup -D "ssh root@10.0.1.16:/mnt/dbraid/data" -P -v -X s
>
>
> it instead writes to my root drive which doesn't have the space, so it
> fails and deletes the partial backup. 

What you might be thinking of is the "old" method of doing base backups
before pg_basebackup:  Call pg_start_backup() and then do file system
operations (tar, scp, whatever) to move the data files to where you want
them.  This is mostly obsolete.  You should run pg_basebackup on the
host where you want to set up your standby

Thanks. It’s been a while since I set up replication. Not to mention several Postgres versions. I’ve started pg_basebackup from the standby. It failed once due to an ssh error, but I reloaded sshd and started again. May take a while. It about 750gb. 
.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Chuck Martin
Avondale Software

Re: getting pg_basebackup to use remote destination

From
Francisco Olarte
Date:
On Sun, Dec 30, 2018 at 5:52 PM Chuck Martin <clmartin@theombudsman.com> wrote:
......
> I’ve started pg_basebackup from the standby. It failed once due to an ssh error, but I reloaded sshd and started
again.May take a while. It about 750gb. 

IIRC pg_base backup uses the postgres connection, so I supose ssh
failed on the terminal session you were using to run it. If this is
your problem you could ( should, IMO ) install something like tmux /
screen which is really a life saver for this cases. ( I'll recommend
tmux if you've never used one of them, they are terminal multiplexers
which allow you to open a session, runs some terminal sessions under
it and connect/disconnect/reconnect to them, and specially when your
ssh barfs they just disconnect the session, so you just ssh again and
reconnect. They are great when you have long running termina commands
like pg_basebackup / pg_dump, I routinely use them to launch a long
running process and just reconnect periodically to check on it ).

Francisco Olarte.


Re: getting pg_basebackup to use remote destination

From
Jeff Janes
Date:
On Sat, Dec 29, 2018 at 2:05 PM Chuck Martin <clmartin@theombudsman.com> wrote:
I thought I knew how to do this, but I apparently don't. I have to set up a new server as a standby for a PG 11.1 server. The main server has a lot more resources than the standby. What I want to do is run pg_basebackup on the main server with the output going to the data directory on the new server.

pg_basebackup consumes few resources on the standby anyway in the mode you are running it, other than network and disk.  And those are inevitable given your end goal, so if you could do what you want, I think it still wouldn't do what you want.

If you really want to spare the network, you can run compression on the server side then decompress on the standby.  Currently you can't compress on the server when invoking it on the standby, so:

pg_basebackup -D - -Ft -X none |pxz | ssh 10.0.1.16 "tar -xJf - -C /somewhere/data_test"

Unfortunately you can't use this along with -X stream or -X fetch.
 
Really I would probably compress to a file and then use scp/rsync, rather the streaming into ssh.  That way if ssh gets interrupted, you don't lose all the work.

Cheers,

Jeff

Re: getting pg_basebackup to use remote destination

From
Chuck Martin
Date:
Maybe I need to rethink ths and take Jeff's advice. I executed this:

pg_basebackup -h [main server's URL] -U postgres -P -v -X s -D /mnt/dbraid/data

8 hours ago, and it is now still at 1%. Should it be that slow? The database in question is about 750 GB, and both servers are on the same GB ethernet network. 

Chuck Martin
Avondale Software


On Sun, Dec 30, 2018 at 3:28 PM Jeff Janes <jeff.janes@gmail.com> wrote:
On Sat, Dec 29, 2018 at 2:05 PM Chuck Martin <clmartin@theombudsman.com> wrote:
I thought I knew how to do this, but I apparently don't. I have to set up a new server as a standby for a PG 11.1 server. The main server has a lot more resources than the standby. What I want to do is run pg_basebackup on the main server with the output going to the data directory on the new server.

pg_basebackup consumes few resources on the standby anyway in the mode you are running it, other than network and disk.  And those are inevitable given your end goal, so if you could do what you want, I think it still wouldn't do what you want.

If you really want to spare the network, you can run compression on the server side then decompress on the standby.  Currently you can't compress on the server when invoking it on the standby, so:

pg_basebackup -D - -Ft -X none |pxz | ssh 10.0.1.16 "tar -xJf - -C /somewhere/data_test"

Unfortunately you can't use this along with -X stream or -X fetch.
 
Really I would probably compress to a file and then use scp/rsync, rather the streaming into ssh.  That way if ssh gets interrupted, you don't lose all the work.

Cheers,

Jeff

Re: getting pg_basebackup to use remote destination

From
Jeff Janes
Date:
On Sun, Dec 30, 2018 at 6:17 PM Chuck Martin <clmartin@theombudsman.com> wrote:
Maybe I need to rethink ths and take Jeff's advice. I executed this:

pg_basebackup -h [main server's URL] -U postgres -P -v -X s -D /mnt/dbraid/data

8 hours ago, and it is now still at 1%. Should it be that slow? The database in question is about 750 GB, and both servers are on the same GB ethernet network.

Over gigabit ethernet, it should not be that slow.  Unless the network is saturated with other traffic or something.  Might be time to call in the network engineers.  Can you transfer static files at high speeds between those two hosts using scp or rsync?  (Or use some other technique to take PostgreSQL out of the loop and see if your network is performing as it should)

Are you seeing transfers at a constant slow rate, or are their long freezes or something?  Maybe the initial checkpoint was extremely slow?  Unfortunately -P option (even with -v) doesn't make this easy to figure out.  So alas it's back to old school stopwatch and a pen and paper (or spreadsheet).

Cheers,

Jeff

Re: getting pg_basebackup to use remote destination

From
Chuck Martin
Date:
On Mon, Dec 31, 2018 at 12:05 PM Jeff Janes <jeff.janes@gmail.com> wrote:
On Sun, Dec 30, 2018 at 6:17 PM Chuck Martin <clmartin@theombudsman.com> wrote:
Maybe I need to rethink ths and take Jeff's advice. I executed this:

pg_basebackup -h [main server's URL] -U postgres -P -v -X s -D /mnt/dbraid/data

8 hours ago, and it is now still at 1%. Should it be that slow? The database in question is about 750 GB, and both servers are on the same GB ethernet network.

Over gigabit ethernet, it should not be that slow.  Unless the network is saturated with other traffic or something.  Might be time to call in the network engineers.  Can you transfer static files at high speeds between those two hosts using scp or rsync?  (Or use some other technique to take PostgreSQL out of the loop and see if your network is performing as it should)

Are you seeing transfers at a constant slow rate, or are their long freezes or something?  Maybe the initial checkpoint was extremely slow?  Unfortunately -P option (even with -v) doesn't make this easy to figure out.  So alas it's back to old school stopwatch and a pen and paper (or spreadsheet).

Cheers,

Jeff
Using iperf, the transfer speed between the two servers (from the main to the standby) was 938 Mbits/sec. If I understand the units correctly, it is close to what it can be. 

Your earlier suggestion was to do the pg_basebackup locally and rsync it over. Maybe that would be faster. At this point, it is saying it is 6% through, over 24 hours after being started.

Chuck Martin
Avondale Software
 

Re: getting pg_basebackup to use remote destination

From
Stephen Frost
Date:
Greetings Chuck,

* Chuck Martin (clmartin@theombudsman.com) wrote:
> Using iperf, the transfer speed between the two servers (from the main to
> the standby) was 938 Mbits/sec. If I understand the units correctly, it is
> close to what it can be.

That does look like the rate it should be going at, but it should only
take about 2 hours to copy 750GB at that rate.

How much WAL does this system generate though...?  If you're generating
a very large amount then it's possible the WAL streaming is actually
clogging up the network and causing the rate of copy on the data files
to be quite slow.  You'd have to be generating quite a bit of WAL
though.

> Your earlier suggestion was to do the pg_basebackup locally and rsync it
> over. Maybe that would be faster. At this point, it is saying it is 6%
> through, over 24 hours after being started.

For building out a replica, I'd tend to use my backups anyway instead of
using pg_basebackup.  Provided you have good backups and reasonable WAL
retention, restoring a backup and then letting it replay WAL from the
archive until it can catch up with the primary works very well.  If you
have a very high rate of WAL then you might consider taking a full
backup and then taking an incremental backup (which is much faster, and
reduces the amount of WAL required to be only that needed for the length
of time that the incremental backup is started until the replica has
caught up to WAL that the primary has).

There's a few different backup tools out there which can do parallel
backup and in-transit compression, which loads up the primary's CPUs
with process doing compression but should reduce the overall time if the
bottleneck is the network.

Thanks!

Stephen

Attachment

Re: getting pg_basebackup to use remote destination

From
Chuck Martin
Date:
On Thu, Jan 3, 2019 at 3:46 PM Stephen Frost <sfrost@snowman.net> wrote:
Greetings Chuck,

* Chuck Martin (clmartin@theombudsman.com) wrote:
> Using iperf, the transfer speed between the two servers (from the main to
> the standby) was 938 Mbits/sec. If I understand the units correctly, it is
> close to what it can be.

That does look like the rate it should be going at, but it should only
take about 2 hours to copy 750GB at that rate.

That’s what I was expecting.

How much WAL does this system generate though...?  If you're generating
a very large amount then it's possible the WAL streaming is actually
clogging up the network and causing the rate of copy on the data files
to be quite slow.  You'd have to be generating quite a bit of WAL
though.

It shouldn’t be excessive, but I’ll look closely at that.


> Your earlier suggestion was to do the pg_basebackup locally and rsync it
> over. Maybe that would be faster. At this point, it is saying it is 6%
> through, over 24 hours after being started.

For building out a replica, I'd tend to use my backups anyway instead of
using pg_basebackup.  Provided you have good backups and reasonable WAL
retention, restoring a backup and then letting it replay WAL from the
archive until it can catch up with the primary works very well.  If you
have a very high rate of WAL then you might consider taking a full
backup and then taking an incremental backup (which is much faster, and
reduces the amount of WAL required to be only that needed for the length
of time that the incremental backup is started until the replica has
caught up to WAL that the primary has).

There's a few different backup tools out there which can do parallel
backup and in-transit compression, which loads up the primary's CPUs
with process doing compression but should reduce the overall time if the
bottleneck is the network.

I’ll check out some solutions this weekend.

I appreciate the tips. 

Chuck


Thanks!

Stephen
--
Chuck Martin
Avondale Software