Thread: archive_command too slow.

archive_command too slow.

From
Joao Junior
Date:
Hi friends,

I am running 2 Linux machines, kernel  3.13.0-45-generic #74-Ubuntu SMP.
Postgresql version 9.4 in both machine, in a Hot Standby cenario.

Master-Slave using WAL files, not streaming replication.

The archive_command from master is:

archive_command = '/usr/bin/rsync -a -e "ssh" "%p" slave:/data2/postgres/standby/main/incoming/"%f"' #


The recovery.conf from slave is:
standby_mode = 'on'
restore_command = 'cp /data2/postgres/standby/main/incoming/%f "%p"'


We have a have intensive write operation generating for example 1577 wals segments per hour ~= 26 segments per minute.

The slave is very behind from master, more than 20 hours.
I can see that all WAL segments on master are on ready state, waiting for archive_command do his jobs.



The slave is waiting for the wal files as described above.

016-11-02 18:57:48 UTC::@:[15698]: LOG:  unexpected pageaddr C955/C5000000 in log segment 000000010000C96000000023, offset 0
2016-11-02 18:57:54 UTC::@:[15698]: LOG:  restored log file "000000010000C96000000022" from archive
2016-11-02 18:57:54 UTC::@:[15698]: LOG:  restored log file "000000010000C96000000023" from archive
2016-11-02 18:57:54 UTC::@:[15698]: LOG:  restored log file "000000010000C96000000024" from archive
cp: cannot stat ‘/data2/postgres/standby/main/incoming/000000010000C96000000025’: No such file or directory
2016-11-02 18:57:54 UTC::@:[15698]: LOG:  unexpected pageaddr C956/71000000 in log segment 000000010000C96000000025, offset 0
2016-11-02 18:57:58 UTC::@:[15698]: LOG:  restored log file "000000010000C96000000024" from archive
cp: cannot stat ‘/data2/postgres/standby/main/incoming/000000010000C96000000025’: No such file or directory


It seems that archive_command is very slowly compared with the amount of WAL segments generated.
Any suggestions??? Should I use another strategy to increase the archive_command process speed???


Best Regards,









 

Re: archive_command too slow.

From
Jeff Janes
Date:
On Wed, Nov 2, 2016 at 12:06 PM, Joao Junior <jcoj2006@gmail.com> wrote:
Hi friends,

I am running 2 Linux machines, kernel  3.13.0-45-generic #74-Ubuntu SMP.
Postgresql version 9.4 in both machine, in a Hot Standby cenario.

Master-Slave using WAL files, not streaming replication.

The archive_command from master is:

archive_command = '/usr/bin/rsync -a -e "ssh" "%p" slave:/data2/postgres/standby/main/incoming/"%f"' #


How long does it take just to set up the ssh tunnel?

$ time ssh slave hostname

In my hands, this takes about 0.5, every time.  If you need to archive 26 segments per minute, that much overhead is going to consume a substantial fraction of your time budget.

How much network bandwidth do you have?  If you scp a big chunk of files in one command over to the slave (not into a production directory of it,of course) how fast does that go?

$ time rsync datadir/pg_xlog/000000010000C9600000004? slave:/tmp/foo/


...

 

It seems that archive_command is very slowly compared with the amount of WAL segments generated.
Any suggestions??? Should I use another strategy to increase the archive_command process speed???


If network throughput is the problem, use compression, or get a faster network.

If setting up the ssh tunnel is the problem, you could assess whether you really need that security, or compile a custom postgresql with larger WAL file sizes, or write a fancy archive_command which first archives the files to a local directory, and then transfers them in chunks to the slave.  Or maybe use streaming rather than file shipping.


Cheers,

Jeff

Re: archive_command too slow.

From
Claudio Freire
Date:
On Fri, Nov 4, 2016 at 1:19 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> If setting up the ssh tunnel is the problem, you could assess whether you
> really need that security, or compile a custom postgresql with larger WAL
> file sizes, or write a fancy archive_command which first archives the files
> to a local directory, and then transfers them in chunks to the slave.  Or
> maybe use streaming rather than file shipping.

Another option is to use ssh's ControlMaster and ControlPersist
features to keep the SSH tunnel alive between commands.

You'd have to set up the RSYNC_CONNECT_PROG environment variable on
your archive command for that and include the relevant options for ssh
in the command.


Re: archive_command too slow.

From
Stephen Frost
Date:
Greetings,

* Joao Junior (jcoj2006@gmail.com) wrote:
> I am running 2 Linux machines, kernel  3.13.0-45-generic #74-Ubuntu SMP.
> Postgresql version 9.4 in both machine, in a Hot Standby cenario.
>
> Master-Slave using WAL files, not streaming replication.
>
> The archive_command from master is:
>
> archive_command = '/usr/bin/rsync -a -e "ssh" "%p"
> slave:/data2/postgres/standby/main/incoming/"%f"' #

Admittedly, you're talking about this for only WAL replay, but I wanted
to point out that this is not a safe archive_command setting because
you are not forcing an fsync() on the remote side to be sure that the
WAL is saved out to disk before telling PG that it can remove that WAL
segment.

What that means is that a crash on the remote system at just the wrong
time will create a hole in your WAL, possibly meaning that you wouldn't
be able to replay WAL on the remote side past that point and would have
to rebuild/re-sync your replica.

There's a lot more that needs to be done carefully and correctly to
ensure proper backups than this also.

In short, don't try to write your own software for doing this.  Use one
of the existing tools which do the correct things for you.  In
particular, when it comes to dealing with archive_command being a
bottleneck, pgbackrest has an async mode where it will keep open a
long-running SSH session to the remote side to avoid the overhead
associated with re-exec'ing SSH.  The ControlMaster approach helps too,
but you still have to exec rsync and ssh and it's ultimately expensive
to even do that.  Unfortunatly, restore_command is single threaded and
is going to be pretty expensive too, though pgbackrest's archive-get
tries to be as fast as it can be.

pgbackrest also has the ability to perform incremental backups and
incremental restores, in parallel, allowing you to periodically bring
your replica up to speed very quickly using file transfers instead of
using WAL replay.  This does mean that you have a period of downtime for
your replica, of course, but that might be a worthwhile trade-off.

For pure backups, another approach is to use pg_receivexlog and a tool
like barman which supports verifying that the WAL for a given backup has
reached the remote side.

Thanks!

Stephen

Attachment