Thread: archive_command too slow.
Hi friends,
I am running 2 Linux machines, kernel 3.13.0-45-generic #74-Ubuntu SMP.
Postgresql version 9.4 in both machine, in a Hot Standby cenario.
Master-Slave using WAL files, not streaming replication.
The archive_command from master is:
archive_command = '/usr/bin/rsync -a -e "ssh" "%p" slave:/data2/postgres/standby/main/incoming/"%f"' #
archive_command = '/usr/bin/rsync -a -e "ssh" "%p" slave:/data2/postgres/standby/main/incoming/"%f"' #
The recovery.conf from slave is:
standby_mode = 'on'
restore_command = 'cp /data2/postgres/standby/main/incoming/%f "%p"'
standby_mode = 'on'
restore_command = 'cp /data2/postgres/standby/main/incoming/%f "%p"'
We have a have intensive write operation generating for example 1577 wals segments per hour ~= 26 segments per minute.
The slave is very behind from master, more than 20 hours.
I can see that all WAL segments on master are on ready state, waiting for archive_command do his jobs.
The slave is waiting for the wal files as described above.
016-11-02 18:57:48 UTC::@:[15698]: LOG: unexpected pageaddr C955/C5000000 in log segment 000000010000C96000000023, offset 0
2016-11-02 18:57:54 UTC::@:[15698]: LOG: restored log file "000000010000C96000000022" from archive
2016-11-02 18:57:54 UTC::@:[15698]: LOG: restored log file "000000010000C96000000023" from archive
2016-11-02 18:57:54 UTC::@:[15698]: LOG: restored log file "000000010000C96000000024" from archive
cp: cannot stat ‘/data2/postgres/standby/main/incoming/000000010000C96000000025’: No such file or directory
2016-11-02 18:57:54 UTC::@:[15698]: LOG: unexpected pageaddr C956/71000000 in log segment 000000010000C96000000025, offset 0
2016-11-02 18:57:58 UTC::@:[15698]: LOG: restored log file "000000010000C96000000024" from archive
cp: cannot stat ‘/data2/postgres/standby/main/incoming/000000010000C96000000025’: No such file or directory
The slave is waiting for the wal files as described above.
016-11-02 18:57:48 UTC::@:[15698]: LOG: unexpected pageaddr C955/C5000000 in log segment 000000010000C96000000023, offset 0
2016-11-02 18:57:54 UTC::@:[15698]: LOG: restored log file "000000010000C96000000022" from archive
2016-11-02 18:57:54 UTC::@:[15698]: LOG: restored log file "000000010000C96000000023" from archive
2016-11-02 18:57:54 UTC::@:[15698]: LOG: restored log file "000000010000C96000000024" from archive
cp: cannot stat ‘/data2/postgres/standby/main/incoming/000000010000C96000000025’: No such file or directory
2016-11-02 18:57:54 UTC::@:[15698]: LOG: unexpected pageaddr C956/71000000 in log segment 000000010000C96000000025, offset 0
2016-11-02 18:57:58 UTC::@:[15698]: LOG: restored log file "000000010000C96000000024" from archive
cp: cannot stat ‘/data2/postgres/standby/main/incoming/000000010000C96000000025’: No such file or directory
It seems that archive_command is very slowly compared with the amount of WAL segments generated.
Any suggestions??? Should I use another strategy to increase the archive_command process speed???
Best Regards,
On Wed, Nov 2, 2016 at 12:06 PM, Joao Junior <jcoj2006@gmail.com> wrote:
Hi friends,I am running 2 Linux machines, kernel 3.13.0-45-generic #74-Ubuntu SMP.Postgresql version 9.4 in both machine, in a Hot Standby cenario.Master-Slave using WAL files, not streaming replication.The archive_command from master is:
archive_command = '/usr/bin/rsync -a -e "ssh" "%p" slave:/data2/postgres/standby/main/incoming/"%f"' #
How long does it take just to set up the ssh tunnel?
$ time ssh slave hostname
In my hands, this takes about 0.5, every time. If you need to archive 26 segments per minute, that much overhead is going to consume a substantial fraction of your time budget.
How much network bandwidth do you have? If you scp a big chunk of files in one command over to the slave (not into a production directory of it,of course) how fast does that go?
$ time rsync datadir/pg_xlog/000000010000C9600000004? slave:/tmp/foo/
...
It seems that archive_command is very slowly compared with the amount of WAL segments generated.Any suggestions??? Should I use another strategy to increase the archive_command process speed???
If network throughput is the problem, use compression, or get a faster network.
If setting up the ssh tunnel is the problem, you could assess whether you really need that security, or compile a custom postgresql with larger WAL file sizes, or write a fancy archive_command which first archives the files to a local directory, and then transfers them in chunks to the slave. Or maybe use streaming rather than file shipping.
Cheers,
Jeff
On Fri, Nov 4, 2016 at 1:19 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > If setting up the ssh tunnel is the problem, you could assess whether you > really need that security, or compile a custom postgresql with larger WAL > file sizes, or write a fancy archive_command which first archives the files > to a local directory, and then transfers them in chunks to the slave. Or > maybe use streaming rather than file shipping. Another option is to use ssh's ControlMaster and ControlPersist features to keep the SSH tunnel alive between commands. You'd have to set up the RSYNC_CONNECT_PROG environment variable on your archive command for that and include the relevant options for ssh in the command.
Greetings, * Joao Junior (jcoj2006@gmail.com) wrote: > I am running 2 Linux machines, kernel 3.13.0-45-generic #74-Ubuntu SMP. > Postgresql version 9.4 in both machine, in a Hot Standby cenario. > > Master-Slave using WAL files, not streaming replication. > > The archive_command from master is: > > archive_command = '/usr/bin/rsync -a -e "ssh" "%p" > slave:/data2/postgres/standby/main/incoming/"%f"' # Admittedly, you're talking about this for only WAL replay, but I wanted to point out that this is not a safe archive_command setting because you are not forcing an fsync() on the remote side to be sure that the WAL is saved out to disk before telling PG that it can remove that WAL segment. What that means is that a crash on the remote system at just the wrong time will create a hole in your WAL, possibly meaning that you wouldn't be able to replay WAL on the remote side past that point and would have to rebuild/re-sync your replica. There's a lot more that needs to be done carefully and correctly to ensure proper backups than this also. In short, don't try to write your own software for doing this. Use one of the existing tools which do the correct things for you. In particular, when it comes to dealing with archive_command being a bottleneck, pgbackrest has an async mode where it will keep open a long-running SSH session to the remote side to avoid the overhead associated with re-exec'ing SSH. The ControlMaster approach helps too, but you still have to exec rsync and ssh and it's ultimately expensive to even do that. Unfortunatly, restore_command is single threaded and is going to be pretty expensive too, though pgbackrest's archive-get tries to be as fast as it can be. pgbackrest also has the ability to perform incremental backups and incremental restores, in parallel, allowing you to periodically bring your replica up to speed very quickly using file transfers instead of using WAL replay. This does mean that you have a period of downtime for your replica, of course, but that might be a worthwhile trade-off. For pure backups, another approach is to use pg_receivexlog and a tool like barman which supports verifying that the WAL for a given backup has reached the remote side. Thanks! Stephen