Thread: Hourly backup using pg_basebackup
Hi all, We have a python script called by cron on an hourly basis to back up our production database. Currently, the script invokespg_dump and takes more than hour to complete. Hence the script looks to see if it's already running and exits if so. I want to change the script so it uses pg_basebackup insteadsince that's so much faster. My problem is, however, that while I'd like to just have it build a tarball, maybe compressed, I can't use a "-X s" optionfor the wal segments. I think I understand why I can't use the streaming option with a "-Ft" specified. I'm just concerned about the docs saying that the backup may have problemswith fetch as a wal segment may have expired. Manually testing is showing that the Db needs about 11 minutes to backup with pg_basebackup, and our wal_keep_segments setting is6. This said, an hour's worth of wal segments should be available, but the six that were there at the beginning of the backup are not the same six there at the end. I don't thinkthis is really a problem, but I'd like to get it confirmed. Wouldn't the backup actually have to take more than hour for this to be an issue? Thanks in advance, Jay
Why can't you stream?
Did you read "The connection must be made with a superuser or a user having REPLICATION permissions (see Section 20.2), and pg_hba.conf must explicitly permit the replication connection. The server must also be configured with max_wal_senders set high enough to leave at least one session available for the backup."
http://www.postgresql.org/docs/9.3/static/app-pgbasebackup.html.
Also why not do one basebackup and setup archiving through the day? You can do a incremental in the middle of the day (if you really wanted to), may make the recovery a bit shorter. Or you can setup a streaming slave and stop postgres and tar/compress the data dir there. Although watch the wal_keep_segments there as well or set an restore_command if you are archiving.
On Fri, Feb 6, 2015 at 12:53 PM, John Scalia <jayknowsunix@gmail.com> wrote:
Hi all,
We have a python script called by cron on an hourly basis to back up our production database. Currently, the script invokes pg_dump and takes more than hour to complete. Hence the script looks to see if it's already running and exits if so. I want to change the script so it uses pg_basebackup instead since that's so much faster.
My problem is, however, that while I'd like to just have it build a tarball, maybe compressed, I can't use a "-X s" option for the wal segments. I think I understand why I can't use the streaming option with a "-Ft" specified. I'm just concerned about the docs saying that the backup may have problems with fetch as a wal segment may have expired. Manually testing is showing that the Db needs about 11 minutes to backup with pg_basebackup, and our wal_keep_segments setting is 6. This said, an hour's worth of wal segments should be available, but the six that were there at the beginning of the backup are not the same six there at the end. I don't think this is really a problem, but I'd like to get it confirmed. Wouldn't the backup actually have to take more than hour for this to be an issue?
Thanks in advance,
Jay
--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
On Fri, Feb 6, 2015 at 4:53 PM, John Scalia <jayknowsunix@gmail.com> wrote:
We have a python script called by cron on an hourly basis to back up our production database. Currently, the script invokes pg_dump and takes more than hour to complete. Hence the script looks to see if it's already running and exits if so. I want to change the script so it uses pg_basebackup instead since that's so much faster.
Have you considered using incremental backup (continuous archiving) instead of a such small backup window?
See [1]
See [1]
My problem is, however, that while I'd like to just have it build a tarball, maybe compressed, I can't use a "-X s" option for the wal segments. I think I understand why I can't use the streaming option with a "-Ft" specified. I'm just concerned about the docs saying that the backup may have problems with fetch as a wal segment may have expired. Manually testing is showing that the Db needs about 11 minutes to backup with pg_basebackup, and our wal_keep_segments setting is 6. This said, an hour's worth of wal segments should be available, but the six that were there at the beginning of the backup are not the same six there at the end. I don't think this is really a problem, but I'd like to get it confirmed. Wouldn't the backup actually have to take more than hour for this to be an issue?
If you use archiving [1], you don't need to worry about saving the segments withing the backup, just let it be done through archive_command. Isn't that an option for you? If don't, why?
[1] http://www.postgresql.org/docs/current/static/continuous-archiving.html
Best regards,
--
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres
On 2/6/2015 2:19 PM, Mathis, Jason wrote:
Yes, the script runs as postgres and I placed an entry in the pg_hba.conf file for postgres with replication. The problem I've hit is that when you specify "-Ft" to pg_basebackup to make a tarball, it will error if you also specify "-X s". I think that's because the "-X s" specifies that it spawns another process to stream the transaction log in parallel. You can't really write a single tar file with multiple processes.Why can't you stream?Did you read "The connection must be made with a superuser or a user having REPLICATION permissions (see Section 20.2), and pg_hba.conf must explicitly permit the replication connection. The server must also be configured with max_wal_senders set high enough to leave at least one session available for the backup."
The reason is more political than anything else, and the result of a contract obligation. You know those that get negotiated without discussing the matter with the DBA? I'm just thinking that so long as the most recent couple wal segments are included in the tarball, then the backup should restorable without issue. Just trying to confirm that.Also why not do one basebackup and setup archiving through the day? You can do a incremental in the middle of the day (if you really wanted to), may make the recovery a bit shorter. Or you can setup a streaming slave and stop postgres and tar/compress the data dir there. Although watch the wal_keep_segments there as well or set an restore_command if you are archiving.
On Fri, Feb 6, 2015 at 12:53 PM, John Scalia <jayknowsunix@gmail.com> wrote:Hi all,
We have a python script called by cron on an hourly basis to back up our production database. Currently, the script invokes pg_dump and takes more than hour to complete. Hence the script looks to see if it's already running and exits if so. I want to change the script so it uses pg_basebackup instead since that's so much faster.
My problem is, however, that while I'd like to just have it build a tarball, maybe compressed, I can't use a "-X s" option for the wal segments. I think I understand why I can't use the streaming option with a "-Ft" specified. I'm just concerned about the docs saying that the backup may have problems with fetch as a wal segment may have expired. Manually testing is showing that the Db needs about 11 minutes to backup with pg_basebackup, and our wal_keep_segments setting is 6. This said, an hour's worth of wal segments should be available, but the six that were there at the beginning of the backup are not the same six there at the end. I don't think this is really a problem, but I'd like to get it confirmed. Wouldn't the backup actually have to take more than hour for this to be an issue?
Thanks in advance,
Jay
--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
On 2/6/2015 2:25 PM, Matheus de Oliveira wrote:
Oh, yes, I did fail to mention that the system where I'm trying this is the primary in a streaming replication cluster with (2) hot standby servers. I've mentioned without much traction, that we don't really even need a backup with 3 servers ready to do the work without delays. Like I said in the last post, it's all political.On Fri, Feb 6, 2015 at 4:53 PM, John Scalia <jayknowsunix@gmail.com> wrote:We have a python script called by cron on an hourly basis to back up our production database. Currently, the script invokes pg_dump and takes more than hour to complete. Hence the script looks to see if it's already running and exits if so. I want to change the script so it uses pg_basebackup instead since that's so much faster.Have you considered using incremental backup (continuous archiving) instead of a such small backup window?
See [1]My problem is, however, that while I'd like to just have it build a tarball, maybe compressed, I can't use a "-X s" option for the wal segments. I think I understand why I can't use the streaming option with a "-Ft" specified. I'm just concerned about the docs saying that the backup may have problems with fetch as a wal segment may have expired. Manually testing is showing that the Db needs about 11 minutes to backup with pg_basebackup, and our wal_keep_segments setting is 6. This said, an hour's worth of wal segments should be available, but the six that were there at the beginning of the backup are not the same six there at the end. I don't think this is really a problem, but I'd like to get it confirmed. Wouldn't the backup actually have to take more than hour for this to be an issue?If you use archiving [1], you don't need to worry about saving the segments withing the backup, just let it be done through archive_command. Isn't that an option for you? If don't, why?
[1] http://www.postgresql.org/docs/current/static/continuous-archiving.html
Best regards,
--Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres
John Scalia <jayknowsunix@gmail.com> writes: > On 2/6/2015 2:25 PM, Matheus de Oliveira wrote: > > On Fri, Feb 6, 2015 at 4:53 PM, John Scalia <jayknowsunix@gmail.com> wrote: > > We have a python script called by cron on an hourly basis to back up our production database. Currently, the scriptinvokes pg_dump and takes more than hour to > complete. Hence the script looks to see if it's already running and exits if so. I want to change the script soit uses pg_basebackup instead since that's so > much faster. > > Have you considered using incremental backup (continuous archiving) instead of a such small backup window? > > See [1] > > My problem is, however, that while I'd like to just have it build a tarball, maybe compressed, I can't use a"-X s" option for the wal segments. I think I > understand why I can't use the streaming option with a "-Ft" specified. I'm just concerned about the docs sayingthat the backup may have problems with fetch as > a wal segment may have expired. Manually testing is showing that the Db needs about 11 minutes to backup with pg_basebackup,and our wal_keep_segments setting > is 6. This said, an hour's worth of wal segments should be available, but the six that were there at the beginningof the backup are not the same six there at > the end. I don't think this is really a problem, but I'd like to get it confirmed. Wouldn't the backup actuallyhave to take more than hour for this to be an > issue? > > If you use archiving [1], you don't need to worry about saving the segments withing the backup, just let it be donethrough archive_command. Isn't that an option > for you? If don't, why? > > Oh, yes, I did fail to mention that the system where I'm trying this is the primary in a streaming replication clusterwith (2) hot standby servers. I've mentioned > without much traction, that we don't really even need a backup with 3 servers ready to do the work without delays. LikeI said in the last post, it's all political. Er, yes you do. Suppose someone/thing drops a table erroneously? Do that and all dozens of your streamers are trash. > > [1] http://www.postgresql.org/docs/current/static/continuous-archiving.html > > Best regards, > -- > Matheus de Oliveira > Analista de Banco de Dados > Dextra Sistemas - MPS.Br nÃvel F! > www.dextra.com.br/postgres > -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consulting@comcast.net p: 312.241.7800