Thread: Hourly backup using pg_basebackup

Hourly backup using pg_basebackup

From
John Scalia
Date:
Hi all,

We have a python script called by cron on an hourly basis to back up our production database. Currently, the script
invokespg_dump and takes more than hour to complete. Hence the  
script looks to see if it's already running and exits if so. I want to change the script so it uses pg_basebackup
insteadsince that's so much faster. 

My problem is, however, that while I'd like to just have it build a tarball, maybe compressed,  I can't use a "-X s"
optionfor the wal segments. I think I understand why I can't  
use the streaming option with a "-Ft" specified. I'm just concerned about the docs saying that the backup may have
problemswith fetch as a wal segment may have expired. Manually  
testing is showing that the Db needs about 11 minutes to backup with pg_basebackup, and our wal_keep_segments setting
is6.  This said, an hour's worth of wal segments should be  
available, but the six that were there at the beginning of the backup are not the same six there at the end. I don't
thinkthis is really a problem, but I'd like to get it  
confirmed. Wouldn't the backup actually have to take more than hour for this to be an issue?

Thanks in advance,
Jay


Re: Hourly backup using pg_basebackup

From
"Mathis, Jason"
Date:
Why can't you stream? 

Did you read "The connection must be made with a superuser or a user having REPLICATION permissions (see Section 20.2), and pg_hba.conf must explicitly permit the replication connection. The server must also be configured with max_wal_senders set high enough to leave at least one session available for the backup." 

http://www.postgresql.org/docs/9.3/static/app-pgbasebackup.html

Also why not do one basebackup and setup archiving through the day? You can do a incremental in the middle of the day (if you really wanted to), may make the recovery a bit shorter. Or you can setup a streaming slave and stop postgres and tar/compress the data dir there. Although watch the wal_keep_segments there as well or set an restore_command if you are archiving.  




On Fri, Feb 6, 2015 at 12:53 PM, John Scalia <jayknowsunix@gmail.com> wrote:
Hi all,

We have a python script called by cron on an hourly basis to back up our production database. Currently, the script invokes pg_dump and takes more than hour to complete. Hence the script looks to see if it's already running and exits if so. I want to change the script so it uses pg_basebackup instead since that's so much faster.

My problem is, however, that while I'd like to just have it build a tarball, maybe compressed,  I can't use a "-X s" option for the wal segments. I think I understand why I can't use the streaming option with a "-Ft" specified. I'm just concerned about the docs saying that the backup may have problems with fetch as a wal segment may have expired. Manually testing is showing that the Db needs about 11 minutes to backup with pg_basebackup, and our wal_keep_segments setting is 6.  This said, an hour's worth of wal segments should be available, but the six that were there at the beginning of the backup are not the same six there at the end. I don't think this is really a problem, but I'd like to get it confirmed. Wouldn't the backup actually have to take more than hour for this to be an issue?

Thanks in advance,
Jay


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: Hourly backup using pg_basebackup

From
Matheus de Oliveira
Date:

On Fri, Feb 6, 2015 at 4:53 PM, John Scalia <jayknowsunix@gmail.com> wrote:
We have a python script called by cron on an hourly basis to back up our production database. Currently, the script invokes pg_dump and takes more than hour to complete. Hence the script looks to see if it's already running and exits if so. I want to change the script so it uses pg_basebackup instead since that's so much faster.


Have you considered using incremental backup (continuous archiving) instead of a such small backup window?

See [1]

My problem is, however, that while I'd like to just have it build a tarball, maybe compressed,  I can't use a "-X s" option for the wal segments. I think I understand why I can't use the streaming option with a "-Ft" specified. I'm just concerned about the docs saying that the backup may have problems with fetch as a wal segment may have expired. Manually testing is showing that the Db needs about 11 minutes to backup with pg_basebackup, and our wal_keep_segments setting is 6.  This said, an hour's worth of wal segments should be available, but the six that were there at the beginning of the backup are not the same six there at the end. I don't think this is really a problem, but I'd like to get it confirmed. Wouldn't the backup actually have to take more than hour for this to be an issue?

If you use archiving [1], you don't need to worry about saving the segments withing the backup, just let it be done through archive_command. Isn't that an option for you? If don't, why?


[1] http://www.postgresql.org/docs/current/static/continuous-archiving.html

Best regards,
--
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres

Re: Hourly backup using pg_basebackup

From
John Scalia
Date:
On 2/6/2015 2:19 PM, Mathis, Jason wrote:
Why can't you stream? 

Did you read "The connection must be made with a superuser or a user having REPLICATION permissions (see Section 20.2), and pg_hba.conf must explicitly permit the replication connection. The server must also be configured with max_wal_senders set high enough to leave at least one session available for the backup." 

Yes, the script runs as postgres and I placed an entry in the pg_hba.conf file for postgres with replication. The problem I've hit is that when you specify "-Ft" to pg_basebackup to make a tarball, it will error if you also specify "-X s". I think that's because the "-X s" specifies that it spawns another process to stream the transaction log in parallel. You can't really write a single tar file with multiple processes.
Also why not do one basebackup and setup archiving through the day? You can do a incremental in the middle of the day (if you really wanted to), may make the recovery a bit shorter. Or you can setup a streaming slave and stop postgres and tar/compress the data dir there. Although watch the wal_keep_segments there as well or set an restore_command if you are archiving.  

The reason is more political than anything else, and the result of a contract obligation. You know those that get negotiated without discussing the matter with the DBA? I'm just thinking that so long as the most recent couple wal segments are included in the tarball, then the backup should restorable without issue. Just trying to confirm that.

On Fri, Feb 6, 2015 at 12:53 PM, John Scalia <jayknowsunix@gmail.com> wrote:
Hi all,

We have a python script called by cron on an hourly basis to back up our production database. Currently, the script invokes pg_dump and takes more than hour to complete. Hence the script looks to see if it's already running and exits if so. I want to change the script so it uses pg_basebackup instead since that's so much faster.

My problem is, however, that while I'd like to just have it build a tarball, maybe compressed,  I can't use a "-X s" option for the wal segments. I think I understand why I can't use the streaming option with a "-Ft" specified. I'm just concerned about the docs saying that the backup may have problems with fetch as a wal segment may have expired. Manually testing is showing that the Db needs about 11 minutes to backup with pg_basebackup, and our wal_keep_segments setting is 6.  This said, an hour's worth of wal segments should be available, but the six that were there at the beginning of the backup are not the same six there at the end. I don't think this is really a problem, but I'd like to get it confirmed. Wouldn't the backup actually have to take more than hour for this to be an issue?

Thanks in advance,
Jay


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


Re: Hourly backup using pg_basebackup

From
John Scalia
Date:

On 2/6/2015 2:25 PM, Matheus de Oliveira wrote:

On Fri, Feb 6, 2015 at 4:53 PM, John Scalia <jayknowsunix@gmail.com> wrote:
We have a python script called by cron on an hourly basis to back up our production database. Currently, the script invokes pg_dump and takes more than hour to complete. Hence the script looks to see if it's already running and exits if so. I want to change the script so it uses pg_basebackup instead since that's so much faster.


Have you considered using incremental backup (continuous archiving) instead of a such small backup window?

See [1]

My problem is, however, that while I'd like to just have it build a tarball, maybe compressed,  I can't use a "-X s" option for the wal segments. I think I understand why I can't use the streaming option with a "-Ft" specified. I'm just concerned about the docs saying that the backup may have problems with fetch as a wal segment may have expired. Manually testing is showing that the Db needs about 11 minutes to backup with pg_basebackup, and our wal_keep_segments setting is 6.  This said, an hour's worth of wal segments should be available, but the six that were there at the beginning of the backup are not the same six there at the end. I don't think this is really a problem, but I'd like to get it confirmed. Wouldn't the backup actually have to take more than hour for this to be an issue?

If you use archiving [1], you don't need to worry about saving the segments withing the backup, just let it be done through archive_command. Isn't that an option for you? If don't, why?

Oh, yes, I did fail to mention that the system where I'm trying this is the primary in a streaming replication cluster with (2) hot standby servers. I've mentioned without much traction, that we don't really even need a backup with 3 servers ready to do the work without delays. Like I said in the last post, it's all political.

[1] http://www.postgresql.org/docs/current/static/continuous-archiving.html

Best regards,
--
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres


Re: Hourly backup using pg_basebackup

From
Jerry Sievers
Date:
John Scalia <jayknowsunix@gmail.com> writes:

> On 2/6/2015 2:25 PM, Matheus de Oliveira wrote:
>
>     On Fri, Feb 6, 2015 at 4:53 PM, John Scalia <jayknowsunix@gmail.com> wrote:
>
>         We have a python script called by cron on an hourly basis to back up our production database. Currently, the
scriptinvokes pg_dump and takes more than hour to 
>         complete. Hence the script looks to see if it's already running and exits if so. I want to change the script
soit uses pg_basebackup instead since that's so 
>         much faster.
>
>     Have you considered using incremental backup (continuous archiving) instead of a such small backup window?
>
>     See [1]
>
>         My problem is, however, that while I'd like to just have it build a tarball, maybe compressed,  I can't use
a"-X s" option for the wal segments. I think I 
>         understand why I can't use the streaming option with a "-Ft" specified. I'm just concerned about the docs
sayingthat the backup may have problems with fetch as 
>         a wal segment may have expired. Manually testing is showing that the Db needs about 11 minutes to backup with
pg_basebackup,and our wal_keep_segments setting 
>         is 6.  This said, an hour's worth of wal segments should be available, but the six that were there at the
beginningof the backup are not the same six there at 
>         the end. I don't think this is really a problem, but I'd like to get it confirmed. Wouldn't the backup
actuallyhave to take more than hour for this to be an 
>         issue?
>
>     If you use archiving [1], you don't need to worry about saving the segments withing the backup, just let it be
donethrough archive_command. Isn't that an option 
>     for you? If don't, why?
>
> Oh, yes, I did fail to mention that the system where I'm trying this is the primary in a streaming replication
clusterwith (2) hot standby servers. I've mentioned 
> without much traction, that we don't really even need a backup with 3 servers ready to do the work without delays.
LikeI said in the last post, it's all political. 

Er, yes you do.

Suppose someone/thing drops a table erroneously?

Do that and all dozens of your streamers are trash.


>
>     [1] http://www.postgresql.org/docs/current/static/continuous-archiving.html
>
>     Best regards,
>     --
>     Matheus de Oliveira
>     Analista de Banco de Dados
>     Dextra Sistemas - MPS.Br nível F!
>     www.dextra.com.br/postgres
>

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800