Thread: Best backup strategy for production systems

Best backup strategy for production systems

From
Oliver
Date:
Hi,
I'm a newbie in postgresql. I've mounted my first postgresql instance, it is empty now, only with default postgres DB.
It is under Linux, with 2 filesystems, one for data and another for archiving (I've enabled archiving as it will be for production).
Could someone recommend me a strategy for backups, scripts and so on? 
Can base backup be done with the system up (postgres up), isn't it?
Would it be ok if I do a base backup each week and archiving backup each day?
As I've not configured backups (and archiving deletion), I've had my first problem and it is that my archiving filesystem (FS) is full and archiver process is showing "failed" with the last wal file copy (normal as archiving FS is full).
Please, recommend me what I should make now .. I should create another network FS for base backups and archiving backups? When I have my first base backup, could I then delete archiving files, isn't it?
My archiving FS has 20GB, I don't understand as with a system without load (it will be for production, but it hasn't databases now .. only postgres), how it full the FS in a few days ... Is it normal?
Thanks beforehand.

Cheers...

Re: Best backup strategy for production systems

From
François Beausoleil
Date:
Hi!

Le 2014-06-17 à 08:31, Oliver <ofabelo@gmail.com> a écrit :

> Hi,
> I'm a newbie in postgresql. I've mounted my first postgresql instance, it is empty now, only with default postgres
DB.
> It is under Linux, with 2 filesystems, one for data and another for archiving (I've enabled archiving as it will be
forproduction). 
> Could someone recommend me a strategy for backups, scripts and so on?
> Can base backup be done with the system up (postgres up), isn't it?
> Would it be ok if I do a base backup each week and archiving backup each day?
> As I've not configured backups (and archiving deletion), I've had my first problem and it is that my archiving
filesystem(FS) is full and archiver process is showing "failed" with the last wal file copy (normal as archiving FS is
full).
> Please, recommend me what I should make now .. I should create another network FS for base backups and archiving
backups?When I have my first base backup, could I then delete archiving files, isn't it? 
> My archiving FS has 20GB, I don't understand as with a system without load (it will be for production, but it hasn't
databasesnow .. only postgres), how it full the FS in a few days ... Is it normal? 
> Thanks beforehand.

Welcome to PostgreSQL!

The PostgreSQL manual has a great section on backup and restore:
http://www.postgresql.org/docs/current/static/backup.html

I found value in « Instant PostgreSQL Backup and Restore How-To » at
http://www.packtpub.com/how-to-postgresql-backup-and-restore/book

Regarding your questions:

* Yes, base backups can be made while the server is up and running. PostgreSQL has a tool named pg_basebackup to do
justthat http://www.postgresql.org/docs/current/static/app-pgbasebackup.html. I personally use OmniPITR to handle my
basebackups and continuous archiving https://github.com/omniti-labs/omnipitr . There also exists WAL-E
https://github.com/wal-e/wal-ewhich backs up your data to S3 / Rackspace CloudFiles automatically. 

* Your WAL files are of no value once you have a new base backup: the new base backup includes all previous WAL files.
Youcan think of a base backup as a snapshot. WAL files describe changes to the last snapshot. Depending on your rate of
change,you can delete obsolete WAL files that are older than « a few days » than the last base backup. I personally
keep3 weeks of WAL files, 2 weeks of base backups. 

* The vacuum daemon will vacuum databases regularly, and checkpoints will also occur on a schedule, even on a system
withoutactivity. Those processes will generate some amount of WAL archives. WAL archives compress very well: 16MB to
4MBis very typical on my system. 

* My database is too big to do pg_dump (3 TiB), so I dont, but I have weekly base backups, plus the WAL archives which
Ikeep for three weeks. 

Hope that helps!
François Beausoleil



Re: Best backup strategy for production systems

From
Oliver
Date:
Hi,
thank you very much for your reply.
Ok, I've read again the official documentation about backup, slowly now ;-)
Is it correct if I use same location for archiving wal files and base backups, isn't it? It will be in a different filesystem of $PGDATA.
OmniPITR allows be configured without having hot_standby? I have PostgreSQL configured using "archive", for archiving wal files into a different filesystem/path.
About many wal generated, reading documentation, I've done a error I think .. :

The archive command is only invoked on completed WAL segments. Hence, if your server generates only little WAL traffic (or has slack periods where it does so), there could be a long delay between the completion of a transaction and its safe recording in archive storage. To put a limit on how old unarchived data can be, you can setarchive_timeout to force the server to switch to a new WAL segment file at least that often. Note that archived files that are archived early due to a forced switch are still the same length as completely full files. It is therefore unwise to set a very short archive_timeout — it will bloat your archive storage. archive_timeout settings of a minute or so are usually reasonable.

So I modified my archive_timeout parameter to 60 .. so I understand now that it is creating wal files each min. of 16MB each one, correct? Even not being fill (because there isn't activity in the database), it will create wal files each min. of 16MB, and for that, I've had my archiving filesystem full quickly. Correct? I've modified parameter now to original value, 0, so it is disabled now.
About wal files and archiving of them, I must delete both manually, isn't it? There isn't any option for automatically delete wal files with a given age in the postgresql.conf, isn't it? (Away of archive_command). Do you use Linux? Could you pass me your archive_command or script that you use for copying/gzipping the files?
Thanks beforehand.

Cheers...


2014-06-17 14:52 GMT+01:00 François Beausoleil <francois@teksol.info>:
Hi!

Le 2014-06-17 à 08:31, Oliver <ofabelo@gmail.com> a écrit :

> Hi,
> I'm a newbie in postgresql. I've mounted my first postgresql instance, it is empty now, only with default postgres DB.
> It is under Linux, with 2 filesystems, one for data and another for archiving (I've enabled archiving as it will be for production).
> Could someone recommend me a strategy for backups, scripts and so on?
> Can base backup be done with the system up (postgres up), isn't it?
> Would it be ok if I do a base backup each week and archiving backup each day?
> As I've not configured backups (and archiving deletion), I've had my first problem and it is that my archiving filesystem (FS) is full and archiver process is showing "failed" with the last wal file copy (normal as archiving FS is full).
> Please, recommend me what I should make now .. I should create another network FS for base backups and archiving backups? When I have my first base backup, could I then delete archiving files, isn't it?
> My archiving FS has 20GB, I don't understand as with a system without load (it will be for production, but it hasn't databases now .. only postgres), how it full the FS in a few days ... Is it normal?
> Thanks beforehand.

Welcome to PostgreSQL!

The PostgreSQL manual has a great section on backup and restore: http://www.postgresql.org/docs/current/static/backup.html

I found value in « Instant PostgreSQL Backup and Restore How-To » at http://www.packtpub.com/how-to-postgresql-backup-and-restore/book

Regarding your questions:

* Yes, base backups can be made while the server is up and running. PostgreSQL has a tool named pg_basebackup to do just that http://www.postgresql.org/docs/current/static/app-pgbasebackup.html. I personally use OmniPITR to handle my base backups and continuous archiving https://github.com/omniti-labs/omnipitr . There also exists WAL-E https://github.com/wal-e/wal-e which backs up your data to S3 / Rackspace CloudFiles automatically.

* Your WAL files are of no value once you have a new base backup: the new base backup includes all previous WAL files. You can think of a base backup as a snapshot. WAL files describe changes to the last snapshot. Depending on your rate of change, you can delete obsolete WAL files that are older than « a few days » than the last base backup. I personally keep 3 weeks of WAL files, 2 weeks of base backups.

* The vacuum daemon will vacuum databases regularly, and checkpoints will also occur on a schedule, even on a system without activity. Those processes will generate some amount of WAL archives. WAL archives compress very well: 16MB to 4MB is very typical on my system.

* My database is too big to do pg_dump (3 TiB), so I dont, but I have weekly base backups, plus the WAL archives which I keep for three weeks.

Hope that helps!
François Beausoleil


Re: Best backup strategy for production systems

From
John R Pierce
Date:
On 6/19/2014 3:14 AM, Oliver wrote:
About wal files and archiving of them, I must delete both manually, isn't it? There isn't any option for automatically delete wal files with a given age in the postgresql.conf, isn't it? (Away of archive_command). Do you use Linux? Could you pass me your archive_command or script that you use for copying/gzipping the files?
Thanks beforehand.

you need ALL the archived WAL files since the start of the last base backup, or none of them are useful.



-- 
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: Best backup strategy for production systems

From
Borislav Ivanov
Date:
If your database is relatively small, I would recommend http://www.pgbarman.org/. It does binary backup and will take care of your WAL files. The laster version of pgbarman can also take backups from a slave using pgespresso extension. Note that pgbarman runs over streaming replication protocol.

If your database is big, go for pg_basebackup and archive_command. You can run this on a slave. The pg_basebackup will give you the base and during restore you can use restore_command with recovery_target_time for example to replay from the archived WAL files.


On 19 June 2014 11:28, John R Pierce <pierce@hogranch.com> wrote:
On 6/19/2014 3:14 AM, Oliver wrote:
About wal files and archiving of them, I must delete both manually, isn't it? There isn't any option for automatically delete wal files with a given age in the postgresql.conf, isn't it? (Away of archive_command). Do you use Linux? Could you pass me your archive_command or script that you use for copying/gzipping the files?
Thanks beforehand.

you need ALL the archived WAL files since the start of the last base backup, or none of them are useful.



-- 
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: Best backup strategy for production systems

From
Marti Raudsepp
Date:
On Fri, Jun 20, 2014 at 1:47 AM, Borislav Ivanov <bivanov@atlassian.com> wrote:
> If your database is relatively small, I would recommend
> http://www.pgbarman.org/. It does binary backup and will take care of your
> WAL files. The laster version of pgbarman can also take backups from a slave
> using pgespresso extension. Note that pgbarman runs over streaming
> replication protocol.

I would advise against barman for smaller installations. The setup is
complicated and there are too many moving parts (requires SSH hole
punching in two directions AND a streaming replication connection). I
found WAL-E to be much easier to manage.

Regards,
Marti