Thread: Deleting old archived WAL files

Deleting old archived WAL files

From
Jaume Sabater
Date:
Hello everyone!

I am trying to setup a warm standby cluster with pgpool-II and PostgreSQL 8.3 on Debian Etch (with backported
packages),with replication and load balancing. I have enabled WAL archiving in both PostgreSQL instances as required by
pgpool-IIand configured the online recovery with help of rsync and ssh, which is working fine at the moment. My archive
commandis: 

archive_command = '/bin/cp %p /var/lib/postgresql/pg_xlog_archive/%f'

Now it turns out that, with just a few tests using pgbench, my /var/lib/postgresql/pg_xlog_archive directory is almost
1GB big, and this is just a computer running two Xen instances for testing and such. 

So I have been investigating, reading online docs and googling a lot, and I believe that, in order to get rid of old
WALarchives (and to be able to actually call those "old" ones), I should be doing something like this: 

1. Once every now and then, depending on how big that directory grows, I should run pg_start_backup('label') and
pg_stop_backup.

2. Note the created .backup file inside /var/lib/postgresql/pg_xlog_archive and delete all archived WAL files "older"
("lesser"than, alphabetically speaking) than the one referenced in the "START WAL LOCATION" label inside that .backup
file.

I could do this from a job in the cron, triggered every hour, that would check the size of the folder and run an
emergencybackup script to prevent the hard drive from filling up. Or with another job in the cron run every week or so.
Ora combination of both. 

Would this be a good procedure? I have not been able to find any information or examples on how to do with old WAL
backups.Everyone seems to be archiving those until the day of the armageddon... 

I would be doing nightly pg_dumps, too.

Thanks in advance.

--
Jaume Sabater
http://linuxsilo.net/

"Ubi sapientas ibi libertas"

Re: Deleting old archived WAL files

From
Chander Ganesan
Date:
Jaume Sabater wrote:
> Hello everyone!
>
> I am trying to setup a warm standby cluster with pgpool-II and PostgreSQL 8.3 on Debian Etch (with backported
packages),with replication and load balancing. I have enabled WAL archiving in both PostgreSQL instances as required by
pgpool-IIand configured the online recovery with help of rsync and ssh, which is working fine at the moment. My archive
commandis: 
>
> archive_command = '/bin/cp %p /var/lib/postgresql/pg_xlog_archive/%f'
>
> Now it turns out that, with just a few tests using pgbench, my /var/lib/postgresql/pg_xlog_archive directory is
almost1 GB big, and this is just a computer running two Xen instances for testing and such. 
>
> So I have been investigating, reading online docs and googling a lot, and I believe that, in order to get rid of old
WALarchives (and to be able to actually call those "old" ones), I should be doing something like this: 
>
> 1. Once every now and then, depending on how big that directory grows, I should run pg_start_backup('label') and
pg_stop_backup.
>
> 2. Note the created .backup file inside /var/lib/postgresql/pg_xlog_archive and delete all archived WAL files "older"
("lesser"than, alphabetically speaking) than the one referenced in the "START WAL LOCATION" label inside that .backup
file.
>
> I could do this from a job in the cron, triggered every hour, that would check the size of the folder and run an
emergencybackup script to prevent the hard drive from filling up. Or with another job in the cron run every week or so.
Ora combination of both. 
>
> Would this be a good procedure? I have not been able to find any information or examples on how to do with old WAL
backups.Everyone seems to be archiving those until the day of the armageddon... 
>
If you are running warm-standby, its presumable that your standby server
is "consuming" these files as they are being generated.  In such a case,
you can set "log_restartpoints" in your recovery.conf file, and use
pg_standby with the '%r' (restartwalfile) parameter so that it can
"prune" old WAL files when necessary.  In such a case, you wouldn't need
to do any pruning yourself, since pg_standby would do it for you, when
the standby server indicates that it is "safe" to remove those old files.

log_restartpoints='true'
restore_command='pg_standby /archive_dir %f %p %r'

In short, your system that is in "recovery mode" can decide which ones
it needs to get rid of, once it knows it no longer needs them.

Note: I believe the log_restartpoints option (and %r argument) were
added in 8.3 .

--
Chander Ganesan
Open Technology Group, Inc.
One Copley Parkway, Suite 210
Morrisville, NC  27560
919-463-0999/877-258-8987
http://www.otg-nc.com
Expert PostgreSQL, PostGIS, and other training delivered worldwide.


Re: Deleting old archived WAL files

From
Jaume Sabater
Date:
Chander Ganesan wrote:

> If you are running warm-standby, its presumable that your standby
> server is "consuming" these files as they are being generated.  In
> such a case, you can set "log_restartpoints" in your recovery.conf
> file, and use pg_standby with the '%r' (restartwalfile) parameter so
> that it can "prune" old WAL files when necessary.  In such a case,
> you wouldn't need to do any pruning yourself, since pg_standby would
> do it for you, when the standby server indicates that it is "safe" to
> remove those old files.
>
> log_restartpoints='true' restore_command='pg_standby /archive_dir %f
> %p %r'
>
> In short, your system that is in "recovery mode" can decide which
> ones it needs to get rid of, once it knows it no longer needs them.

Thanks for your answer. I believe I did not explain myself correctly.
pgpool-II, a middleware, is replicating all transactions into all the
nodes (via sending the SQL statements to all nodes). And does load
balancing of SELECT statements when possible.

Therefore, the online recovery process can happen on what could be
called the master node (the slave node failed) or in the slave node (the
master node failed). In both cases the same happens:

1. pg_start_backup
2. rsync data dir
3. pg_stop_backup
4. Recovery process, scp'ing the necessary archived WAL files from the
existing node to the being-recovered node.

So, what I mean is that it's not a pgsql to pgsql warm standby setup
(both pgsql nodes don't know about each other, and one is not feeding
the other with WAL files so that the other can be kept updated).

In other words, let's imagine it's just a single pgsql installation,
with no pgpool-II stuff, and I am archiving the WAL files because I
fancy it, or as an backup system.

But I don't want to keep archived WAL files forever, but just up to,
let's say, a few gigabytes or something. Therefore I thought I would
have to be doing pg_start_backup/pg_stop_backup "manually" and getting
rid of the now old files myself, as explained in my previous mail.

I hope I have explained myself correctly this time. I read about the
options you mentioned in the pgsql online documentation, but I did not
understand it was the thing I was looking for since there is no pgsql
instance "consuming" the logs.

Thanks again for your reply.

P.S. Sorry guys for the previous mail, as my webmail does not wrap lines.

--
Jaume Sabater
http://linuxsilo.net/

"Ubi sapientas ibi libertas"


Re: Deleting old archived WAL files

From
Chander Ganesan
Date:
Jaume Sabater wrote:
> Chander Ganesan wrote:
>
>
>> If you are running warm-standby, its presumable that your standby
>> server is "consuming" these files as they are being generated.  In
>> such a case, you can set "log_restartpoints" in your recovery.conf
>> file, and use pg_standby with the '%r' (restartwalfile) parameter so
>> that it can "prune" old WAL files when necessary.  In such a case,
>> you wouldn't need to do any pruning yourself, since pg_standby would
>> do it for you, when the standby server indicates that it is "safe" to
>> remove those old files.
>>
>> log_restartpoints='true' restore_command='pg_standby /archive_dir %f
>> %p %r'
>>
>> In short, your system that is in "recovery mode" can decide which
>> ones it needs to get rid of, once it knows it no longer needs them.
>>
>
> Thanks for your answer. I believe I did not explain myself correctly.
> pgpool-II, a middleware, is replicating all transactions into all the
> nodes (via sending the SQL statements to all nodes). And does load
> balancing of SELECT statements when possible.
>
Ahh...typically the term "warm standby" refers to a secondary server
that is not "hot" (i.e., queryable).  I assumed that you meant you were
using pgpool-II for pooing  and as a "switch" to move things over to the
other server.
> Therefore, the online recovery process can happen on what could be
> called the master node (the slave node failed) or in the slave node (the
> master node failed). In both cases the same happens:
>
> 1. pg_start_backup
> 2. rsync data dir
> 3. pg_stop_backup
> 4. Recovery process, scp'ing the necessary archived WAL files from the
> existing node to the being-recovered node.
>
So really, when you need to "recover", you can simply remove all the old
"wals" prior to issuing your pg_start_backup(), since they would not be
needed after your rsync.
> So, what I mean is that it's not a pgsql to pgsql warm standby setup
> (both pgsql nodes don't know about each other, and one is not feeding
> the other with WAL files so that the other can be kept updated).
>
Okay...
> In other words, let's imagine it's just a single pgsql installation,
> with no pgpool-II stuff, and I am archiving the WAL files because I
> fancy it, or as an backup system.
>
> But I don't want to keep archived WAL files forever, but just up to,
> let's say, a few gigabytes or something. Therefore I thought I would
> have to be doing pg_start_backup/pg_stop_backup "manually" and getting
> rid of the now old files myself, as explained in my previous mail.
>
Why?  If you haven't saved a backup that you made previously (using
pg_start_backup()), having archived wal files is useless.  I don't see
why you just don't arbitrarily remove them immediately.

WAL files are useful when you have a "starting point" (i.e., a backup
made using the PITR backup method), and a set of WAL files generated
"after" the starting point. If you don't preserve the start point (i.e.,
the PITR backup) then the WAL files are basically useless.

If you wanted to actually use them for something, you would issue a
pg_start_backup(), make a backup, and then preserve that backup (in case
you need to recover using the backup and subsequent WAL's at a later
date).  In such a case, you could remove the old WAL's when you
discarded the old backup (hopefully, after making a new one).

I think what you want to do is something like this:

1.  have an archive_command that looks something like this (we'll call
this 'archive.sh'):
if [ -f /var/lib/postgresql/pg_xlog_archive/regen_secondary.lock ]; then
   /bin/cp $1 /var/lib/postgresql/pg_xlog_archive/$2
   exit $?
fi
exit 0

Note: If you are using 8.3, you can even just disable the
archive_command when you aren't doing a recovery...

2.  When you are rebuilding the "secondary", issue:
    touch /var/lib/postgresql/pg_xlog_archive/regen_secondary.lock

3.  Set your archive command to "archive_command='archive.sh %p %f'"
4.  Perform a warm_standby backup of the node (pg_start_backup(), etc.);
5.  When you complete recovery (and have the second node up and running)
go ahead and issue:
    rm -f /var/lib/postgresql/pg_xlog_archive/*

This way you'll only "archive" WAL files when you actually need them,
and won't waste time/space copying WAL files that you'll never use anyways.

I'm not sure that I agree with your strategy in terms of preserving your
data in the case of some sort of failure, but if I understand your
problem correctly, this would serve as a pretty good solution...

As a side note, I'd recommend you read up on the standard issues that
apply when using PgPool-II in multi-master replication mode with query
balancing (if you aren't already aware of them) :-)  ...

--
Chander Ganesan
Open Technology Group, Inc.
One Copley Parkway, Suite 210
Morrisville, NC  27560
919-463-0999/877-258-8987
http://www.otg-nc.com
Expert PostgreSQL training - from basic to advanced, delivered worldwide!


Re: Deleting old archived WAL files

From
Simon Riggs
Date:
On Thu, 2008-10-30 at 16:17 -0400, Chander Ganesan wrote:

> Why?  If you haven't saved a backup that you made previously (using
> pg_start_backup()), having archived wal files is useless.  I don't see
> why you just don't arbitrarily remove them immediately.
>
> WAL files are useful when you have a "starting point" (i.e., a backup
> made using the PITR backup method), and a set of WAL files generated
> "after" the starting point. If you don't preserve the start point
> (i.e., the PITR backup) then the WAL files are basically useless.

They are still better than nothing at all. I haven't read the whole
thread, but suggesting deleting things like that makes me go all
twitchy. :-). Do a couple of proper backups and then delete em.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: Deleting old archived WAL files

From
Jaume Sabater
Date:
Chander Ganesan wrote:

> Ahh...typically the term "warm standby" refers to a secondary server
> that is not "hot" (i.e., queryable).  I assumed that you meant you were

You are right. My fault :'(

> So really, when you need to "recover", you can simply remove all the old
> "wals" prior to issuing your pg_start_backup(), since they would not be
> needed after your rsync.

[..]

> Why?  If you haven't saved a backup that you made previously (using
> pg_start_backup()), having archived wal files is useless.  I don't see
> why you just don't arbitrarily remove them immediately.

Aha. Now I understand the situation. So, since pgpool-II issues online recovery in two stages, as you say, I don't
reallyneed to do WAL archiving unless there is an online recovery process getting started or happening. Therefore what
Iwill do is modify my base-backup script to, as you suggest, activate WAL archiving. 

Now it all makes sense to me:

1. I do pg_start_backup
2. Therefore, from now onwards, I need to archive/store WAL files.
3. I do pg_stop_backup
4. Once the restoration is done, I don't need them anymore, so I can disable WAL archiving and delete the old, just
used,already used ones. 

Thank you very much for your explanation, Chander, you just made everything make sense.

> Note: If you are using 8.3, you can even just disable the
> archive_command when you aren't doing a recovery...

Yes, this is what I will do. I will look for how to do this on the online documentation (I have read that
enabling/disablingWAL archiving requires restart, but enabling/disabling archive_command only needs a reload). I guess
Iwill have to modify the /etc/postgresql/8.3/main/postgresql.conf file from the script, then ask the daemon for a
reload.But I'll see if there is another way to do this that does not require altering that configuration file. 

> I'm not sure that I agree with your strategy in terms of preserving your
> data in the case of some sort of failure, but if I understand your
> problem correctly, this would serve as a pretty good solution...

This database is almost read only, with scheduled updates of the contents. Therefore nightly pg_dumps are good enough
(sufficient)as a backup strategy. 

> As a side note, I'd recommend you read up on the standard issues that
> apply when using PgPool-II in multi-master replication mode with query
> balancing (if you aren't already aware of them) :-)  ...

Yes, I believe I have. I guess you mean not being able to use CURRENT_TIMESTAMP or SERIAL in queries and stuff like
that.We use the database as a data storage, all business logic remains in .NET classes or in Java classes, therefore I
ampretty much sure that I won't have problems. I am still in the process of checking the code in development with the
programmersso that there is no query not abiding by the rules stated in the pgpool-II readme. 

Thanks again for your help! It is very much appreciated.

--
Jaume Sabater
http://linuxsilo.net/

"Ubi sapientas ibi libertas"


Re: Deleting old archived WAL files

From
Jaume Sabater
Date:
Chander Ganesan wrote:

> Note: If you are using 8.3, you can even just disable the
> archive_command when you aren't doing a recovery...

Hmmm... after reading these pages:

http://www.postgresql.org/docs/8.3/static/continuous-archiving.html#BACKUP-ARCHIVING-WAL
http://www.postgresql.org/docs/8.3/static/runtime-config-wal.html#GUC-ARCHIVE-COMMAND

I understand that, if the archive_command is set to an empty string, then WAL files will accumulate in pg_xlog/ until a
workingarchive_command is re-established. 

So, should I set the archive_command parameter to any of these?

1. exit 0
2. cp %p /dev/null

Or something similar. I mean this as I have no intention to preserve them unless I am doing an online recovery, in
whichcase I would actually set a meaningful value into that parameter: 

archive_command = '/bin/cp %p /var/lib/postgresql/pg_xlog_archive/%f'

--
Jaume Sabater
http://linuxsilo.net/

"Ubi sapientas ibi libertas"


Re: Deleting old archived WAL files

From
Jaume Sabater
Date:
Jaume Sabater wrote:

> So, should I set the archive_command parameter to any of these?
>
> 1. exit 0
> 2. cp %p /dev/null

Just checked and both worked fine. Did 30000 transactions per 10 concurrent clients with pgbench and the pg_xlog/
foldernever went beyond the initial size (files were rotated inside pg_xlog/, none archived, no errors in the
PostgreSQLlog). 

I think I'll stick with the "exit 0" one, as the cp would have to read the file (although it would probably be in the
cacheof the kernel as it was just written?). 

Now on how to change the archive_command in runtime without having to modify the postgresql.conf file, if possible...
:P

--
Jaume Sabater
http://linuxsilo.net/

"Ubi sapientas ibi libertas"