Thread: pg_xlog volume question
Hello listers, I'm designing a PITR backup / recovery scenario where I'm log-shipping to another location and recovering using a tar backup and log roll forward. The plan is to save a week's worth of logs and restore nightly using the accumulated logs to that point. A daily full backup is too expensive and time consuming - it takes over 36 hours to transport it over the network to the recovery site. So we're going with the weekly backup and saving the weeks worth of logs. As you well know, I will require a LOT of storage for the xlogs. I'm currently creating about 2GB of logs every hour. Is there a config parameter to reduce the amount that Xlog takes up? At this rate I'm looking at close to 400GB to restore the database at the end of the week. Any help appreciated. Mark Steben│Database Administrator│ @utoRevenue-R- "Join the Revenue-tion" 95 Ashley Ave. West Springfield, MA., 01089 413-243-4800 x1512 (Phone) │ 413-732-1824 (Fax) @utoRevenue is a registered trademark and a division of Dominion Enterprises
On Fri, 2009-02-06 at 12:52 -0500, Mark Steben wrote: > Hello listers, > > I'm designing a PITR backup / recovery scenario where I'm log-shipping to > another location and recovering using a tar backup and log roll forward. > The plan is to save a week's worth of logs and restore nightly using the > accumulated logs to that point. A daily full backup is too > expensive and time consuming - it takes over 36 hours to transport it over > the network to the recovery site. So we're going with the weekly backup and > saving the weeks worth of logs. > > As you well know, I will require a LOT of storage for the xlogs. I'm > currently creating about 2GB of logs every hour. Is there a config > parameter to reduce the amount that Xlog takes up? No. I would suggest using pg_standby or pitrtools (which requires pg_standby) to manage this. I would also suggest just restoring as the logs become available. If you can't process 2G in an hour over the network do it at night. Joshua D. Drake > -- PostgreSQL - XMPP: jdrake@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
>>> "Mark Steben" <msteben@autorevenue.com> wrote: > currently creating about 2GB of logs every hour. Is there a config > parameter to reduce the amount that Xlog takes up? We pipe ours through gzip as part of our archive script. There was a "gotcha", though -- an xlog is reused without clearing it first for performance reasons, which meant that even an xlog which had accumulated little or nothing could compress very poorly. I wrote pg_clearxlogtail to help us with this, and have posted source on pgfoundry. Just pipe through it before gzip and the unused portion is set to very compressible zero bytes with no discernible performance hit. Someone else took a more aggressive approach with pglesslog, so you may want to look at that. I haven't, because we have a solution that's working well enough for us, but apparently it dives down into the actual log records and strips some out which it determines are not needed for PITR recovery, and I think it might truncate the file rather than setting it to zeros. If you're already using one of these and compressing the files, there may not be much else you can do than to buy more drives. -Kevin
Thanks for quick response. I should have added that the 'recovery' site is really no more than a Second copy where reporting and ETL work is to be done - so we need The database to be available at least 6 -8 hours during the day. I've tried to wrestle with PG_STANDBY to allow daily availability And then to place back in 'recovery mode' for the next nights set of Logs But it doesn't lend itself to this type of application. Mark Steben│Database Administrator│ @utoRevenue-R- "Join the Revenue-tion" 95 Ashley Ave. West Springfield, MA., 01089 413-243-4800 x1512 (Phone) │ 413-732-1824 (Fax) @utoRevenue is a registered trademark and a division of Dominion Enterprises -----Original Message----- From: Joshua D. Drake [mailto:jd@commandprompt.com] Sent: Friday, February 06, 2009 1:13 PM To: Mark Steben Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] pg_xlog volume question On Fri, 2009-02-06 at 12:52 -0500, Mark Steben wrote: > Hello listers, > > I'm designing a PITR backup / recovery scenario where I'm log-shipping to > another location and recovering using a tar backup and log roll forward. > The plan is to save a week's worth of logs and restore nightly using the > accumulated logs to that point. A daily full backup is too > expensive and time consuming - it takes over 36 hours to transport it over > the network to the recovery site. So we're going with the weekly backup and > saving the weeks worth of logs. > > As you well know, I will require a LOT of storage for the xlogs. I'm > currently creating about 2GB of logs every hour. Is there a config > parameter to reduce the amount that Xlog takes up? No. I would suggest using pg_standby or pitrtools (which requires pg_standby) to manage this. I would also suggest just restoring as the logs become available. If you can't process 2G in an hour over the network do it at night. Joshua D. Drake > -- PostgreSQL - XMPP: jdrake@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
On Fri, 2009-02-06 at 13:29 -0500, Mark Steben wrote: > Thanks for quick response. > > I should have added that the 'recovery' site is really no more than a > Second copy where reporting and ETL work is to be done - so we need > The database to be available at least 6 -8 hours during the day. > I've tried to wrestle with PG_STANDBY to allow daily availability > And then to place back in 'recovery mode' for the next nights set of > Logs But it doesn't lend itself to this type of application. Actually I would think it would if done correctly. If you are using PITR Tools (which again uses pg_standby) you would do this: cmd_standby -B (base backup) cmd_standby -S (standby mode) Cruise...... when you want to go into live mode: cmd_standby -F999 Which will bring the machine up and allow you to use it. Once the "use time" is over: cmd_standby -B (base backup) cmd_standby -S (standby mode) cmd_standby will use rsync to make your base backup so it only copies what has changed. https://projects.commandprompt.com/public/pitrtools It is BSD licensed. Joshua D. Drake -- PostgreSQL - XMPP: jdrake@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997