Thread: pg_xlog volume question

pg_xlog volume question

From
"Mark Steben"
Date:
Hello listers,

I'm designing a PITR backup / recovery scenario where I'm log-shipping to
another location and recovering using a tar backup and log roll forward.
The plan is to save a week's worth of logs and restore nightly using the
accumulated logs to that point.  A daily full backup is too
expensive and time consuming - it takes over 36 hours to transport it over
the network to the recovery site.  So we're going with the weekly backup and
saving the weeks worth of logs.

As you well know, I will require a LOT of storage for the xlogs.  I'm
currently creating about 2GB of logs every hour.  Is there a config
parameter to reduce the amount that Xlog takes up?  At this rate I'm looking
at close to 400GB to restore the database at the end of the week.

Any help appreciated.

Mark Steben│Database Administrator│
@utoRevenue-R- "Join the Revenue-tion"
95 Ashley Ave. West Springfield, MA., 01089
413-243-4800 x1512 (Phone) │ 413-732-1824 (Fax)
@utoRevenue is a registered trademark and a division of Dominion Enterprises





Re: pg_xlog volume question

From
"Joshua D. Drake"
Date:
On Fri, 2009-02-06 at 12:52 -0500, Mark Steben wrote:
> Hello listers,
>
> I'm designing a PITR backup / recovery scenario where I'm log-shipping to
> another location and recovering using a tar backup and log roll forward.
> The plan is to save a week's worth of logs and restore nightly using the
> accumulated logs to that point.  A daily full backup is too
> expensive and time consuming - it takes over 36 hours to transport it over
> the network to the recovery site.  So we're going with the weekly backup and
> saving the weeks worth of logs.
>
> As you well know, I will require a LOT of storage for the xlogs.  I'm
> currently creating about 2GB of logs every hour.  Is there a config
> parameter to reduce the amount that Xlog takes up?

No. I would suggest using pg_standby or pitrtools (which requires
pg_standby) to manage this. I would also suggest just restoring as the
logs become available. If you can't process 2G in an hour over the
network do it at night.

Joshua D. Drake


>
--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


Re: pg_xlog volume question

From
"Kevin Grittner"
Date:
>>> "Mark Steben" <msteben@autorevenue.com> wrote:
> currently creating about 2GB of logs every hour.  Is there a config
> parameter to reduce the amount that Xlog takes up?

We pipe ours through gzip as part of our archive script.  There was a
"gotcha", though -- an xlog is reused without clearing it first for
performance reasons, which meant that even an xlog which had
accumulated little or nothing could compress very poorly.  I wrote
pg_clearxlogtail to help us with this, and have posted source on
pgfoundry.  Just pipe through it before gzip and the unused portion is
set to very compressible zero bytes with no discernible performance
hit.

Someone else took a more aggressive approach with pglesslog, so you
may want to look at that.  I haven't, because we have a solution
that's working well enough for us, but apparently it dives down into
the actual log records and strips some out which it determines are not
needed for PITR recovery, and I think it might truncate the file
rather than setting it to zeros.

If you're already using one of these and compressing the files, there
may not be much else you can do than to buy more drives.

-Kevin

Re: pg_xlog volume question

From
"Mark Steben"
Date:
Thanks for quick response.

I should have added that the 'recovery' site is really no more than a
Second copy where reporting and ETL work is to be done - so we need
The database to be available at least 6 -8 hours during the day.
I've tried to wrestle with PG_STANDBY to allow daily availability
And then to place back in 'recovery mode' for the next nights set of
Logs But it doesn't lend itself to this type of application.

Mark Steben│Database Administrator│
@utoRevenue-R- "Join the Revenue-tion"
95 Ashley Ave. West Springfield, MA., 01089
413-243-4800 x1512 (Phone) │ 413-732-1824 (Fax)
@utoRevenue is a registered trademark and a division of Dominion Enterprises




-----Original Message-----
From: Joshua D. Drake [mailto:jd@commandprompt.com]
Sent: Friday, February 06, 2009 1:13 PM
To: Mark Steben
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] pg_xlog volume question

On Fri, 2009-02-06 at 12:52 -0500, Mark Steben wrote:
> Hello listers,
>
> I'm designing a PITR backup / recovery scenario where I'm log-shipping to
> another location and recovering using a tar backup and log roll forward.
> The plan is to save a week's worth of logs and restore nightly using the
> accumulated logs to that point.  A daily full backup is too
> expensive and time consuming - it takes over 36 hours to transport it over
> the network to the recovery site.  So we're going with the weekly backup
and
> saving the weeks worth of logs.
>
> As you well know, I will require a LOT of storage for the xlogs.  I'm
> currently creating about 2GB of logs every hour.  Is there a config
> parameter to reduce the amount that Xlog takes up?

No. I would suggest using pg_standby or pitrtools (which requires
pg_standby) to manage this. I would also suggest just restoring as the
logs become available. If you can't process 2G in an hour over the
network do it at night.

Joshua D. Drake


>
--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997



Re: pg_xlog volume question

From
"Joshua D. Drake"
Date:
On Fri, 2009-02-06 at 13:29 -0500, Mark Steben wrote:
> Thanks for quick response.
>
> I should have added that the 'recovery' site is really no more than a
> Second copy where reporting and ETL work is to be done - so we need
> The database to be available at least 6 -8 hours during the day.
> I've tried to wrestle with PG_STANDBY to allow daily availability
> And then to place back in 'recovery mode' for the next nights set of
> Logs But it doesn't lend itself to this type of application.

Actually I would think it would if done correctly. If you are using PITR
Tools (which again uses pg_standby) you would do this:

cmd_standby -B (base backup)
cmd_standby -S (standby mode)

Cruise...... when you want to go into live mode:

cmd_standby -F999

Which will bring the machine up and allow you to use it. Once the "use
time" is over:

cmd_standby -B (base backup)
cmd_standby -S (standby mode)

cmd_standby will use rsync to make your base backup so it only copies
what has changed.

https://projects.commandprompt.com/public/pitrtools

It is BSD licensed.

Joshua D. Drake

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997