Thread: backup strategies

backup strategies

From
"Richard P. Welty"
Date:
so the outfit i'm currently working for on a quasi-full time
basis has what amounts to an OLTP database server in colocation.
the footprint in the rack is very small, that is, there's no
DLT autoloader or anything of that sort in the rack.

the temporary backup solution was to do full dumps in cron
every night and use scp to move them to the office server.
this has clear scaling problems, but was an ok quick
hack back in march.

well, now it's may and the magnitude of the scaling problems
are becoming very obvious. the scp is taking an absurd
amount of time (the backup times themselves aren't all that
bad, but the office is behind a business road runner connection
and the scp is severely bandwidth limited.)

so i'm outlining a longer term solution, and would be
interested in suggestions/comments. part 1 is to reduce the
frequency of the full dumps, and start using a WAL based
incremental solution such as is outlined here:

  http://www.postgresql.org/docs/8.1/static/backup-online.html

part 2 is going to be to set up amanda to roll this stuff to
a DLT drive in the office. i figure that i probably want to do
full backups of the incremental WAL files each time so i'm not
rummaging around the tape library trying to find all of them
should the worst case happen.

but what are the consequences of backing up a WAL file
if the archive process (probably scp in this case) is running
when the backup copy is made? the whole thing won't make it onto
tape, are there any downsides to running a recover with
an incomplete WAL file?

and if there are better solutions, i'm interested in hearing
about them.

thanks,
  richard


Re: backup strategies

From
Bill Moran
Date:
"Richard P. Welty" <rwelty@averillpark.net> wrote:
>
> so the outfit i'm currently working for on a quasi-full time
> basis has what amounts to an OLTP database server in colocation.
> the footprint in the rack is very small, that is, there's no
> DLT autoloader or anything of that sort in the rack.
>
> the temporary backup solution was to do full dumps in cron
> every night and use scp to move them to the office server.
> this has clear scaling problems, but was an ok quick
> hack back in march.
>
> well, now it's may and the magnitude of the scaling problems
> are becoming very obvious. the scp is taking an absurd
> amount of time (the backup times themselves aren't all that
> bad, but the office is behind a business road runner connection
> and the scp is severely bandwidth limited.)

Have you looked into rsync?  I would think rsync could copy your
pg_dump very efficiently, since it should be able to skip over
parts that haven't changed since the previous run.  Make sure _not_
to compress the dump if you use rsync, to allow it to take
the most advantage of unchanged data.

> so i'm outlining a longer term solution, and would be
> interested in suggestions/comments. part 1 is to reduce the
> frequency of the full dumps, and start using a WAL based
> incremental solution such as is outlined here:
>
>   http://www.postgresql.org/docs/8.1/static/backup-online.html
>
> part 2 is going to be to set up amanda to roll this stuff to
> a DLT drive in the office. i figure that i probably want to do
> full backups of the incremental WAL files each time so i'm not
> rummaging around the tape library trying to find all of them
> should the worst case happen.
>
> but what are the consequences of backing up a WAL file
> if the archive process (probably scp in this case) is running
> when the backup copy is made? the whole thing won't make it onto
> tape, are there any downsides to running a recover with
> an incomplete WAL file?

Make sure you do a "SELECT pg_start_backup()" before doing the copy
(don't forget the "SELECT pg_stop_backup()" when you're done)  That
tells PG that data after that point might change, and gives it
the information it needs to ensure it can use the WAL files no
matter what happens during.

As an aside, you can only fit so many gallons into a 10 gallon
container.  You might simply have to accept that your requirements
now exceed the capacity of the RR connection and upgrade.

--
Bill Moran
http://www.potentialtech.com

Re: backup strategies

From
"Richard P. Welty"
Date:
Bill Moran wrote:
> As an aside, you can only fit so many gallons into a 10 gallon
> container.  You might simply have to accept that your requirements
> now exceed the capacity of the RR connection and upgrade.
>
actually, what it will come down to is the cost of an upgraded
connection vs $60/month
rent for 3Us of rack space to place a DLT autoloader in the colocation
facility.

richard


Re: backup strategies

From
Francisco Reyes
Date:
Richard P. Welty writes:

> actually, what it will come down to is the cost of an upgraded
> connection vs $60/month
> rent for 3Us of rack space to place a DLT autoloader in the colocation
> facility.

How much data are you looking to backup?
There are companies that do rsync services.
Just saw one last night for something like $20 for 30GB.
It would go much faster than going to a DSL connection.

Depending on what OS you are using you may have programs that allow you to
connect to Amazon S3 service. That is fairly affordable.
http://amazon.com/S3-AWS-home-page-Money/b/ref=sc_fe_l_2/105-7566505-6512415
?ie=UTF8&node=16427261&no=3435361&me=A36L942TSJ2AJA

http://tinyurl.com/3bkzdy



Re: backup strategies

From
"Richard P. Welty"
Date:
Francisco Reyes wrote:
> Richard P. Welty writes:
>
>> actually, what it will come down to is the cost of an upgraded
>> connection vs $60/month
>> rent for 3Us of rack space to place a DLT autoloader in the
>> colocation facility.
>
> How much data are you looking to backup?
> There are companies that do rsync services.
> Just saw one last night for something like $20 for 30GB.
> It would go much faster than going to a DSL connection.
a couple of gig, not really all that much. the problem is that there is
an expectation of one or more persons/organizations going through
due diligence on the operation, and i'm not sure that a fuzzy
"somewhere online" file storage service will pass the smell test for
many of them, where as physical tape cartridges stored offsite will
likely make them happy.

richard


Re: backup strategies

From
Francisco Reyes
Date:
Richard P. Welty writes:

> a couple of gig, not really all that much. the problem is that there is
> an expectation of one or more persons/organizations going through
> due diligence on the operation, and i'm not sure that a fuzzy
> "somewhere online" file storage service will pass the smell test for
> many of them, where as physical tape cartridges stored offsite will
> likely make them happy.

I think you should worry much more about getting the procedure done right
over making somebody happy.

Some of the problems with tape systsems, in my humble opinion,
are:

1- More often than not there isn't a second tape unit somewhere to use
in case the physical location where the tape unit is, becomes unavailable.
Having tapes offsite is useless if you don't have a tape unit handy to put
the tapes. Also not only you need a tape unit, but you also need whatever
program was used to do the backups to tape.

2- If and when the tape unit dies you need to have a backup scheme until you
get the unit repaired.

3- Restore tests are usually not done enough to make sure the process is
actually working. You would be surprised how often people have a system they
believe works.. to only find out at restore time that it had been failing
for a long time.

I suggest you look into a multi-stage approach. One form of backup tape and
a second approach such as a second machine where you usually do restores.
Amazon's S3 can also be a good second location.

Just today I was taking a glance at python code to use S3 and looked pretty
simple. I would, however, encode the data before sending it to S3.

Best of luck in whatever method you choose.

Re: backup strategies

From
Glen Parker
Date:
Richard P. Welty wrote:
> but what are the consequences of backing up a WAL file
> if the archive process (probably scp in this case) is running
> when the backup copy is made? the whole thing won't make it onto
> tape, are there any downsides to running a recover with
> an incomplete WAL file?

The WAL file will obviously be no good.  Just do something like:

cp $WALFILEPATH $ARCHIVE/$WALFILENAME.NEW
mv $ARCHIVE/$WALFILENAME.NEW $ARCHIVE/$WALFILENAME


Then simply exclude "*.NEW" from your backup.


-Glen