Thread: backup strategies
so the outfit i'm currently working for on a quasi-full time basis has what amounts to an OLTP database server in colocation. the footprint in the rack is very small, that is, there's no DLT autoloader or anything of that sort in the rack. the temporary backup solution was to do full dumps in cron every night and use scp to move them to the office server. this has clear scaling problems, but was an ok quick hack back in march. well, now it's may and the magnitude of the scaling problems are becoming very obvious. the scp is taking an absurd amount of time (the backup times themselves aren't all that bad, but the office is behind a business road runner connection and the scp is severely bandwidth limited.) so i'm outlining a longer term solution, and would be interested in suggestions/comments. part 1 is to reduce the frequency of the full dumps, and start using a WAL based incremental solution such as is outlined here: http://www.postgresql.org/docs/8.1/static/backup-online.html part 2 is going to be to set up amanda to roll this stuff to a DLT drive in the office. i figure that i probably want to do full backups of the incremental WAL files each time so i'm not rummaging around the tape library trying to find all of them should the worst case happen. but what are the consequences of backing up a WAL file if the archive process (probably scp in this case) is running when the backup copy is made? the whole thing won't make it onto tape, are there any downsides to running a recover with an incomplete WAL file? and if there are better solutions, i'm interested in hearing about them. thanks, richard
"Richard P. Welty" <rwelty@averillpark.net> wrote: > > so the outfit i'm currently working for on a quasi-full time > basis has what amounts to an OLTP database server in colocation. > the footprint in the rack is very small, that is, there's no > DLT autoloader or anything of that sort in the rack. > > the temporary backup solution was to do full dumps in cron > every night and use scp to move them to the office server. > this has clear scaling problems, but was an ok quick > hack back in march. > > well, now it's may and the magnitude of the scaling problems > are becoming very obvious. the scp is taking an absurd > amount of time (the backup times themselves aren't all that > bad, but the office is behind a business road runner connection > and the scp is severely bandwidth limited.) Have you looked into rsync? I would think rsync could copy your pg_dump very efficiently, since it should be able to skip over parts that haven't changed since the previous run. Make sure _not_ to compress the dump if you use rsync, to allow it to take the most advantage of unchanged data. > so i'm outlining a longer term solution, and would be > interested in suggestions/comments. part 1 is to reduce the > frequency of the full dumps, and start using a WAL based > incremental solution such as is outlined here: > > http://www.postgresql.org/docs/8.1/static/backup-online.html > > part 2 is going to be to set up amanda to roll this stuff to > a DLT drive in the office. i figure that i probably want to do > full backups of the incremental WAL files each time so i'm not > rummaging around the tape library trying to find all of them > should the worst case happen. > > but what are the consequences of backing up a WAL file > if the archive process (probably scp in this case) is running > when the backup copy is made? the whole thing won't make it onto > tape, are there any downsides to running a recover with > an incomplete WAL file? Make sure you do a "SELECT pg_start_backup()" before doing the copy (don't forget the "SELECT pg_stop_backup()" when you're done) That tells PG that data after that point might change, and gives it the information it needs to ensure it can use the WAL files no matter what happens during. As an aside, you can only fit so many gallons into a 10 gallon container. You might simply have to accept that your requirements now exceed the capacity of the RR connection and upgrade. -- Bill Moran http://www.potentialtech.com
Bill Moran wrote: > As an aside, you can only fit so many gallons into a 10 gallon > container. You might simply have to accept that your requirements > now exceed the capacity of the RR connection and upgrade. > actually, what it will come down to is the cost of an upgraded connection vs $60/month rent for 3Us of rack space to place a DLT autoloader in the colocation facility. richard
Richard P. Welty writes: > actually, what it will come down to is the cost of an upgraded > connection vs $60/month > rent for 3Us of rack space to place a DLT autoloader in the colocation > facility. How much data are you looking to backup? There are companies that do rsync services. Just saw one last night for something like $20 for 30GB. It would go much faster than going to a DSL connection. Depending on what OS you are using you may have programs that allow you to connect to Amazon S3 service. That is fairly affordable. http://amazon.com/S3-AWS-home-page-Money/b/ref=sc_fe_l_2/105-7566505-6512415 ?ie=UTF8&node=16427261&no=3435361&me=A36L942TSJ2AJA http://tinyurl.com/3bkzdy
Francisco Reyes wrote: > Richard P. Welty writes: > >> actually, what it will come down to is the cost of an upgraded >> connection vs $60/month >> rent for 3Us of rack space to place a DLT autoloader in the >> colocation facility. > > How much data are you looking to backup? > There are companies that do rsync services. > Just saw one last night for something like $20 for 30GB. > It would go much faster than going to a DSL connection. a couple of gig, not really all that much. the problem is that there is an expectation of one or more persons/organizations going through due diligence on the operation, and i'm not sure that a fuzzy "somewhere online" file storage service will pass the smell test for many of them, where as physical tape cartridges stored offsite will likely make them happy. richard
Richard P. Welty writes: > a couple of gig, not really all that much. the problem is that there is > an expectation of one or more persons/organizations going through > due diligence on the operation, and i'm not sure that a fuzzy > "somewhere online" file storage service will pass the smell test for > many of them, where as physical tape cartridges stored offsite will > likely make them happy. I think you should worry much more about getting the procedure done right over making somebody happy. Some of the problems with tape systsems, in my humble opinion, are: 1- More often than not there isn't a second tape unit somewhere to use in case the physical location where the tape unit is, becomes unavailable. Having tapes offsite is useless if you don't have a tape unit handy to put the tapes. Also not only you need a tape unit, but you also need whatever program was used to do the backups to tape. 2- If and when the tape unit dies you need to have a backup scheme until you get the unit repaired. 3- Restore tests are usually not done enough to make sure the process is actually working. You would be surprised how often people have a system they believe works.. to only find out at restore time that it had been failing for a long time. I suggest you look into a multi-stage approach. One form of backup tape and a second approach such as a second machine where you usually do restores. Amazon's S3 can also be a good second location. Just today I was taking a glance at python code to use S3 and looked pretty simple. I would, however, encode the data before sending it to S3. Best of luck in whatever method you choose.
Richard P. Welty wrote: > but what are the consequences of backing up a WAL file > if the archive process (probably scp in this case) is running > when the backup copy is made? the whole thing won't make it onto > tape, are there any downsides to running a recover with > an incomplete WAL file? The WAL file will obviously be no good. Just do something like: cp $WALFILEPATH $ARCHIVE/$WALFILENAME.NEW mv $ARCHIVE/$WALFILENAME.NEW $ARCHIVE/$WALFILENAME Then simply exclude "*.NEW" from your backup. -Glen