Re: Basic question on recovery and disk snapshotting - Mailing list pgsql-general

From Jeff Janes
Subject Re: Basic question on recovery and disk snapshotting
Date
Msg-id CAMkU=1wLLj728cAAJ7UV40L0sVMGwkBuYUz0dTfKuWHt-5QWPA@mail.gmail.com
Whole thread Raw
In response to Re: Basic question on recovery and disk snapshotting  (Yang Zhang <yanghatespam@gmail.com>)
Responses Re: Basic question on recovery and disk snapshotting  (Yang Zhang <yanghatespam@gmail.com>)
List pgsql-general
On Saturday, April 27, 2013, Yang Zhang wrote:
On Sat, Apr 27, 2013 at 11:55 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Sat, Apr 27, 2013 at 10:40 AM, Yang Zhang <yanghatespam@gmail.com> wrote:
>> My question really boils down to: if we're interested in using COW
>> snapshotting (a common feature of modern filesystems and hosting
>> environments), would we necessarily need to ensure the data and
>> pg_xlog are on the same snapshotted volume?
>
>
> That would certainly make it easier.  But it shouldn't be necessary, as long
> as the xlog snapshot is taken after the cluster snapshot, and also as long
> as no xlog files which were written to after the last completed checkpoint
> prior to the cluster snapshot got recycled before the xlog snapshot.   As
> long as the snapshots run quickly and promptly one after the other, this
> should not be a problem, but you should certainly validate that a snapshot
> collection has all the xlogs it needs before accepting it as being good.  If
> you find some necessary xlog files are missing, you can turn up
> wal_keep_segments and try again.

This information is gold, thank you.

How do I validate that a snapshot collection has all the xlogs it needs?


I've always validated my backups by practicing restoring them.  It seems like the most rigorous way, and I figure I need the practice.  If that isn't feasible, the backup_label file created by pg_start_backup() will tell you by name which xlog is the first one you need.  If not, you can use pg_controldata to figure that out based on "Latest checkpoint's REDO location", keeping in mind that the part after the / is not zero padded on the left, so if it "short" you have to take that into account. I have no actual experience in doing this in practice.

 
>
>
> Your goal seems to be to *avoid* continuous archiving, so I wouldn't expect
> that part of the docs to touch on your issues.   But see the section
> "Standalone Hot Backups" which would allow you to use snapshots for the
> cluster "copy" part, and normal archiving for just the xlogs.  The volume of
> pg_xlog should be fairly small, so this seems to me like an attractive
> option.

Just to validate my understanding, are the two options as follows?

a. Checkpoint (optional but helps with time window?), snapshot
tablespaces/cluster/xlog, validate all necessary xlogs present.

Yes.  And doing the checkpoint immediately before has two good effects, it makes the recovery faster, and it maximizes the time until wal logs you need will start to be recycled.
 

b. Set wal_level/archive_mode/archive_command, pg_start_backup,
snapshot tablespaces/cluster, pg_stop_backup to archive xlog.

(a) sounds more appealing since it's treating recovery as crash
recovery rather than backup restore, and as such seems simpler and
lower-overhead (e.g. WAL verbosity, though I don't know how much that
overhead is).  

That brings up another point to consider.  If wal level is minimal, then tables which you bulk load in the same transaction as you created them or truncated them will not get any WAL records written.  (That is the main reason the WAL verbosity is reduced).  But that also means that if any of those operations is happening while you are taking your snapshot, those operations will be corrupted.  If the data and xlogs were part of the same atomic snapshot, this would not be a problem, as either the operation completed, or it never happened.  But with different snapshots, the data can get partially but not completely into the data-snapshot, but then the xlog record which says the data was completely written does gets into the xlog snapshot

If the system is very busy with different people doing different operations at the same time, it could be hard to coordinate a time to take the backup.

 
However, I'm not sure how complex that validation step
is.


The more I think about it, the more I doubt it is worth it.  It would be very easy to come up with a method that sort of works, some of the time, and then blows up spectacularly for no apparent reason.  Using the archiving, at least over the period during which the backup is being taken, is the truly supported way of doing it.  You can still use a snapshot of the data directory, but you are really just treating it as a fast copy operation.  That it is atomic is not important to the process.
 
>
> If you really don't want to use archiving, even just during the duration of
> the cluster snapshotting, then this is the part that addresses your
> questions:
>
> http://www.postgresql.org/docs/9.1/static/backup-file.html

I'm still interested in online backups, though - stopping the DB is a
no-go unfortunately.

I wonder if initiating a EBS snapshot of large volume isn't going to freeze the database up for a while anyway, just by starving it of IO.  It would be interesting to hear back on your experiences.

Cheers,

Jeff
 

pgsql-general by date:

Previous
From: Karthik GP
Date:
Subject: MKDIR_P@: Command not found error in regression test
Next
From: Tom Lane
Date:
Subject: Re: MKDIR_P@: Command not found error in regression test