Re: Basic question on recovery and disk snapshotting - Mailing list pgsql-general

From Yang Zhang
Subject Re: Basic question on recovery and disk snapshotting
Date
Msg-id CAKxBDU92PF3MD8CTJE=GwXmpMEbBVpSbML48iTHz_6s7kF5_XA@mail.gmail.com
Whole thread Raw
In response to Re: Basic question on recovery and disk snapshotting  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: Basic question on recovery and disk snapshotting
List pgsql-general
On Sat, Apr 27, 2013 at 11:55 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Sat, Apr 27, 2013 at 10:40 AM, Yang Zhang <yanghatespam@gmail.com> wrote:
>> My question really boils down to: if we're interested in using COW
>> snapshotting (a common feature of modern filesystems and hosting
>> environments), would we necessarily need to ensure the data and
>> pg_xlog are on the same snapshotted volume?
>
>
> That would certainly make it easier.  But it shouldn't be necessary, as long
> as the xlog snapshot is taken after the cluster snapshot, and also as long
> as no xlog files which were written to after the last completed checkpoint
> prior to the cluster snapshot got recycled before the xlog snapshot.   As
> long as the snapshots run quickly and promptly one after the other, this
> should not be a problem, but you should certainly validate that a snapshot
> collection has all the xlogs it needs before accepting it as being good.  If
> you find some necessary xlog files are missing, you can turn up
> wal_keep_segments and try again.

This information is gold, thank you.

How do I validate that a snapshot collection has all the xlogs it needs?

>
>
>>
>>  If not, how should we be
>> taking the snapshots - should we be using pg_start_backup() and then
>> taking the snapshot of one before the other?  (What order?)  What if
>> we have tablespaces, do we take snapshots of those, followed by the
>> cluster directory, followed by pg_xlog?
>
>
> First the cluster directory (where "pg_control" is), then tablespaces, then
> pg_xlog.  pg_start_backup() shouldn't be necessary, unless you are running
> with full_page_writes off.  But it won't hurt, and if you don't use
> pg_start_backup you should probably run a checkpoint of your own immediately
> before starting.
>
>>
>> I read through
>> http://www.postgresql.org/docs/9.1/static/continuous-archiving.html
>> and it doesn't touch on these questions.
>
>
> Your goal seems to be to *avoid* continuous archiving, so I wouldn't expect
> that part of the docs to touch on your issues.   But see the section
> "Standalone Hot Backups" which would allow you to use snapshots for the
> cluster "copy" part, and normal archiving for just the xlogs.  The volume of
> pg_xlog should be fairly small, so this seems to me like an attractive
> option.

Just to validate my understanding, are the two options as follows?

a. Checkpoint (optional but helps with time window?), snapshot
tablespaces/cluster/xlog, validate all necessary xlogs present.

b. Set wal_level/archive_mode/archive_command, pg_start_backup,
snapshot tablespaces/cluster, pg_stop_backup to archive xlog.

(a) sounds more appealing since it's treating recovery as crash
recovery rather than backup restore, and as such seems simpler and
lower-overhead (e.g. WAL verbosity, though I don't know how much that
overhead is).  However, I'm not sure how complex that validation step
is.

>
> If you really don't want to use archiving, even just during the duration of
> the cluster snapshotting, then this is the part that addresses your
> questions:
>
> http://www.postgresql.org/docs/9.1/static/backup-file.html

I'm still interested in online backups, though - stopping the DB is a
no-go unfortunately.

--
Yang Zhang
http://yz.mit.edu/


pgsql-general by date:

Previous
From: Jeff Janes
Date:
Subject: Re: Basic question on recovery and disk snapshotting
Next
From: Alexander Reichstadt
Date:
Subject: DISTINCT ON changes sort order on its own it seems