Re: Feedback about hybrid SAN snap and rsync'd approach for large systemcloning - Mailing list pgsql-general

From Jerry Sievers
Subject Re: Feedback about hybrid SAN snap and rsync'd approach for large systemcloning
Date
Msg-id A1B67D24-5721-4597-8F54-9217C77028A0@comcast.net
Whole thread Raw
In response to Re: Feedback about hybrid SAN snap and rsync'd approach for large systemcloning  (Stephen Frost <sfrost@snowman.net>)
List pgsql-general
Hi Stephen, and thanks!  Please see below...

> Greetings,

> * Jerry Sievers (gsievers19@comcast.net) wrote:
> Suppose we have a DB cluster with an additional tablespace and we are
> able to make an atomic SAN snapshot of *only* the main cluster
> volume...

> The additional tablespace contains only UNLOGGED relations.

> We cannot snap that volume so we use rsync as follows...

> 1. pg_start_backup('foo');
> make SAN snapshot
> rsync the add'l tablespace
> pg_stop_backup()

> Surely you're also doing WAL archiving?  You shouldn't ever be using
> start/stop backup without also doing WAL archiving as you must capture
> all of the WAL between the two.

Oh, we're doing archiving all right, via pg_receivewal.

And please stop calling me Shirley :-)

> Now provision a new cluster around the snapshot and rsync'd volume,
> rejigger the pg_tblspc link if necessary... and start it up maybe or
> not having it remain as a streaming replica.

> Dedicated backup tools know how to do tablespace remapping.

Yep, I'm aware.  The fixing of symlinks isn't much of a concern here,
generally.

> It's been my experience that possibly bulky data in the additional
> tablespace does *not* need be rsync'd if we capture only the *_init
> files.

> That isn't a trivial thing to do with rsync tho and "try it and see if
> it works" is a really bad way to go about this- it's important to
> understand what the files are in the data directory, how PG uses them,
> how PG recovery works, etc.  That's why there are folks who write
> dedicated backup tools for PG (myself included).

Understood.  It's a legacy configuration that we're working with
that's used both for making full backups as well as frequently
refreshed non-prod systems.  Some of them are large enough to make
compelling avoidance of any unnecessary data materialization.

> Id' be curious to here feedback re the sanity of this approach.

> I wouldn't recommend it is the short answer.  Note that you must be
> doing WAL archiving and you really should (and will have to, as of 15)

Hmmm, we're using non-exclusive just as of my most recent
updates... but I wasn't aware that exclusive mode was going away
entirely w/V15.  

> use non-exclusive backup when you're doing this and be sure to copy the
> backup_label file from the pg_stop_backup/pg_backup_stop (in v15)
> results and put that into the snapshot.  If you use exclusive mode
> then if the system crashes for any reason during the rsync then you'll
> have a good chance of ending up with a system that won't come back up
> until you go muck around in the data directory.  It's not good and is
> why it's now been removed.

Agree and I've certainly had a few cases of that in the past.  Agree
that it'll foil the less seasoned DBA.

Yep and we do stash the labelfile output from pg_stop_backup in the
snap after it's mounted on whatever box.

> Also, you'd have to craft a pretty ugly rsync to make it not copy data
> from the unlogged tables, and when you end up with a logged table in
> that tablespace (and you will, unless you're actively monitoring for it
> and remove any that pop up...) you could end up with data corruption.

We are... but indeed the rsync include/exclude rules were fussy to
develop and IMO non-trivial to really verify, certainly not ideal.

For that matter, the event trigger that disallows creating logged
tables I'm not certain is foolproof.

> Don't think that you'll be able to use delta rsyncs with this either, as
> you may end up with files being changed without rsync realizing it and
> again end up with corruption.

We don't do that anyhow.

The snaps I'm talking about here are instantiated only once, then
written to tape or brought up as non-prod systems... then torn all the
way down.

> And would also like to know if perhaps *only* the directories under
> the rsync'd tablespace actually must be present for a successful
> recovery.

Forget I ever said the above which I'm pretty sure isn't workable and
wouldn't buy much anyhow.

> The above approach has worked numerous times even with origin
> systems having large, churny contents in the dedicated unlogged
> tablespace (which is on a much faster local NVME volume than the
> main SAN volume.)

> Dedicated backup tools already know how to recognize unlogged tables and
> skip them, along with being able to do incremental backup and delta
> restores, I'd strongly suggest you consider using such tools instead of
> trying to hack your own.

Point very well taken and it's about time my site had another look to
see if such a solution is a good fit and /or to reverify that our
backups are consistently recoverable.

Thx again for weighing in.

> Thanks,

> Stephen



pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: No psql md5 auth, psql 14.1 to PG 11
Next
From: Ram Pratap Maurya
Date:
Subject: RE: Huge archive log generate in Postgresql-13