Thread: Feedback about hybrid SAN snap and rsync'd approach for large systemcloning

Feedback about hybrid SAN snap and rsync'd approach for large systemcloning

From

Jerry Sievers

Date:

12 April 2022, 23:07:02

Suppose we have a DB cluster with an additional tablespace and we are
able to make an atomic SAN snapshot of *only* the main cluster
volume...

The additional tablespace contains only UNLOGGED relations.

We cannot snap that volume so we use rsync as follows...

1. pg_start_backup('foo');
make SAN snapshot
rsync the add'l tablespace
pg_stop_backup()

Now provision a new cluster around the snapshot and rsync'd volume,
rejigger the pg_tblspc link if necessary... and start it up maybe or
not having it remain as a streaming replica.

It's been my experience that possibly bulky data in the additional
tablespace does *not* need be rsync'd if we capture only the *_init
files.

Id' be curious to here feedback re the sanity of this approach.

And would also like to know if perhaps *only* the directories under
the rsync'd tablespace actually must be present for a successful
recovery.

The above approach has worked mumerous times even with origin systems
having large, churny contents in the dedicated unlogged tablespace
(which is on a much faster local NVME volume than the main SAN
volume.)

Thanks!

Re: Feedback about hybrid SAN snap and rsync'd approach for large systemcloning

From

Laurenz Albe

Date:

13 April 2022, 08:39:01

On Tue, 2022-04-12 at 18:07 -0500, Jerry Sievers wrote:
> Suppose we have a DB cluster with an additional tablespace and we are
> able to make an atomic SAN snapshot of *only* the main cluster
> volume...
> 
> The additional tablespace contains only UNLOGGED relations.
> 
> We cannot snap that volume so we use rsync as follows...
> 
> 1. pg_start_backup('foo');
> make SAN snapshot
> rsync the add'l tablespace
> pg_stop_backup()
> 
> Now provision a new cluster around the snapshot and rsync'd volume,
> rejigger the pg_tblspc link if necessary... and start it up maybe or
> not having it remain as a streaming replica.
> 
> It's been my experience that possibly bulky data in the additional
> tablespace does *not* need be rsync'd if we capture only the *_init
> files.
> 
> Id' be curious to here feedback re the sanity of this approach.

I thought that an unlogged table is rendered empty by recovery,
but perhaps I misunderstood what you are doing.

Yours,
Laurenz Albe

Re: Feedback about hybrid SAN snap and rsync'd approach for large systemcloning

From

Stephen Frost

Date:

18 April 2022, 18:39:21

Greetings,

* Jerry Sievers (gsievers19@comcast.net) wrote:
> Suppose we have a DB cluster with an additional tablespace and we are
> able to make an atomic SAN snapshot of *only* the main cluster
> volume...
>
> The additional tablespace contains only UNLOGGED relations.
>
> We cannot snap that volume so we use rsync as follows...
>
> 1. pg_start_backup('foo');
> make SAN snapshot
> rsync the add'l tablespace
> pg_stop_backup()

Surely you're also doing WAL archiving?  You shouldn't ever be using
start/stop backup without also doing WAL archiving as you must capture
all of the WAL between the two.

> Now provision a new cluster around the snapshot and rsync'd volume,
> rejigger the pg_tblspc link if necessary... and start it up maybe or
> not having it remain as a streaming replica.

Dedicated backup tools know how to do tablespace remapping.

> It's been my experience that possibly bulky data in the additional
> tablespace does *not* need be rsync'd if we capture only the *_init
> files.

That isn't a trivial thing to do with rsync tho and "try it and see if
it works" is a really bad way to go about this- it's important to
understand what the files are in the data directory, how PG uses them,
how PG recovery works, etc.  That's why there are folks who write
dedicated backup tools for PG (myself included).

> Id' be curious to here feedback re the sanity of this approach.

I wouldn't recommend it is the short answer.  Note that you must be
doing WAL archiving and you really should (and will have to, as of 15)
use non-exclusive backup when you're doing this and be sure to copy the
backup_label file from the pg_stop_backup/pg_backup_stop (in v15)
results and put that into the snapshot.  If you use exclusive mode
then if the system crashes for any reason during the rsync then you'll
have a good chance of ending up with a system that won't come back up
until you go muck around in the data directory.  It's not good and is
why it's now been removed.

Also, you'd have to craft a pretty ugly rsync to make it not copy data
from the unlogged tables, and when you end up with a logged table in
that tablespace (and you will, unless you're actively monitoring for it
and remove any that pop up...) you could end up with data corruption.

Don't think that you'll be able to use delta rsyncs with this either, as
you may end up with files being changed without rsync realizing it and
again end up with corruption.

> And would also like to know if perhaps *only* the directories under
> the rsync'd tablespace actually must be present for a successful
> recovery.
>
> The above approach has worked mumerous times even with origin systems
> having large, churny contents in the dedicated unlogged tablespace
> (which is on a much faster local NVME volume than the main SAN
> volume.)

Dedicated backup tools already know how to recognize unlogged tables and
skip them, along with being able to do incremental backup and delta
restores, I'd strongly suggest you consider using such tools instead of
trying to hack your own.

Thanks,

Stephen

Attachment

signature.asc

Re: Feedback about hybrid SAN snap and rsync'd approach for large systemcloning

From

Jerry Sievers

Date:

19 April 2022, 02:35:00

Hi Stephen, and thanks!  Please see below...

> Greetings,

> * Jerry Sievers (gsievers19@comcast.net) wrote:
> Suppose we have a DB cluster with an additional tablespace and we are
> able to make an atomic SAN snapshot of *only* the main cluster
> volume...

> The additional tablespace contains only UNLOGGED relations.

> We cannot snap that volume so we use rsync as follows...

> 1. pg_start_backup('foo');
> make SAN snapshot
> rsync the add'l tablespace
> pg_stop_backup()

> Surely you're also doing WAL archiving?  You shouldn't ever be using
> start/stop backup without also doing WAL archiving as you must capture
> all of the WAL between the two.

Oh, we're doing archiving all right, via pg_receivewal.

And please stop calling me Shirley :-)

> Now provision a new cluster around the snapshot and rsync'd volume,
> rejigger the pg_tblspc link if necessary... and start it up maybe or
> not having it remain as a streaming replica.

> Dedicated backup tools know how to do tablespace remapping.

Yep, I'm aware.  The fixing of symlinks isn't much of a concern here,
generally.

> It's been my experience that possibly bulky data in the additional
> tablespace does *not* need be rsync'd if we capture only the *_init
> files.

> That isn't a trivial thing to do with rsync tho and "try it and see if
> it works" is a really bad way to go about this- it's important to
> understand what the files are in the data directory, how PG uses them,
> how PG recovery works, etc.  That's why there are folks who write
> dedicated backup tools for PG (myself included).

Understood.  It's a legacy configuration that we're working with
that's used both for making full backups as well as frequently
refreshed non-prod systems.  Some of them are large enough to make
compelling avoidance of any unnecessary data materialization.

> Id' be curious to here feedback re the sanity of this approach.

> I wouldn't recommend it is the short answer.  Note that you must be
> doing WAL archiving and you really should (and will have to, as of 15)

Hmmm, we're using non-exclusive just as of my most recent
updates... but I wasn't aware that exclusive mode was going away
entirely w/V15.  

> use non-exclusive backup when you're doing this and be sure to copy the
> backup_label file from the pg_stop_backup/pg_backup_stop (in v15)
> results and put that into the snapshot.  If you use exclusive mode
> then if the system crashes for any reason during the rsync then you'll
> have a good chance of ending up with a system that won't come back up
> until you go muck around in the data directory.  It's not good and is
> why it's now been removed.

Agree and I've certainly had a few cases of that in the past.  Agree
that it'll foil the less seasoned DBA.

Yep and we do stash the labelfile output from pg_stop_backup in the
snap after it's mounted on whatever box.

> Also, you'd have to craft a pretty ugly rsync to make it not copy data
> from the unlogged tables, and when you end up with a logged table in
> that tablespace (and you will, unless you're actively monitoring for it
> and remove any that pop up...) you could end up with data corruption.

We are... but indeed the rsync include/exclude rules were fussy to
develop and IMO non-trivial to really verify, certainly not ideal.

For that matter, the event trigger that disallows creating logged
tables I'm not certain is foolproof.

> Don't think that you'll be able to use delta rsyncs with this either, as
> you may end up with files being changed without rsync realizing it and
> again end up with corruption.

We don't do that anyhow.

The snaps I'm talking about here are instantiated only once, then
written to tape or brought up as non-prod systems... then torn all the
way down.

> And would also like to know if perhaps *only* the directories under
> the rsync'd tablespace actually must be present for a successful
> recovery.

Forget I ever said the above which I'm pretty sure isn't workable and
wouldn't buy much anyhow.

> The above approach has worked numerous times even with origin
> systems having large, churny contents in the dedicated unlogged
> tablespace (which is on a much faster local NVME volume than the
> main SAN volume.)

> Dedicated backup tools already know how to recognize unlogged tables and
> skip them, along with being able to do incremental backup and delta
> restores, I'd strongly suggest you consider using such tools instead of
> trying to hack your own.

Point very well taken and it's about time my site had another look to
see if such a solution is a good fit and /or to reverify that our
backups are consistently recoverable.

Thx again for weighing in.

> Thanks,

> Stephen