Re: Feedback about hybrid SAN snap and rsync'd approach for large systemcloning - Mailing list pgsql-general

From Stephen Frost
Subject Re: Feedback about hybrid SAN snap and rsync'd approach for large systemcloning
Date
Msg-id 20220418183921.GM10577@tamriel.snowman.net
Whole thread Raw
In response to Feedback about hybrid SAN snap and rsync'd approach for large systemcloning  (Jerry Sievers <gsievers19@comcast.net>)
Responses Re: Feedback about hybrid SAN snap and rsync'd approach for large systemcloning  (Jerry Sievers <gsievers19@comcast.net>)
List pgsql-general
Greetings,

* Jerry Sievers (gsievers19@comcast.net) wrote:
> Suppose we have a DB cluster with an additional tablespace and we are
> able to make an atomic SAN snapshot of *only* the main cluster
> volume...
>
> The additional tablespace contains only UNLOGGED relations.
>
> We cannot snap that volume so we use rsync as follows...
>
> 1. pg_start_backup('foo');
> make SAN snapshot
> rsync the add'l tablespace
> pg_stop_backup()

Surely you're also doing WAL archiving?  You shouldn't ever be using
start/stop backup without also doing WAL archiving as you must capture
all of the WAL between the two.

> Now provision a new cluster around the snapshot and rsync'd volume,
> rejigger the pg_tblspc link if necessary... and start it up maybe or
> not having it remain as a streaming replica.

Dedicated backup tools know how to do tablespace remapping.

> It's been my experience that possibly bulky data in the additional
> tablespace does *not* need be rsync'd if we capture only the *_init
> files.

That isn't a trivial thing to do with rsync tho and "try it and see if
it works" is a really bad way to go about this- it's important to
understand what the files are in the data directory, how PG uses them,
how PG recovery works, etc.  That's why there are folks who write
dedicated backup tools for PG (myself included).

> Id' be curious to here feedback re the sanity of this approach.

I wouldn't recommend it is the short answer.  Note that you must be
doing WAL archiving and you really should (and will have to, as of 15)
use non-exclusive backup when you're doing this and be sure to copy the
backup_label file from the pg_stop_backup/pg_backup_stop (in v15)
results and put that into the snapshot.  If you use exclusive mode
then if the system crashes for any reason during the rsync then you'll
have a good chance of ending up with a system that won't come back up
until you go muck around in the data directory.  It's not good and is
why it's now been removed.

Also, you'd have to craft a pretty ugly rsync to make it not copy data
from the unlogged tables, and when you end up with a logged table in
that tablespace (and you will, unless you're actively monitoring for it
and remove any that pop up...) you could end up with data corruption.

Don't think that you'll be able to use delta rsyncs with this either, as
you may end up with files being changed without rsync realizing it and
again end up with corruption.

> And would also like to know if perhaps *only* the directories under
> the rsync'd tablespace actually must be present for a successful
> recovery.
>
> The above approach has worked mumerous times even with origin systems
> having large, churny contents in the dedicated unlogged tablespace
> (which is on a much faster local NVME volume than the main SAN
> volume.)

Dedicated backup tools already know how to recognize unlogged tables and
skip them, along with being able to do incremental backup and delta
restores, I'd strongly suggest you consider using such tools instead of
trying to hack your own.

Thanks,

Stephen

Attachment

pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Facing issues with pgsql upgrade.
Next
From: Alban Hertroys
Date:
Subject: Re: Puzzline CROSS JOIN when doing RECURSIVE CTE