Thread: postgres large database backup

postgres large database backup

From
Atul Kumar
Date:
Hi, 

I have a 10TB database running on postgres 11 version running on centos 7 "on premises", I need to schedule the backup of this database in a faster way.

The scheduled backup will be used for PITR purposes.

So please let me know how I should do it in a quicker backup for my 10TB database ? Is there any tool to take backups and subsequently incremental backups in a faster way and restore it for PITR in a faster way when required.

What should be the exact approach for scheduling such backups so that it can be restored in a faster way ?



Regards.

Re: postgres large database backup

From
Michael Loftis
Date:
On Wed, Nov 30, 2022 at 8:40 AM Atul Kumar <akumar14871@gmail.com> wrote:
>
> Hi,
>
> I have a 10TB database running on postgres 11 version running on centos 7 "on premises", I need to schedule the
backupof this database in a faster way.
 
>
> The scheduled backup will be used for PITR purposes.
>
> So please let me know how I should do it in a quicker backup for my 10TB database ? Is there any tool to take backups
andsubsequently incremental backups in a faster way and restore it for PITR in a faster way when required.
 
>
> What should be the exact approach for scheduling such backups so that it can be restored in a faster way ?

Faster than *what*?

If speed is the primary criteria, filesystem snapshots by using
pg_start_backup() to tell the DB cluster to be in a binary ready
backup mode, snapshot, then pg_stop_backup(), capture the WALs
generated alongside your FS snapshot, all on the same machine or
shared storage would be the fastest to restore.   To restore, bring
back the old snapshot+ the WALs captured with the DB shutdown/stopped,
startup is normal "crash recovery" or you can select PITR/LSN in the
short pg_start_backup() ... pg_stop_backup() window.  If you're
properly archiving WALs outside of JUST the full backup you can PITR
to any point after the full backup snapshot, but the more
transactions/WAL it has to process to get to the desired point the
longer the recovery.

pgbackrest can backup a PG cluster in multiple ways (including taking
a base backup while/and actively streaming WALs or being the WAL
archiver), and a restore on the same machine as the backup repository
would be basically limited by I/O (well, unless you've got all NVMe,
then CPU, bus, or memory bandwidth constraints become the limiting
factor).

Basically no matter how you backup, 10TB takes a long time to copy,
and except in the "local FS snapshot" method I outlined above, that's
going to be your limiting factor, is how fast you can move the data
back to where you need it.

For critical DBs of this nature I've actually done almost exactly the
method I just outlined, only the backup/snapshot process happens on a
replica.  *NORMAL* failure recovery in that replicated cluster is by
failovers, but, for actual backup restore due to disaster or need to
go back in time (which is...extremely rare...) there's some manual
intervention to bring up a snapshot and play back WALs to the point in
time that we want the DB cluster.

>
>
>
> Regards.



-- 

"Genius might be described as a supreme capacity for getting its possessors
into trouble of all kinds."
-- Samuel Butler



Re: postgres large database backup

From
Christophe Pettus
Date:

> On Nov 30, 2022, at 07:40, Atul Kumar <akumar14871@gmail.com> wrote:
> So please let me know how I should do it in a quicker backup for my 10TB database ? Is there any tool to take backups
andsubsequently incremental backups in a faster way and restore it for PITR in a faster way when required. 

For an initial backup, you'll need to copy 10TB to another medium; there's no getting around that.  pgBackRest supports
incrementaland differential backups, which are especially useful if there are tables that are largely for archival
purposesand do not get frequently modified: 

    https://pgbackrest.org


Re: postgres large database backup

From
Vijaykumar Jain
Date:

On Wed, Nov 30, 2022, 9:10 PM Atul Kumar <akumar14871@gmail.com> wrote:
Hi, 

I have a 10TB database running on postgres 11 version running on centos 7 "on premises", I need to schedule the backup of this database in a faster way.

The scheduled backup will be used for PITR purposes.

So please let me know how I should do it in a quicker backup for my 10TB database ? Is there any tool to take backups and subsequently incremental backups in a faster way and restore it for PITR in a faster way when required.

What should be the exact approach for scheduling such backups so that it can be restored in a faster way ?


We had a 96 cpu, 385gb ram, nvme storage and 10g network baremetal server.
We used pgbackrest for full backup.
It supports pitr and differential backup.

28tb db took 2.5 hours for backup on remote storage, and restore from the remote storage  took 3.5 hours when immediately restored (a lot of time is later due to wal replay to catch up) 

pg_basebackup took 9 hours.

Re: postgres large database backup

From
Mladen Gogala
Date:
On 11/30/22 10:40, Atul Kumar wrote:
Hi, 

I have a 10TB database running on postgres 11 version running on centos 7 "on premises", I need to schedule the backup of this database in a faster way.

The scheduled backup will be used for PITR purposes.

So please let me know how I should do it in a quicker backup for my 10TB database ? Is there any tool to take backups and subsequently incremental backups in a faster way and restore it for PITR in a faster way when required.

What should be the exact approach for scheduling such backups so that it can be restored in a faster way ?

Well, that depends on your configuration details. Is your DB located on a SAN device or JBOD? If it's SAN, than it can do snapshots. All enterprise backup utilities like NetBackup, TSM, Avamar or Commvault support snapshots. They all can do full and incremental backups and they can also do PITR. Your database is an enterprise sized database and needs an enterprise level tool. If you want a freebie, pgbackrest is the best thing no money can buy. Pgbackrest supports parallelism, full and incremental backups.


-- 
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com

Re: postgres large database backup

From
Ron
Date:
On 11/30/22 11:41, Vijaykumar Jain wrote:

On Wed, Nov 30, 2022, 9:10 PM Atul Kumar <akumar14871@gmail.com> wrote:
Hi, 

I have a 10TB database running on postgres 11 version running on centos 7 "on premises", I need to schedule the backup of this database in a faster way.

The scheduled backup will be used for PITR purposes.

So please let me know how I should do it in a quicker backup for my 10TB database ? Is there any tool to take backups and subsequently incremental backups in a faster way and restore it for PITR in a faster way when required.

What should be the exact approach for scheduling such backups so that it can be restored in a faster way ?


We had a 96 cpu, 385gb ram, nvme storage and 10g network baremetal server.
We used pgbackrest for full backup.
It supports pitr and differential backup.

28tb db took 2.5 hours for backup on remote storage, and restore from the remote storage  took 3.5 hours when immediately restored (a lot of time is later due to wal replay to catch up)

Another vote for pgbackrest.  It supports AES encryption, and multiple forms of compression.

--
Angular momentum makes the world go 'round.

Re: postgres large database backup

From
Hannes Erven
Date:
Am 30.11.22 um 20:01 schrieb Mladen Gogala:
> On 11/30/22 10:40, Atul Kumar wrote:
>> Hi,
>>
>> I have a 10TB database running on postgres 11 version running on 
>> centos 7 "on premises", I need to schedule the backup of this 
>> database in a faster way.
>>
>> The scheduled backup will be used for PITR purposes.
>>
> Well, that depends on your configuration details. Is your DB located 
> on a SAN device or JBOD?
>

You could also use a filesystem that can do atomic snapshots - like ZFS.

However, I'm wondering why none of the previous respondents mentioned 
it? Sure, ZFS may have its own performance implications... but my 
experience is that it makes "safety snapshots" and "on-demand throw-away 
testing environments" very convenient to use.


Best regards,

     -hannes





Re: postgres large database backup

From
Mladen Gogala
Date:
On 11/30/22 18:19, Hannes Erven wrote:
You could also use a filesystem that can do atomic snapshots - like ZFS.

Uh, oh. Not so sure about that. Here is a page from the world of the big O: https://blog.docbert.org/oracle-on-zfs/

However, similar can be said about ZFS. ZFS snapshots will slow down the I/O considerably. I would definitely prefer snapshots done in hardware and not in software.  My favorite file systems, depending on the type of disk, are F2FS and XFS.

-- 
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com

Re: postgres large database backup

From
Michael Loftis
Date:


On Wed, Nov 30, 2022 at 18:03 Mladen Gogala <gogala.mladen@gmail.com> wrote:
On 11/30/22 18:19, Hannes Erven wrote:
You could also use a filesystem that can do atomic snapshots - like ZFS.

Uh, oh. Not so sure about that. Here is a page from the world of the big O: https://blog.docbert.org/oracle-on-zfs/

However, similar can be said about ZFS. ZFS snapshots will slow down the I/O considerably. I would definitely prefer snapshots done in hardware and not in software.  My favorite file systems, depending on the type of disk, are F2FS and XFS.


ZFS snapshots don’t typically have much if  any performance impact versus not having a snapshot (and already being on ZFS) because it’s already doing COW style semantics. 

Postgres write performance using ZFS is difficult because it’s super important to match up the underlying I/O sizes to the device/ZFS ashift, the ZFS recordsize, and the DB’s page/wal page sizes though, but not getting this right also cause performance issues without any snapshots, because again COW. If you’re constantly breaking a record block or sector there’s going to be a big impact. It won’t be any worse (in my own testing) regardless of if you have snapshots or not. Snapshots on ZFS don’t  cause any crazy write amplification by themselves (I’m not sure they cause any extra writes at all, I’d have to do some sleuthing)

ZFS will yes be slower than a raw disk (but that’s not an option for Pg anyway), and may or may not be faster than a different  filesystem on a HW RAID volume or storage array volume. It absolutely takes more care/clue/tuning to get Pg write performance on ZFS, and ZFS does duplicate some of Pg’s resiliency so there is duplicate work going on.

I’d say really that 2016 article is meaningless as ZFS, Oracle, and Postgres have all evolved dramatically in six years. Even further since there’s nothing remotely like ASM for Postgres.


-- 
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com
--

"Genius might be described as a supreme capacity for getting its possessors
into trouble of all kinds."
-- Samuel Butler

Re: postgres large database backup

From
Ron
Date:
On 11/30/22 19:41, Michael Loftis wrote:


On Wed, Nov 30, 2022 at 18:03 Mladen Gogala <gogala.mladen@gmail.com> wrote:
On 11/30/22 18:19, Hannes Erven wrote:
You could also use a filesystem that can do atomic snapshots - like ZFS.

Uh, oh. Not so sure about that. Here is a page from the world of the big O: https://blog.docbert.org/oracle-on-zfs/

However, similar can be said about ZFS. ZFS snapshots will slow down the I/O considerably. I would definitely prefer snapshots done in hardware and not in software.  My favorite file systems, depending on the type of disk, are F2FS and XFS.


ZFS snapshots don’t typically have much if  any performance impact versus not having a snapshot (and already being on ZFS) because it’s already doing COW style semantics. 

Postgres write performance using ZFS is difficult because it’s super important to match up the underlying I/O sizes to the device/ZFS ashift, the ZFS recordsize, and the DB’s page/wal page sizes though, but not getting this right also cause performance issues without any snapshots, because again COW. If you’re constantly breaking a record block or sector there’s going to be a big impact. It won’t be any worse (in my own testing) regardless of if you have snapshots or not. Snapshots on ZFS don’t  cause any crazy write amplification by themselves (I’m not sure they cause any extra writes at all, I’d have to do some sleuthing)

ZFS will yes be slower than a raw disk (but that’s not an option for Pg anyway), and may or may not be faster than a different  filesystem on a HW RAID volume or storage array volume. It absolutely takes more care/clue/tuning to get Pg write performance on ZFS, and ZFS does duplicate some of Pg’s resiliency so there is duplicate work going on.

I wonder what percentage of /Big Databases/ (like Op's and Vijaykumar's) are still on physical servers, as opposed to VMs connected to SANs.  Even many physical servers are connected to SANs.  (That is, of course, in the dreaded Enterprise environment.)

--
Angular momentum makes the world go 'round.

Re: postgres large database backup

From
Mladen Gogala
Date:
On 11/30/22 20:41, Michael Loftis wrote:

ZFS snapshots don’t typically have much if  any performance impact versus not having a snapshot (and already being on ZFS) because it’s already doing COW style semantics. 

Hi Michael,

I am not sure that such statement holds water. When a snapshot is taken, the amount of necessary I/O requests goes up dramatically. For every block that snapshot points to, it is necessary to read the block, write it to the spare location and then overwrite it, if you want to write to a block pointed by snapshot. That gives 3 I/O requests for every block written. NetApp is trying to optimize it by using 64MB blocks, but ZFS on Linux cannot do that, they have to use standard CoW because they don't have the benefit of their own hardware and OS. And the standard CoW is tripling the number of I/O requests for every write to the blocks pointed to by the snapshot, for every snapshot. CoW is a very expensive animal, with horns.

Regards

-- 
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com

Re: postgres large database backup

From
Mladen Gogala
Date:
On 11/30/22 20:51, Ron wrote:
ZFS will yes be slower than a raw disk (but that’s not an option for Pg anyway), and may or may not be faster than a different  filesystem on a HW RAID volume or storage array volume. It absolutely takes more care/clue/tuning to get Pg write performance on ZFS, and ZFS does duplicate some of Pg’s resiliency so there is duplicate work going on.

Ron, if this raw device reference is about ASM, Oracle has a file system on top of ASM, called ACFS, and I have been able to run PostgreSQL on top of ACFS. The reason to do that is redundancy. ASM/ACFS with PostgreSQL behaves similarly to Veritas Cluster, when one Postgres cluster goes down, the other one is started. And you don't have to pay for it, unless you start using storage snapshots. That ACFS feature requires a commercial license.

-- 
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com

Re: postgres large database backup

From
Vijaykumar Jain
Date:


On Thu, Dec 1, 2022, 7:11 PM Mladen Gogala <gogala.mladen@gmail.com> wrote:
On 11/30/22 20:41, Michael Loftis wrote:

ZFS snapshots don’t typically have much if  any performance impact versus not having a snapshot (and already being on ZFS) because it’s already doing COW style semantics. 

Hi Michael,

I am not sure that such statement holds water. When a snapshot is taken, the amount of necessary I/O requests goes up dramatically. For every block that snapshot points to, it is necessary to read the block, write it to the spare location and then overwrite it, if you want to write to a block pointed by snapshot. That gives 3 I/O requests for every block written. NetApp is trying to optimize it by using 64MB blocks, but ZFS on Linux cannot do that, they have to use standard CoW because they don't have the benefit of their own hardware and OS. And the standard CoW is tripling the number of I/O requests for every write to the blocks pointed to by the snapshot, for every snapshot. CoW is a very expensive animal, with horns.


I am not an expert in this area, but we have zfs for specific instances which have timeseries/event log data, and we also need compression. 
One day, there was a need to snapshot a 35tb zfs pool and send it across the network to a relplica, coz both the disks in the mirror degraded around same time, I do not recall zfs snapshots took anything resource intensive, and it was quick.ill ask around for actual time though.

We have more than 500 of these type of nodes with zfs (each having 20 disks in mirror each 8tb) for event log with compression, and zfs works just fine. This is a special setup where the data is assumed to be cold storage, hence compression, so it was designed for heavy writes and occasional reads queries only for debugging.

Re: postgres large database backup

From
Vijaykumar Jain
Date:

 I do not recall zfs snapshots took anything resource intensive, and it was quick.ill ask around for actual time.

Ok just a small note, out ingestion pattern is write anywhere, read globally. So we did stop ingestion while snapshot was taken as we could afford it that way. Maybe the story is different when snapshot is taken on live systems which generate a lot of delta.

Re: postgres large database backup

From
Michael Loftis
Date:


On Thu, Dec 1, 2022 at 06:40 Mladen Gogala <gogala.mladen@gmail.com> wrote:
On 11/30/22 20:41, Michael Loftis wrote:

ZFS snapshots don’t typically have much if  any performance impact versus not having a snapshot (and already being on ZFS) because it’s already doing COW style semantics. 

Hi Michael,

I am not sure that such statement holds water. When a snapshot is taken, the amount of necessary I/O requests goes up dramatically. For every block that snapshot points to, it is necessary to read the block, write it to the spare location and then overwrite it, if you want to write to a block pointed by snapshot. That gives 3 I/O requests for every block written. NetApp is trying to optimize it by using 64MB blocks, but ZFS on Linux cannot do that, they have to use standard CoW because they don't have the benefit of their own hardware and OS. And the standard CoW is tripling the number of I/O requests for every write to the blocks pointed to by the snapshot, for every snapshot. CoW is a very expensive animal, with horns.


Nope, ZFS does not behave that way.  Yup AFAIK all other snapshotting filesystems or volume managers do.  One major architectural decision of ZFS is the atomicity of writes.  Data at rest stays at rest.  Thus it does NOT overwrite live data.  Snapshots do not change the write path/behavior in ZFS. In ZFS writes are atomic, you’re always writing new data to free space, and accounting for where the current record/volume block within a file or volume actually lives on disk.  If a filesystem, volume manager, or RAID system, is overwriting data and in the middle of that process and has an issue that breaks that write, and that data is also live data, you can't be atomic, you've now destroyed data (RAID write hole is one concept of this).  That’s why adding a snapshot isn’t an additional cost for ZFS.  For better or worse you're paying that snapshot cost already because it already does not overwrite live data.  If there's no snapshot once the write is committed and the refcount is zero for the old blocks, and it's safe (TXG committed), those old blocks go back to the free pool to be potentially used again.  There's a bunch of optimization to that and how it actually happens, but at the end of the day, your writes do not overwrite your data in ZFS, writes of data get directed at free space, and eventually the on-disk structures get an atomic update that happens to say it now lives here.  In the time between that all happening the ZIL (which may live on its own special devices called SLOG -- this is why you often see the terms ZIL/journal/SLOG/log vdev used interchangeably) is the durable bit, but that's never normally read, it's only read back during recovery.   This is also where the ZFS filesystem property of recordsize or volblocksize (independently configurable on every filesystem/volume within a pool) is important for performance.  If you clobber a whole record ZFS isn't going to read anything extra when it gets around to committing, it knows the whole record changed and can safely write a whole new record (every 5s it goes about this TXG commit, so two 64k writes are still slower with a 128k recordsize, but still shouldn't pull in that 128k record).  There's other optimizations there, but at the end of the day as long as the chosen recordsize/volblocksize that matches up to your writes, and your writes are aligned to that within your file or volume, you'll not see an extra read of the data as part of it's normal flow of committing data.  Snapshots don't change that.  

Because of those architectural decisions, CoW behavior is part of ZFS' existing performance penalty, so when you look at that older Oracle ASM vs ZFS article, remember that that extra...what was it 0.5ms?... is accounting for most, probably all of the penalties for a snapshot too if you want (or need) it.  It's fundamental to how ZFS works and provides data durability+atomicity.  This is what ZFS calls it's snapshots essentially free, because you're already paying the performance for it.   What would ASM do if it had a snapshot to manage?  Or a few dozen on the same data?  Obviously during the first writes to those snapshotted areas you'd see it.  Ongoing performance penalties with those snapshots? Maybe ASM has an optimization that saves that benchmark a bunch of time if there is no snapshot.  But once one exists it takes a different write path and adds a performance penalty?  If a snapshot was taken in the middle of the benchmark?  Yeah there's going to be some extra IOPS when you take the snapshot to say "a snapshot now exists" for ZFS, but that doesn't dramatically change it's underlying write path after that point.

That atomicity and data durability also means that even if you lose the SLOG devices (which hold the ZIL/journal, if you don't have SLOG/log vdev then it's in-pool) you do not lose all the data.  Only stuff that somehow remained uncommitted after the ZIL failed. Say you had some sort of hard fault/crash, the SLOG/ZIL devices were destroyed, you can still opt to mount the ZFS pool, and filesystems/volumes, without that ZIL, which could (well, would) still suck, but would be better than just losing everything.  If the ZIL fails while the system is live, ZFS is going to do it's best to hopefully get everything committed ASAP as soon as it knows something is wrong, and keep it that way.  So on a SLOG/ZIL failure your performance WILL suffer (and boy it's UGLY, but at least it's not dead and destroyed).  And because of the atomicity property even if it has further fails during that window of time where it scrambles to commit, ZFS does not wreck the filesystem.  If the devices are still available it'll still provide whatever data it can back to you.  

So there's a very different approach to what's important with ZFS, it's not that performance isn't important, it's that your data is more important.  Performance is NOT ignored, but, to get that atomicity and durability you ARE paying some performance costs.  Is that worth it for YOUR database or files?  Only you as an admin can decide that.  No ZFS is NOT a great choice for every database or dataset!  For some workloads that penalty is not going to be acceptable.  

So writes in ZFS are always towards the journal (ZIL) first, barring config property tweaks, once a journal entry is durable it’s considered written, but uncommitted. If we crash at that point then journal recovery brings the pool back to that last written journal entry so ZFS is never lying to the application or OS.

The data is always written back in an atomic manner, writing new or changed data to a free block, never to the existing block.  So if it bombs in the middle of THAT, you're fine.  When the whole (record size or vol block size) block of sectors is written at once or within a window of time there’s no read of the original data once it finishes coalescing and the writing process to fully commit it. There’s of course always the free space/reference counts. But if they haven’t changed they aren’t written, and they're needed anyway to find where the data lives so are already present, so yeah first write best case there’s an extra write in the area where that is kept (which may itself be coalesced with other writes towards that), but it's not like every write incurs that extra overhead to maintain the old block, and if your write replaces that whole recordsize sized block,  But after that until a snapshot is added or removed there’s no more changes for the references to that old block.  When there’s no more references to a given block it goes into the free pool. 

Having a snapshot doesn’t add more work. It’s a byproduct of the atomic write behavior that ZFS implements (always write to free blocks, always write a sector).  You're just asking ZFS to not free those blocks.


Regards

-- 
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com

Re: postgres large database backup

From
Michael Loftis
Date:
On Thu, Dec 1, 2022 at 9:21 AM Michael Loftis <mloftis@wgops.com> wrote:
>
>
>
> On Thu, Dec 1, 2022 at 06:40 Mladen Gogala <gogala.mladen@gmail.com> wrote:
>>
>> On 11/30/22 20:41, Michael Loftis wrote:
>>
>>
>> ZFS snapshots don’t typically have much if  any performance impact versus not having a snapshot (and already being
onZFS) because it’s already doing COW style semantics. 
>>
>> Hi Michael,
>>
>> I am not sure that such statement holds water. When a snapshot is taken, the amount of necessary I/O requests goes
updramatically. For every block that snapshot points to, it is necessary to read the block, write it to the spare
locationand then overwrite it, if you want to write to a block pointed by snapshot. That gives 3 I/O requests for every
blockwritten. NetApp is trying to optimize it by using 64MB blocks, but ZFS on Linux cannot do that, they have to use
standardCoW because they don't have the benefit of their own hardware and OS. And the standard CoW is tripling the
numberof I/O requests for every write to the blocks pointed to by the snapshot, for every snapshot. CoW is a very
expensiveanimal, with horns. 

And if you want to know more, ARS wrote a good ZFS 101 article -- the
write semantics I described in overview are on page three,
https://arstechnica.com/information-technology/2020/05/zfs-101-understanding-zfs-storage-and-performance/3/


--

"Genius might be described as a supreme capacity for getting its possessors
into trouble of all kinds."
-- Samuel Butler



Re: postgres large database backup

From
hvjunk
Date:
> On 01 Dec 2022, at 01:19, Hannes Erven <hannes@erven.at> wrote:
>
> You could also use a filesystem that can do atomic snapshots - like ZFS.
>
> However, I'm wondering why none of the previous respondents mentioned it?

the reason: 'cause most people are "stuck" on public clouds that does not provide that as a "solution" in their cookie
cutter(wrongly called "managed") solutions... 

> Sure, ZFS may have its own performance implications... but my experience is that it makes "safety snapshots" and
"on-demandthrow-away testing environments" very convenient to use. 

Using that myself, but also as a VM in a ProxMox environmenet, I've found (and restored) with ProxMox Backup Server
gootsingle snapshot backup solution. 

My cases doesn't need the PITR yet


Re: postgres large database backup

From
Michael Loftis
Date:
On Thu, Dec 1, 2022 at 7:40 AM Vijaykumar Jain
<vijaykumarjain.github@gmail.com> wrote:
>
>
>>  I do not recall zfs snapshots took anything resource intensive, and it was quick.ill ask around for actual time.
>
>
> Ok just a small note, out ingestion pattern is write anywhere, read globally. So we did stop ingestion while snapshot
wastaken as we could afford it that way. Maybe the story is different when snapshot is taken on live systems which
generatea lot of delta. 

Snapshot in ZFS at worst case would copy the entire allocation tree
and adjusts ref counters, IE metadata, no data copy.  I don't know if
it even works that hard to create a snapshot now, as in it might just
make a marker, all I know is they've always been fast/cheap.
Differential zfs send|recv based off two snapshots is also pretty damn
fast because it knows what's shared, and only sends what changes.
There's definitely been major changes in how snapshots are created
over the years to make them even quicker (ISTR it's the "bookmarks"
feature?)

This is just a small pool on my local/home NAS (TrueNAS Scale) of
around 40T of data...Note that -r, it's not creating one snapshot but
uhm *checks* 64 (-r create also a snapshot of every volume/filesystem
underneath that)
root@...:~ # time zfs snapshot -r tank@TESTSNAP0
0.000u 0.028s 0:00.32 6.2%      144+280k 0+0io 0pf+0w
root@...:~ #

I have no idea how many files are in there.  My personal home
directory and dev tree is in one of those, and I've got at least half
a dozen versions of the Linux Kernel, FreeBSD kernel, and other source
trees, and quite a few other Very Bushy(tm) source trees so it's quite
a fair amount of files.

So yeah, 28msec, 64 snapshots....they're REALLY cheap to create, and
since you pay the performance costs already, they're not very
expensive to maintain.  And the performance cost isn't awful unlike in
more traditional snapshot systems.  I will say that is a kind of
optimal case because I have a very fast NVMe SLOG/ZIL, and the box is
otherwise effectively idle.  Destroying the freshly created snapshot
is about the same...So is destroying 6 months old snapshots though I
don't have a bonkers amount of changed data in my pool.




--

"Genius might be described as a supreme capacity for getting its possessors
into trouble of all kinds."
-- Samuel Butler