Re: CREATE DATABASE with filesystem cloning - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: CREATE DATABASE with filesystem cloning
Date
Msg-id eb02dd00-3fba-9611-d2eb-b99b7c1723cf@dunslane.net
Whole thread Raw
In response to CREATE DATABASE with filesystem cloning  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: CREATE DATABASE with filesystem cloning
List pgsql-hackers
On 2023-10-07 Sa 01:51, Thomas Munro wrote:
> Hello hackers,
>
> Here is an experimental POC of fast/cheap database cloning.  For
> clones from little template databases, no one cares much, but it might
> be useful to be able to create a snapshot or fork of very large
> database for testing/experimentation like this:
>
>    create database foodb_snapshot20231007 template=foodb strategy=file_clone
>
> It should be a lot faster, and use less physical disk, than the two
> existing strategies on recent-ish XFS, BTRFS, very recent OpenZFS,
> APFS (= macOS), and it could in theory be extended to other systems
> that invented different system calls for this with more work (Solaris,
> Windows).  Then extra physical disk space will be consumed only as the
> two clones diverge.
>
> It's just like the old strategy=file_copy, except it asks the OS to do
> its best copying trick.  If you try it on a system that doesn't
> support copy-on-write, then copy_file_range() should fall back to
> plain old copy, but it might still be better than we could do, as it
> can push copy commands to network storage or physical storage.
>
> Therefore, the usual caveats from strategy=file_copy also apply here.
> Namely that it has to perform checkpoints which could be very
> expensive, and there are some quirks/brokenness about concurrent
> backups and PITR.  Which makes me wonder if it's worth pursuing this
> idea.  Thoughts?
>
> I tested on bleeding edge FreeBSD/ZFS, where you need to set sysctl
> vfs.zfs.bclone_enabled=1 to enable the optimisation, as it's still a
> very new feature that is still being rolled out.  The system call
> succeeds either way, but that controls whether the new database
> initially shares blocks on disk, or get new copies.  I also tested on
> a Mac.  In both cases I could clone large databases in a fraction of a
> second.


I've had to disable COW on my BTRFS-resident buildfarm animals (see 
previous discussion re Direct I/O).


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




pgsql-hackers by date:

Previous
From: Richard Guo
Date:
Subject: Re: pg16: XX000: could not find pathkey item to sort
Next
From: Christoph Moench-Tegeder
Date:
Subject: Re: wal recycling problem