file cloning in pg_upgrade and CREATE DATABASE - Mailing list pgsql-hackers

From Peter Eisentraut
Subject file cloning in pg_upgrade and CREATE DATABASE
Date
Msg-id bc9ca382-b98d-0446-f699-8c5de2307ca7@2ndquadrant.com
Whole thread Raw
Responses Re: file cloning in pg_upgrade and CREATE DATABASE  (Robert Haas <robertmhaas@gmail.com>)
Re: file cloning in pg_upgrade and CREATE DATABASE  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: file cloning in pg_upgrade and CREATE DATABASE  (Michael Paquier <michael@paquier.xyz>)
Re: file cloning in pg_upgrade and CREATE DATABASE  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-hackers
Here is another attempt at implementing file cloning for pg_upgrade and
CREATE DATABASE.  The idea is to take advantage of file systems that can
make copy-on-write clones, which would make the copy run much faster.
For pg_upgrade, this will give the performance of --link mode without
the associated drawbacks.

There have been patches proposed previously [0][1].  The concerns there
were mainly that they required a Linux-specific ioctl() call and only
worked for Btrfs.

Some new things have happened since then:

- XFS has (optional) reflink support.  This file system is probably more
widely used than Btrfs.

- Linux and glibc have a proper function to do this now.

- APFS on macOS supports file cloning.

So altogether this feature will be more widely usable and less ugly to
implement.  Note, however, that you will currently need literally the
latest glibc release, so it probably won't be accessible right now
unless you are using Fedora 28 for example.  (This is the
copy_file_range() function that had us recently rename the same function
in pg_rewind.)

Some example measurements:

6 GB database, pg_upgrade unpatched 30 seconds, patched 3 seconds (XFS
and APFS)

similar for a CREATE DATABASE from a large template

Even if you don't have a file system with cloning support, the special
library calls make copying faster.  For example, on APFS, in this
example, an unpatched CREATE DATABASE takes 30 seconds, with the library
call (but without cloning) it takes 10 seconds.

For amusement/bewilderment, without the recent flush optimization on
APFS, this takes 2 minutes 30 seconds.  I suppose this optimization will
now actually obsolete, since macOS will no longer hit that code.


[0]:
https://www.postgresql.org/message-id/flat/513C0E7C.5080606%40socialserve.com

[1]:
https://www.postgresql.org/message-id/flat/20140213030731.GE4831%40momjian.us
-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: support parameters in CALL
Next
From: Masahiko Sawada
Date:
Subject: Re: Duplicate Item Pointers in Gin index