Home > mailing lists

file cloning in pg_upgrade and CREATE DATABASE - Mailing list pgsql-hackers

From	Peter Eisentraut
Subject	file cloning in pg_upgrade and CREATE DATABASE
Date	February 21, 2018 09:00:04
Msg-id	bc9ca382-b98d-0446-f699-8c5de2307ca7@2ndquadrant.com Whole thread Raw
Responses	Re: file cloning in pg_upgrade and CREATE DATABASE (Robert Haas <robertmhaas@gmail.com>) Re: file cloning in pg_upgrade and CREATE DATABASE (Tomas Vondra <tomas.vondra@2ndquadrant.com>) Re: file cloning in pg_upgrade and CREATE DATABASE (Michael Paquier <michael@paquier.xyz>) Re: file cloning in pg_upgrade and CREATE DATABASE (Thomas Munro <thomas.munro@enterprisedb.com>)
List	pgsql-hackers

Tree view

Here is another attempt at implementing file cloning for pg_upgrade and
CREATE DATABASE.  The idea is to take advantage of file systems that can
make copy-on-write clones, which would make the copy run much faster.
For pg_upgrade, this will give the performance of --link mode without
the associated drawbacks.

There have been patches proposed previously [0][1].  The concerns there
were mainly that they required a Linux-specific ioctl() call and only
worked for Btrfs.

Some new things have happened since then:

- XFS has (optional) reflink support.  This file system is probably more
widely used than Btrfs.

- Linux and glibc have a proper function to do this now.

- APFS on macOS supports file cloning.

So altogether this feature will be more widely usable and less ugly to
implement.  Note, however, that you will currently need literally the
latest glibc release, so it probably won't be accessible right now
unless you are using Fedora 28 for example.  (This is the
copy_file_range() function that had us recently rename the same function
in pg_rewind.)

Some example measurements:

6 GB database, pg_upgrade unpatched 30 seconds, patched 3 seconds (XFS
and APFS)

similar for a CREATE DATABASE from a large template

Even if you don't have a file system with cloning support, the special
library calls make copying faster.  For example, on APFS, in this
example, an unpatched CREATE DATABASE takes 30 seconds, with the library
call (but without cloning) it takes 10 seconds.

For amusement/bewilderment, without the recent flush optimization on
APFS, this takes 2 minutes 30 seconds.  I suppose this optimization will
now actually obsolete, since macOS will no longer hit that code.


[0]:
https://www.postgresql.org/message-id/flat/513C0E7C.5080606%40socialserve.com

[1]:
https://www.postgresql.org/message-id/flat/20140213030731.GE4831%40momjian.us
-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

0001-Use-file-cloning-in-pg_upgrade-and-CREATE-DATABASE.patch

pgsql-hackers by date:

From: Peter Eisentraut
Date: 21 February 2018, 08:22:29
Subject: support parameters in CALL

From: Masahiko Sawada
Date: 21 February 2018, 09:19:45
Subject: Re: Duplicate Item Pointers in Gin index

file cloning in pg_upgrade and CREATE DATABASE - Mailing list pgsql-hackers

Attachment

Previous

Next