Re: file cloning in pg_upgrade and CREATE DATABASE - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: file cloning in pg_upgrade and CREATE DATABASE
Date
Msg-id 20180322023831.GG2490@paquier.xyz
Whole thread Raw
In response to Re: file cloning in pg_upgrade and CREATE DATABASE  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Responses Re: file cloning in pg_upgrade and CREATE DATABASE  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
List pgsql-hackers
On Tue, Mar 20, 2018 at 10:55:04AM -0400, Peter Eisentraut wrote:
> On 3/19/18 22:58, Michael Paquier wrote:
>> - Extend copy_file so as it is able to use fcopyfile.
>
> fcopyfile() does not support cloning.  (This is not documented.)

You are right.  I have been reading the documentation here to get an
idea as I don't have a macos system at hand:
https://www.unix.com/man-page/osx/3/fcopyfile/

However I have bumped into that:
http://www.openradar.me/30706426

Future versions will be visibly fixed.

>> - Move the work done in pg_upgrade into a common API which can as well
>> be used by pg_rewind as well.  One place would be to have a
>> frontend-only API in src/common which does the leg work.  I would
>> recommend working only on file descriptors as well for consistency with
>> copy_file_range.
>
> pg_upgrade copies files, whereas pg_rewind needs to copy file ranges.
> So I don't think this is going to be a good match.
>
> We could add support for using Linux copy_file_range() in pg_rewind, but
> that would work a bit differently.  I also don't have a good sense of
> how to test the performance of that.

One simple way to test that would be to limit the time it takes to scan
the WAL segments on the target so as the filemap is computed quickly,
and create many, say gigabyte-size relations on the promoted source
which will need to be copied from the source to the target.

> Another thing to think about is that we go through some trouble to
> initialize new WAL files so that the disk space is fully allocated.  If
> we used file cloning calls in pg_rewind, that would potentially
> invalidate some of that.  At least, we'd have to think through this more
> carefully.

Agreed.  Let's keep in mind such things but come with a sane, first cut
of this patch based on the time remaining in this commit fest.

>> - Add proper wait events for the backend calls.  Those are missing for
>> copy_file_range and copyfile.
>
> done

+         <entry><literal>CopyFileCopy</literal></entry>
+         <entry>Waiting for a file copy operation (if the copying is done by
+         an operating system call rather than as separate read and write
+         operations).</entry>
CopyFileCopy is... Redundant.  Perhaps CopyFileSystem or CopyFileRange?

>> - For XLogFileCopy, the problem may be trickier as the tail of a segment
>> is filled with zeroes, so dropping it from the first version of the
>> patch sounds wiser.
>
> Seems like a possible follow-on project.  But see also under pg_rewind
> above.

No objections to do that in the future for both.

> Another oddity is that pg_upgrade uses CopyFile() on Windows, but the
> backend does not.  The Git log shows that the backend used to use
> CopyFile(), but that was then removed when the generic code was added,
> but when pg_upgrade was imported, it came with the CopyFile() call.

You mean 558730ac, right?

> I suspect the CopyFile() call can be quite a bit faster, so we should
> consider adding it back in.  Or if not, remove it from pg_upgrade.

Hm.  The proposed patch also removes an important property of what
happens now in copy_file: the copied files are periodically synced to
avoid spamming the cache, so for some loads wouldn't this cause a
performance regression?

At least on Linux it is possible to rely on sync_file_range which is
called via pg_flush_data, so it seems to me that we ought to roughly
keep the loop working on FLUSH_DISTANCE, and replace the calls of
read/write by copy_file_range.  copyfile is only able to do a complete
file copy, so we would also lose this property as well on Linux.  Even
for Windows using CopyFile would be a step backwards for the backend.

pg_upgrade is different though as it copies files fully, so using both
copyfile and copy_file_range makes sense.

At the end, it seems to me that using copy_file_range has some values as
you save a set of read/write calls, but copyfile comes with its
limitations, which I think will cause side issues, so I would recommend
dropping it from a first cut of the patch for the backend.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] taking stdbool.h into use
Next
From: Amit Langote
Date:
Subject: Re: reorganizing partitioning code