Thread: [patch] pg_copy - a command for reliable WAL archiving

[patch] pg_copy - a command for reliable WAL archiving

From
"MauMau"
Date:
Hello,

As I proposed before in the thread below, I've implemented a simple command
for reliable WAL archiving.  I would appreciate it if you could review and
test the patch.

http://www.postgresql.org/message-id/9C1EB95CA1F34DAB93DF549A51E3E874@maumau

Regards
MauMau

Attachment

Re: [patch] pg_copy - a command for reliable WAL archiving

From
Abhijit Menon-Sen
Date:
At 2014-06-17 23:26:37 +0900, maumau307@gmail.com wrote:
>
> Hello,
> 
> As I proposed before in the thread below, I've implemented a simple
> command for reliable WAL archiving.  I would appreciate it if you
> could review and test the patch.

Please add your patch to the next CF (i.e. 2014-08), so that it gets
review attention at the appropriate time.

-- Abhijit



Re: [patch] pg_copy - a command for reliable WAL archiving

From
Joe Conway
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/17/2014 07:26 AM, MauMau wrote:
> Hello,
> 
> As I proposed before in the thread below, I've implemented a simple
> command for reliable WAL archiving.  I would appreciate it if you could
> review and test the patch.
> 
> http://www.postgresql.org/message-id/9C1EB95CA1F34DAB93DF549A51E3E874@maumau

That first hunk refers to dblink -- I'm guessing it does not belong with
this patch.

Joe


- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJToFiiAAoJEDfy90M199hlseMP/Rpwneo5la7mJkHJA0BWUtj/
Hh8yNzzyBVPbKYuTEOZWi2yblFW6yA2dknrYD8RigS1qJFwLw+drFt5673Vi6jKW
Pf7Qn62Ck/U0lZTGXUU9akOhSx7BsKVwH8HvdARIp5DSV2n7/HFDtazi3hTtFq31
GHKA84lPwuynODN42eVez0YXdeRUX7K/s5rCRq154q3BrLPhEEvQwo/kfhet3F0A
utf0ymSPuX3RvpGDDPAZ1oQxTd/xA+qJhX0jrsRoJVaY40rufTgXPmAnNFOdWEds
fXs6ObRm+tHmeLhYVUXhODa1HPHBQMvElVeB3CGxhPUvhP2494sfoLOd7qs8Lblg
oUkYbIJee8Ir7NN34Gc69YW58sekPSVI4vXKu0PwF++Ubs4MYNNd7nP4Wu9N2ISw
p/GPIbx0uR3NbFGCoOLD0K3QHnX/b0FTWHzTTboZ+b69rNIDePpn3eGO2QEOLL/R
P/YkPma8SLyDvNnCzuSU3XDkFmWQsgK7xa7qpsBR1mbdF7zKPfOxCtoby/enSeuk
z7NJxtpHeUQkO7Pb3ZNk6gL+E8UAQihvdBgdBwBzj4qaZyAM5ec29aya3TtBbF+3
UoFX3m4tthR6mMWqizsdadSPvozDjMrhqSRFAjdBSX80Nfs2DVN1Hepp8NXvjRzV
b5RfV7x4yvtr92FFQboj
=+TWh
-----END PGP SIGNATURE-----



Re: [patch] pg_copy - a command for reliable WAL archiving

From
Abhijit Menon-Sen
Date:
At 2014-06-17 08:02:59 -0700, mail@joeconway.com wrote:
>
> That first hunk refers to dblink -- I'm guessing it does not belong
> with this patch.

Yes, it's a leftover of the dblink memory leak patch that's in this CF.

-- Abhijit



Re: [patch] pg_copy - a command for reliable WAL archiving

From
"MauMau"
Date:
From: "Joe Conway" <mail@joeconway.com>
> That first hunk refers to dblink -- I'm guessing it does not belong with
> this patch.

Ouch, what a careless mistake.  The attached one is clean.  I'll update the
CommitFest entry when I'm back home from work.

Regards
MauMau


Attachment

Re: [patch] pg_copy - a command for reliable WAL archiving

From
"MauMau"
Date:
I fixed some minor mistakes.

Regards
MauMau

Attachment

Re: [patch] pg_copy - a command for reliable WAL archiving

From
Fujii Masao
Date:
On Thu, Aug 14, 2014 at 1:52 PM, MauMau <maumau307@gmail.com> wrote:
> I fixed some minor mistakes.

What's the main purpose of this tool? If it's for WAL archiving, the tool name
"pg_copy" sounds too generic. We already have pg_archivecleanup, so maybe
"pg_archivecopy" or something is better for the consistency?

pg_copy in the patch copies the file to the destination in a
straightforward way,
i.e., directly copies the file to the dest file with actual name. This can cause
the problem which some people reported. The problem is that, when the server
crashes while WAL file is being archived by cp command, its partially-filled
WAL file remains at the archival area. This half-baked archive file can cause
various troubles. To address this, WAL file needs to be copied to the temporary
file at first, then renamed to the actual name. I think that pg_copy should
copy the WAL file in that way.

Currently pg_copy always syncs the archive file, and there is no way to disable
that. But I'm sure that not everyone want to sync the archive file. So I think
that it's better to add the option specifying whether to sync the file
or not, into
pg_copy.

Some users might want to specify whether to call posix_fadvise or not because
they might need to re-read the archvied files just after the archiving.
For example, network copy of the archived files from the archive area to
remote site for disaster recovery.

Do you recommend to use pg_copy for restore_command? If yes, it also should
be documented. And in the WAL restore case, the restored WAL files are re-read
soon by recovery, so posix_fadvise is not good in that case.

Direct I/O and posix_fadvise are used only for destination file. But why not
source file? That might be useful especially for restore_command case.

At last, the big question is, is there really no OS command which provides
the same functionality as pg_copy does? If there is, I'd like to avoid duplicate
work basically.

Regards,

-- 
Fujii Masao



Re: [patch] pg_copy - a command for reliable WAL archiving

From
Kevin Grittner
Date:
Fujii Masao <masao.fujii@gmail.com> wrote:

> Direct I/O and posix_fadvise are used only for destination file.
> But why not source file? That might be useful especially for
> restore_command case.

That would prevent people from piping the file through a
compression utility.  We should support piped I/O for input (and if
we want to use it on the recovery side, for the output, too), at
least as an option.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [patch] pg_copy - a command for reliable WAL archiving

From
"MauMau"
Date:
From: "Fujii Masao" <masao.fujii@gmail.com>
> What's the main purpose of this tool? If it's for WAL archiving, the tool 
> name
> "pg_copy" sounds too generic. We already have pg_archivecleanup, so maybe
> "pg_archivecopy" or something is better for the consistency?
>
> pg_copy in the patch copies the file to the destination in a
> straightforward way,
> i.e., directly copies the file to the dest file with actual name. This can 
> cause
> the problem which some people reported. The problem is that, when the 
> server
> crashes while WAL file is being archived by cp command, its 
> partially-filled
> WAL file remains at the archival area. This half-baked archive file can 
> cause
> various troubles. To address this, WAL file needs to be copied to the 
> temporary
> file at first, then renamed to the actual name. I think that pg_copy 
> should
> copy the WAL file in that way.

I intended to make pg_copy a straightforward replacement of cp/copy, which 
complements the missing sync.  Direct I/O and posix_fadvice() feature may be 
convenient but not essential for this utility.  cp/copy doesn't copy to a 
temporary file, and the problem can be solved easily by mv/move.  I wanted 
to keep pg_copy as generic as cp/copy, so that it can be used by some 
advanced features in the future, e.g. comprehensive backup/recovery 
management like RMAN (this example may not be best) when it's integrated 
into the core.

With that said, copying to a temporary file like <dest>.tmp and renaming it 
to <dest> sounds worthwhile even as a basic copy utility.  I want to avoid 
copying to a temporary file with a fixed name like _copy.tmp, because some 
advanced utility may want to run multiple instances of pg_copy to copy 
several files into the same directory simultaneously.  However, I'm afraid 
multiple <dest>.tmp files might continue to occupy disk space after 
canceling copy or power failure in some use cases, where the copy of the 
same file won't be retried.  That's also the reason why I chose to not use a 
temporary file like cp/copy.

> Currently pg_copy always syncs the archive file, and there is no way to 
> disable
> that. But I'm sure that not everyone want to sync the archive file. So I 
> think
> that it's better to add the option specifying whether to sync the file
> or not, into
> pg_copy.

pg_copy is for copying a file reliably by syncing.  If sync is not 
necessary, people can use cp/copy.


> Some users might want to specify whether to call posix_fadvise or not 
> because
> they might need to re-read the archvied files just after the archiving.
> For example, network copy of the archived files from the archive area to
> remote site for disaster recovery.

This sounds reasonable.  Do you have an idea on the switch name and the 
default behavior?


> Do you recommend to use pg_copy for restore_command? If yes, it also 
> should
> be documented. And in the WAL restore case, the restored WAL files are 
> re-read
> soon by recovery, so posix_fadvise is not good in that case.
>
> Direct I/O and posix_fadvise are used only for destination file. But why 
> not
> source file? That might be useful especially for restore_command case.

No, I don't think it's necessary to use pg_copy for restore_command.


> At last, the big question is, is there really no OS command which provides
> the same functionality as pg_copy does? If there is, I'd like to avoid 
> duplicate
> work basically.

If there exists such a command available in the standard OS installation, I 
want to use it.

Regards
MauMau




Re: [patch] pg_copy - a command for reliable WAL archiving

From
Peter Eisentraut
Date:
On 8/19/14 9:35 AM, MauMau wrote:
> pg_copy is for copying a file reliably by syncing.  If sync is not
> necessary, people can use cp/copy.

I'm getting mixed messages from this thread.

I think there could be a fair amount of support for a new tool that can
serve as a universal plug-and-play archive_command, with a variety of
options, such as fsync yes/no, fadvise yes/no, direct-io[*] yes/no,
atomic copy yes/no, allow overwrite yes/no, compression yes/no.  That
would reduce the need for users to compose adventurous shell commands,
and it would set out the various options more clearly.

This is not that.  This is cp+fsync with a hardcoded fadvise policy and
optional direct-io.  That is a valid problem to address, but it is one
of many.  On the other hand, I fear that the addition of this
single-and-a-half-purpose tool would make the overall landscape more
complicated than it already is.  Since it's in the examples, people will
probably use it, even if they don't need to or shouldn't.  And not
recommending it for the restore_command is also confusing.

Another example of how confusing all of this is: On Windows, the copy
command by default doesn't overwrite files, which is what we want
(usually).  The patch changes those instances of copy to pg_copy, but it
doesn't have that behavior.  Should the examples by changed to do a test
&& pg_copy on Windows (what's the Windows shell syntax for that?), or
should pg_copy have an option to not overwrite a file?  How do you then
avoid inconsistencies with the Unix behavior?  Or what if you want fsync
but allow overwriting on Windows?

On the technical side, I think if you fsync a new file, you also need to
fsync the directory, to make sure the file is certain to be visible
after a crash.


[*] I keep reading "directio" as a typo of "direction", so please
consider putting a hyphen or underscore in the option and variable
names. ;-)



Re: [patch] pg_copy - a command for reliable WAL archiving

From
Peter Eisentraut
Date:
On 8/15/14 10:46 AM, Fujii Masao wrote:
> At last, the big question is, is there really no OS command which provides
> the same functionality as pg_copy does? If there is, I'd like to avoid duplicate
> work basically.

If you look hard enough, you can maybe find an OS command that can fsync
a file after it was copied.  Some versions of dd can do that, and some
systems have an fsync program.  But it's not clear whether all systems
have that, and it probably won't be simple and consistent.



Re: [patch] pg_copy - a command for reliable WAL archiving

From
Josh Berkus
Date:
On 08/19/2014 12:37 PM, Peter Eisentraut wrote:
> On 8/19/14 9:35 AM, MauMau wrote:
>> pg_copy is for copying a file reliably by syncing.  If sync is not
>> necessary, people can use cp/copy.
> 
> I'm getting mixed messages from this thread.
> 
> I think there could be a fair amount of support for a new tool that can
> serve as a universal plug-and-play archive_command, with a variety of
> options, such as fsync yes/no, fadvise yes/no, direct-io[*] yes/no,
> atomic copy yes/no, allow overwrite yes/no, compression yes/no.  That
> would reduce the need for users to compose adventurous shell commands,
> and it would set out the various options more clearly.

Yes please!

Although I'm not sold on the idea of using DirectIO for this.  Is there
really enough benefit to make it worth the trouble?

> 
> This is not that.  This is cp+fsync with a hardcoded fadvise policy and
> optional direct-io.  That is a valid problem to address, but it is one
> of many.  On the other hand, I fear that the addition of this
> single-and-a-half-purpose tool would make the overall landscape more
> complicated than it already is.  Since it's in the examples, people will
> probably use it, even if they don't need to or shouldn't.  And not
> recommending it for the restore_command is also confusing.

I'm afraid that I agree with Peter here.  pg_copy looks like a nice
foundation for the eventual pg_copy utility we need, but it's not there yet.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: [patch] pg_copy - a command for reliable WAL archiving

From
Alvaro Herrera
Date:
MauMau wrote:

> With that said, copying to a temporary file like <dest>.tmp and
> renaming it to <dest> sounds worthwhile even as a basic copy
> utility.  I want to avoid copying to a temporary file with a fixed
> name like _copy.tmp, because some advanced utility may want to run
> multiple instances of pg_copy to copy several files into the same
> directory simultaneously.  However, I'm afraid multiple <dest>.tmp
> files might continue to occupy disk space after canceling copy or
> power failure in some use cases, where the copy of the same file
> won't be retried.  That's also the reason why I chose to not use a
> temporary file like cp/copy.

Is there a way to create a link to a file which only exists as an open
file descriptor?   If there was, you could create a temp file, open an
fd, then delete the file.  That would remove the issue with files being
leaked due to failures of various kinds.

Also, it's been mentioned that this utility might be useful for
restore_command.  That sounds good I guess, but need to keep the
RECOVERYXLOG trick in mind.  I remember a case of stalled replay because
the restore command was writing to RECOVERYXLOG.gz and ungzipping, and
the unlink("RECOVERYXLOG") call failed after a partial copy and so did
the copy from the archive.  (Removing the borked RECOVERYXLOG.gz fixed
it.)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: [patch] pg_copy - a command for reliable WAL archiving

From
Greg Stark
Date:
On Tue, Aug 19, 2014 at 10:42 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Is there a way to create a link to a file which only exists as an open
> file descriptor?   If there was, you could create a temp file, open an
> fd, then delete the file.  That would remove the issue with files being
> leaked due to failures of various kinds.


Sort of. On recent Linuxen you can create a file with open(O_TMPFILE)
then use linkat(2) to create a link for it only once it's fully
written.

-- 
greg



Re: [patch] pg_copy - a command for reliable WAL archiving

From
Greg Stark
Date:
c.f.:
      O_TMPFILE (since Linux 3.11)             Create an unnamed temporary file.  The pathname argument
specifiesa directory; an unnamed inode will be created in             that directory's filesystem.  Anything written to
the            resulting file will be lost when the last file descriptor is             closed, unless the file is
givena name.
 
             O_TMPFILE must be specified with one of O_RDWR or O_WRONLY             and, optionally, O_EXCL.  If O_EXCL
isnot specified, then             linkat(2) can be used to link the temporary file into the             filesystem,
makingit permanent, using code like the             following:
 
                 char path[PATH_MAX];                 fd = open("/path/to/dir", O_TMPFILE | O_RDWR,
                   S_IRUSR | S_IWUSR);
 
                 /* File I/O on 'fd'... */
                 snprintf(path, PATH_MAX,  "/proc/self/fd/%d", fd);                 linkat(AT_FDCWD, path, AT_FDCWD,
"/path/for/file",                                        AT_SYMLINK_FOLLOW);
 
             In this case, the open() mode argument determines the file             permission mode, as with O_CREAT.
             Specifying O_EXCL in conjunction with O_TMPFILE prevents a             temporary file from being linked
intothe filesystem in the             above manner.  (Note that the meaning of O_EXCL in this case             is
differentfrom the meaning of O_EXCL otherwise.)
 
             There are two main use cases for O_TMPFILE:
             *  Improved tmpfile(3) functionality: race-free creation of                temporary files that (1) are
automaticallydeleted when                closed; (2) can never be reached via any pathname; (3) are                not
subjectto symlink attacks; and (4) do not require the                caller to devise unique names.
 
             *  Creating a file that is initially invisible, which is then                populated with data and
adjustedto have appropriate                filesystem attributes (chown(2), chmod(2), fsetxattr(2),
etc.) before being atomically linked into the filesystem                in a fully formed state (using linkat(2) as
described               above).
 
             O_TMPFILE requires support by the underlying filesystem; only             a subset of Linux filesystems
providethat support.  In the             initial implementation, support was provided in the ext2,             ext3,
ext4,UDF, Minix, and shmem filesystems.  XFS support             was added in Linux 3.15.
 



Re: [patch] pg_copy - a command for reliable WAL archiving

From
Alvaro Herrera
Date:
Greg Stark wrote:

>                   char path[PATH_MAX];
>                   fd = open("/path/to/dir", O_TMPFILE | O_RDWR,
>                                           S_IRUSR | S_IWUSR);
> 
>                   /* File I/O on 'fd'... */
> 
>                   snprintf(path, PATH_MAX,  "/proc/self/fd/%d", fd);
>                   linkat(AT_FDCWD, path, AT_FDCWD, "/path/for/file",
>                                           AT_SYMLINK_FOLLOW);

Hmm, the real trick here is linkat(... "/proc/self/foobar"), not the
O_TMPFILE: you can have an open file descriptor to an "invisible" file
simply by creating a normal file and unlinking it.  I looked at linkat()
yesterday but the idea of using /proc/self didn't occur to me.  Nasty
trick :-(  It seems linkat() is quite a bit more portable than
O_TMPFILE, fortunately ...

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: [patch] pg_copy - a command for reliable WAL archiving

From
Andres Freund
Date:
On 2014-08-19 17:42:06 -0400, Alvaro Herrera wrote:
> MauMau wrote:
> 
> > With that said, copying to a temporary file like <dest>.tmp and
> > renaming it to <dest> sounds worthwhile even as a basic copy
> > utility.  I want to avoid copying to a temporary file with a fixed
> > name like _copy.tmp, because some advanced utility may want to run
> > multiple instances of pg_copy to copy several files into the same
> > directory simultaneously.  However, I'm afraid multiple <dest>.tmp
> > files might continue to occupy disk space after canceling copy or
> > power failure in some use cases, where the copy of the same file
> > won't be retried.  That's also the reason why I chose to not use a
> > temporary file like cp/copy.
> 
> Is there a way to create a link to a file which only exists as an open
> file descriptor?   If there was, you could create a temp file, open an
> fd, then delete the file.  That would remove the issue with files being
> leaked due to failures of various kinds.

Isn't this a solution looking for a problem? We're using tempfiles in
dozens of other places and I really don't see why this is the place to
stop doing so. Just copy to <dest>.tmp and move it into place. If things
crash during that, the command will be repeated shortly afterwards again
*anyway*. Let's not get into platform specific games here.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: [patch] pg_copy - a command for reliable WAL archiving

From
Alvaro Herrera
Date:
Andres Freund wrote:
> On 2014-08-19 17:42:06 -0400, Alvaro Herrera wrote:
> > MauMau wrote:
> > 
> > > With that said, copying to a temporary file like <dest>.tmp and
> > > renaming it to <dest> sounds worthwhile even as a basic copy
> > > utility.  I want to avoid copying to a temporary file with a fixed
> > > name like _copy.tmp, because some advanced utility may want to run
> > > multiple instances of pg_copy to copy several files into the same
> > > directory simultaneously.  However, I'm afraid multiple <dest>.tmp
> > > files might continue to occupy disk space after canceling copy or
> > > power failure in some use cases, where the copy of the same file
> > > won't be retried.  That's also the reason why I chose to not use a
> > > temporary file like cp/copy.
> > 
> > Is there a way to create a link to a file which only exists as an open
> > file descriptor?   If there was, you could create a temp file, open an
> > fd, then delete the file.  That would remove the issue with files being
> > leaked due to failures of various kinds.
> 
> Isn't this a solution looking for a problem? We're using tempfiles in
> dozens of other places and I really don't see why this is the place to
> stop doing so. Just copy to <dest>.tmp and move it into place. If things
> crash during that, the command will be repeated shortly afterwards again
> *anyway*. Let's not get into platform specific games here.

The issue is what happens if there's a crash while the temp file is in
the middle of being filled.  I agree there are bigger problems to solve
in this area though.  Yes, I agree that a fixed name such as _copy.tmp
is a really bad choice for several reasons.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: [patch] pg_copy - a command for reliable WAL archiving

From
Greg Stark
Date:
On Wed, Aug 20, 2014 at 2:27 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Hmm, the real trick here is linkat(... "/proc/self/foobar"), not the
> O_TMPFILE: you can have an open file descriptor to an "invisible" file
> simply by creating a normal file and unlinking it.  I looked at linkat()
> yesterday but the idea of using /proc/self didn't occur to me.  Nasty
> trick :-(  It seems linkat() is quite a bit more portable than
> O_TMPFILE, fortunately ...

Supposedly linkat(2) on Linux refuses to create a link to a file that
was opened normally and had its last link removed with unlink(2) due
to concerns that this would create security holes.


-- 
greg



Re: [patch] pg_copy - a command for reliable WAL archiving

From
Andres Freund
Date:
On 2014-08-20 09:50:56 -0400, Alvaro Herrera wrote:
> Andres Freund wrote:
> > On 2014-08-19 17:42:06 -0400, Alvaro Herrera wrote:
> > > MauMau wrote:
> > > 
> > > > With that said, copying to a temporary file like <dest>.tmp and
> > > > renaming it to <dest> sounds worthwhile even as a basic copy
> > > > utility.  I want to avoid copying to a temporary file with a fixed
> > > > name like _copy.tmp, because some advanced utility may want to run
> > > > multiple instances of pg_copy to copy several files into the same
> > > > directory simultaneously.  However, I'm afraid multiple <dest>.tmp
> > > > files might continue to occupy disk space after canceling copy or
> > > > power failure in some use cases, where the copy of the same file
> > > > won't be retried.  That's also the reason why I chose to not use a
> > > > temporary file like cp/copy.
> > > 
> > > Is there a way to create a link to a file which only exists as an open
> > > file descriptor?   If there was, you could create a temp file, open an
> > > fd, then delete the file.  That would remove the issue with files being
> > > leaked due to failures of various kinds.
> > 
> > Isn't this a solution looking for a problem? We're using tempfiles in
> > dozens of other places and I really don't see why this is the place to
> > stop doing so. Just copy to <dest>.tmp and move it into place. If things
> > crash during that, the command will be repeated shortly afterwards again
> > *anyway*. Let's not get into platform specific games here.
> 
> The issue is what happens if there's a crash while the temp file is in
> the middle of being filled.

The archive command will be be run again a couple seconds and remove the
half-filled temp file.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: [patch] pg_copy - a command for reliable WAL archiving

From
Alvaro Herrera
Date:
Greg Stark wrote:
> On Wed, Aug 20, 2014 at 2:27 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
> > Hmm, the real trick here is linkat(... "/proc/self/foobar"), not the
> > O_TMPFILE: you can have an open file descriptor to an "invisible" file
> > simply by creating a normal file and unlinking it.  I looked at linkat()
> > yesterday but the idea of using /proc/self didn't occur to me.  Nasty
> > trick :-(  It seems linkat() is quite a bit more portable than
> > O_TMPFILE, fortunately ...
> 
> Supposedly linkat(2) on Linux refuses to create a link to a file that
> was opened normally and had its last link removed with unlink(2) due
> to concerns that this would create security holes.

Sigh.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: [patch] pg_copy - a command for reliable WAL archiving

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-08-20 09:50:56 -0400, Alvaro Herrera wrote:
>> Andres Freund wrote:
>>> Isn't this a solution looking for a problem? We're using tempfiles in
>>> dozens of other places and I really don't see why this is the place to
>>> stop doing so. Just copy to <dest>.tmp and move it into place.

>> The issue is what happens if there's a crash while the temp file is in
>> the middle of being filled.

> The archive command will be be run again a couple seconds and remove the
> half-filled temp file.

Alternatively, you could use the process PID as part of the temp file
name; which is probably a good idea anyway.
        regards, tom lane



Re: [patch] pg_copy - a command for reliable WAL archiving

From
Andres Freund
Date:
On 2014-08-20 10:19:33 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-08-20 09:50:56 -0400, Alvaro Herrera wrote:
> >> Andres Freund wrote:
> >>> Isn't this a solution looking for a problem? We're using tempfiles in
> >>> dozens of other places and I really don't see why this is the place to
> >>> stop doing so. Just copy to <dest>.tmp and move it into place.
> 
> >> The issue is what happens if there's a crash while the temp file is in
> >> the middle of being filled.
> 
> > The archive command will be be run again a couple seconds and remove the
> > half-filled temp file.
> 
> Alternatively, you could use the process PID as part of the temp file
> name; which is probably a good idea anyway.

I think that's actually worse, because nothing will clean up those
unless you explicitly scan for all <whatever>.$pid files, and somehow
kill them.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: [patch] pg_copy - a command for reliable WAL archiving

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-08-20 10:19:33 -0400, Tom Lane wrote:
>> Alternatively, you could use the process PID as part of the temp file
>> name; which is probably a good idea anyway.

> I think that's actually worse, because nothing will clean up those
> unless you explicitly scan for all <whatever>.$pid files, and somehow
> kill them.

True.  As long as the copy command is prepared to get rid of a
pre-existing target file, using a fixed .tmp extension should be fine.
        regards, tom lane



Re: [patch] pg_copy - a command for reliable WAL archiving

From
Bruce Momjian
Date:
On Wed, Aug 20, 2014 at 10:36:40AM -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-08-20 10:19:33 -0400, Tom Lane wrote:
> >> Alternatively, you could use the process PID as part of the temp file
> >> name; which is probably a good idea anyway.
> 
> > I think that's actually worse, because nothing will clean up those
> > unless you explicitly scan for all <whatever>.$pid files, and somehow
> > kill them.
> 
> True.  As long as the copy command is prepared to get rid of a
> pre-existing target file, using a fixed .tmp extension should be fine.

Well, then we are back to this comment by MauMau:

> With that said, copying to a temporary file like <dest>.tmp and
> renaming it to <dest> sounds worthwhile even as a basic copy utility.
> I want to avoid copying to a temporary file with a fixed name like
> _copy.tmp, because some advanced utility may want to run multiple
> instances of pg_copy to copy several files into the same directory
> simultaneously.  However, I'm afraid multiple <dest>.tmp files might
> continue to occupy disk space after canceling copy or power failure in
> some use cases, where the copy of the same file won't be retried.
> That's also the reason why I chose to not use a temporary file like
> cp/copy.

Do we want cases where the same directory is used multiple pg_copy
processes?  I can't imagine how that setup would make sense.

I am thinking pg_copy should emit a warning message when it removes an
old temp file.  This might alert people that something odd is happening
if they see the message often.

The pid-extension idea would work as pg_copy can test all pid extension
files to see if the pid is still active.  However, that assumes that the
pid is running on the local machine and not on another machines that has
NFS-mounted this directory, so maybe this is a bad idea, but again, we
are back to the idea that only one process should be writing into this
directory.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: [patch] pg_copy - a command for reliable WAL archiving

From
Andres Freund
Date:
On 2014-08-20 18:58:05 -0400, Bruce Momjian wrote:
> On Wed, Aug 20, 2014 at 10:36:40AM -0400, Tom Lane wrote:
> > Andres Freund <andres@2ndquadrant.com> writes:
> > > On 2014-08-20 10:19:33 -0400, Tom Lane wrote:
> > >> Alternatively, you could use the process PID as part of the temp file
> > >> name; which is probably a good idea anyway.
> > 
> > > I think that's actually worse, because nothing will clean up those
> > > unless you explicitly scan for all <whatever>.$pid files, and somehow
> > > kill them.
> > 
> > True.  As long as the copy command is prepared to get rid of a
> > pre-existing target file, using a fixed .tmp extension should be fine.
> 
> Well, then we are back to this comment by MauMau:

> > With that said, copying to a temporary file like <dest>.tmp and
> > renaming it to <dest> sounds worthwhile even as a basic copy utility.
> > I want to avoid copying to a temporary file with a fixed name like
> > _copy.tmp, because some advanced utility may want to run multiple
> > instances of pg_copy to copy several files into the same directory
> > simultaneously.  However, I'm afraid multiple <dest>.tmp files might
> > continue to occupy disk space after canceling copy or power failure in
> > some use cases, where the copy of the same file won't be retried.
> > That's also the reason why I chose to not use a temporary file like
> > cp/copy.
> 
> Do we want cases where the same directory is used multiple pg_copy
> processes?  I can't imagine how that setup would make sense.

I don't think anybody is proposing the _copy.tmp proposal. We've just
argued about the risk of <dest>.tmp. And I argued - and others seem to
agree - the space usage problem isn't really relevant because archive
commands and such are rerun after failure and can then clean up the temp
file again.

> I am thinking pg_copy should emit a warning message when it removes an
> old temp file.  This might alert people that something odd is happening
> if they see the message often.

Don't really see a point in this. If the archive command or such failed,
that will already have been logged. I'd expect this to be implemented by
passing O_CREAT | O_TRUNC to open(), nothing else.

> The pid-extension idea would work as pg_copy can test all pid extension
> files to see if the pid is still active.  However, that assumes that the
> pid is running on the local machine and not on another machines that has
> NFS-mounted this directory, so maybe this is a bad idea, but again, we
> are back to the idea that only one process should be writing into this
> directory.

I don't actually think we should assume that. There very well could be
one process running an archive command, using differently prefixed file
names or such.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services