Thread: [PATCH] Atomic pgrename on Windows

[PATCH] Atomic pgrename on Windows

From
Alexander Korotkov
Date:
Hi!

It's assumed in PostgreSQL codebase that pgrename atomically replaces target file with source file even if target file is open and being read by another process.  And this assumption is true on Linux, but it's false on Windows.  MoveFileEx() triggers an error when target file is open (and accordingly locked).  Some our customers has been faced such errors while operating heavily loaded PostgreSQL instance on Windows.

LOG could not rename temporary statistics file "pg_stat_tmp/global.tmp" to "pg_stat_tmp/global.stat": Permission denied

I've managed to reproduce this situation.  Reliable reproducing of this issue required patch to PostgreSQL core.  I've written slow-read-statfiles.patch for artificial slowdown of pgstat_read_statsfiles() – sleep 100 ms after each statfile entry.  If you run make-100-dbs.sql on patched version, and then few times execute 

select pg_stat_get_tuples_inserted('t1'::regclass);

in psql, then you would likely get the error above on Windows.

Attached patch atomic-pgrename-windows-1.patch fixes this problem.  It appears to be possible to atomically replace file on Windows – ReplaceFile() does that.  ReplaceFiles() requires target file to exist, this is why we still need to call MoveFileEx() when it doesn't exist.

This patch is based on work of Victor Spirin who was asked by Postgres Pro to research this problem.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment

Re: [PATCH] Atomic pgrename on Windows

From
Michael Paquier
Date:
On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> Attached patch atomic-pgrename-windows-1.patch fixes this problem.  It
> appears to be possible to atomically replace file on Windows – ReplaceFile()
> does that.  ReplaceFiles() requires target file to exist, this is why we
> still need to call MoveFileEx() when it doesn't exist.

Do you think that it could be safer to unlink the target file first
with pgunlink()? This way you make sure that the target file is
removed and not locked. This change makes me worrying about the
introduction of more race conditions.

> This patch is based on work of Victor Spirin who was asked by Postgres Pro
> to research this problem.

Victor has no community account so his name cannot be registered as a
co-author of the patch. I have added your name though.
--
Michael


Re: [PATCH] Atomic pgrename on Windows

From
Andres Freund
Date:
On 2017-11-28 09:47:45 +0900, Michael Paquier wrote:
> On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > Attached patch atomic-pgrename-windows-1.patch fixes this problem.  It
> > appears to be possible to atomically replace file on Windows – ReplaceFile()
> > does that.  ReplaceFiles() requires target file to exist, this is why we
> > still need to call MoveFileEx() when it doesn't exist.
> 
> Do you think that it could be safer to unlink the target file first
> with pgunlink()? This way you make sure that the target file is
> removed and not locked. This change makes me worrying about the
> introduction of more race conditions.

That seems like a *seriously* bad idea. What if we crash inbetween the
unlink and the rename?


I'm confused about the need for this. Shouldn't normally
opening all files FILE_SHARE_DELETE take care of this? See
https://msdn.microsoft.com/en-us/library/windows/desktop/aa363858(v=vs.85).aspx
"Note  Delete access allows both delete and rename operations."

Is there an external process active that doesn't set that flag? Are we
missing setting it somewhere?

Greetings,

Andres Freund


Re: [PATCH] Atomic pgrename on Windows

From
Craig Ringer
Date:
On 27 November 2017 at 14:28, Alexander Korotkov wrote: > Hi! > > It's assumed in PostgreSQL codebase that pgrename atomically replaces > target file with source file even if target file is open and being read by > another process. And this assumption is true on Linux, but it's false on > Windows. MoveFileEx() triggers an error when target file is open (and > accordingly locked). Some our customers has been faced such errors while > operating heavily loaded PostgreSQL instance on Windows. > > LOG could not rename temporary statistics file "pg_stat_tmp/global.tmp" to > "pg_stat_tmp/global.stat": Permission denied > That would explain a number of intermittent reports Iv'e seen floating around. > Attached patch atomic-pgrename-windows-1.patch fixes this problem. It > appears to be possible to atomically replace file on Windows – > ReplaceFile() does that. ReplaceFiles() requires target file to exist, > this is why we still need to call MoveFileEx() when it doesn't exist. > Look at the error codes for ReplaceFile: https://msdn.microsoft.com/en-us/library/aa365512(VS.85).aspx The docs don't say it's atomic and the error codes suggest it may not be. But there's a Microsoft Research paper claiming it's atomic - http://research.microsoft.com/pubs/64525/tr-2006-45.pdf . It appears that MoveFileTransacted would be what we really want, when we're on NTFS (it won't work on FAT32 or network shares). See https://msdn.microsoft.com/en-us/library/windows/desktop/aa365241(v=vs.85).aspx . But the docs have a preface warning that it's not recommended and may not be available in future Windows versions, so that's not an option. This Go language bug (https://github.com/golang/go/issues/8914) and this cPython discussion (http://bugs.python.org/issue8828)) have discussion. I found this comment particularly illuminating https://bugs.python.org/msg146307 as it quotes what Java does. It uses MoveFileEx. See also: * https://stackoverflow.com/questions/167414/is-an-atomic-file-rename-with-overwrite-possible-on-windows * https://blogs.msdn.microsoft.com/adioltean/2005/12/28/how-to-do-atomic-writes-in-a-file/ * https://msdn.microsoft.com/en-us/library/aa365512(VS.85).aspx * https://msdn.microsoft.com/en-us/library/windows/desktop/hh802690(v=vs.85).aspx (I sincerely hope that that blog post about atomic writes, which is 12 years old, is obsoleted by some new functionality. Because seriously, WTF.) -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [PATCH] Atomic pgrename on Windows

From
Alexander Korotkov
Date:
On Tue, Nov 28, 2017 at 3:59 AM, Andres Freund wrote: > On 2017-11-28 09:47:45 +0900, Michael Paquier wrote: > > On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov > > wrote: > > > Attached patch atomic-pgrename-windows-1.patch fixes this problem. It > > > appears to be possible to atomically replace file on Windows – > ReplaceFile() > > > does that. ReplaceFiles() requires target file to exist, this is why > we > > > still need to call MoveFileEx() when it doesn't exist. > > > > Do you think that it could be safer to unlink the target file first > > with pgunlink()? This way you make sure that the target file is > > removed and not locked. This change makes me worrying about the > > introduction of more race conditions. > > That seems like a *seriously* bad idea. What if we crash inbetween the > unlink and the rename? > > > I'm confused about the need for this. Shouldn't normally > opening all files FILE_SHARE_DELETE take care of this? See > https://msdn.microsoft.com/en-us/library/windows/desktop/ > aa363858(v=vs.85).aspx > "Note Delete access allows both delete and rename operations." > > Is there an external process active that doesn't set that flag? I'm quite sure there was no such processed during my experimentation. No antivirus or other disturbers. Moreover, error reproduces only with artificial delay in pgstat_read_statsfiles(). So, it's clearly related to lock placed by this function. > Are we missing setting it somewhere? > That's curious, but at least pgstat_read_statsfiles() seems to open file with that flag. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company

Re: [PATCH] Atomic pgrename on Windows

From
Alexander Korotkov
Date:
On Tue, Nov 28, 2017 at 5:52 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Tue, Nov 28, 2017 at 3:59 AM, Andres Freund <andres@anarazel.de> wrote:
On 2017-11-28 09:47:45 +0900, Michael Paquier wrote:
> On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > Attached patch atomic-pgrename-windows-1.patch fixes this problem.  It
> > appears to be possible to atomically replace file on Windows – ReplaceFile()
> > does that.  ReplaceFiles() requires target file to exist, this is why we
> > still need to call MoveFileEx() when it doesn't exist.
>
> Do you think that it could be safer to unlink the target file first
> with pgunlink()? This way you make sure that the target file is
> removed and not locked. This change makes me worrying about the
> introduction of more race conditions.

That seems like a *seriously* bad idea. What if we crash inbetween the
unlink and the rename?


I'm confused about the need for this. Shouldn't normally
opening all files FILE_SHARE_DELETE take care of this? See
https://msdn.microsoft.com/en-us/library/windows/desktop/aa363858(v=vs.85).aspx
"Note  Delete access allows both delete and rename operations."

Is there an external process active that doesn't set that flag?

I'm quite sure there was no such processed during my experimentation.  No antivirus or other disturbers.  Moreover, error reproduces only with artificial delay in pgstat_read_statsfiles().  So, it's clearly related to lock placed by this function.
 
Are we missing setting it somewhere?

That's curious, but at least pgstat_read_statsfiles() seems to open file with that flag.

I wrote same console program to verify that windows API behaves so, and it's not something odd inside PostgreSQL.  Source code is attached.

So, with FILE_SHARE_DELETE  flag you really can delete opened file concurrently.

Session #1
rename_test.exe Open 1.txt
Opened
Press any key to continue . . .
    Session #2
    rename_test.exe Delete 1.txt
    DeleteFile success
Closed

And you can rename it concurrently.

Session #1
rename_test.exe Open 1.txt
Opened
Press any key to continue . . .
    Session #2
    rename_test.exe MoveFileEx 1.txt 2.txt
    MoveFileEx successClosed
Closed

But you can't replace it concurrently with another file.  So as msdn states, you can either delete or rename opened file concurrently.  But you can't replace it...

Session #1
rename_test.exe Open 1.txt
Opened
Press any key to continue . . .
    Session #2
    rename_test.exe MoveFileEx 2.txt 1.txt
    MoveFileEx error: 5
Closed

But ReplaceFile works OK.  Temporary file lives until session #1 close the file.

Session #1
rename_test.exe Open 1.txt
Opened
Press any key to continue . . .
    Session #2
    rename_test.exe ReplaceFile 2.txt 1.txt
    ReplaceFile success
Closed

I didn't try MoveFileTransacted, because deprecated function doesn't seem like an option.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Attachment

Re: [PATCH] Atomic pgrename on Windows

From
Alexander Korotkov
Date:
On Tue, Nov 28, 2017 at 5:02 AM, Craig Ringer wrote: > On 27 November 2017 at 14:28, Alexander Korotkov < > a.korotkov@postgrespro.ru> wrote: > >> It's assumed in PostgreSQL codebase that pgrename atomically replaces >> target file with source file even if target file is open and being read by >> another process. And this assumption is true on Linux, but it's false on >> Windows. MoveFileEx() triggers an error when target file is open (and >> accordingly locked). Some our customers has been faced such errors while >> operating heavily loaded PostgreSQL instance on Windows. >> >> LOG could not rename temporary statistics file "pg_stat_tmp/global.tmp" >> to "pg_stat_tmp/global.stat": Permission denied >> > > That would explain a number of intermittent reports Iv'e seen floating > around. > Yeah. Attached patch atomic-pgrename-windows-1.patch fixes this problem. It >> appears to be possible to atomically replace file on Windows – >> ReplaceFile() does that. ReplaceFiles() requires target file to exist, >> this is why we still need to call MoveFileEx() when it doesn't exist. >> > > Look at the error codes for ReplaceFile: > > https://msdn.microsoft.com/en-us/library/aa365512(VS.85).aspx > > The docs don't say it's atomic and the error codes suggest it may not be. > But there's a Microsoft Research paper claiming it's atomic - > http://research.microsoft.com/pubs/64525/tr-2006-45.pdf . > > It appears that MoveFileTransacted would be what we really want, when > we're on NTFS (it won't work on FAT32 or network shares). See > https://msdn.microsoft.com/en-us/library/windows/ > desktop/aa365241(v=vs.85).aspx . But the docs have a preface warning that > it's not recommended and may not be available in future Windows versions, > so that's not an option. > > This Go language bug (https://github.com/golang/go/issues/8914) and this > cPython discussion (http://bugs.python.org/issue8828)) have discussion. > > I found this comment particularly illuminating https://bugs.python.org/ > msg146307 as it quotes what Java does. It uses MoveFileEx. > That's look bad. MoveFileTransacted() looks like best option, but it's not an option since it's deprecated. For now, ReplaceFile() function looks like best choice for renaming statfiles. But it doesn't look good for critical datafiles whose are also renamed using pgrename, because system can crash between rename of destination file and rename of source file. Thus, MoveFileEx() seems still best solution for critical datafiles where safety is more important than concurrency. After reading provided links, I'm very suspicious about its safety too. But it's seems like best available solution and it have already passed some test of time... ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company

Re: [PATCH] Atomic pgrename on Windows

From
Craig Ringer
Date:
On 29 November 2017 at 00:16, Alexander Korotkov wrote: > > For now, ReplaceFile() function looks like best choice for renaming > statfiles. But it doesn't look good for critical datafiles whose are also > renamed using pgrename, because system can crash between rename of > destination file and rename of source file. Thus, MoveFileEx() seems still > best solution for critical datafiles where safety is more important than > concurrency. After reading provided links, I'm very suspicious about its > safety too. But it's seems like best available solution and it have > already passed some test of time... > Yeah. Or alternately, we might need to extend WAL to include any temp file names, etc, needed to allow us to recover from an interrupted operation during redo. Frankly that may be the safest option; on Windows presume that there is no atomic rename we can trust for critical files, and do it in multiple steps. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [PATCH] Atomic pgrename on Windows

From
Andres Freund
Date:
On 2017-11-29 10:13:32 +0800, Craig Ringer wrote:
> On 29 November 2017 at 00:16, Alexander Korotkov <a.korotkov@postgrespro.ru>
> wrote:
> 
> >
> > For now, ReplaceFile() function looks like best choice for renaming
> > statfiles.  But it doesn't look good for critical datafiles whose are also
> > renamed using pgrename, because system can crash between rename of
> > destination file and rename of source file.  Thus, MoveFileEx() seems still
> > best solution for critical datafiles where safety is more important than
> > concurrency.  After reading provided links, I'm very suspicious about its
> > safety too.  But it's seems like best available solution and it have
> > already passed some test of time...

Or we can just add some retry logic like we have for opening files...


> Or alternately, we might need to extend WAL to include any temp file names,
> etc, needed to allow us to recover from an interrupted operation during
> redo. Frankly that may be the safest option; on Windows presume that there
> is no atomic rename we can trust for critical files, and do it in multiple
> steps.

I'd rather remove windows support than go there.

Greetings,

Andres Freund


Re: [PATCH] Atomic pgrename on Windows

From
Stephen Frost
Date:
Greetings Alexander, all,

* Alexander Korotkov (a.korotkov@postgrespro.ru) wrote:
> Attached patch atomic-pgrename-windows-1.patch fixes this problem.  It
> appears to be possible to atomically replace file on Windows –
> ReplaceFile() does that.  ReplaceFiles() requires target file to exist,
> this is why we still need to call MoveFileEx() when it doesn't exist.

There's been some discussion on this patch and it seems to, at least,
improve the situation on Windows even if it might not completely address
the issues.  Does anyone else want to take a look and comment,
particularly those familiar with the Windows side of things?  Otherwise,
I'm afraid this patch may end up just sitting in Needs review state fo
the entire CF and not getting moved forward, even though it seems like a
good improvement for our Windows users.

Thanks!

Stephen

Attachment

Re: [PATCH] Atomic pgrename on Windows

From
Magnus Hagander
Date:


On Tue, Nov 28, 2017 at 2:47 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> Attached patch atomic-pgrename-windows-1.patch fixes this problem.  It
> appears to be possible to atomically replace file on Windows – ReplaceFile()
> does that.  ReplaceFiles() requires target file to exist, this is why we
> still need to call MoveFileEx() when it doesn't exist.

Do you think that it could be safer to unlink the target file first
with pgunlink()? This way you make sure that the target file is
removed and not locked. This change makes me worrying about the
introduction of more race conditions.

Unlinking it first seems dangerous, as pointed out by Andres.

What about first trying ReplaceFile() and then if it fails with "target doesn't exist", then call MoveFileEx().

Or the other way around -- try MoveFileEx() first since that seems to work most of the time today (if it mostly broke we'd be in trouble already), and if it fails with a sharing violation, try ReplaceFile()? And perhaps end up doing it something similar to what we do with shared memory which is just to loop over it and try  each a couple of time, before giving up and failing? 

--

Re: Re: [PATCH] Atomic pgrename on Windows

From
David Steele
Date:
Hi Alexander,

On 1/20/18 10:13 AM, Magnus Hagander wrote:
> 
> On Tue, Nov 28, 2017 at 2:47 AM, Michael Paquier
> <michael.paquier@gmail.com <mailto:michael.paquier@gmail.com>> wrote:
> 
>     On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov
>     <a.korotkov@postgrespro.ru <mailto:a.korotkov@postgrespro.ru>> wrote:
>     > Attached patch atomic-pgrename-windows-1.patch fixes this problem.  It
>     > appears to be possible to atomically replace file on Windows – ReplaceFile()
>     > does that.  ReplaceFiles() requires target file to exist, this is why we
>     > still need to call MoveFileEx() when it doesn't exist.
> 
>     Do you think that it could be safer to unlink the target file first
>     with pgunlink()? This way you make sure that the target file is
>     removed and not locked. This change makes me worrying about the
>     introduction of more race conditions.
> 
> Unlinking it first seems dangerous, as pointed out by Andres.
> 
> What about first trying ReplaceFile() and then if it fails with "target
> doesn't exist", then call MoveFileEx().
> 
> Or the other way around -- try MoveFileEx() first since that seems to
> work most of the time today (if it mostly broke we'd be in trouble
> already), and if it fails with a sharing violation, try ReplaceFile()?
> And perhaps end up doing it something similar to what we do with shared
> memory which is just to loop over it and try  each a couple of time,
> before giving up and failing? 

This patch was mistakenly left as Needs Review during the last
commitfest but it's pretty clear that a new patch is required.

This certainly sounds like a non-trivial change.  Perhaps it should be
submitted for PG12?

Thanks,
-- 
-David
david@pgmasters.net


Re: Re: [PATCH] Atomic pgrename on Windows

From
Alexander Korotkov
Date:
Hi, David!

On Tue, Mar 6, 2018 at 5:04 PM, David Steele <david@pgmasters.net> wrote:
On 1/20/18 10:13 AM, Magnus Hagander wrote:
>
> On Tue, Nov 28, 2017 at 2:47 AM, Michael Paquier
> <michael.paquier@gmail.com <mailto:michael.paquier@gmail.com>> wrote:
>
>     On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov
>     <a.korotkov@postgrespro.ru <mailto:a.korotkov@postgrespro.ru>> wrote:
>     > Attached patch atomic-pgrename-windows-1.patch fixes this problem.  It
>     > appears to be possible to atomically replace file on Windows – ReplaceFile()
>     > does that.  ReplaceFiles() requires target file to exist, this is why we
>     > still need to call MoveFileEx() when it doesn't exist.
>
>     Do you think that it could be safer to unlink the target file first
>     with pgunlink()? This way you make sure that the target file is
>     removed and not locked. This change makes me worrying about the
>     introduction of more race conditions.
>
> Unlinking it first seems dangerous, as pointed out by Andres.
>
> What about first trying ReplaceFile() and then if it fails with "target
> doesn't exist", then call MoveFileEx().
>
> Or the other way around -- try MoveFileEx() first since that seems to
> work most of the time today (if it mostly broke we'd be in trouble
> already), and if it fails with a sharing violation, try ReplaceFile()?
> And perhaps end up doing it something similar to what we do with shared
> memory which is just to loop over it and try  each a couple of time,
> before giving up and failing?

This patch was mistakenly left as Needs Review during the last
commitfest but it's pretty clear that a new patch is required.

OK!  No objections against marking this patch RWF.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: [PATCH] Atomic pgrename on Windows

From
David Steele
Date:
On 3/6/18 9:06 AM, Alexander Korotkov wrote:
> 
> On Tue, Mar 6, 2018 at 5:04 PM, David Steele <david@pgmasters.net
> <mailto:david@pgmasters.net>> wrote:
> 
>     On 1/20/18 10:13 AM, Magnus Hagander wrote:
>     >
>     > Unlinking it first seems dangerous, as pointed out by Andres.
>     >
>     > What about first trying ReplaceFile() and then if it fails with "target
>     > doesn't exist", then call MoveFileEx().
>     >
>     > Or the other way around -- try MoveFileEx() first since that seems to
>     > work most of the time today (if it mostly broke we'd be in trouble
>     > already), and if it fails with a sharing violation, try ReplaceFile()?
>     > And perhaps end up doing it something similar to what we do with shared
>     > memory which is just to loop over it and try  each a couple of time,
>     > before giving up and failing?
> 
>     This patch was mistakenly left as Needs Review during the last
>     commitfest but it's pretty clear that a new patch is required.
> 
> OK!  No objections against marking this patch RWF.

Hmmm, I just noticed this categorized as a bug.  I thought it was a
refactor.

Even so, it looks like the approach needs a rethink so better to wait
for that.

Marked Returned with Feedback.

Thanks,
-- 
-David
david@pgmasters.net


Re: [PATCH] Atomic pgrename on Windows

From
Alexander Korotkov
Date:
On Tue, Mar 6, 2018 at 5:11 PM, David Steele <david@pgmasters.net> wrote:
On 3/6/18 9:06 AM, Alexander Korotkov wrote:
>
> On Tue, Mar 6, 2018 at 5:04 PM, David Steele <david@pgmasters.net
> <mailto:david@pgmasters.net>> wrote:
>
>     On 1/20/18 10:13 AM, Magnus Hagander wrote:
>     >
>     > Unlinking it first seems dangerous, as pointed out by Andres.
>     >
>     > What about first trying ReplaceFile() and then if it fails with "target
>     > doesn't exist", then call MoveFileEx().
>     >
>     > Or the other way around -- try MoveFileEx() first since that seems to
>     > work most of the time today (if it mostly broke we'd be in trouble
>     > already), and if it fails with a sharing violation, try ReplaceFile()?
>     > And perhaps end up doing it something similar to what we do with shared
>     > memory which is just to loop over it and try  each a couple of time,
>     > before giving up and failing?
>
>     This patch was mistakenly left as Needs Review during the last
>     commitfest but it's pretty clear that a new patch is required.
>
> OK!  No objections against marking this patch RWF.

Hmmm, I just noticed this categorized as a bug.  I thought it was a
refactor.
 
Yes, that's naturally a bug.  Not very critical though.

Even so, it looks like the approach needs a rethink so better to wait
for that.

In this thread we've found at least two possible approaches to fix this bug.  But both of them need to be implemented and tested.
 
Marked Returned with Feedback.

OK!

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
 

Re: [PATCH] Atomic pgrename on Windows

From
Alexander Korotkov
Date:
Hi!

I'd like to resume the discussion on this subject.  Sorry for so long delay.

On Sat, Jan 20, 2018 at 6:13 PM Magnus Hagander <magnus@hagander.net> wrote:
> On Tue, Nov 28, 2017 at 2:47 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
>>
>> On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov
>> <a.korotkov@postgrespro.ru> wrote:
>> > Attached patch atomic-pgrename-windows-1.patch fixes this problem.  It
>> > appears to be possible to atomically replace file on Windows – ReplaceFile()
>> > does that.  ReplaceFiles() requires target file to exist, this is why we
>> > still need to call MoveFileEx() when it doesn't exist.
>>
>> Do you think that it could be safer to unlink the target file first
>> with pgunlink()? This way you make sure that the target file is
>> removed and not locked. This change makes me worrying about the
>> introduction of more race conditions.
>
> Unlinking it first seems dangerous, as pointed out by Andres.
>
> What about first trying ReplaceFile() and then if it fails with "target doesn't exist", then call MoveFileEx().
>
> Or the other way around -- try MoveFileEx() first since that seems to work most of the time today (if it mostly broke
we'dbe in trouble already), and if it fails with a sharing violation, try ReplaceFile()? And perhaps end up doing it
somethingsimilar to what we do with shared memory which is just to loop over it and try  each a couple of time, before
givingup and failing? 

Unfortunately, it seems that none of such strategies would fit all the cases.

Basically, we have two option for renaming a file.
 * MoveFileEx() – safest possible option, less likely loose files, but
takes a lock on target.
 * ReplaceFile() – "lockless" option, but may loose files on OS crash.

Also we have two different cases of files renaming:
1) Renaming durable files.  When target already exists, on OS crash we
expect finding either old or new file in target filename.  We don't
expect to find nothing in target filename.
2) Renaming temporary files.  In this case we don't much care on
loosing files on OS crash.  But locking appears to be an issue in some
cases.

So, in 1st case it doesn't seem like an option to try ReplaceFile()
either in first or second place.  In both ways it causes a risk to
loose a target file, which seems unacceptable.

In the 2nd case, trying MoveFileEx() then ReplaceFile() or vise versa
might work.  But error codes of these functions doesn't tell
explicitly whether target file exists.  So, I prefer checking it
explicitly with GetFileAttributes().

Attached patch implements idea described in [1].  It implements
separate pgrename_temp() function, which is better for concurrent
operation but less safe.

Links
1. https://www.postgresql.org/message-id/CAPpHfds9trA6ipezK3BsuuOSQwEmESiqj8pkOxACFJpoLpcoNw%40mail.gmail.com

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: [PATCH] Atomic pgrename on Windows

From
Tomas Vondra
Date:
On Tue, Aug 06, 2019 at 09:03:08PM +0300, Alexander Korotkov wrote:
>
> ...
>
> Unfortunately, it seems that none of such strategies would fit all the
> cases.
>
> Basically, we have two option for renaming a file.  * MoveFileEx() –
> safest possible option, less likely loose files, but takes a lock on
> target.  * ReplaceFile() – "lockless" option, but may loose files on
> OS crash.
>
> Also we have two different cases of files renaming: 1) Renaming
> durable files.  When target already exists, on OS crash we expect
> finding either old or new file in target filename.  We don't expect to
> find nothing in target filename.  2) Renaming temporary files.  In
> this case we don't much care on loosing files on OS crash.  But
> locking appears to be an issue in some cases.
>
> So, in 1st case it doesn't seem like an option to try ReplaceFile()
> either in first or second place.  In both ways it causes a risk to
> loose a target file, which seems unacceptable.
>
> In the 2nd case, trying MoveFileEx() then ReplaceFile() or vise versa
> might work.  But error codes of these functions doesn't tell explicitly
> whether target file exists.  So, I prefer checking it explicitly with
> GetFileAttributes().
>
> Attached patch implements idea described in [1].  It implements
> separate pgrename_temp() function, which is better for concurrent
> operation but less safe.
>

I don't have access to a Windows machine and my developer experience
with that platform is pretty much nil, but I think this patch makes
sense. It's not an ideal solution, but it's not clear such solution
exists, and an improvement is better than nothing.

I have two minor comments about rename_temp:

1) The name might seem to be hinting it's about files opened using
OpenTemporaryFile, but in practice it's about files that are not
critical. But maybe it's true.

2) I think the rename_temp comment should mention it can only be used in
cases when the renames happen in a single process (non-concurrently). If
we could call rename_temp() concurrently from two different processes,
it won't work as expected. Of course, we only call rename_temp from stat
collector and syslogger, where it obviously works.

Anyway, I'm really nitpicking here ...

Are there any objections to get the current patch committed as is, so
that it does not get pushed yet again to the next commitfest.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] Atomic pgrename on Windows

From
Alexander Korotkov
Date:
Hi!

On Tue, Jan 7, 2020 at 7:16 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> I don't have access to a Windows machine and my developer experience
> with that platform is pretty much nil, but I think this patch makes
> sense. It's not an ideal solution, but it's not clear such solution
> exists, and an improvement is better than nothing.

Thank you for your attention to this patch!

> I have two minor comments about rename_temp:
>
> 1) The name might seem to be hinting it's about files opened using
> OpenTemporaryFile, but in practice it's about files that are not
> critical. But maybe it's true.


We may invent another name.  What about rename_fragile()?

> 2) I think the rename_temp comment should mention it can only be used in
> cases when the renames happen in a single process (non-concurrently). If
> we could call rename_temp() concurrently from two different processes,
> it won't work as expected. Of course, we only call rename_temp from stat
> collector and syslogger, where it obviously works.

Good point, this should be reflected in comments.

> Anyway, I'm really nitpicking here ...
>
> Are there any objections to get the current patch committed as is, so
> that it does not get pushed yet again to the next commitfest.

It would be good to commit.  Let's get agreement on function name first.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [PATCH] Atomic pgrename on Windows

From
Tom Lane
Date:
Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> On Tue, Jan 7, 2020 at 7:16 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> I don't have access to a Windows machine and my developer experience
>> with that platform is pretty much nil, but I think this patch makes
>> sense. It's not an ideal solution, but it's not clear such solution
>> exists, and an improvement is better than nothing.

> Thank you for your attention to this patch!

FWIW, I don't like this patch much at all.  I too know nothing about
Windows, but I do *not* like inventing a distinction between "rename"
and "rename_temp" and expecting all call sites to have to decide
which one to use.  That's allowing a single platform's implementation
bugs to dictate our APIs globally; plus it's not clear that every
call site can know which is more appropriate.

            regards, tom lane



Re: [PATCH] Atomic pgrename on Windows

From
Alexander Korotkov
Date:
On Tue, Jan 7, 2020 at 9:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> > On Tue, Jan 7, 2020 at 7:16 PM Tomas Vondra
> > <tomas.vondra@2ndquadrant.com> wrote:
> >> I don't have access to a Windows machine and my developer experience
> >> with that platform is pretty much nil, but I think this patch makes
> >> sense. It's not an ideal solution, but it's not clear such solution
> >> exists, and an improvement is better than nothing.
>
> > Thank you for your attention to this patch!
>
> FWIW, I don't like this patch much at all.  I too know nothing about
> Windows, but I do *not* like inventing a distinction between "rename"
> and "rename_temp" and expecting all call sites to have to decide
> which one to use.  That's allowing a single platform's implementation
> bugs to dictate our APIs globally; plus it's not clear that every
> call site can know which is more appropriate.

I'm not sure issue we faced is really about single platform.  TBH, the
assumptions we place to rename function is very strict.  We assume
rename works atomically on system crash.  And we indirectly assume it
can work concurrently as least with file readers.  The both are
probably true for Linux with most popular filesystems.  But we do
support pretty many platforms.  I think the issue we didn't
investigate rename properties well on all of them.  But if we do, it
might happen some assumptions are wrong on multiple platforms.
Windows is just used busy enough to spot the problem.


------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [PATCH] Atomic pgrename on Windows

From
Tom Lane
Date:
Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> I'm not sure issue we faced is really about single platform.  TBH, the
> assumptions we place to rename function is very strict.  We assume
> rename works atomically on system crash.  And we indirectly assume it
> can work concurrently as least with file readers.  The both are
> probably true for Linux with most popular filesystems.  But we do
> support pretty many platforms.  I think the issue we didn't
> investigate rename properties well on all of them.  But if we do, it
> might happen some assumptions are wrong on multiple platforms.
> Windows is just used busy enough to spot the problem.

Well, atomic rename is required by POSIX.  True, we have found bugs
related to that in one or two Unix-ish platforms.  But nobody
is going to deny that those are OS bugs that the OS ought to fix,
rather than accepted behaviors that applications have to find
some way to work around.  I'm not pleased with the idea that
Windows' deficiencies in this area should result in kluges all over
our code.  I think we should stick to the autoconf recommendation:

    Autoconf follows a philosophy that was formed over the years by
    those who have struggled for portability: isolate the portability
    issues in specific files, and then program as if you were in a
    Posix environment.

            regards, tom lane



Re: [PATCH] Atomic pgrename on Windows

From
Alexander Korotkov
Date:
On Tue, Jan 7, 2020 at 11:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> > I'm not sure issue we faced is really about single platform.  TBH, the
> > assumptions we place to rename function is very strict.  We assume
> > rename works atomically on system crash.  And we indirectly assume it
> > can work concurrently as least with file readers.  The both are
> > probably true for Linux with most popular filesystems.  But we do
> > support pretty many platforms.  I think the issue we didn't
> > investigate rename properties well on all of them.  But if we do, it
> > might happen some assumptions are wrong on multiple platforms.
> > Windows is just used busy enough to spot the problem.
>
> Well, atomic rename is required by POSIX.  True, we have found bugs
> related to that in one or two Unix-ish platforms.  But nobody
> is going to deny that those are OS bugs that the OS ought to fix,
> rather than accepted behaviors that applications have to find
> some way to work around.  I'm not pleased with the idea that
> Windows' deficiencies in this area should result in kluges all over
> our code.  I think we should stick to the autoconf recommendation:
>
>     Autoconf follows a philosophy that was formed over the years by
>     those who have struggled for portability: isolate the portability
>     issues in specific files, and then program as if you were in a
>     Posix environment.

I get this point: we want to program postgres like it's working in
perfect POSIX environment.

And POSIX standard really requires rename() to work concurrently with
file accesses.

"If the link named by the new argument exists and the file's link
count becomes 0 when it is removed and no process has the file open,
the space occupied by the file shall be freed and the file shall no
longer be accessible. If one or more processes have the file open when
the last link is removed, the link shall be removed before rename()
returns, but the removal of the file contents shall be postponed until
all references to the file are closed."

But issue is that on Windows POSIX rename() is kind of impossible to
implement.  And I suspect other platforms may have issues too.

Regarding "pg_stat_tmp/global.stat", which is a problem in particular
case, we may evade file renaming altogether.  Instead, we may
implement shared-memory counter for filename.  So, instead of
renaming, new reads will just come to new file.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [PATCH] Atomic pgrename on Windows

From
David Steele
Date:
On 1/11/20 5:13 PM, Alexander Korotkov wrote:
> On Tue, Jan 7, 2020 at 11:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
> "If the link named by the new argument exists and the file's link
> count becomes 0 when it is removed and no process has the file open,
> the space occupied by the file shall be freed and the file shall no
> longer be accessible. If one or more processes have the file open when
> the last link is removed, the link shall be removed before rename()
> returns, but the removal of the file contents shall be postponed until
> all references to the file are closed."
> 
> But issue is that on Windows POSIX rename() is kind of impossible to
> implement.  And I suspect other platforms may have issues too.
> 
> Regarding "pg_stat_tmp/global.stat", which is a problem in particular
> case, we may evade file renaming altogether.  Instead, we may
> implement shared-memory counter for filename.  So, instead of
> renaming, new reads will just come to new file.

I tend to agree with Tom on the question of portability. But it seems 
upthread we have determined that this can't be sensibly isolated into a 
Windows-specific rename() function.

Does anyone have any further ideas? If not I feel like this patch is 
going to need to be RWF again.

Regards,
-- 
-David
david@pgmasters.net



Re: [PATCH] Atomic pgrename on Windows

From
Tom Lane
Date:
David Steele <david@pgmasters.net> writes:
> On 1/11/20 5:13 PM, Alexander Korotkov wrote:
>> Regarding "pg_stat_tmp/global.stat", which is a problem in particular
>> case, we may evade file renaming altogether.  Instead, we may
>> implement shared-memory counter for filename.  So, instead of
>> renaming, new reads will just come to new file.

> I tend to agree with Tom on the question of portability. But it seems 
> upthread we have determined that this can't be sensibly isolated into a 
> Windows-specific rename() function.
> Does anyone have any further ideas? If not I feel like this patch is 
> going to need to be RWF again.

So far as pg_stat_tmp is concerned, I think there is reasonable hope
that that problem is just going to go away in the near future.
I've not been paying attention to the shared-memory stats collector
thread so I'm not sure if that's anywhere near committable, but
I think that's clearly something we'll want once it's ready.

So if that's the main argument why we need this, it's a pretty
thin argument ...

            regards, tom lane



Re: [PATCH] Atomic pgrename on Windows

From
Andres Freund
Date:
Hi,

On March 30, 2020 9:17:00 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>So far as pg_stat_tmp is concerned, I think there is reasonable hope
>that that problem is just going to go away in the near future.
>I've not been paying attention to the shared-memory stats collector
>thread so I'm not sure if that's anywhere near committable, but
>I think that's clearly something we'll want once it's ready.

I'd give it a ~30-40%  chance for 13 at this point. The patch improved a lot after the review cycles in the last ~10
days,but still needs a good bit more polish. 

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.