Re: [PATCH] Atomic pgrename on Windows - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: [PATCH] Atomic pgrename on Windows
Date
Msg-id CAPpHfduMCu4rTw7V1vCug61kF1u8U+Jj6_hEdCTCFb5=sTOtcw@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Atomic pgrename on Windows  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
List pgsql-hackers
On Tue, Nov 28, 2017 at 5:52 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Tue, Nov 28, 2017 at 3:59 AM, Andres Freund <andres@anarazel.de> wrote:
On 2017-11-28 09:47:45 +0900, Michael Paquier wrote:
> On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > Attached patch atomic-pgrename-windows-1.patch fixes this problem.  It
> > appears to be possible to atomically replace file on Windows – ReplaceFile()
> > does that.  ReplaceFiles() requires target file to exist, this is why we
> > still need to call MoveFileEx() when it doesn't exist.
>
> Do you think that it could be safer to unlink the target file first
> with pgunlink()? This way you make sure that the target file is
> removed and not locked. This change makes me worrying about the
> introduction of more race conditions.

That seems like a *seriously* bad idea. What if we crash inbetween the
unlink and the rename?


I'm confused about the need for this. Shouldn't normally
opening all files FILE_SHARE_DELETE take care of this? See
https://msdn.microsoft.com/en-us/library/windows/desktop/aa363858(v=vs.85).aspx
"Note  Delete access allows both delete and rename operations."

Is there an external process active that doesn't set that flag?

I'm quite sure there was no such processed during my experimentation.  No antivirus or other disturbers.  Moreover, error reproduces only with artificial delay in pgstat_read_statsfiles().  So, it's clearly related to lock placed by this function.
 
Are we missing setting it somewhere?

That's curious, but at least pgstat_read_statsfiles() seems to open file with that flag.

I wrote same console program to verify that windows API behaves so, and it's not something odd inside PostgreSQL.  Source code is attached.

So, with FILE_SHARE_DELETE  flag you really can delete opened file concurrently.

Session #1
rename_test.exe Open 1.txt
Opened
Press any key to continue . . .
    Session #2
    rename_test.exe Delete 1.txt
    DeleteFile success
Closed

And you can rename it concurrently.

Session #1
rename_test.exe Open 1.txt
Opened
Press any key to continue . . .
    Session #2
    rename_test.exe MoveFileEx 1.txt 2.txt
    MoveFileEx successClosed
Closed

But you can't replace it concurrently with another file.  So as msdn states, you can either delete or rename opened file concurrently.  But you can't replace it...

Session #1
rename_test.exe Open 1.txt
Opened
Press any key to continue . . .
    Session #2
    rename_test.exe MoveFileEx 2.txt 1.txt
    MoveFileEx error: 5
Closed

But ReplaceFile works OK.  Temporary file lives until session #1 close the file.

Session #1
rename_test.exe Open 1.txt
Opened
Press any key to continue . . .
    Session #2
    rename_test.exe ReplaceFile 2.txt 1.txt
    ReplaceFile success
Closed

I didn't try MoveFileTransacted, because deprecated function doesn't seem like an option.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256
Next
From: Robert Haas
Date:
Subject: Re: explain analyze output with parallel workers - question aboutmeaning of information for explain.depesz.com