Thread: [PATCH] Atomic pgrename on Windows
Hi!
It's assumed in PostgreSQL codebase that pgrename atomically replaces target file with source file even if target file is open and being read by another process. And this assumption is true on Linux, but it's false on Windows. MoveFileEx() triggers an error when target file is open (and accordingly locked). Some our customers has been faced such errors while operating heavily loaded PostgreSQL instance on Windows.
I've managed to reproduce this situation. Reliable reproducing of this issue required patch to PostgreSQL core. I've written slow-read-statfiles.patch for artificial slowdown of pgstat_read_statsfiles() – sleep 100 ms after each statfile entry. If you run make-100-dbs.sql on patched version, and then few times execute
in psql, then you would likely get the error above on Windows.
Attached patch atomic-pgrename-windows-1.patch fixes this problem. It appears to be possible to atomically replace file on Windows – ReplaceFile() does that. ReplaceFiles() requires target file to exist, this is why we still need to call MoveFileEx() when it doesn't exist.
This patch is based on work of Victor Spirin who was asked by Postgres Pro to research this problem.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment
On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > Attached patch atomic-pgrename-windows-1.patch fixes this problem. It > appears to be possible to atomically replace file on Windows – ReplaceFile() > does that. ReplaceFiles() requires target file to exist, this is why we > still need to call MoveFileEx() when it doesn't exist. Do you think that it could be safer to unlink the target file first with pgunlink()? This way you make sure that the target file is removed and not locked. This change makes me worrying about the introduction of more race conditions. > This patch is based on work of Victor Spirin who was asked by Postgres Pro > to research this problem. Victor has no community account so his name cannot be registered as a co-author of the patch. I have added your name though. -- Michael
On 2017-11-28 09:47:45 +0900, Michael Paquier wrote: > On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov > <a.korotkov@postgrespro.ru> wrote: > > Attached patch atomic-pgrename-windows-1.patch fixes this problem. It > > appears to be possible to atomically replace file on Windows – ReplaceFile() > > does that. ReplaceFiles() requires target file to exist, this is why we > > still need to call MoveFileEx() when it doesn't exist. > > Do you think that it could be safer to unlink the target file first > with pgunlink()? This way you make sure that the target file is > removed and not locked. This change makes me worrying about the > introduction of more race conditions. That seems like a *seriously* bad idea. What if we crash inbetween the unlink and the rename? I'm confused about the need for this. Shouldn't normally opening all files FILE_SHARE_DELETE take care of this? See https://msdn.microsoft.com/en-us/library/windows/desktop/aa363858(v=vs.85).aspx "Note Delete access allows both delete and rename operations." Is there an external process active that doesn't set that flag? Are we missing setting it somewhere? Greetings, Andres Freund
On 27 November 2017 at 14:28, Alexander Korotkov
wrote:
> Hi!
>
> It's assumed in PostgreSQL codebase that pgrename atomically replaces
> target file with source file even if target file is open and being read by
> another process. And this assumption is true on Linux, but it's false on
> Windows. MoveFileEx() triggers an error when target file is open (and
> accordingly locked). Some our customers has been faced such errors while
> operating heavily loaded PostgreSQL instance on Windows.
>
> LOG could not rename temporary statistics file "pg_stat_tmp/global.tmp" to
> "pg_stat_tmp/global.stat": Permission denied
>
That would explain a number of intermittent reports Iv'e seen floating
around.
> Attached patch atomic-pgrename-windows-1.patch fixes this problem. It
> appears to be possible to atomically replace file on Windows –
> ReplaceFile() does that. ReplaceFiles() requires target file to exist,
> this is why we still need to call MoveFileEx() when it doesn't exist.
>
Look at the error codes for ReplaceFile:
https://msdn.microsoft.com/en-us/library/aa365512(VS.85).aspx
The docs don't say it's atomic and the error codes suggest it may not be.
But there's a Microsoft Research paper claiming it's atomic -
http://research.microsoft.com/pubs/64525/tr-2006-45.pdf .
It appears that MoveFileTransacted would be what we really want, when we're
on NTFS (it won't work on FAT32 or network shares). See
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365241(v=vs.85).aspx
. But the docs have a preface warning that it's not recommended and may not
be available in future Windows versions, so that's not an option.
This Go language bug (https://github.com/golang/go/issues/8914) and this
cPython discussion (http://bugs.python.org/issue8828)) have discussion.
I found this comment particularly illuminating
https://bugs.python.org/msg146307 as it quotes what Java does. It uses
MoveFileEx.
See also:
*
https://stackoverflow.com/questions/167414/is-an-atomic-file-rename-with-overwrite-possible-on-windows
*
https://blogs.msdn.microsoft.com/adioltean/2005/12/28/how-to-do-atomic-writes-in-a-file/
* https://msdn.microsoft.com/en-us/library/aa365512(VS.85).aspx
*
https://msdn.microsoft.com/en-us/library/windows/desktop/hh802690(v=vs.85).aspx
(I sincerely hope that that blog post about atomic writes, which is 12
years old, is obsoleted by some new functionality. Because seriously, WTF.)
-- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Nov 28, 2017 at 3:59 AM, Andres Freund wrote:
> On 2017-11-28 09:47:45 +0900, Michael Paquier wrote:
> > On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov
> > wrote:
> > > Attached patch atomic-pgrename-windows-1.patch fixes this problem. It
> > > appears to be possible to atomically replace file on Windows –
> ReplaceFile()
> > > does that. ReplaceFiles() requires target file to exist, this is why
> we
> > > still need to call MoveFileEx() when it doesn't exist.
> >
> > Do you think that it could be safer to unlink the target file first
> > with pgunlink()? This way you make sure that the target file is
> > removed and not locked. This change makes me worrying about the
> > introduction of more race conditions.
>
> That seems like a *seriously* bad idea. What if we crash inbetween the
> unlink and the rename?
>
>
> I'm confused about the need for this. Shouldn't normally
> opening all files FILE_SHARE_DELETE take care of this? See
> https://msdn.microsoft.com/en-us/library/windows/desktop/
> aa363858(v=vs.85).aspx
> "Note Delete access allows both delete and rename operations."
>
> Is there an external process active that doesn't set that flag?
I'm quite sure there was no such processed during my experimentation. No
antivirus or other disturbers. Moreover, error reproduces only with
artificial delay in pgstat_read_statsfiles(). So, it's clearly related to
lock placed by this function.
> Are we missing setting it somewhere?
>
That's curious, but at least pgstat_read_statsfiles() seems to open file
with that flag.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Tue, Nov 28, 2017 at 5:52 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
I didn't try MoveFileTransacted, because deprecated function doesn't seem like an option.
On Tue, Nov 28, 2017 at 3:59 AM, Andres Freund <andres@anarazel.de> wrote:On 2017-11-28 09:47:45 +0900, Michael Paquier wrote:
> On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > Attached patch atomic-pgrename-windows-1.patch fixes this problem. It
> > appears to be possible to atomically replace file on Windows – ReplaceFile()
> > does that. ReplaceFiles() requires target file to exist, this is why we
> > still need to call MoveFileEx() when it doesn't exist.
>
> Do you think that it could be safer to unlink the target file first
> with pgunlink()? This way you make sure that the target file is
> removed and not locked. This change makes me worrying about the
> introduction of more race conditions.
That seems like a *seriously* bad idea. What if we crash inbetween the
unlink and the rename?
I'm confused about the need for this. Shouldn't normally
opening all files FILE_SHARE_DELETE take care of this? See
https://msdn.microsoft.com/en-us/library/windows/desktop/aa3 63858(v=vs.85).aspx
"Note Delete access allows both delete and rename operations."
Is there an external process active that doesn't set that flag?I'm quite sure there was no such processed during my experimentation. No antivirus or other disturbers. Moreover, error reproduces only with artificial delay in pgstat_read_statsfiles(). So, it's clearly related to lock placed by this function.Are we missing setting it somewhere?That's curious, but at least pgstat_read_statsfiles() seems to open file with that flag.
I wrote same console program to verify that windows API behaves so, and it's not something odd inside PostgreSQL. Source code is attached.
So, with FILE_SHARE_DELETE flag you really can delete opened file concurrently.
Session #1
rename_test.exe Open 1.txt
Opened
Press any key to continue . . .
Session #2
rename_test.exe Delete 1.txt
DeleteFile success
Closed
And you can rename it concurrently.
Session #1
rename_test.exe Open 1.txt
Opened
Press any key to continue . . .
Session #2
rename_test.exe MoveFileEx 1.txt 2.txt
MoveFileEx successClosed
Closed
But you can't replace it concurrently with another file. So as msdn states, you can either delete or rename opened file concurrently. But you can't replace it...
Session #1
rename_test.exe Open 1.txt
Opened
Press any key to continue . . .
Session #2
rename_test.exe MoveFileEx 2.txt 1.txt
MoveFileEx error: 5
Closed
But ReplaceFile works OK. Temporary file lives until session #1 close the file.
Session #1
rename_test.exe Open 1.txt
Opened
Press any key to continue . . .
Session #2
rename_test.exe ReplaceFile 2.txt 1.txt
ReplaceFile success
Closed
I didn't try MoveFileTransacted, because deprecated function doesn't seem like an option.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment
On Tue, Nov 28, 2017 at 5:02 AM, Craig Ringer wrote:
> On 27 November 2017 at 14:28, Alexander Korotkov <
> a.korotkov@postgrespro.ru> wrote:
>
>> It's assumed in PostgreSQL codebase that pgrename atomically replaces
>> target file with source file even if target file is open and being read by
>> another process. And this assumption is true on Linux, but it's false on
>> Windows. MoveFileEx() triggers an error when target file is open (and
>> accordingly locked). Some our customers has been faced such errors while
>> operating heavily loaded PostgreSQL instance on Windows.
>>
>> LOG could not rename temporary statistics file "pg_stat_tmp/global.tmp"
>> to "pg_stat_tmp/global.stat": Permission denied
>>
>
> That would explain a number of intermittent reports Iv'e seen floating
> around.
>
Yeah.
Attached patch atomic-pgrename-windows-1.patch fixes this problem. It
>> appears to be possible to atomically replace file on Windows –
>> ReplaceFile() does that. ReplaceFiles() requires target file to exist,
>> this is why we still need to call MoveFileEx() when it doesn't exist.
>>
>
> Look at the error codes for ReplaceFile:
>
> https://msdn.microsoft.com/en-us/library/aa365512(VS.85).aspx
>
> The docs don't say it's atomic and the error codes suggest it may not be.
> But there's a Microsoft Research paper claiming it's atomic -
> http://research.microsoft.com/pubs/64525/tr-2006-45.pdf .
>
> It appears that MoveFileTransacted would be what we really want, when
> we're on NTFS (it won't work on FAT32 or network shares). See
> https://msdn.microsoft.com/en-us/library/windows/
> desktop/aa365241(v=vs.85).aspx . But the docs have a preface warning that
> it's not recommended and may not be available in future Windows versions,
> so that's not an option.
>
> This Go language bug (https://github.com/golang/go/issues/8914) and this
> cPython discussion (http://bugs.python.org/issue8828)) have discussion.
>
> I found this comment particularly illuminating https://bugs.python.org/
> msg146307 as it quotes what Java does. It uses MoveFileEx.
>
That's look bad. MoveFileTransacted() looks like best option, but it's not
an option since it's deprecated.
For now, ReplaceFile() function looks like best choice for renaming
statfiles. But it doesn't look good for critical datafiles whose are also
renamed using pgrename, because system can crash between rename of
destination file and rename of source file. Thus, MoveFileEx() seems still
best solution for critical datafiles where safety is more important than
concurrency. After reading provided links, I'm very suspicious about its
safety too. But it's seems like best available solution and it have
already passed some test of time...
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 29 November 2017 at 00:16, Alexander Korotkov
wrote:
>
> For now, ReplaceFile() function looks like best choice for renaming
> statfiles. But it doesn't look good for critical datafiles whose are also
> renamed using pgrename, because system can crash between rename of
> destination file and rename of source file. Thus, MoveFileEx() seems still
> best solution for critical datafiles where safety is more important than
> concurrency. After reading provided links, I'm very suspicious about its
> safety too. But it's seems like best available solution and it have
> already passed some test of time...
>
Yeah.
Or alternately, we might need to extend WAL to include any temp file names,
etc, needed to allow us to recover from an interrupted operation during
redo. Frankly that may be the safest option; on Windows presume that there
is no atomic rename we can trust for critical files, and do it in multiple
steps.
-- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2017-11-29 10:13:32 +0800, Craig Ringer wrote: > On 29 November 2017 at 00:16, Alexander Korotkov <a.korotkov@postgrespro.ru> > wrote: > > > > > For now, ReplaceFile() function looks like best choice for renaming > > statfiles. But it doesn't look good for critical datafiles whose are also > > renamed using pgrename, because system can crash between rename of > > destination file and rename of source file. Thus, MoveFileEx() seems still > > best solution for critical datafiles where safety is more important than > > concurrency. After reading provided links, I'm very suspicious about its > > safety too. But it's seems like best available solution and it have > > already passed some test of time... Or we can just add some retry logic like we have for opening files... > Or alternately, we might need to extend WAL to include any temp file names, > etc, needed to allow us to recover from an interrupted operation during > redo. Frankly that may be the safest option; on Windows presume that there > is no atomic rename we can trust for critical files, and do it in multiple > steps. I'd rather remove windows support than go there. Greetings, Andres Freund
Greetings Alexander, all, * Alexander Korotkov (a.korotkov@postgrespro.ru) wrote: > Attached patch atomic-pgrename-windows-1.patch fixes this problem. It > appears to be possible to atomically replace file on Windows – > ReplaceFile() does that. ReplaceFiles() requires target file to exist, > this is why we still need to call MoveFileEx() when it doesn't exist. There's been some discussion on this patch and it seems to, at least, improve the situation on Windows even if it might not completely address the issues. Does anyone else want to take a look and comment, particularly those familiar with the Windows side of things? Otherwise, I'm afraid this patch may end up just sitting in Needs review state fo the entire CF and not getting moved forward, even though it seems like a good improvement for our Windows users. Thanks! Stephen
Attachment
On Tue, Nov 28, 2017 at 2:47 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
Or the other way around -- try MoveFileEx() first since that seems to work most of the time today (if it mostly broke we'd be in trouble already), and if it fails with a sharing violation, try ReplaceFile()? And perhaps end up doing it something similar to what we do with shared memory which is just to loop over it and try each a couple of time, before giving up and failing?
On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> Attached patch atomic-pgrename-windows-1.patch fixes this problem. It
> appears to be possible to atomically replace file on Windows – ReplaceFile()
> does that. ReplaceFiles() requires target file to exist, this is why we
> still need to call MoveFileEx() when it doesn't exist.
Do you think that it could be safer to unlink the target file first
with pgunlink()? This way you make sure that the target file is
removed and not locked. This change makes me worrying about the
introduction of more race conditions.
Unlinking it first seems dangerous, as pointed out by Andres.
What about first trying ReplaceFile() and then if it fails with "target doesn't exist", then call MoveFileEx().
Or the other way around -- try MoveFileEx() first since that seems to work most of the time today (if it mostly broke we'd be in trouble already), and if it fails with a sharing violation, try ReplaceFile()? And perhaps end up doing it something similar to what we do with shared memory which is just to loop over it and try each a couple of time, before giving up and failing?
Hi Alexander, On 1/20/18 10:13 AM, Magnus Hagander wrote: > > On Tue, Nov 28, 2017 at 2:47 AM, Michael Paquier > <michael.paquier@gmail.com <mailto:michael.paquier@gmail.com>> wrote: > > On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov > <a.korotkov@postgrespro.ru <mailto:a.korotkov@postgrespro.ru>> wrote: > > Attached patch atomic-pgrename-windows-1.patch fixes this problem. It > > appears to be possible to atomically replace file on Windows – ReplaceFile() > > does that. ReplaceFiles() requires target file to exist, this is why we > > still need to call MoveFileEx() when it doesn't exist. > > Do you think that it could be safer to unlink the target file first > with pgunlink()? This way you make sure that the target file is > removed and not locked. This change makes me worrying about the > introduction of more race conditions. > > Unlinking it first seems dangerous, as pointed out by Andres. > > What about first trying ReplaceFile() and then if it fails with "target > doesn't exist", then call MoveFileEx(). > > Or the other way around -- try MoveFileEx() first since that seems to > work most of the time today (if it mostly broke we'd be in trouble > already), and if it fails with a sharing violation, try ReplaceFile()? > And perhaps end up doing it something similar to what we do with shared > memory which is just to loop over it and try each a couple of time, > before giving up and failing? This patch was mistakenly left as Needs Review during the last commitfest but it's pretty clear that a new patch is required. This certainly sounds like a non-trivial change. Perhaps it should be submitted for PG12? Thanks, -- -David david@pgmasters.net
On 1/20/18 10:13 AM, Magnus Hagander wrote:
>
> On Tue, Nov 28, 2017 at 2:47 AM, Michael Paquier
> <michael.paquier@gmail.com <mailto:michael.paquier@gmail.com>> wrote:
>
> On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov
> <a.korotkov@postgrespro.ru <mailto:a.korotkov@postgrespro.ru>> wrote:
> > Attached patch atomic-pgrename-windows-1.patch fixes this problem. It
> > appears to be possible to atomically replace file on Windows – ReplaceFile()
> > does that. ReplaceFiles() requires target file to exist, this is why we
> > still need to call MoveFileEx() when it doesn't exist.
>
> Do you think that it could be safer to unlink the target file first
> with pgunlink()? This way you make sure that the target file is
> removed and not locked. This change makes me worrying about the
> introduction of more race conditions.
>
> Unlinking it first seems dangerous, as pointed out by Andres.
>
> What about first trying ReplaceFile() and then if it fails with "target
> doesn't exist", then call MoveFileEx().
>
> Or the other way around -- try MoveFileEx() first since that seems to
> work most of the time today (if it mostly broke we'd be in trouble
> already), and if it fails with a sharing violation, try ReplaceFile()?
> And perhaps end up doing it something similar to what we do with shared
> memory which is just to loop over it and try each a couple of time,
> before giving up and failing?
This patch was mistakenly left as Needs Review during the last
commitfest but it's pretty clear that a new patch is required.
OK! No objections against marking this patch RWF.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 3/6/18 9:06 AM, Alexander Korotkov wrote: > > On Tue, Mar 6, 2018 at 5:04 PM, David Steele <david@pgmasters.net > <mailto:david@pgmasters.net>> wrote: > > On 1/20/18 10:13 AM, Magnus Hagander wrote: > > > > Unlinking it first seems dangerous, as pointed out by Andres. > > > > What about first trying ReplaceFile() and then if it fails with "target > > doesn't exist", then call MoveFileEx(). > > > > Or the other way around -- try MoveFileEx() first since that seems to > > work most of the time today (if it mostly broke we'd be in trouble > > already), and if it fails with a sharing violation, try ReplaceFile()? > > And perhaps end up doing it something similar to what we do with shared > > memory which is just to loop over it and try each a couple of time, > > before giving up and failing? > > This patch was mistakenly left as Needs Review during the last > commitfest but it's pretty clear that a new patch is required. > > OK! No objections against marking this patch RWF. Hmmm, I just noticed this categorized as a bug. I thought it was a refactor. Even so, it looks like the approach needs a rethink so better to wait for that. Marked Returned with Feedback. Thanks, -- -David david@pgmasters.net
On Tue, Mar 6, 2018 at 5:11 PM, David Steele <david@pgmasters.net> wrote:
On 3/6/18 9:06 AM, Alexander Korotkov wrote:
>
> On Tue, Mar 6, 2018 at 5:04 PM, David Steele <david@pgmasters.net
> <mailto:david@pgmasters.net>> wrote:
>
> On 1/20/18 10:13 AM, Magnus Hagander wrote:
> >
> > Unlinking it first seems dangerous, as pointed out by Andres.
> >
> > What about first trying ReplaceFile() and then if it fails with "target
> > doesn't exist", then call MoveFileEx().
> >
> > Or the other way around -- try MoveFileEx() first since that seems to
> > work most of the time today (if it mostly broke we'd be in trouble
> > already), and if it fails with a sharing violation, try ReplaceFile()?
> > And perhaps end up doing it something similar to what we do with shared
> > memory which is just to loop over it and try each a couple of time,
> > before giving up and failing?
>
> This patch was mistakenly left as Needs Review during the last
> commitfest but it's pretty clear that a new patch is required.
>
> OK! No objections against marking this patch RWF.
Hmmm, I just noticed this categorized as a bug. I thought it was a
refactor.
Yes, that's naturally a bug. Not very critical though.
Even so, it looks like the approach needs a rethink so better to wait
for that.
In this thread we've found at least two possible approaches to fix this bug. But both of them need to be implemented and tested.
Marked Returned with Feedback.
OK!
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Hi! I'd like to resume the discussion on this subject. Sorry for so long delay. On Sat, Jan 20, 2018 at 6:13 PM Magnus Hagander <magnus@hagander.net> wrote: > On Tue, Nov 28, 2017 at 2:47 AM, Michael Paquier <michael.paquier@gmail.com> wrote: >> >> On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov >> <a.korotkov@postgrespro.ru> wrote: >> > Attached patch atomic-pgrename-windows-1.patch fixes this problem. It >> > appears to be possible to atomically replace file on Windows – ReplaceFile() >> > does that. ReplaceFiles() requires target file to exist, this is why we >> > still need to call MoveFileEx() when it doesn't exist. >> >> Do you think that it could be safer to unlink the target file first >> with pgunlink()? This way you make sure that the target file is >> removed and not locked. This change makes me worrying about the >> introduction of more race conditions. > > Unlinking it first seems dangerous, as pointed out by Andres. > > What about first trying ReplaceFile() and then if it fails with "target doesn't exist", then call MoveFileEx(). > > Or the other way around -- try MoveFileEx() first since that seems to work most of the time today (if it mostly broke we'dbe in trouble already), and if it fails with a sharing violation, try ReplaceFile()? And perhaps end up doing it somethingsimilar to what we do with shared memory which is just to loop over it and try each a couple of time, before givingup and failing? Unfortunately, it seems that none of such strategies would fit all the cases. Basically, we have two option for renaming a file. * MoveFileEx() – safest possible option, less likely loose files, but takes a lock on target. * ReplaceFile() – "lockless" option, but may loose files on OS crash. Also we have two different cases of files renaming: 1) Renaming durable files. When target already exists, on OS crash we expect finding either old or new file in target filename. We don't expect to find nothing in target filename. 2) Renaming temporary files. In this case we don't much care on loosing files on OS crash. But locking appears to be an issue in some cases. So, in 1st case it doesn't seem like an option to try ReplaceFile() either in first or second place. In both ways it causes a risk to loose a target file, which seems unacceptable. In the 2nd case, trying MoveFileEx() then ReplaceFile() or vise versa might work. But error codes of these functions doesn't tell explicitly whether target file exists. So, I prefer checking it explicitly with GetFileAttributes(). Attached patch implements idea described in [1]. It implements separate pgrename_temp() function, which is better for concurrent operation but less safe. Links 1. https://www.postgresql.org/message-id/CAPpHfds9trA6ipezK3BsuuOSQwEmESiqj8pkOxACFJpoLpcoNw%40mail.gmail.com ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Tue, Aug 06, 2019 at 09:03:08PM +0300, Alexander Korotkov wrote: > > ... > > Unfortunately, it seems that none of such strategies would fit all the > cases. > > Basically, we have two option for renaming a file. * MoveFileEx() – > safest possible option, less likely loose files, but takes a lock on > target. * ReplaceFile() – "lockless" option, but may loose files on > OS crash. > > Also we have two different cases of files renaming: 1) Renaming > durable files. When target already exists, on OS crash we expect > finding either old or new file in target filename. We don't expect to > find nothing in target filename. 2) Renaming temporary files. In > this case we don't much care on loosing files on OS crash. But > locking appears to be an issue in some cases. > > So, in 1st case it doesn't seem like an option to try ReplaceFile() > either in first or second place. In both ways it causes a risk to > loose a target file, which seems unacceptable. > > In the 2nd case, trying MoveFileEx() then ReplaceFile() or vise versa > might work. But error codes of these functions doesn't tell explicitly > whether target file exists. So, I prefer checking it explicitly with > GetFileAttributes(). > > Attached patch implements idea described in [1]. It implements > separate pgrename_temp() function, which is better for concurrent > operation but less safe. > I don't have access to a Windows machine and my developer experience with that platform is pretty much nil, but I think this patch makes sense. It's not an ideal solution, but it's not clear such solution exists, and an improvement is better than nothing. I have two minor comments about rename_temp: 1) The name might seem to be hinting it's about files opened using OpenTemporaryFile, but in practice it's about files that are not critical. But maybe it's true. 2) I think the rename_temp comment should mention it can only be used in cases when the renames happen in a single process (non-concurrently). If we could call rename_temp() concurrently from two different processes, it won't work as expected. Of course, we only call rename_temp from stat collector and syslogger, where it obviously works. Anyway, I'm really nitpicking here ... Are there any objections to get the current patch committed as is, so that it does not get pushed yet again to the next commitfest. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi! On Tue, Jan 7, 2020 at 7:16 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > I don't have access to a Windows machine and my developer experience > with that platform is pretty much nil, but I think this patch makes > sense. It's not an ideal solution, but it's not clear such solution > exists, and an improvement is better than nothing. Thank you for your attention to this patch! > I have two minor comments about rename_temp: > > 1) The name might seem to be hinting it's about files opened using > OpenTemporaryFile, but in practice it's about files that are not > critical. But maybe it's true. We may invent another name. What about rename_fragile()? > 2) I think the rename_temp comment should mention it can only be used in > cases when the renames happen in a single process (non-concurrently). If > we could call rename_temp() concurrently from two different processes, > it won't work as expected. Of course, we only call rename_temp from stat > collector and syslogger, where it obviously works. Good point, this should be reflected in comments. > Anyway, I'm really nitpicking here ... > > Are there any objections to get the current patch committed as is, so > that it does not get pushed yet again to the next commitfest. It would be good to commit. Let's get agreement on function name first. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > On Tue, Jan 7, 2020 at 7:16 PM Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >> I don't have access to a Windows machine and my developer experience >> with that platform is pretty much nil, but I think this patch makes >> sense. It's not an ideal solution, but it's not clear such solution >> exists, and an improvement is better than nothing. > Thank you for your attention to this patch! FWIW, I don't like this patch much at all. I too know nothing about Windows, but I do *not* like inventing a distinction between "rename" and "rename_temp" and expecting all call sites to have to decide which one to use. That's allowing a single platform's implementation bugs to dictate our APIs globally; plus it's not clear that every call site can know which is more appropriate. regards, tom lane
On Tue, Jan 7, 2020 at 9:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > > On Tue, Jan 7, 2020 at 7:16 PM Tomas Vondra > > <tomas.vondra@2ndquadrant.com> wrote: > >> I don't have access to a Windows machine and my developer experience > >> with that platform is pretty much nil, but I think this patch makes > >> sense. It's not an ideal solution, but it's not clear such solution > >> exists, and an improvement is better than nothing. > > > Thank you for your attention to this patch! > > FWIW, I don't like this patch much at all. I too know nothing about > Windows, but I do *not* like inventing a distinction between "rename" > and "rename_temp" and expecting all call sites to have to decide > which one to use. That's allowing a single platform's implementation > bugs to dictate our APIs globally; plus it's not clear that every > call site can know which is more appropriate. I'm not sure issue we faced is really about single platform. TBH, the assumptions we place to rename function is very strict. We assume rename works atomically on system crash. And we indirectly assume it can work concurrently as least with file readers. The both are probably true for Linux with most popular filesystems. But we do support pretty many platforms. I think the issue we didn't investigate rename properties well on all of them. But if we do, it might happen some assumptions are wrong on multiple platforms. Windows is just used busy enough to spot the problem. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > I'm not sure issue we faced is really about single platform. TBH, the > assumptions we place to rename function is very strict. We assume > rename works atomically on system crash. And we indirectly assume it > can work concurrently as least with file readers. The both are > probably true for Linux with most popular filesystems. But we do > support pretty many platforms. I think the issue we didn't > investigate rename properties well on all of them. But if we do, it > might happen some assumptions are wrong on multiple platforms. > Windows is just used busy enough to spot the problem. Well, atomic rename is required by POSIX. True, we have found bugs related to that in one or two Unix-ish platforms. But nobody is going to deny that those are OS bugs that the OS ought to fix, rather than accepted behaviors that applications have to find some way to work around. I'm not pleased with the idea that Windows' deficiencies in this area should result in kluges all over our code. I think we should stick to the autoconf recommendation: Autoconf follows a philosophy that was formed over the years by those who have struggled for portability: isolate the portability issues in specific files, and then program as if you were in a Posix environment. regards, tom lane
On Tue, Jan 7, 2020 at 11:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > > I'm not sure issue we faced is really about single platform. TBH, the > > assumptions we place to rename function is very strict. We assume > > rename works atomically on system crash. And we indirectly assume it > > can work concurrently as least with file readers. The both are > > probably true for Linux with most popular filesystems. But we do > > support pretty many platforms. I think the issue we didn't > > investigate rename properties well on all of them. But if we do, it > > might happen some assumptions are wrong on multiple platforms. > > Windows is just used busy enough to spot the problem. > > Well, atomic rename is required by POSIX. True, we have found bugs > related to that in one or two Unix-ish platforms. But nobody > is going to deny that those are OS bugs that the OS ought to fix, > rather than accepted behaviors that applications have to find > some way to work around. I'm not pleased with the idea that > Windows' deficiencies in this area should result in kluges all over > our code. I think we should stick to the autoconf recommendation: > > Autoconf follows a philosophy that was formed over the years by > those who have struggled for portability: isolate the portability > issues in specific files, and then program as if you were in a > Posix environment. I get this point: we want to program postgres like it's working in perfect POSIX environment. And POSIX standard really requires rename() to work concurrently with file accesses. "If the link named by the new argument exists and the file's link count becomes 0 when it is removed and no process has the file open, the space occupied by the file shall be freed and the file shall no longer be accessible. If one or more processes have the file open when the last link is removed, the link shall be removed before rename() returns, but the removal of the file contents shall be postponed until all references to the file are closed." But issue is that on Windows POSIX rename() is kind of impossible to implement. And I suspect other platforms may have issues too. Regarding "pg_stat_tmp/global.stat", which is a problem in particular case, we may evade file renaming altogether. Instead, we may implement shared-memory counter for filename. So, instead of renaming, new reads will just come to new file. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 1/11/20 5:13 PM, Alexander Korotkov wrote: > On Tue, Jan 7, 2020 at 11:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > "If the link named by the new argument exists and the file's link > count becomes 0 when it is removed and no process has the file open, > the space occupied by the file shall be freed and the file shall no > longer be accessible. If one or more processes have the file open when > the last link is removed, the link shall be removed before rename() > returns, but the removal of the file contents shall be postponed until > all references to the file are closed." > > But issue is that on Windows POSIX rename() is kind of impossible to > implement. And I suspect other platforms may have issues too. > > Regarding "pg_stat_tmp/global.stat", which is a problem in particular > case, we may evade file renaming altogether. Instead, we may > implement shared-memory counter for filename. So, instead of > renaming, new reads will just come to new file. I tend to agree with Tom on the question of portability. But it seems upthread we have determined that this can't be sensibly isolated into a Windows-specific rename() function. Does anyone have any further ideas? If not I feel like this patch is going to need to be RWF again. Regards, -- -David david@pgmasters.net
David Steele <david@pgmasters.net> writes: > On 1/11/20 5:13 PM, Alexander Korotkov wrote: >> Regarding "pg_stat_tmp/global.stat", which is a problem in particular >> case, we may evade file renaming altogether. Instead, we may >> implement shared-memory counter for filename. So, instead of >> renaming, new reads will just come to new file. > I tend to agree with Tom on the question of portability. But it seems > upthread we have determined that this can't be sensibly isolated into a > Windows-specific rename() function. > Does anyone have any further ideas? If not I feel like this patch is > going to need to be RWF again. So far as pg_stat_tmp is concerned, I think there is reasonable hope that that problem is just going to go away in the near future. I've not been paying attention to the shared-memory stats collector thread so I'm not sure if that's anywhere near committable, but I think that's clearly something we'll want once it's ready. So if that's the main argument why we need this, it's a pretty thin argument ... regards, tom lane
Hi, On March 30, 2020 9:17:00 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote: >So far as pg_stat_tmp is concerned, I think there is reasonable hope >that that problem is just going to go away in the near future. >I've not been paying attention to the shared-memory stats collector >thread so I'm not sure if that's anywhere near committable, but >I think that's clearly something we'll want once it's ready. I'd give it a ~30-40% chance for 13 at this point. The patch improved a lot after the review cycles in the last ~10 days,but still needs a good bit more polish. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.