Thread: stat() vs ERROR_DELETE_PENDING, round N + 1
Hi, Back 2017, Michael and Magus apparently fixed a bug report[1] about failing basebackups on Windows due to its concurrent file access semantics: commit 9951741bbeb3ec37ca50e9ce3df1808c931ff6a6 Author: Magnus Hagander <magnus@hagander.net> Date: Wed Jan 4 10:48:30 2017 +0100 Attempt to handle pending-delete files on Windows I think this has been re-broken by: commit bed90759fcbcd72d4d06969eebab81e47326f9a2 Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Fri Oct 9 16:20:12 2020 -0400 Fix our Windows stat() emulation to handle file sizes > 4GB. There's code in there that appears to understand the ERROR_PENDING_DELETE stuff, but it seems to be too late, as this bit will fail with ERROR_ACCESS_DENIED first: /* fast not-exists check */ if (GetFileAttributes(name) == INVALID_FILE_ATTRIBUTES) { _dosmaperr(GetLastError()); return -1; } ... and if you comment that out, then the CreateFile() call will fail and we'll return before we get to the code that purports to grok pending deletes. I don't really understand that code, but I can report that it's not reached. This came up because in our work on AIO, we have extra io worker processes that might have file handles open even in a single session scenario like 010_pg_basebackup.pl, so we make these types of problems more likely to hit (hence also my CF entry to fix a related problem in DROP TABLESPACE). But that's just chance: I assume basebackup could fail for anyone in 14 for the same reason due to any other backend that hasn't processed a sinval to close the file yet. Perhaps we need some combination of the old way (that apparently knew how to detect pending deletes) and the new way (that knows about large files)? [1] https://www.postgresql.org/message-id/flat/20160712083220.1426.58667%40wrigleys.postgresql.org
On Thu, Sep 2, 2021 at 10:10 AM Thomas Munro <thomas.munro@gmail.com> wrote: > Perhaps we need some combination of the old way (that apparently knew > how to detect pending deletes) and the new way (that knows about large > files)? I tried that, but as far as I can tell, the old approach didn't really work either :-( A disruptive solution that works in my tests: we could reuse the global barrier proposed in CF #2962. If you see EACCES, ask every backend to close all vfds at their next CFI() and wait for them all to finish, and then retry. If you get EACCES again it really means EACCES, but you'll very probably get ENOENT. The cheapest solution would be to assume EACCES really means ENOENT, but that seems unacceptably incorrect. I suspect it might be possible to use underdocumented/unstable NtXXX() interfaces to get at the information, but I don't know much about that. Is there another way that is cheap, correct and documented?
Thomas Munro <thomas.munro@gmail.com> writes: > A disruptive solution that works in my tests: we could reuse the > global barrier proposed in CF #2962. If you see EACCES, ask every > backend to close all vfds at their next CFI() and wait for them all to > finish, and then retry. If you get EACCES again it really means > EACCES, but you'll very probably get ENOENT. That seems quite horrid :-(. But if it works, doesn't that mean that somewhere we are opening a problematic file without the correct sharing flags? regards, tom lane
On Thu, Sep 2, 2021 at 10:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > A disruptive solution that works in my tests: we could reuse the > > global barrier proposed in CF #2962. If you see EACCES, ask every > > backend to close all vfds at their next CFI() and wait for them all to > > finish, and then retry. If you get EACCES again it really means > > EACCES, but you'll very probably get ENOENT. > > That seems quite horrid :-(. But if it works, doesn't that mean that > somewhere we are opening a problematic file without the correct > sharing flags? I'm no expert, but not AFAICS. We managed to delete the file while some other backend had it open, which FILE_SHARE_DELETE allowed. We just can't open it or create a new file with the same name until it's really gone (all handles closed).
Thomas Munro <thomas.munro@gmail.com> writes: > On Thu, Sep 2, 2021 at 10:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> That seems quite horrid :-(. But if it works, doesn't that mean that >> somewhere we are opening a problematic file without the correct >> sharing flags? > I'm no expert, but not AFAICS. We managed to delete the file while > some other backend had it open, which FILE_SHARE_DELETE allowed. We > just can't open it or create a new file with the same name until it's > really gone (all handles closed). Right, but we shouldn't ever need to access such a file --- we couldn't do so on Unix, certainly. So for the open() case, it's sufficient to return ENOENT, and the problem is only to make sure that that's what we return if the underlying error is ERROR_DELETE_PENDING. It's harder if the desire is to create a new file of the same name. I'm inclined to think that the best answer might be "if it hurts, don't do that". We should not have such a case for ordinary relation files or WAL files, but maybe there are individual other cases where some redesign is indicated? regards, tom lane
On Thu, Sep 2, 2021 at 11:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > I'm no expert, but not AFAICS. We managed to delete the file while > > some other backend had it open, which FILE_SHARE_DELETE allowed. We > > just can't open it or create a new file with the same name until it's > > really gone (all handles closed). > > Right, but we shouldn't ever need to access such a file --- we couldn't do > so on Unix, certainly. So for the open() case, it's sufficient to return > ENOENT, and the problem is only to make sure that that's what we return if > the underlying error is ERROR_DELETE_PENDING. Yeah. The problem is that it still shows up in directory listings AFAIK, so something like basebackup.c sees it, and even if it didn't, it reads the directory, and then stats the files, and then opens the files at different times. The non-public API way to ask for the real reason after such a failure would apparently be to call NtFileCreate(), which can return STATUS_DELETE_PENDING. I don't know what complications that might involve, but I see now that we have code that digs such non-public APIs out of ntdll.dll already (for long dead OS versions only). Hmm. (Another thing you can't do is delete the directory that contains such a file, which is a problem for DROP TABLESPACE and the reason I developed the global barrier thing.) > It's harder if the desire is to create a new file of the same name. > I'm inclined to think that the best answer might be "if it hurts, > don't do that". We should not have such a case for ordinary relation > files or WAL files, but maybe there are individual other cases where > some redesign is indicated? I guess GetNewRelFileNode()’s dilemma branch is an example; it'd probably be nicer to users to treat a pending-deleted file as a collision. if (access(rpath, F_OK) == 0) { /* definite collision */ collides = true; } else { /* * Here we have a little bit of a dilemma: if errno is something * other than ENOENT, should we declare a collision and loop? In * practice it seems best to go ahead regardless of the errno. If * there is a colliding file we will get an smgr failure when we * attempt to create the new relation file. */ collides = false; }
On Fri, Sep 3, 2021 at 12:44 AM Thomas Munro <thomas.munro@gmail.com> wrote: > NtFileCreate() Erm, that's spelled NtCreateFile. I see Michael mentioned this before[1]; I don't think it's only available in kernel mode though, the docs[2] say "This function is the user-mode equivalent to the ZwCreateFile function", and other open source user space stuff is using it. It's explicitly internal and subject to change though, hence my desire to avoid it. [1] https://www.postgresql.org/message-id/flat/a9c76882-27c7-9c92-7843-21d5521b70a9%40postgrespro.ru [2] https://docs.microsoft.com/en-us/windows/win32/api/winternl/nf-winternl-ntcreatefile
At Fri, 3 Sep 2021 01:01:43 +1200, Thomas Munro <thomas.munro@gmail.com> wrote in > On Fri, Sep 3, 2021 at 12:44 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > NtFileCreate() > > Erm, that's spelled NtCreateFile. I see Michael mentioned this > before[1]; I don't think it's only available in kernel mode though, > the docs[2] say "This function is the user-mode equivalent to the > ZwCreateFile function", and other open source user space stuff is > using it. It's explicitly internal and subject to change though, > hence my desire to avoid it. > > [1] https://www.postgresql.org/message-id/flat/a9c76882-27c7-9c92-7843-21d5521b70a9%40postgrespro.ru > [2] https://docs.microsoft.com/en-us/windows/win32/api/winternl/nf-winternl-ntcreatefile Might be stupid, if a delete-pending'ed file can obstruct something, couldn't we change unlink on Windows to rename to a temporary random name then remove it? We do something like it explicitly while WAL file removal. (It may cause degradation on bulk file deletion, and we may need further fix so that such being-deleted files are excluded while running a directory scan, though..) However, looking [1], with that strategy there may be a case where such "deleted" files may be left alone forever, though. [1] https://www.postgresql.org/message-id/002101d79fc2%24c96dff60%245c49fe20%24%40gmail.com regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Fri, Sep 3, 2021 at 2:01 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > Might be stupid, if a delete-pending'ed file can obstruct something, > couldn't we change unlink on Windows to rename to a temporary random > name then remove it? We do something like it explicitly while WAL > file removal. (It may cause degradation on bulk file deletion, and we > may need further fix so that such being-deleted files are excluded > while running a directory scan, though..) > > However, looking [1], with that strategy there may be a case where > such "deleted" files may be left alone forever, though. It's a good idea. I tested it and it certainly does fix the basebackup problem I've seen (experimental patch attached). But, yeah, I'm also a bit worried that that path could be fragile and need special handling in lots of places. I also tried writing a new open() wrapper using the lower level NtCreateFile() interface, and then an updated stat() wrapper built on top of that. As a non-Windows person, getting that to (mostly) work involved a fair amount of suffering. I can share that if someone is interested, but while learning about that family of interfaces, I realised we could keep the existing Win32-based code, but also retrieve the NT status, leading to a very small change (experimental patch attached). The best idea is probably to set FILE_DISPOSITION_DELETE | FILE_DISPOSITION_POSIX_SEMANTICS before unlinking. This appears to be a silver bullet, but isn't available on ancient Windows releases that we support, or file systems other than local NTFS. So maybe we need a combination of that + STATUS_DELETE_PENDING as shown in the attached. I'll look into that next.
Attachment
Hi, On 2021-09-06 01:32:55 +1200, Thomas Munro wrote: > It's a good idea. I tested it and it certainly does fix the > basebackup problem I've seen (experimental patch attached). But, > yeah, I'm also a bit worried that that path could be fragile and need > special handling in lots of places. It's also expensive-ish. > I also tried writing a new open() wrapper using the lower level > NtCreateFile() interface, and then an updated stat() wrapper built on > top of that. As a non-Windows person, getting that to (mostly) work > involved a fair amount of suffering. I can share that if someone is > interested, but while learning about that family of interfaces, I > realised we could keep the existing Win32-based code, but also > retrieve the NT status, leading to a very small change (experimental > patch attached). Is it guaranteed, or at least reliable, that the status we fetch with RtlGetLastNtStatus is actually from the underlying filesystem operation, rather than some other work that happens during the win32->nt translation? E.g. a memory allocation or such? Presumably most of such work happens before the actual nt "syscall", but ... > The best idea is probably to set FILE_DISPOSITION_DELETE | > FILE_DISPOSITION_POSIX_SEMANTICS before unlinking. This appears to be > a silver bullet, but isn't available on ancient Windows releases that > we support, or file systems other than local NTFS. So maybe we need a > combination of that + STATUS_DELETE_PENDING as shown in the attached. > I'll look into that next. When was that introduced? I'd be ok to only fix these bugs on e.g. Win10, Win Server 2019, Win Server 2016 or such. I don't think we need to support OSs that the vendor doesn't support - and I wouldn't count "only security fixes" as support in this context. main extended Windows 10 Oct 14, 2025 Windows Server 2019 Jan 9, 2024 Jan 9, 2029 Windows Server 2016 Jan 11, 2022 Jan 12, 2027 Windows 7 Jan 13, 2015 Jan 14, 2020 Windows Vista Apr 10, 2012 Apr 11, 2017 One absurd detail here is that the deault behaviour changed sometime in Windows 10's lifetime: https://stackoverflow.com/questions/60424732/did-the-behaviour-of-deleted-files-open-with-fileshare-delete-change-on-windows "The behavior changed in recent releases of Windows 10 -- without notice AFAIK. DeleteFileW now tries to use POSIX semantics if the filesystem supports it. NTFS does." > #ifndef FRONTEND > - Assert(pgwin32_signal_event != NULL); /* small chance of pg_usleep() */ > + /* XXX When called by stat very early on, this fails! */ > + //Assert(pgwin32_signal_event != NULL); /* small chance of pg_usleep() */ > #endif Perhaps we should move the win32 signal initialization into StartupHacks()? There's some tension around it using ereport(), and MemoryContextInit() only being called a tad later, but that seems resolvable. > + * Our open wrapper will report STATUS_DELETE_PENDING as ENOENT. We pass > + * in a special private flag to say that it's _pgstat64() calling, to > + * activate a mode that allows directories to be opened for limited > + * purposes. > + * > + * XXX Think about fd pressure, since we're opening an fd? > */ If I understand https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/getmaxstdio?view=msvc-160 etc correctly, it looks like there is. But only at the point we do _open_osfhandle(). So perhaps we should a pgwin32_open() version returning a handle and make pgwin32_open() a wrapper around that? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2021-09-06 01:32:55 +1200, Thomas Munro wrote: >> The best idea is probably to set FILE_DISPOSITION_DELETE | >> FILE_DISPOSITION_POSIX_SEMANTICS before unlinking. This appears to be >> a silver bullet, but isn't available on ancient Windows releases that >> we support, or file systems other than local NTFS. So maybe we need a >> combination of that + STATUS_DELETE_PENDING as shown in the attached. >> I'll look into that next. > When was that introduced? Googling says that it was introduced in Win10, although in RS2 (version 1703, general availability in 4/2017) not the initial release. > I'd be ok to only fix these bugs on e.g. Win10, Win Server 2019, Win Server > 2016 or such. Yeah, particularly if the fix is trivial on the newer systems and incredibly complicated otherwise. Between the effort needed and the risk of introducing new bugs, I'm really not excited about an invasive fix for this. regards, tom lane
On Mon, Sep 6, 2021 at 9:55 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andres Freund <andres@anarazel.de> writes: > > I'd be ok to only fix these bugs on e.g. Win10, Win Server 2019, Win Server > > 2016 or such. > > Yeah, particularly if the fix is trivial on the newer systems and > incredibly complicated otherwise. Between the effort needed and > the risk of introducing new bugs, I'm really not excited about > an invasive fix for this. If DeleteFile() is automatically using FILE_DISPOSITION_POSIX_SEMANTICS by default when possible on recent releases as per the SO link that Andres posted above ("18363.657 definitely has the new behavior"), then that's great news and maybe we shouldn't even bother to try to request that mode ourselves explicitly (eg in some kind of unlink wrapper). Then we'd need just one accomodation for older systems and non-NTFS systems, not two, and I currently think that should be the short and sweet approach shown in 0001-Handle-STATUS_DELETE_PENDING-on-Windows.patch, with some tidying and adjustments per feedback.
> If DeleteFile() is automatically using > FILE_DISPOSITION_POSIX_SEMANTICS by default when possible on recent > releases as per the SO link that Andres posted above ("18363.657 > definitely has the new behavior"), then that's great news and maybe we > shouldn't even bother to try to request that mode ourselves explicitly > (eg in some kind of unlink wrapper). Then we'd need just one > accomodation for older systems and non-NTFS systems, not two, and I > currently think that should be the short and sweet approach shown in > 0001-Handle-STATUS_DELETE_PENDING-on-Windows.patch, with some tidying > and adjustments per feedback. Having a non-invasive fix for this long-standing issue would be really great, even if that means reducing the scope of systems where this can be fixed. The last time I poked at the bear (54fb8c7d), there was a test posted by Alexander Lakhin that was really useful in making sure that concurrency is correctly handled when a file is unlinked: https://www.postgresql.org/message-id/c3427edf-d7c0-ff57-90f6-b5de3bb62709@gmail.com It worked with VS but not on MinGW. How does your patch react to this test? -- Michael
Attachment
On Mon, Sep 6, 2021 at 9:44 AM Andres Freund <andres@anarazel.de> wrote: > Is it guaranteed, or at least reliable, that the status we fetch with > RtlGetLastNtStatus is actually from the underlying filesystem operation, > rather than some other work that happens during the win32->nt translation? > E.g. a memory allocation or such? Presumably most of such work happens before > the actual nt "syscall", but ... I don't know. I know at least that it's thread-local, so that's something. I guess it's plausible that CreateFile() might want to free a temporary buffer that it used for conversion to NT pathname format, and whatever code it uses to do that might clobber the NT status. Nothing like that seems to happen in common cases though, and I guess it would also be clobbered on success. Frustrating. Alright then, here also is the version that bypasses CreateFile() and goes straight to NtCreateFile(). This way, the status can't possibly be clobbered before we see it, but maybe there are other risks due to using a much wider set of unstable ntdll interfaces... Both versions pass all tests on CI, including the basebackup one in a scenario where an unlinked file has an open descriptor, but still need a bit more tidying. > "The behavior changed in recent releases of Windows 10 -- without notice > AFAIK. DeleteFileW now tries to use POSIX semantics if the filesystem supports > it. NTFS does." Nice find. I wonder if this applies also to rename()... > > #ifndef FRONTEND > > - Assert(pgwin32_signal_event != NULL); /* small chance of pg_usleep() */ > > + /* XXX When called by stat very early on, this fails! */ > > + //Assert(pgwin32_signal_event != NULL); /* small chance of pg_usleep() */ > > #endif > > Perhaps we should move the win32 signal initialization into StartupHacks()? > There's some tension around it using ereport(), and MemoryContextInit() only > being called a tad later, but that seems resolvable. The dependencies among open(), pg_usleep(), pgwin32_signal_initialize() and read_backend_variables() are not very nice. I don't have a fix for that yet. > > + * XXX Think about fd pressure, since we're opening an fd? > If I understand https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/getmaxstdio?view=msvc-160 > etc correctly, it looks like there is. But only at the point we do > _open_osfhandle(). So perhaps we should a pgwin32_open() version returning a > handle and make pgwin32_open() a wrapper around that? Yeah. Done, in both variants. I haven't tried it, but I suspect the difference between stat() and lstat() could be handled with FILE_OPEN_REPARSE_POINT (as NtCreateFile() calls it) or FILE_FLAG_OPEN_REPARSE_POINT (as CreateFile() calls it).
Attachment
On Mon, Sep 6, 2021 at 6:36 PM Michael Paquier <michael@paquier.xyz> wrote: > Having a non-invasive fix for this long-standing issue would be really > great, even if that means reducing the scope of systems where this can > be fixed. I hope those patches fix at least the basebackup problem on all relevant OS versions, until the better POSIX thing is everywhere (they can't fix all related problems though, since zombie files still stop you creating new ones with the same name or deleting the containing directory). I didn't try to find out how far back those APIs go, but they look ancient/fundamental and widely used by other software... But do they qualify as non-invasive? > The last time I poked at the bear (54fb8c7d), there was a test posted > by Alexander Lakhin that was really useful in making sure that > concurrency is correctly handled when a file is unlinked: > https://www.postgresql.org/message-id/c3427edf-d7c0-ff57-90f6-b5de3bb62709@gmail.com Thanks. It's a confusing topic with many inconclusive threads. > It worked with VS but not on MinGW. How does your patch react to this > test? Thanks. Adding Alexander in CC in case he has ideas/feedback.
06.09.2021 16:04, Thomas Munro wrote: > On Mon, Sep 6, 2021 at 6:36 PM Michael Paquier <michael@paquier.xyz> wrote: >> The last time I poked at the bear (54fb8c7d), there was a test posted >> by Alexander Lakhin that was really useful in making sure that >> concurrency is correctly handled when a file is unlinked: >> https://www.postgresql.org/message-id/c3427edf-d7c0-ff57-90f6-b5de3bb62709@gmail.com The new approach looks very promising. Knowing that the file is really in the DELETE_PENDING state simplifies a lot. I've tested the patch v2_0001_Check... with my demo tests [1] and [2], and it definitely works. [1] https://www.postgresql.org/message-id/e5179494-715e-f8a3-266b-0cf52adac8f4%40gmail.com [2] https://www.postgresql.org/message-id/c3427edf-d7c0-ff57-90f6-b5de3bb62709@gmail.com Best regards, Alexander
On Tue, Sep 07, 2021 at 09:00:01AM +0300, Alexander Lakhin wrote: > The new approach looks very promising. Knowing that the file is really > in the DELETE_PENDING state simplifies a lot. > I've tested the patch v2_0001_Check... with my demo tests [1] and [2], > and it definitely works. Oho, nice. Just to be sure. You are referring to v2-0001-Check*.patch posted here, right? https://www.postgresql.org/message-id/CA+hUKGKj3p+2AciBGacCf_cXE0JLCYevWHexvOpK6uL1+V-zag@mail.gmail.com -- Michael
Attachment
Hello Michael, 07.09.2021 09:05, Michael Paquier wrote: > On Tue, Sep 07, 2021 at 09:00:01AM +0300, Alexander Lakhin wrote: >> The new approach looks very promising. Knowing that the file is really >> in the DELETE_PENDING state simplifies a lot. >> I've tested the patch v2_0001_Check... with my demo tests [1] and [2], >> and it definitely works. > Oho, nice. Just to be sure. You are referring to > v2-0001-Check*.patch posted here, right? > https://www.postgresql.org/message-id/CA+hUKGKj3p+2AciBGacCf_cXE0JLCYevWHexvOpK6uL1+V-zag@mail.gmail.com Yes, i've tested that one, on the master branch (my tests needed a minor modification due to PostgresNode changes). Best regards, Alexander
On Tue, Sep 7, 2021 at 7:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > 07.09.2021 09:05, Michael Paquier wrote: > > On Tue, Sep 07, 2021 at 09:00:01AM +0300, Alexander Lakhin wrote: > >> The new approach looks very promising. Knowing that the file is really > >> in the DELETE_PENDING state simplifies a lot. > >> I've tested the patch v2_0001_Check... with my demo tests [1] and [2], > >> and it definitely works. > > Oho, nice. Just to be sure. You are referring to > > v2-0001-Check*.patch posted here, right? > > https://www.postgresql.org/message-id/CA+hUKGKj3p+2AciBGacCf_cXE0JLCYevWHexvOpK6uL1+V-zag@mail.gmail.com > Yes, i've tested that one, on the master branch (my tests needed a minor > modification due to PostgresNode changes). Thanks very much! Time to tidy up some loose ends. There are a couple of judgement calls involved. Here's what Andres and I came up with in an off-list chat. Any different suggestions? 1. I abandoned the "direct NtCreateFile()" version for now. I guess using more and wider unstable interfaces might expose us to greater risk of silent API/behavior changes or have subtle bugs. If we ever have a concrete reason to believe that RtlGetLastNtStatus() is not reliable here, we could reconsider. 2. I dropped the assertion that the signal event has been created before the first call to the open() wrapper. Instead, I taught pg_usleep() to fall back to plain old SleepEx() if the signal stuff isn't up yet. Other solutions are possible of course, but it struck me as a bad idea to place initialisation ordering constraints on very basic facilities like open() and stat(). I should point out explicitly that with this patch, stat() benefits from open()'s tolerance for sharing violations, as a side effect. That is, it'll retry for a short time in the hope that whoever opened our file without allowing sharing will soon go away. I don't know how useful that bandaid loop really is in practice, but I don't see why we'd want that for open() and not stat(), so this change seems good to me on consistency grounds at the very least. 3. We fixed the warnings about macro redefinition with #define UMDF_USING_NTSTATUS and #include <ntstatus.h> in win32_port.h. (Other ideas considered: (1) Andres reported that it also works to move the #include to ~12 files that need things from it, ie things that were suppressed from windows.h by that macro and must now be had from ntstatus.h, but the files you have to change are probably different in back branches if we decide to do that, (2) I tried defining that macro locally in files that need it, *before* including c.h/postgres.h, and then locally include ntstatus.h afterwards, but that seems to violate project style and generally seems weird.) Another thing to point out explicitly is that I added a new file src/port/win32ntdll.c, which is responsible for fishing out the NT function pointers. It was useful to be able to do that in the abandoned NtCreateFile() variant because it needed three of them and I could reduce boiler-plate noise with a static array of function names to loop over. In this version the array has just one element, but I'd still rather centralise this stuff in one place and make it easy to add any more of these that we eventually find a need for. BTW, I also plan to help Victor get his "POSIX semantics" patch[1] into the tree (and extend it to cover more ops). That should make these problems go away in a more complete way IIUC, but doesn't work everywhere (not sure if we have any build farm animals where it doesn't work, if so it might be nice to change that), so it's complementary to this patch. (My earlier idea that that stuff would magically start happening for free on all relevant systems some time soon has faded.) [1] https://www.postgresql.org/message-id/flat/a529b660-da15-5b62-21a0-9936768210fd%40postgrespro.ru
Attachment
On Fri, Sep 10, 2021 at 5:04 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Tue, Sep 7, 2021 at 7:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > > 07.09.2021 09:05, Michael Paquier wrote: > > > On Tue, Sep 07, 2021 at 09:00:01AM +0300, Alexander Lakhin wrote: > > >> The new approach looks very promising. Knowing that the file is really > > >> in the DELETE_PENDING state simplifies a lot. > > >> I've tested the patch v2_0001_Check... with my demo tests [1] and [2], > > >> and it definitely works. Since our handling of that stuff never really worked the way we wanted (or if it did, then Windows' behaviour changed, possibly well over a decade ago, from what I could dig up), this isn't an open item candidate for 14 after all, it's a pre-existing condition. So I propose to push this fix to master only soon, and then let it stew there for a little while to see how the buildfarm Windows variants and the Windows hacker community testing on master react. If it looks good, we can back-patch it a bit later, perhaps some more convenient time WRT the release. I added a CF entry to see if anyone else wants to review it and get CI. One small detail I'd like to draw attention to is this bit, which differs slightly from the [non-working] previous attempts by mapping to two different errors: + * If there's no O_CREAT flag, then we'll pretend the file is + * invisible. With O_CREAT, we have no choice but to report that + * there's a file in the way (which wouldn't happen on Unix). ... + if (fileFlags & O_CREAT) + err = ERROR_FILE_EXISTS; + else + err = ERROR_FILE_NOT_FOUND;
On Thu, Sep 23, 2021 at 4:58 AM Thomas Munro <thomas.munro@gmail.com> wrote:
One small detail I'd like to draw attention to is this bit, which
differs slightly from the [non-working] previous attempts by mapping
to two different errors:
+ * If there's no O_CREAT flag, then we'll pretend the file is
+ * invisible. With O_CREAT, we have no choice but to report that
+ * there's a file in the way (which wouldn't happen on Unix).
...
+ if (fileFlags & O_CREAT)
+ err = ERROR_FILE_EXISTS;
+ else
+ err = ERROR_FILE_NOT_FOUND;
When GetTempFileName() finds a duplicated file name and the file is pending for deletion, it fails with an "ERROR_ACCESS_DENIED" error code. That may describe the situation better than "ERROR_FILE_EXISTS".
Regards,
Juan José Santamaría Flecha
On Thu, Sep 23, 2021 at 9:13 PM Juan José Santamaría Flecha <juanjo.santamaria@gmail.com> wrote: > On Thu, Sep 23, 2021 at 4:58 AM Thomas Munro <thomas.munro@gmail.com> wrote: >> One small detail I'd like to draw attention to is this bit, which >> differs slightly from the [non-working] previous attempts by mapping >> to two different errors: >> >> + * If there's no O_CREAT flag, then we'll pretend the file is >> + * invisible. With O_CREAT, we have no choice but to report that >> + * there's a file in the way (which wouldn't happen on Unix). >> >> ... >> >> + if (fileFlags & O_CREAT) >> + err = ERROR_FILE_EXISTS; >> + else >> + err = ERROR_FILE_NOT_FOUND; > > > When GetTempFileName() finds a duplicated file name and the file is pending for deletion, it fails with an "ERROR_ACCESS_DENIED"error code. That may describe the situation better than "ERROR_FILE_EXISTS". Thanks for looking. Why do you think that's better? I assume that's just the usual NT->Win32 error conversion at work. The only case I can think of so far in our tree where you'd notice this change of errno for the O_CREAT case is relfilenode creation[1], and there it's just a case of printing a different message. Trying to create a relfilenode that exists already in delete-pending state will fail, but with this change we'll log the %m string for EEXIST instead of EACCES (what you see today) or ENOENT (which seems nonsensical, "I can't create your file because it doesn't exist", and what you'd get with this patch if I didn't have the special case for O_CREAT). So I think that's pretty arguably an improvement. As for how likely you are to reach that case... hmm, I don't know what access() returns for a file in delete-pending state. The docs say "The function returns -1 if the named file does not exist or does not have the given mode", so perhaps it returns 0 for such a case, in which case we'll consider it a collision and keep searching for another free relfilenode. If that's the case, it's probably really really unlikely you'll reach the case described in the previous paragraph, so it probably doesn't matter much. Do we have any other code paths where this finer point could cause problems? Looking around at code that handles EEXIST, most of it is directory creation (unaffected by this patch), and then src/port/mkdtemp.c for which this change is appropriate (it implements POSIX mkdtemp(), which shouldn't report EACCES to its caller if something it tries bumps into a delete-pending file, it should see EEXIST and try a new name, which I think it will do with this patch, through its call to open(O_CREAT | O_EXCL)). [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGJz_pZTF9mckn6XgSv69%2BjGwdgLkxZ6b3NWGLBCVjqUZA%40mail.gmail.com
On Tue, Sep 28, 2021 at 2:50 AM Thomas Munro <thomas.munro@gmail.com> wrote:
On Thu, Sep 23, 2021 at 9:13 PM Juan José Santamaría Flecha
<juanjo.santamaria@gmail.com> wrote:
> When GetTempFileName() finds a duplicated file name and the file is pending for deletion, it fails with an "ERROR_ACCESS_DENIED" error code. That may describe the situation better than "ERROR_FILE_EXISTS".
Thanks for looking. Why do you think that's better? I assume that's
just the usual NT->Win32 error conversion at work.
When a function returns an error caused by accessing a file in DELETE_PENDING you should expect an EACCES. Nonetheless, if we can emulate a POSIX behaviour by mapping it to EEXIST, that works for me. I also consider that having the logic for DELETE_PENDING in a single function is an improvement.
Regards,
Juan José Santamaría Flecha
This patch doesn't compile on Windows according to Appveyor, seemingly because of a syntax error in the new win32ntdll.h file, but the MSVC logs are hard on the eye so it might be unrelated. -- Daniel Gustafsson https://vmware.com/
> On 3 Nov 2021, at 12:02, Daniel Gustafsson <daniel@yesql.se> wrote: > > This patch doesn't compile on Windows according to Appveyor, seemingly because > of a syntax error in the new win32ntdll.h file, but the MSVC logs are hard on > the eye so it might be unrelated. As the thread has stalled with a patch that doesn't apply, I'm marking this patch Returned with Feedback. Please feel free to resubmit when a new patch is ready. -- Daniel Gustafsson https://vmware.com/
On Thu, Dec 2, 2021 at 3:11 AM Daniel Gustafsson <daniel@yesql.se> wrote: > > On 3 Nov 2021, at 12:02, Daniel Gustafsson <daniel@yesql.se> wrote: > > > > This patch doesn't compile on Windows according to Appveyor, seemingly because > > of a syntax error in the new win32ntdll.h file, but the MSVC logs are hard on > > the eye so it might be unrelated. > > As the thread has stalled with a patch that doesn't apply, I'm marking this > patch Returned with Feedback. Please feel free to resubmit when a new patch is > ready. I think this was broken by WIN32_LEAN_AND_MEAN (and since gained a merge conflict, but that's easy to fix). I'll try to figure out the right system header hacks to unbreak it...
On Sat, Dec 4, 2021 at 6:18 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Thu, Dec 2, 2021 at 3:11 AM Daniel Gustafsson <daniel@yesql.se> wrote: > > > This patch doesn't compile on Windows according to Appveyor, seemingly because > > > of a syntax error in the new win32ntdll.h file, but the MSVC logs are hard on > > > the eye so it might be unrelated. > I think this was broken by WIN32_LEAN_AND_MEAN (and since gained a > merge conflict, but that's easy to fix). I'll try to figure out the > right system header hacks to unbreak it... Short version: It needed <winternl.h>. Long version: Where Unix shares headers between user space and kernel with #ifdef _KERNEL, today I learned that Windows seems to have two universes of headers, with some stuff defined in both places. You can't cross the streams. I had already defined UMDF_USING_NTSTATUS, which tells <windows.h> that you're planning to include <ntstatus.h>, to avoid a bunch of double-definitions (the other approach I'd found on the 'net was to #define and #undef WIN32_NO_STATUS in the right places), but when WIN32_LEAN_AND_MEAN was added, that combination lost the definition of NTSTATUS, which is needed by various macros like WAIT_OBJECT_0 (it's used in casts). It's supposed to come from <ntdef.h>, but if you include that directly you get more double definitions of other random stuff. Eventually I learned that <winternl.h> fixes that. No doubt this is eroding the gains made by WIN32_LEAN_AND_MEAN, but I don't see how to avoid it until we do the work to stop including <windows.h> in win32_port.h. Well, I do know one way... I noticed that <bcrypt.h> just defines NTSTATUS itself if it sees that <ntdef.h> hasn't been included (by testing its include guard). I tried that and it worked, but it seems pretty ugly and not something that we should be doing.
Attachment
On Mon, Dec 6, 2021 at 9:17 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Sat, Dec 4, 2021 at 6:18 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > I think this was broken by WIN32_LEAN_AND_MEAN (and since gained a > > merge conflict, but that's easy to fix). I'll try to figure out the > > right system header hacks to unbreak it... > > Short version: It needed <winternl.h>. Slightly improvement: now I include <winternl.h> only from src/port/open.c and src/port/win32ntdll.c, so I avoid the extra include for the other ~1500 translation units. That requires a small extra step to work, see comment in win32ntdll.h. I checked that this still cross-compiles OK under mingw on Linux. This is the version that I'm planning to push to master only tomorrow if there are no objections.
Attachment
On Thu, Dec 9, 2021 at 9:16 PM Thomas Munro <thomas.munro@gmail.com> wrote: > Slightly improvement: now I include <winternl.h> only from > src/port/open.c and src/port/win32ntdll.c, so I avoid the extra > include for the other ~1500 translation units. That requires a small > extra step to work, see comment in win32ntdll.h. I checked that this > still cross-compiles OK under mingw on Linux. This is the version > that I'm planning to push to master only tomorrow if there are no > objections. Done. I'll keep an eye on the build farm.