Thread: Understanding, testing and improving our Windows filesystem code
Hi, As a card-carrying Unix hacker, I think it'd be great to remove the differences between platforms if possible using newer Windows facilities, so everything just works the way we expect. Two things that stopped progress on that front in the past were (1) uncertainty about OS versioning, fixed in v16 with an EOL purge, and (2) uncertainty about what the new interfaces really do, due to lack of good ways to test, which I'd like to address here. My goals in this thread: * introduce a pattern/idiom for TAP-testing of low level C code without a database server * demonstrate the behaviour of our filesystem porting code with full coverage * fix reported bugs in my recent symlink changes with coverage * understand the new "POSIX semantics" changes in Windows 10 * figure out what our policy should be on "POSIX semantics" For context, we have a bunch of stuff under src/port to provide POSIX-like implementations of: open()* fstat(), stat()*, lstat()* link(), unlink()*, rename()* symlink(), readlink() opendir(), readdir(), closedir() pg_pwrite(), pg_pread() These call equivalent Windows system interfaces so we can mostly just write code that assumes the whole world is a POSIX. Some of them also deal with three special aspects of Windows file handles, which occasionally cause trouble: 1. errno == EACCES && GetLastError() == ERROR_SHARING_VIOLATION: This happens if you try to access a file that has been opened by without FILE_SHARE_ flags to allow concurrent access. While our own open() wrapper uses those flags, other programs might not. The wrapper functions marked with an asterix above deal with this condition by sleeping or retrying for 10 or 30 seconds, in the hope that the external program goes away. AFAIK this problem will always be with us. 2. errno == EACCES && GetLastNtStatus() == STATUS_DELETE_PENDING: This happens if you try to access a directory entry that is scheduled for asynchronous unlink, but is still present until all handles to the file are closed. The wrapper functions above deal with this in various different ways: open() without O_CREAT: -> ENOENT, so we can pretend that unlink() calls are synchronous open() with O_CREAT: -> EEXIST, the zombie dirent wins stat(), lstat(): -> ENOENT unlink(), rename(): retry, same as we do for ERROR_SHARING_VIOLATION, until timeout or asynchonous unlink completes (this may have been unintentional due to same errno?) 3. errno == EACCESS && <not sure>: You can't MoveFileEx() on top of a file that someone has open. In Windows 10, a new "POSIX semantics" mode was added. Yippee! Victor Spirin proposed[1] that we use it several commitfests ago. Interestingly, on some systems it is already partially activated without any change on our part. That is, on some systems, unlink() works synchronously (when the call returns, the dirent is gone, even if someone else has the file open, just like Unix). Sounds great, but in testing different Windows systems I have access to using the attached test suite I found three different sets of behaviour: A) Using Windows unlink() and MoveFileEx() on Server 2019 (CI) I get traditional STATUS_DELETE_PENDING problems B) Using Windows unlink()/MoveFileEx() on Windows 10 Home (local VM) I get mostly POSIX behaviour, except problem (3) above, which you can see in my test suite C) Using Windows new SetFileInformationByHandle() calls with explicit request for POSIX semantics (this syscall is something like fcntl(), a kitchen sink kernel interface, and is what unlink() and MoveFileEx() and others things are built on, but if you do it yourself you can reach more flags) I get full POSIX behaviour according to my test suite, i.e. agreement with FreeBSD and Linux for the dirent-related cases I've though about so far, on both of those Windows systems It sounds like we want option C, as Victor proposed, but I'm not sure what happens if you try to use it on a non-NTFS filesystem (does it quietly fall back to non-POSIX semantics, or fail, or do all file systems now support this?). I'm also not sure if we really support running on a non-NTFS filesystem, not being a Windows user myself. So the questions I have are: * any thoughts on this C TAP testing system? * has anyone got a non-EOL'd OS version where this test suite fails? * has anyone got a relevant filesystem where this fails? which way do ReFS and SMB go? do the new calls in 0010 just fail, and if so with which code (ie could we add our own fallback path)? * which filesystems do we even claim to support? * if we switched to explicitly using POSIX-semantics like in the 0010 patch, I assume there would be nothing left in the build farm or CI that tests the non-POSIX code paths (either in these tests or in the real server code), and the non-POSIX support would decay into non-working form pretty quickly * if there are any filesystems that don't support POSIX-semantics, would we want to either (1) get such a thing into the build farm so it's tested or (2) de-support non-POSIX-semantics filesystems by edict, and drop a lot of code and problems that everyone hates? Thanks to Andres for the 0002 meson support. I have not yet written autoconf support; I guess I'd have to do that. You can run this with eg "meson test --suite=port -v" on any OS. The first test result tells you whether it detected POSIX semantics or not, which affects later testing. Unix systems are always detected as POSIX, Windows 10+ systems are always POSIX if you apply the final patch, but could be either depending on your Windows version if you don't, except they still have the quirk about problem (3) above for some reason, which is why the relevant test changes in the final patch. (Note: The 0010 patch fails on the CI CompilerCheck cross build, which I think has to do with wanting _WIN32_WINNT >= 0xA02 to see some definitions, not looked into yet, and I haven't thought much about Cygwin, but I expect they turn on all the POSIX things under the covers too.) [1] https://commitfest.postgresql.org/40/3347/
Attachment
- 0001-Add-suite-of-macros-for-writing-TAP-tests-in-C.patch
- 0002-meson-Add-infrastructure-for-TAP-tests-written-in-C.patch
- 0003-Fix-symlink-errno-in-Windows-replacement-code.patch
- 0004-Fix-readlink-return-value-on-Windows.patch
- 0005-Add-tests-for-Windows-filesystem-code-in-src-port.patch
- 0006-Fix-lstat-on-broken-junction-points.patch
- 0007-Fix-readlink-for-non-PostgreSQL-created-junction-poi.patch
- 0008-Fix-stat-for-recursive-junction-points-on-Windows.patch
- 0009-Fix-unlink-for-STATUS_DELETE_PENDING-on-Windows.patch
- 0010-Use-POSIX-semantics-for-unlink-and-rename-on-Windows.patch
On Tue, Oct 18, 2022 at 10:00 PM Thomas Munro <thomas.munro@gmail.com> wrote: > * has anyone got a relevant filesystem where this fails? which way > do ReFS and SMB go? do the new calls in 0010 just fail, and if so > with which code (ie could we add our own fallback path)? Andres kindly ran these tests on some Win 10 and Win 11 VMs he had with non-NTFS filesystems, so I can report: NTFS: have_posix_unlink_semantics == true, tests passing ReFS: have_posix_unlink_semantics == false, tests passing SMB: have_posix_unlink_semantics == false, symlink related tests failing (our junction points are rejected) + one readdir() test failing (semantic difference introduced by SMB, it can't see STATUS_DELETE_PENDING zombies). I think this means that PostgreSQL probably mostly works on SMB today, except you can't create tablespaces, and therefore our regression tests etc already can't pass there, and there may be a few extra ENOTEMPTY race conditions due to readdir()'s different behaviour. > * if there are any filesystems that don't support POSIX-semantics, > would we want to either (1) get such a thing into the build farm so > it's tested or (2) de-support non-POSIX-semantics filesystems by > edict, and drop a lot of code and problems that everyone hates? Yes, yes there are, so this question comes up. Put another way: I guess that almost all users of PostgreSQL on Windows are using NTFS. Some are getting partial POSIX semantics already, and some are not, depending on the Windows variant. If we commit the 0010 patch, all supported OSes will get full POSIX unlink semantics on NTFS. That'd leave just ReFS and SMB users (are there any other relevant filesystems?) in the cold with non-POSIX semantics. Do we want to claim that we support those filesystems? If so, I guess we'd need an animal and perhaps also optional CI with ReFS. (Though ReFS may eventually get POSIX semantics too, I have no idea about that.) If not, we could in theory rip out various code we have to cope with the non-POSIX unlink semantics, and completely forget about that whole category of problem. Changes in this version: * try to avoid tests that do bad things that crash if earlier tests failed (I learned that close(-1) aborts in debug builds) * add fallback paths in 0010 (I learned what errors are raised on lack of POSIX support) * fix MinGW build problems As far as I could tell, MinGW doesn't have a struct definition we need, and it seems to want _WIN32_WINNT >= 0x0A000002 to see FileRenameInfoEx, which looks weird to me... (I'm not sure about that, but I think that was perhaps supposed to be 0x0A02, but even that isn't necessary with MSVC SDK headers). I gave up researching that and put the definitions I needed into the code.
Attachment
- v2-0001-Add-suite-of-macros-for-writing-TAP-tests-in-C.patch
- v2-0002-meson-Add-infrastructure-for-TAP-tests-written-in.patch
- v2-0003-Fix-symlink-errno-in-Windows-replacement-code.patch
- v2-0004-Fix-readlink-return-value-on-Windows.patch
- v2-0005-Add-tests-for-Windows-filesystem-code-in-src-port.patch
- v2-0006-Fix-lstat-on-broken-junction-points.patch
- v2-0007-Fix-readlink-for-non-PostgreSQL-created-junction-.patch
- v2-0008-Fix-stat-for-recursive-junction-points-on-Windows.patch
- v2-0009-Fix-unlink-for-STATUS_DELETE_PENDING-on-Windows.patch
- v2-0010-Use-POSIX-semantics-for-unlink-and-rename-on-Wind.patch
I pushed the bug fixes from this series, without their accompanying tests. Here's a rebase of the test suite, with all those tests now squashed into the main test patch, and also the tell-Windows-to-be-more-like-Unix patch. Registered in the commitfest.
Attachment
Re: Understanding, testing and improving our Windows filesystem code
From
Ian Lawrence Barwick
Date:
2022年10月25日(火) 13:12 Thomas Munro <thomas.munro@gmail.com>: > > I pushed the bug fixes from this series, without their accompanying > tests. Here's a rebase of the test suite, with all those tests now > squashed into the main test patch, and also the > tell-Windows-to-be-more-like-Unix patch. Registered in the > commitfest.
Re: Understanding, testing and improving our Windows filesystem code
From
Ian Lawrence Barwick
Date:
2022年10月25日(火) 13:12 Thomas Munro <thomas.munro@gmail.com>: > > I pushed the bug fixes from this series, without their accompanying > tests. Here's a rebase of the test suite, with all those tests now > squashed into the main test patch, and also the > tell-Windows-to-be-more-like-Unix patch. Registered in the > commitfest. For reference: https://commitfest.postgresql.org/40/3951/ For my understanding, does this entry supersede the proposal in https://commitfest.postgresql.org/40/3347/ ? (Apologies for the previous mail with no additional content). Regards Ian Barwick
On Mon, Nov 28, 2022 at 8:58 PM Ian Lawrence Barwick <barwick@gmail.com> wrote: > For my understanding, does this entry supersede the proposal in > https://commitfest.postgresql.org/40/3347/ ? I think so (Victor hasn't commented). Patch 0004 derives from Victor's patch and has him as primary author still, but I made some changes: * remove obsolete version check code * provide fallback code for systems where it doesn't work (after some research to determine that there are such systems, and what they do) * test that it's really more POSIX-like and demonstrate what that means (building on 0003) Patch 0003 is a set of file system semantics tests that work on Unix, but also exercise those src/port/*.c wrappers on Windows and show differences from Unix semantics. Some of these tests also verify various bugfixes already committed, so they've been pretty useful to me already even though they aren't in the tree yet. Patches 0001 and 0002 are generic, unrelated to this Windows stuff, and provide a simple way to write unit tests for small bits of C code without a whole PostgreSQL server. That's something that has been proposed in the abstract many times before by many people. Here I've tried to be minimalist about it, just what I needed for the higher-numbered patches, building on existing technologies (TAP).
On Tue, 25 Oct 2022 at 09:42, Thomas Munro <thomas.munro@gmail.com> wrote: > > I pushed the bug fixes from this series, without their accompanying > tests. Here's a rebase of the test suite, with all those tests now > squashed into the main test patch, and also the > tell-Windows-to-be-more-like-Unix patch. Registered in the > commitfest. The patch does not apply on top of HEAD as in [1], please post a rebased patch: === Applying patches on top of PostgreSQL commit ID 5212d447fa53518458cbe609092b347803a667c5 === === applying patch ./v3-0002-meson-Add-infrastructure-for-TAP-tests-written-in.patch patching file meson.build Hunk #5 FAILED at 3000. Hunk #6 FAILED at 3035. 2 out of 6 hunks FAILED -- saving rejects to file meson.build.rej [1] - http://cfbot.cputube.org/patch_41_3951.log Regards, Vignesh
On Thu, Jan 5, 2023 at 1:06 AM vignesh C <vignesh21@gmail.com> wrote: > On Tue, 25 Oct 2022 at 09:42, Thomas Munro <thomas.munro@gmail.com> wrote: > > I pushed the bug fixes from this series, without their accompanying > > tests. Here's a rebase of the test suite, with all those tests now > > squashed into the main test patch, and also the > > tell-Windows-to-be-more-like-Unix patch. Registered in the > > commitfest. > > The patch does not apply ... I think this exercise was (if I say so myself) quite useful, to understand the Windows file system landscape. Maybe the things we figured out by testing are common knowledge to real Windows programmers, I dunno, but they were certainly all news to me and not documented anywhere I could find, and the knowledge and tests will probably help in future battles against Windows. The most important things discovered were: 1. If you're testing on a Windows VM or laptop running 10 or 11 *you aren't seeing the same behaviour as Windows Server*. So the semantics don't match real production PostgreSQL deployments. 2. If we decided to turn on the new POSIX unlink semantics explicitly as originally proposed by Victor, we'd get the behaviour we really want on NTFS on all known Windows versions. But that would move the traditional behaviour into a blind spot that we have no testing for: ReFS and SMB. Our tree would probably gain more stuff that doesn't work on them, so that would be tantamount to dropping support. Therefore, with regret, I'm going to withdraw this for now. We'd need to get CI testing for ReFS and/or SMB first, which could be arranged, but even then, what is the point of POSIX semantics if you don't have them everywhere? You can't even remove any code! Unless we could reach consensus that "PostgreSQL is not supported on SMB or ReFS until they gain POSIX semantics" [which may never happen for all we know], and then commit this patch and forget about non-POSIX unlink semantics forever. I don't see us doing that in a hurry. So there's not much hope for this idea in this commitfest. The little C TAP framework could definitely be useful as a starting point for something else, and the FS semantics test will definitely come in handy if this topic is reopened by some of those potential actions or needed to debug existing behaviour, and then I might even re-propose parts of it, but it's all here in the archives anyway.