Thread: Understanding, testing and improving our Windows filesystem code

Understanding, testing and improving our Windows filesystem code

From
Thomas Munro
Date:
Hi,

As a card-carrying Unix hacker, I think it'd be great to remove the
differences between platforms if possible using newer Windows
facilities, so everything just works the way we expect.  Two things
that stopped progress on that front in the past were (1) uncertainty
about OS versioning, fixed in v16 with an EOL purge, and (2)
uncertainty about what the new interfaces really do, due to lack of
good ways to test, which I'd like to address here.

My goals in this thread:

 * introduce a pattern/idiom for TAP-testing of low level C code
without a database server
 * demonstrate the behaviour of our filesystem porting code with full coverage
 * fix reported bugs in my recent symlink changes with coverage
 * understand the new "POSIX semantics" changes in Windows 10
 * figure out what our policy should be on "POSIX semantics"

For context, we have a bunch of stuff under src/port to provide
POSIX-like implementations of:

  open()*
  fstat(), stat()*, lstat()*
  link(), unlink()*, rename()*
  symlink(), readlink()
  opendir(), readdir(), closedir()
  pg_pwrite(), pg_pread()

These call equivalent Windows system interfaces so we can mostly just
write code that assumes the whole world is a POSIX.  Some of them also
deal with three special aspects of Windows file handles, which
occasionally cause trouble:

1.  errno == EACCES && GetLastError() == ERROR_SHARING_VIOLATION:
This happens if you try to access a file that has been opened by
without FILE_SHARE_ flags to allow concurrent access.  While our own
open() wrapper uses those flags, other programs might not.  The
wrapper functions marked with an asterix above deal with this
condition by sleeping or retrying for 10 or 30 seconds, in the hope
that the external program goes away.  AFAIK this problem will always
be with us.

2.  errno == EACCES && GetLastNtStatus() == STATUS_DELETE_PENDING:
This happens if you try to access a directory entry that is scheduled
for asynchronous unlink, but is still present until all handles to the
file are closed.  The wrapper functions above deal with this in
various different ways:

 open() without O_CREAT: -> ENOENT, so we can pretend that unlink()
calls are synchronous
 open() with O_CREAT: -> EEXIST, the zombie dirent wins
 stat(), lstat(): -> ENOENT
 unlink(), rename(): retry, same as we do for ERROR_SHARING_VIOLATION,
until timeout or asynchonous unlink completes (this may have been
unintentional due to same errno?)

3.  errno == EACCESS && <not sure>:  You can't MoveFileEx() on top of
a file that someone has open.

In Windows 10, a new "POSIX semantics" mode was added.  Yippee!
Victor Spirin proposed[1] that we use it several commitfests ago.
Interestingly, on some systems it is already partially activated
without any change on our part.  That is, on some systems, unlink()
works synchronously (when the call returns, the dirent is gone, even
if someone else has the file open, just like Unix).  Sounds great, but
in testing different Windows systems I have access to using the
attached test suite I found three different sets of behaviour:

 A) Using Windows unlink() and MoveFileEx() on Server 2019 (CI) I get
traditional STATUS_DELETE_PENDING problems
 B) Using Windows unlink()/MoveFileEx() on Windows 10 Home (local VM)
I get mostly POSIX behaviour, except problem (3) above, which you can
see in my test suite
 C) Using Windows new SetFileInformationByHandle() calls with explicit
request for POSIX semantics (this syscall is something like fcntl(), a
kitchen sink kernel interface, and is what unlink() and MoveFileEx()
and others things are built on, but if you do it yourself you can
reach more flags) I get full POSIX behaviour according to my test
suite, i.e. agreement with FreeBSD and Linux for the dirent-related
cases I've though about so far, on both of those Windows systems

It sounds like we want option C, as Victor proposed, but I'm not sure
what happens if you try to use it on a non-NTFS filesystem (does it
quietly fall back to non-POSIX semantics, or fail, or do all file
systems now support this?).  I'm also not sure if we really support
running on a non-NTFS filesystem, not being a Windows user myself.

So the questions I have are:

 * any thoughts on this C TAP testing system?
 * has anyone got a non-EOL'd OS version where this test suite fails?
 * has anyone got a relevant filesystem where this fails?  which way
do ReFS and SMB go?  do the new calls in 0010 just fail, and if so
with which code (ie could we add our own fallback path)?
 * which filesystems do we even claim to support?
 * if we switched to explicitly using POSIX-semantics like in the 0010
patch, I assume there would be nothing left in the build farm or CI
that tests the non-POSIX code paths (either in these tests or in the
real server code), and the non-POSIX support would decay into
non-working form pretty quickly
 * if there are any filesystems that don't support POSIX-semantics,
would we want to either (1) get such a thing into the build farm so
it's tested or (2) de-support non-POSIX-semantics filesystems by
edict, and drop a lot of code and problems that everyone hates?

Thanks to Andres for the 0002 meson support.  I have not yet written
autoconf support; I guess I'd have to do that.

You can run this with eg "meson test --suite=port -v" on any OS.  The
first test result tells you whether it detected POSIX semantics or
not, which affects later testing.  Unix systems are always detected as
POSIX, Windows 10+ systems are always POSIX if you apply the final
patch, but could be either depending on your Windows version if you
don't, except they still have the quirk about problem (3) above for
some reason, which is why the relevant test changes in the final
patch.

(Note: The 0010 patch fails on the CI CompilerCheck cross build, which
I think has to do with wanting _WIN32_WINNT >= 0xA02 to see some
definitions, not looked into yet, and I haven't thought much about
Cygwin, but I expect they turn on all the POSIX things under the
covers too.)

[1] https://commitfest.postgresql.org/40/3347/

Attachment

Re: Understanding, testing and improving our Windows filesystem code

From
Thomas Munro
Date:
On Tue, Oct 18, 2022 at 10:00 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>  * has anyone got a relevant filesystem where this fails?  which way
> do ReFS and SMB go?  do the new calls in 0010 just fail, and if so
> with which code (ie could we add our own fallback path)?

Andres kindly ran these tests on some Win 10 and Win 11 VMs he had
with non-NTFS filesystems, so I can report:

NTFS: have_posix_unlink_semantics == true, tests passing

ReFS: have_posix_unlink_semantics == false, tests passing

SMB: have_posix_unlink_semantics == false, symlink related tests
failing (our junction points are rejected) + one readdir() test
failing (semantic difference introduced by SMB, it can't see
STATUS_DELETE_PENDING zombies).

I think this means that PostgreSQL probably mostly works on SMB today,
except you can't create tablespaces, and therefore our regression
tests etc already can't pass there, and there may be a few extra
ENOTEMPTY race conditions due to readdir()'s different behaviour.

>  * if there are any filesystems that don't support POSIX-semantics,
> would we want to either (1) get such a thing into the build farm so
> it's tested or (2) de-support non-POSIX-semantics filesystems by
> edict, and drop a lot of code and problems that everyone hates?

Yes, yes there are, so this question comes up.  Put another way:

I guess that almost all users of PostgreSQL on Windows are using NTFS.
Some are getting partial POSIX semantics already, and some are not,
depending on the Windows variant.  If we commit the 0010 patch, all
supported OSes will get full POSIX unlink semantics on NTFS.  That'd
leave just ReFS and SMB users (are there any other relevant
filesystems?) in the cold with non-POSIX semantics.  Do we want to
claim that we support those filesystems?  If so, I guess we'd need an
animal and perhaps also optional CI with ReFS.  (Though ReFS may
eventually get POSIX semantics too, I have no idea about that.)  If
not, we could in theory rip out various code we have to cope with the
non-POSIX unlink semantics, and completely forget about that whole
category of problem.

Changes in this version:
* try to avoid tests that do bad things that crash if earlier tests
failed (I learned that close(-1) aborts in debug builds)
* add fallback paths in 0010 (I learned what errors are raised on lack
of POSIX support)
* fix MinGW build problems

As far as I could tell, MinGW doesn't have a struct definition we
need, and it seems to want _WIN32_WINNT >= 0x0A000002 to see
FileRenameInfoEx, which looks weird to me... (I'm not sure about that,
but I think that was perhaps supposed to be 0x0A02, but even that
isn't necessary with MSVC SDK headers).  I gave up researching that
and put the definitions I needed into the code.

Attachment

Re: Understanding, testing and improving our Windows filesystem code

From
Thomas Munro
Date:
I pushed the bug fixes from this series, without their accompanying
tests.  Here's a rebase of the test suite, with all those tests now
squashed into the main test patch, and also the
tell-Windows-to-be-more-like-Unix patch.  Registered in the
commitfest.

Attachment

Re: Understanding, testing and improving our Windows filesystem code

From
Ian Lawrence Barwick
Date:
2022年10月25日(火) 13:12 Thomas Munro <thomas.munro@gmail.com>:
>
> I pushed the bug fixes from this series, without their accompanying
> tests.  Here's a rebase of the test suite, with all those tests now
> squashed into the main test patch, and also the
> tell-Windows-to-be-more-like-Unix patch.  Registered in the
> commitfest.



Re: Understanding, testing and improving our Windows filesystem code

From
Ian Lawrence Barwick
Date:
2022年10月25日(火) 13:12 Thomas Munro <thomas.munro@gmail.com>:
>
> I pushed the bug fixes from this series, without their accompanying
> tests.  Here's a rebase of the test suite, with all those tests now
> squashed into the main test patch, and also the
> tell-Windows-to-be-more-like-Unix patch.  Registered in the
> commitfest.

For reference: https://commitfest.postgresql.org/40/3951/

For my understanding, does this entry supersede the proposal in
https://commitfest.postgresql.org/40/3347/ ?

(Apologies for the previous mail with no additional content).

Regards

Ian Barwick



Re: Understanding, testing and improving our Windows filesystem code

From
Thomas Munro
Date:
On Mon, Nov 28, 2022 at 8:58 PM Ian Lawrence Barwick <barwick@gmail.com> wrote:
> For my understanding, does this entry supersede the proposal in
> https://commitfest.postgresql.org/40/3347/ ?

I think so (Victor hasn't commented).  Patch 0004 derives from
Victor's patch and has him as primary author still, but I made some
changes:

* remove obsolete version check code
* provide fallback code for systems where it doesn't work (after some
research to determine that there are such systems, and what they do)
* test that it's really more POSIX-like and demonstrate what that
means (building on 0003)

Patch 0003 is a set of file system semantics tests that work on Unix,
but also exercise those src/port/*.c wrappers on Windows and show
differences from Unix semantics.  Some of these tests also verify
various bugfixes already committed, so they've been pretty useful to
me already even though they aren't in the tree yet.

Patches 0001 and 0002 are generic, unrelated to this Windows stuff,
and  provide a simple way to write unit tests for small bits of C code
without a whole PostgreSQL server.  That's something that has been
proposed in the abstract many times before by many people.  Here I've
tried to be minimalist about it, just what I needed for the
higher-numbered patches, building on existing technologies (TAP).



Re: Understanding, testing and improving our Windows filesystem code

From
vignesh C
Date:
On Tue, 25 Oct 2022 at 09:42, Thomas Munro <thomas.munro@gmail.com> wrote:
>
> I pushed the bug fixes from this series, without their accompanying
> tests.  Here's a rebase of the test suite, with all those tests now
> squashed into the main test patch, and also the
> tell-Windows-to-be-more-like-Unix patch.  Registered in the
> commitfest.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
5212d447fa53518458cbe609092b347803a667c5 ===
=== applying patch
./v3-0002-meson-Add-infrastructure-for-TAP-tests-written-in.patch
patching file meson.build
Hunk #5 FAILED at 3000.
Hunk #6 FAILED at 3035.
2 out of 6 hunks FAILED -- saving rejects to file meson.build.rej

[1] - http://cfbot.cputube.org/patch_41_3951.log

Regards,
Vignesh



Re: Understanding, testing and improving our Windows filesystem code

From
Thomas Munro
Date:
On Thu, Jan 5, 2023 at 1:06 AM vignesh C <vignesh21@gmail.com> wrote:
> On Tue, 25 Oct 2022 at 09:42, Thomas Munro <thomas.munro@gmail.com> wrote:
> > I pushed the bug fixes from this series, without their accompanying
> > tests.  Here's a rebase of the test suite, with all those tests now
> > squashed into the main test patch, and also the
> > tell-Windows-to-be-more-like-Unix patch.  Registered in the
> > commitfest.
>
> The patch does not apply ...

I think this exercise was (if I say so myself) quite useful, to
understand the Windows file system landscape.  Maybe the things we
figured out by testing are common knowledge to real Windows
programmers, I dunno, but they were certainly all news to me and not
documented anywhere I could find, and the knowledge and tests will
probably help in future battles against Windows.  The most important
things discovered were:

 1.  If you're testing on a Windows VM or laptop running 10 or 11 *you
aren't seeing the same behaviour as Windows Server*.  So the semantics
don't match real production PostgreSQL deployments.

 2.  If we decided to turn on the new POSIX unlink semantics
explicitly as originally proposed by Victor, we'd get the behaviour we
really want on NTFS on all known Windows versions.  But that would
move the traditional behaviour into a blind spot that we have no
testing for: ReFS and SMB.  Our tree would probably gain more stuff
that doesn't work on them, so that would be tantamount to dropping
support.

Therefore, with regret, I'm going to withdraw this for now.  We'd need
to get CI testing for ReFS and/or SMB first, which could be arranged,
but even then, what is the point of POSIX semantics if you don't have
them everywhere?  You can't even remove any code!  Unless we could
reach consensus that "PostgreSQL is not supported on SMB or ReFS until
they gain POSIX semantics" [which may never happen for all we know],
and then commit this patch and forget about non-POSIX unlink semantics
forever.  I don't see us doing that in a hurry.  So there's not much
hope for this idea in this commitfest.

The little C TAP framework could definitely be useful as a starting
point for something else, and the FS semantics test will definitely
come in handy if this topic is reopened by some of those potential
actions or needed to debug existing behaviour, and then I might even
re-propose parts of it, but it's all here in the archives anyway.