Understanding, testing and improving our Windows filesystem code - Mailing list pgsql-hackers

From Thomas Munro
Subject Understanding, testing and improving our Windows filesystem code
Date
Msg-id CA+hUKG+ajSQ_8eu2AogTncOnZ5me2D-Cn66iN_-wZnRjLN+icg@mail.gmail.com
Whole thread Raw
Responses Re: Understanding, testing and improving our Windows filesystem code  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
Hi,

As a card-carrying Unix hacker, I think it'd be great to remove the
differences between platforms if possible using newer Windows
facilities, so everything just works the way we expect.  Two things
that stopped progress on that front in the past were (1) uncertainty
about OS versioning, fixed in v16 with an EOL purge, and (2)
uncertainty about what the new interfaces really do, due to lack of
good ways to test, which I'd like to address here.

My goals in this thread:

 * introduce a pattern/idiom for TAP-testing of low level C code
without a database server
 * demonstrate the behaviour of our filesystem porting code with full coverage
 * fix reported bugs in my recent symlink changes with coverage
 * understand the new "POSIX semantics" changes in Windows 10
 * figure out what our policy should be on "POSIX semantics"

For context, we have a bunch of stuff under src/port to provide
POSIX-like implementations of:

  open()*
  fstat(), stat()*, lstat()*
  link(), unlink()*, rename()*
  symlink(), readlink()
  opendir(), readdir(), closedir()
  pg_pwrite(), pg_pread()

These call equivalent Windows system interfaces so we can mostly just
write code that assumes the whole world is a POSIX.  Some of them also
deal with three special aspects of Windows file handles, which
occasionally cause trouble:

1.  errno == EACCES && GetLastError() == ERROR_SHARING_VIOLATION:
This happens if you try to access a file that has been opened by
without FILE_SHARE_ flags to allow concurrent access.  While our own
open() wrapper uses those flags, other programs might not.  The
wrapper functions marked with an asterix above deal with this
condition by sleeping or retrying for 10 or 30 seconds, in the hope
that the external program goes away.  AFAIK this problem will always
be with us.

2.  errno == EACCES && GetLastNtStatus() == STATUS_DELETE_PENDING:
This happens if you try to access a directory entry that is scheduled
for asynchronous unlink, but is still present until all handles to the
file are closed.  The wrapper functions above deal with this in
various different ways:

 open() without O_CREAT: -> ENOENT, so we can pretend that unlink()
calls are synchronous
 open() with O_CREAT: -> EEXIST, the zombie dirent wins
 stat(), lstat(): -> ENOENT
 unlink(), rename(): retry, same as we do for ERROR_SHARING_VIOLATION,
until timeout or asynchonous unlink completes (this may have been
unintentional due to same errno?)

3.  errno == EACCESS && <not sure>:  You can't MoveFileEx() on top of
a file that someone has open.

In Windows 10, a new "POSIX semantics" mode was added.  Yippee!
Victor Spirin proposed[1] that we use it several commitfests ago.
Interestingly, on some systems it is already partially activated
without any change on our part.  That is, on some systems, unlink()
works synchronously (when the call returns, the dirent is gone, even
if someone else has the file open, just like Unix).  Sounds great, but
in testing different Windows systems I have access to using the
attached test suite I found three different sets of behaviour:

 A) Using Windows unlink() and MoveFileEx() on Server 2019 (CI) I get
traditional STATUS_DELETE_PENDING problems
 B) Using Windows unlink()/MoveFileEx() on Windows 10 Home (local VM)
I get mostly POSIX behaviour, except problem (3) above, which you can
see in my test suite
 C) Using Windows new SetFileInformationByHandle() calls with explicit
request for POSIX semantics (this syscall is something like fcntl(), a
kitchen sink kernel interface, and is what unlink() and MoveFileEx()
and others things are built on, but if you do it yourself you can
reach more flags) I get full POSIX behaviour according to my test
suite, i.e. agreement with FreeBSD and Linux for the dirent-related
cases I've though about so far, on both of those Windows systems

It sounds like we want option C, as Victor proposed, but I'm not sure
what happens if you try to use it on a non-NTFS filesystem (does it
quietly fall back to non-POSIX semantics, or fail, or do all file
systems now support this?).  I'm also not sure if we really support
running on a non-NTFS filesystem, not being a Windows user myself.

So the questions I have are:

 * any thoughts on this C TAP testing system?
 * has anyone got a non-EOL'd OS version where this test suite fails?
 * has anyone got a relevant filesystem where this fails?  which way
do ReFS and SMB go?  do the new calls in 0010 just fail, and if so
with which code (ie could we add our own fallback path)?
 * which filesystems do we even claim to support?
 * if we switched to explicitly using POSIX-semantics like in the 0010
patch, I assume there would be nothing left in the build farm or CI
that tests the non-POSIX code paths (either in these tests or in the
real server code), and the non-POSIX support would decay into
non-working form pretty quickly
 * if there are any filesystems that don't support POSIX-semantics,
would we want to either (1) get such a thing into the build farm so
it's tested or (2) de-support non-POSIX-semantics filesystems by
edict, and drop a lot of code and problems that everyone hates?

Thanks to Andres for the 0002 meson support.  I have not yet written
autoconf support; I guess I'd have to do that.

You can run this with eg "meson test --suite=port -v" on any OS.  The
first test result tells you whether it detected POSIX semantics or
not, which affects later testing.  Unix systems are always detected as
POSIX, Windows 10+ systems are always POSIX if you apply the final
patch, but could be either depending on your Windows version if you
don't, except they still have the quirk about problem (3) above for
some reason, which is why the relevant test changes in the final
patch.

(Note: The 0010 patch fails on the CI CompilerCheck cross build, which
I think has to do with wanting _WIN32_WINNT >= 0xA02 to see some
definitions, not looked into yet, and I haven't thought much about
Cygwin, but I expect they turn on all the POSIX things under the
covers too.)

[1] https://commitfest.postgresql.org/40/3347/

Attachment

pgsql-hackers by date:

Previous
From: Yugo NAGATA
Date:
Subject: Re: make_ctags: use -I option to ignore pg_node_attr macro
Next
From: Devrim Gündüz
Date:
Subject: Re: [PATCH] Fix build with LLVM 15 or above