Re: Understanding, testing and improving our Windows filesystem code - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Understanding, testing and improving our Windows filesystem code
Date
Msg-id CA+hUKG+r07Zy6AHroUDZm9k743a_y3r-puTnhKEM2zuFQymYkw@mail.gmail.com
Whole thread Raw
In response to Understanding, testing and improving our Windows filesystem code  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Understanding, testing and improving our Windows filesystem code  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
On Tue, Oct 18, 2022 at 10:00 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>  * has anyone got a relevant filesystem where this fails?  which way
> do ReFS and SMB go?  do the new calls in 0010 just fail, and if so
> with which code (ie could we add our own fallback path)?

Andres kindly ran these tests on some Win 10 and Win 11 VMs he had
with non-NTFS filesystems, so I can report:

NTFS: have_posix_unlink_semantics == true, tests passing

ReFS: have_posix_unlink_semantics == false, tests passing

SMB: have_posix_unlink_semantics == false, symlink related tests
failing (our junction points are rejected) + one readdir() test
failing (semantic difference introduced by SMB, it can't see
STATUS_DELETE_PENDING zombies).

I think this means that PostgreSQL probably mostly works on SMB today,
except you can't create tablespaces, and therefore our regression
tests etc already can't pass there, and there may be a few extra
ENOTEMPTY race conditions due to readdir()'s different behaviour.

>  * if there are any filesystems that don't support POSIX-semantics,
> would we want to either (1) get such a thing into the build farm so
> it's tested or (2) de-support non-POSIX-semantics filesystems by
> edict, and drop a lot of code and problems that everyone hates?

Yes, yes there are, so this question comes up.  Put another way:

I guess that almost all users of PostgreSQL on Windows are using NTFS.
Some are getting partial POSIX semantics already, and some are not,
depending on the Windows variant.  If we commit the 0010 patch, all
supported OSes will get full POSIX unlink semantics on NTFS.  That'd
leave just ReFS and SMB users (are there any other relevant
filesystems?) in the cold with non-POSIX semantics.  Do we want to
claim that we support those filesystems?  If so, I guess we'd need an
animal and perhaps also optional CI with ReFS.  (Though ReFS may
eventually get POSIX semantics too, I have no idea about that.)  If
not, we could in theory rip out various code we have to cope with the
non-POSIX unlink semantics, and completely forget about that whole
category of problem.

Changes in this version:
* try to avoid tests that do bad things that crash if earlier tests
failed (I learned that close(-1) aborts in debug builds)
* add fallback paths in 0010 (I learned what errors are raised on lack
of POSIX support)
* fix MinGW build problems

As far as I could tell, MinGW doesn't have a struct definition we
need, and it seems to want _WIN32_WINNT >= 0x0A000002 to see
FileRenameInfoEx, which looks weird to me... (I'm not sure about that,
but I think that was perhaps supposed to be 0x0A02, but even that
isn't necessary with MSVC SDK headers).  I gave up researching that
and put the definitions I needed into the code.

Attachment

pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: GUC values - recommended way to declare the C variables?
Next
From: Amit Kapila
Date:
Subject: Re: Logical WAL sender unresponsive during decoding commit