Re: stat() vs ERROR_DELETE_PENDING, round N + 1 - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: stat() vs ERROR_DELETE_PENDING, round N + 1
Date
Msg-id CA+hUKGKj3p+2AciBGacCf_cXE0JLCYevWHexvOpK6uL1+V-zag@mail.gmail.com
Whole thread Raw
In response to Re: stat() vs ERROR_DELETE_PENDING, round N + 1  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Mon, Sep 6, 2021 at 9:44 AM Andres Freund <andres@anarazel.de> wrote:
> Is it guaranteed, or at least reliable, that the status we fetch with
> RtlGetLastNtStatus is actually from the underlying filesystem operation,
> rather than some other work that happens during the win32->nt translation?
> E.g. a memory allocation or such? Presumably most of such work happens before
> the actual nt "syscall", but ...

I don't know.  I know at least that it's thread-local, so that's
something.  I guess it's plausible that CreateFile() might want to
free a temporary buffer that it used for conversion to NT pathname
format, and whatever code it uses to do that might clobber the NT
status.  Nothing like that seems to happen in common cases though, and
I guess it would also be clobbered on success.  Frustrating.

Alright then, here also is the version that bypasses CreateFile() and
goes straight to NtCreateFile().  This way, the status can't possibly
be clobbered before we see it, but maybe there are other risks due to
using a much wider set of unstable ntdll interfaces...

Both versions pass all tests on CI, including the basebackup one in a
scenario where an unlinked file has an open descriptor, but still need
a bit more tidying.

> "The behavior changed in recent releases of Windows 10 -- without notice
> AFAIK. DeleteFileW now tries to use POSIX semantics if the filesystem supports
> it. NTFS does."

Nice find.  I wonder if this applies also to rename()...

> >  #ifndef FRONTEND
> > -     Assert(pgwin32_signal_event != NULL);   /* small chance of pg_usleep() */
> > +     /* XXX When called by stat very early on, this fails! */
> > +     //Assert(pgwin32_signal_event != NULL); /* small chance of pg_usleep() */
> >  #endif
>
> Perhaps we should move the win32 signal initialization into StartupHacks()?
> There's some tension around it using ereport(), and MemoryContextInit() only
> being called a tad later, but that seems resolvable.

The dependencies among open(), pg_usleep(),
pgwin32_signal_initialize() and read_backend_variables() are not very
nice.  I don't have a fix for that yet.

> > +      * XXX Think about fd pressure, since we're opening an fd?

> If I understand https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/getmaxstdio?view=msvc-160
> etc correctly, it looks like there is. But only at the point we do
> _open_osfhandle(). So perhaps we should a pgwin32_open() version returning a
> handle and make pgwin32_open() a wrapper around that?

Yeah.  Done, in both variants.

I haven't tried it, but I suspect the difference between stat() and
lstat() could be handled with FILE_OPEN_REPARSE_POINT (as
NtCreateFile() calls it) or FILE_FLAG_OPEN_REPARSE_POINT (as
CreateFile() calls it).

Attachment

pgsql-hackers by date:

Previous
From: gkokolatos@pm.me
Date:
Subject: Re: psql: \dl+ to list large objects privileges
Next
From: Fujii Masao
Date:
Subject: Re: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)