Re: Cleaning up historical portability baggage - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Cleaning up historical portability baggage
Date
Msg-id CA+hUKGJa3c7YSy8kM7HMOFEjV5sTQS7bXUjuQYYGa_bfmtp46g@mail.gmail.com
Whole thread Raw
In response to Re: Cleaning up historical portability baggage  (Andres Freund <andres@anarazel.de>)
Responses Re: Cleaning up historical portability baggage
List pgsql-hackers
On Fri, Aug 12, 2022 at 5:14 AM Andres Freund <andres@anarazel.de> wrote:
> On 2022-08-11 10:52:51 -0400, Tom Lane wrote:
> > Thomas Munro <thomas.munro@gmail.com> writes:
> > > The most interesting things to say about these ones are:
> > >  * The concept of a no-Unix-socket build is removed.  We should be
> > > able to do that now, right?  Peter E seemed to say approximately that
> > > in the commit message for 797129e5.  Or is there a thought that a new
> > > operating system might show up that doesn't have 'em and we'd wish
> > > we'd kept this stuff well marked out?
> >
> > I'm kind of down on removing that.  I certainly think it's premature
> > to do so today, when we haven't even yet shipped a release that
> > assumes we can always define it on Windows

We already do assume that on Windows though.  I assume it just bombs
out with an address-family-not-recognized or similar error if you try
to use it on old Windows?  It compiles fine because there aren't
any new APIs, and winsock.h has always had the AF_UNIX macro (and many
others probably copied-and-pasted from BSD sys/socket.h, the numbers
seem to match for the first 16 AF_XXXs), and we supply our own
sockaddr_un.

About that struct, some versions of Visual Studio and MinGW didn't
have the header for it when this went in[1].  MingGW 10, released in
April 2022, has gained it[2].  I think it's reasonable to support only
the "current" MinGW (it's just too niche to have a support window
wider than "right now" IMHO), so it'd be OK to rely on that for PG16?
That would leave Visual Studio.  Does anyone know if our recent
de-cluttering efforts have fixed that problem yet?  If that turns out
to be true, I'd propose to keep the <sys/un.h> header I added here,
but change it to simply #include <afunix.h>.  But that's probably too
hopeful... If we could find out which MSVC
version/SDK/whatever-you-call-it is the first not to need it, it'd be
great to put that in a comment at least, for future garbage collection
work...

(As for why we ever had a configure check for something as basic as
Unix sockets, it looks like QNX and very old Cygwin didn't have them?
Perhaps one or both also didn't even define AF_UNIX, explaining the
IS_AF_UNIX() macro we used to have?)

(Thinking about the standard... I wonder if Windows would have got
this facility sooner if POSIX's attempt to rename AF_UNIX to the more
portable-sounding AF_LOCAL had succeeded...  I noticed the Stevens
networking book has a note that POSIX had just decided on AF_LOCAL,
but current POSIX standards make no mention of it, so something went
wrong somewhere along the way there...)

> I think what might be good next step is to have tests default to using unix
> sockets on windows, rather than requiring PG_TEST_USE_UNIX_SOCKETS. The
> pg_regress.c and Utils.pm hunks disabling it by default on windows don't
> really make sense anymore.

+1, obviously separate from this cleanup stuff, but, yeah, from PG16
onwards it seems to be legit to assume that they're available AFAIK.

> I don't really know what to do about the warnings around remove_temp() and
> trapsig(). I think we actually may be overreading the restrictions. To me the
> documented restrictions read more like a high-level-ish explanation of what's
> safe in a signal handler and what not. And it seems to not have caused a
> problem on windows on several thousand CI cycles, including plenty failures.
>
> Alternatively we could just default to putting the socketdir inside the data
> directory on windows - I *think* windows doesn't have strict path length
> limits for the socket location. If the socket dir isn't in some global temp
> directory, we don't really need signal_remove_temp, given that we're ok with
> leaving the much bigger data directory around.  The socket directory
> determination doesn't really work on windows right now anyway, one manually
> has to set a temp directory as TMPDIR isn't normally set on windows, and /tmp
> doesn't exist.

Re: length: No Windows here but I spent some time today playing ping
pong with little stand-alone test programs on CI, and determined that
it chomps paths at 108 despite succeeding with longer paths.  I could
successfully bind(), but directory listings showed the truncation, and
if I tried to make and bind two sockets with the same 108-character
prefix, the second would fail with address-in-use.  Is that enough to
be practical?

One thing I noticed is that it is implemented as reparse points.  We
might need to tweak some of our stat/dirent stuff to be more careful
if we start putting sockets in places that might be encountered by
that stuff.  (The old pgwin32_is_junction() I recently removed would
have returned true for a socket; the newer get_dirent_type() would
return PGFILETYPE_REG right now because it examines reparse points
slightly more carefully, but that's still the wrong answer; for
comparison, on Unix you'd get PGFILETYPE_UNKNOWN for a DT_SOCK because
we didn't handle it).

There seems to be conflicting information out there about whether
"abstract" sockets work (the Linux extension to AF_UNIX where sun_path
starts with a NUL byte and they don't appear in the filesystem, and
they automatically go away when all descriptors are closed).  I
couldn't get it to work but I might be doing it wrong... information
is scant.

[1] https://www.postgresql.org/message-id/88ae9594-6177-fa3c-0061-5bf8f8044b21%402ndquadrant.com
[2] https://github.com/mingw-w64/mingw-w64/blob/v10.0.0/mingw-w64-headers/include/afunix.h



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Expand palloc/pg_malloc API
Next
From: Thomas Munro
Date:
Subject: Re: Cleaning up historical portability baggage