Thread: Cleaning up historical portability baggage

Cleaning up historical portability baggage

From
Thomas Munro
Date:
Hello,

I wonder how much dead code for ancient operating systems we could now
drop.  Here are some easier cases, I think, and one tricky one that
might take some debate.  I think it makes a lot of sense to say that
we expect at least POSIX-1:2001, because that corresponds to C99, with
the thread option because every targeted system has that.

0001-Remove-dead-pg_pread-and-pg_pwrite-replacement-code.patch
0002-Remove-dead-getrusage-replacement-code.patch
0003-Remove-dead-setenv-unsetenv-replacement-code.patch
0004-Remove-dead-handling-for-pre-POSIX-sigwait.patch
0005-Remove-dead-getpwuid_r-replacement-code.patch
0006-Remove-disable-thread-safety.patch

Clearly there is more stuff like this (eg more _r functions, they're
just a touch more complicated), but this is a start.  I mention these
now in case it's helpful for the Meson work, and just generally
because I wanted to clean up after the retirement of ancient HP-UX.
The threads patch probably needs more polish and is extracted from
another series I'll propose in a later CF to do some more constructive
work on threads where it'd be helpful not to have to deal with 'no
threads' builds, but I figured it could also pitch this part along
with the other basic POSIX modernisation stuff.

I pulled the configure output from the oldest releases of each
supported target OS, namely:

  * hornet, AIX 7.1
  * wrasse, Solaris 11.3
  * pollock, illumos rolling
  * loach, FreeBSD 12.2
  * conchuela, DragonflyBSD 6.0
  * morepork, OpenBSD 6.9
  * sidewinder, NetBSD 9.2
  * prairiedog, macOS 10.4 (vintage system most likely to cause problems)
  * clam, Linux 3.10/RHEL 7
  * fairiewren, Windows/Mingw with configure

I checked for HAVE_PREAD HAVE_PWRITE HAVE_GETRUSAGE HAVE_SETENV
HAVE_UNSETENV HAVE_GETPWUID_R, and the only missing ones were:

HAVE_PREAD is missing on windows
HAVE_PWRITE is missing on windows
HAVE_GETRUSAGE is missing on windows
HAVE_GETPWUID_R is missing on windows

We either have completely separate code paths or replacement functions
for these.

The pwritev/preadv functions are unfortunately not standardised by
POSIX (I dunno why, it's the obvious combination of the p* and *v
functions) despite every OS in the list having them except for Solaris
and old macOS.  Oh well.

Attachment

Re: Cleaning up historical portability baggage

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> I wonder how much dead code for ancient operating systems we could now
> drop.

+1, it seems like this is the cycle for some housecleaning.

>   * prairiedog, macOS 10.4 (vintage system most likely to cause problems)

FWIW, I am expecting to retire prairiedog once the meson stuff drops.
macOS 10.4 is incapable of running ninja (for lack of <spawn.h>).
While I could keep it working for awhile with the autoconf build system,
I'm not sure I see the point.  The hardware is still chugging along,
but I think it'd be more useful to run up-to-date NetBSD or the like
on it.

Having said that, I'll be happy to try out this patch series on
that platform and see if it burps.

            regards, tom lane



Re: Cleaning up historical portability baggage

From
Tom Lane
Date:
I wrote:
> Having said that, I'll be happy to try out this patch series on
> that platform and see if it burps.

HEAD + patches 0001-0006 seems fine on prairiedog's host.
Builds clean (or as clean as HEAD does anyway), passes make check.
I did not trouble with check-world.

(I haven't actually read the patches, so this isn't a review,
just a quick smoke-test.)

            regards, tom lane



Re: Cleaning up historical portability baggage

From
Greg Stark
Date:
On Sat, 9 Jul 2022 at 21:46, Thomas Munro <thomas.munro@gmail.com> wrote:
>
> Hello,
>
> I wonder how much dead code for ancient operating systems we could now
> drop.

> 0002-Remove-dead-getrusage-replacement-code.patch

I thought the getrusage replacement code was for Windows. Does
getrusage on Windows actually do anything useful?

More generally I think there is a question about whether some of these
things are "supported" in only a minimal way to satisfy standards but
maybe not in a way that we actually want to use. Getrusage might exist
on Windows but not actually report the metrics we need, reentrant
library functions may be implemented by simply locking instead of
actually avoiding static storage, etc.

-- 
greg



Re: Cleaning up historical portability baggage

From
Greg Stark
Date:
(Reading the patch it seems both those points are already addressed)



Re: Cleaning up historical portability baggage

From
Robert Haas
Date:
On Sat, Jul 9, 2022 at 9:46 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> The pwritev/preadv functions are unfortunately not standardised by
> POSIX (I dunno why, it's the obvious combination of the p* and *v
> functions) despite every OS in the list having them except for Solaris
> and old macOS.  Oh well.

I don't think that 0001 is buying us a whole lot, really. I prefer the
style where we have PG-specific functions that behave differently on
different platforms to the one where we call something that looks like
a native OS function call on all platforms but on some of them it is
secretly invoking a replacement implementation in src/port. The
problem with the latter is it looks like you're using something that's
universally supported and works the same way everywhere, but you're
really not. If it were up to me, we'd have more pg_whatever() that
calls whatever() on non-Windows and something else on Windows, rather
than going in the direction that this patch takes us.

I like all of the other patches. Reducing the number of configure
tests that we need seems like a really good idea.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Tue, Jul 12, 2022 at 4:46 AM Robert Haas <robertmhaas@gmail.com> wrote:
> I don't think that 0001 is buying us a whole lot, really. I prefer the
> style where we have PG-specific functions that behave differently on
> different platforms to the one where we call something that looks like
> a native OS function call on all platforms but on some of them it is
> secretly invoking a replacement implementation in src/port. The
> problem with the latter is it looks like you're using something that's
> universally supported and works the same way everywhere, but you're
> really not. If it were up to me, we'd have more pg_whatever() that
> calls whatever() on non-Windows and something else on Windows, rather
> than going in the direction that this patch takes us.

Hmm, but that's not what we're doing in general.  For example, on
Windows we're redirecting open() to a replacement function of our own,
we're not using "pg_open()" in our code.  That's not an example based
on AC_REPLACE_FUNCS, but there are plenty of those too.  Isn't this
quite well established?

AFAIK we generally only use pg_whatever() when there's a good reason,
such as an incompatibility, a complication or a different abstraction
that you want to highlight to a reader.  The reason here was
temporary: we couldn't implement standard pread/pwrite perfectly on
ancient HP-UX, but we *can* implement it on Windows, so the reason is
gone.

These particular pg_ prefixes have only been in our tree for a few
years and I was hoping to boot them out again before they stick, like
"Size".  I like using standard interfaces where possible for the very
basic stuff, to de-weird our stuff.

> I like all of the other patches. Reducing the number of configure
> tests that we need seems like a really good idea.

Thanks for looking.  Yeah, we could also be a little more aggressive
about removing configure tests, in the cases where it's just Windows
vs !Windows.  "HAVE_XXX" tests that are always true on POSIX systems
at the level we require would then be unnecessary.



Re: Cleaning up historical portability baggage

From
Peter Eisentraut
Date:
On 12.07.22 03:10, Thomas Munro wrote:
> AFAIK we generally only use pg_whatever() when there's a good reason,
> such as an incompatibility, a complication or a different abstraction
> that you want to highlight to a reader.  The reason here was
> temporary: we couldn't implement standard pread/pwrite perfectly on
> ancient HP-UX, but we*can*  implement it on Windows, so the reason is
> gone.
> 
> These particular pg_ prefixes have only been in our tree for a few
> years and I was hoping to boot them out again before they stick, like
> "Size".  I like using standard interfaces where possible for the very
> basic stuff, to de-weird our stuff.

I agree.  That's been the established approach.



Re: Cleaning up historical portability baggage

From
Robert Haas
Date:
On Mon, Jul 11, 2022 at 9:11 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> Hmm, but that's not what we're doing in general.  For example, on
> Windows we're redirecting open() to a replacement function of our own,
> we're not using "pg_open()" in our code.  That's not an example based
> on AC_REPLACE_FUNCS, but there are plenty of those too.  Isn't this
> quite well established?

Yes. I just don't care for it.

Sounds like I'm in the minority, though.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Cleaning up historical portability baggage

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Jul 11, 2022 at 9:11 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>> Hmm, but that's not what we're doing in general.  For example, on
>> Windows we're redirecting open() to a replacement function of our own,
>> we're not using "pg_open()" in our code.  That's not an example based
>> on AC_REPLACE_FUNCS, but there are plenty of those too.  Isn't this
>> quite well established?

> Yes. I just don't care for it.
> Sounds like I'm in the minority, though.

I concur with your point that it's not great to use the standard name
for a function that doesn't have exactly the standard semantics.
But if it does, using a nonstandard name is not better.  It's just one
more thing that readers of our code have to learn about.

Note that "exactly" only needs to mean "satisfies all the promises
made by POSIX".  If some caller is depending on behavior details not
specified in the standard, that's the caller's bug not the wrapper
function's.  Otherwise, yeah, we couldn't ever be sure whether a
wrapper function is close enough.

            regards, tom lane



Re: Cleaning up historical portability baggage

From
Andres Freund
Date:
Hi,

On 2022-07-12 08:01:40 -0400, Robert Haas wrote:
> On Mon, Jul 11, 2022 at 9:11 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > Hmm, but that's not what we're doing in general.  For example, on
> > Windows we're redirecting open() to a replacement function of our own,
> > we're not using "pg_open()" in our code.  That's not an example based
> > on AC_REPLACE_FUNCS, but there are plenty of those too.  Isn't this
> > quite well established?
> 
> Yes. I just don't care for it.
> 
> Sounds like I'm in the minority, though.

I agree with you, at least largely.

Redefining functions, be it by linking in something or by redefining function
names via macros, is a mess. There's places where we then have to undefine
some of these things to be able to include external headers etc. Some
functions are only replaced in backends, others in frontend too. It makes it
hard to know what exactly the assumed set of platform primitives is. Etc.

Greetings,

Andres Freund



Re: Cleaning up historical portability baggage

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> Redefining functions, be it by linking in something or by redefining function
> names via macros, is a mess. There's places where we then have to undefine
> some of these things to be able to include external headers etc. Some
> functions are only replaced in backends, others in frontend too. It makes it
> hard to know what exactly the assumed set of platform primitives is. Etc.

In the cases at hand, we aren't doing that, are we?  The replacement
function is only used on platforms that lack the relevant POSIX function,
so it's hard to argue that we're replacing anything.

            regards, tom lane



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
I have committed the first few:

 * "Remove dead getrusage replacement code."
 * "Remove dead handling for pre-POSIX sigwait()."
 * "Remove dead getpwuid_r replacement code."

Here are some more, a couple of which I posted before but I've now
gone a bit further with them in terms of removing configure checks
etc:

 * "Remove dlopen configure probe."
 * "Remove configure probe and extra tests for getrlimit."
 * "Remove configure probe for shm_open."
 * "Remove configure probe for setsid."
 * "Remove configure probes for readlink, and dead code and docs."
 * "Remove configure probe for symlink, and dead code."
 * "Remove configure probe for link."
 * "Remove dead replacement code for clock_gettime()."
 * "Remove configure probes for poll and poll.h."
 * "Remove dead setenv, unsetenv replacement code."
 * "Remove dead pread and pwrite replacement code."
 * "Simplify replacement code for preadv and pwritev."
 * "Remove --disable-thread-safety."

Some of these depend on SUSv2 options (not just "base"), but we
already do that (fsync, ...) and they're all features that are by now
ubiquitous, which means the fallback code is untested and the probes
are pointless.

<archeology-mode>I'd guess the last system we ran on that didn't have
symlinks would have been SVr3-based SCO, HP-UX, DG/UX etc from the
1980s, since they were invented in 4.2BSD in 1983 and adopted by SVr4
in 1988.  The RLIMIT_OFILE stuff seems to be referring to 1BSD or 2BSD
on a PDP, whereas RLIMIT_NOFILE was already used in 4.3BSD in 1986,
which I'd have guessed would be the oldest OS POSTGRES ever actually
ran on, so that must have been cargo culting from something older?</>

The clock_gettime() one only becomes committable once prairiedog's
host switched to NetBSD, so that'll be committed at the same time as
the fdatasync one from a nearby thread.

The setenv/unsetenv one levels up to SUSv3 (= POSIX issue 6, 2001).
That'd be the first time we don't point at SUSv2 (= POSIX issue 5,
199x) to justify a change like this.

I expect there to be further clean-up after the removal of
--disable-thread-safety.

Attachment

Re: Cleaning up historical portability baggage

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> Some of these depend on SUSv2 options (not just "base"), but we
> already do that (fsync, ...) and they're all features that are by now
> ubiquitous, which means the fallback code is untested and the probes
> are pointless.

Reading this, it occurred to me that it'd be interesting to scrape
all of the latest configure results from the buildfarm, and see which
tests actually produce more than one answer among the set of tested
platforms.  Those that don't could be targets for further simplification,
or else an indicator that we'd better go find some more animals.

Before I go off and do that, though, I wonder if you already did.

            regards, tom lane



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Sun, Jul 24, 2022 at 11:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > Some of these depend on SUSv2 options (not just "base"), but we
> > already do that (fsync, ...) and they're all features that are by now
> > ubiquitous, which means the fallback code is untested and the probes
> > are pointless.
>
> Reading this, it occurred to me that it'd be interesting to scrape
> all of the latest configure results from the buildfarm, and see which
> tests actually produce more than one answer among the set of tested
> platforms.  Those that don't could be targets for further simplification,
> or else an indicator that we'd better go find some more animals.
>
> Before I go off and do that, though, I wonder if you already did.

Yeah, here are the macros I scraped yesterday, considering the latest
results from machines that did something in the past week.

Attachment

Re: Cleaning up historical portability baggage

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> Here are some more, a couple of which I posted before but I've now
> gone a bit further with them in terms of removing configure checks
> etc:

After looking through these briefly, I'm pretty concerned about
whether this won't break our Cygwin build in significant ways.
For example, lorikeet reports "HAVE_SETSID 1", a condition that
you want to replace with !WIN32.  The question here is whether
or not WIN32 is defined in a Cygwin build.  I see some places
in our code that believe it is not, but others that believe that
it is --- and the former ones are mostly like
    #if defined(__CYGWIN__) || defined(WIN32)
which means they wouldn't actually fail if they are wrong about that.

More generally, I'm not exactly convinced that changes like
this are a readability improvement:

-#ifdef HAVE_SETSID
+#ifndef WIN32

I'd rather not have the code cluttered with a sea of
indistinguishable "#ifndef WIN32" tests when some of them could be
more specific and more mnemonic.  So I think we'd be better off
leaving that as-is.  I don't mind nuking the configure-time test
and hard-wiring "#define HAVE_SETSID 1" somewhere, but changing
the code's #if tests doesn't seem to bring any advantage.

Specific to 0001, I don't especially like what you did to
src/port/dlopen.c.  The original intent (and reality until
not so long ago) was that that would be a container for
various dlopen replacements.  Well, okay, maybe there will
never be any more besides Windows, but I think then we should
either rename the file to (say) win32dlopen.c or move it to
src/backend/port/win32.  Likewise for link.c in 0007 and
pread.c et al in 0011.  (But 0010 is fine, because the
replacement code is already handled that way.)

OTOH, 0012 seems to immediately change pread.c et al back
to NOT being Windows-only, though it's hard to tell for
sure because the configure support seems all wrong.
I'm quite confused by those two patches ... are they really
correct?

            regards, tom lane



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Sun, Jul 24, 2022 at 12:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> After looking through these briefly, I'm pretty concerned about
> whether this won't break our Cygwin build in significant ways.
> For example, lorikeet reports "HAVE_SETSID 1", a condition that
> you want to replace with !WIN32.  The question here is whether
> or not WIN32 is defined in a Cygwin build.  I see some places
> in our code that believe it is not, but others that believe that
> it is --- and the former ones are mostly like
>         #if defined(__CYGWIN__) || defined(WIN32)
> which means they wouldn't actually fail if they are wrong about that.

I spent a large chunk of today figuring out how to build PostgreSQL
under Cygwin/GCC on CI.  My method for answering this question was to
put the following on the end of 192 .c files that contain the pattern
/#if.*WIN32/:

+
+#if defined(WIN32) && defined(__CYGWIN__)
+#pragma message "contradiction"
+#endif

Only one of them printed that message: dirmod.c.  The reason is that
it goes out of its way to include Windows headers:

#if defined(WIN32) || defined(__CYGWIN__)
#ifndef __CYGWIN__
#include <winioctl.h>
#else
#include <windows.h>
#include <w32api/winioctl.h>
#endif
#endif

The chain <windows.h> -> <windef.h> -> <minwindef.h> leads to WIN32 here:

https://github.com/mirror/mingw-w64/blob/master/mingw-w64-headers/include/minwindef.h#L15

I'm left wondering if we should de-confuse matters by ripping out all
the checks and comments that assume that this problem is more
widespread, and then stick a big notice about it in dirmod.c, to
contain this Jekyll/Hide situation safely inside about 8 feet of
concrete.

I'll respond to your other complaints with new patches tomorrow.



Re: Cleaning up historical portability baggage

From
Andrew Dunstan
Date:
On 2022-07-25 Mo 10:35, Thomas Munro wrote:
> On Sun, Jul 24, 2022 at 12:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> After looking through these briefly, I'm pretty concerned about
>> whether this won't break our Cygwin build in significant ways.
>> For example, lorikeet reports "HAVE_SETSID 1", a condition that
>> you want to replace with !WIN32.  The question here is whether
>> or not WIN32 is defined in a Cygwin build.  I see some places
>> in our code that believe it is not, but others that believe that
>> it is --- and the former ones are mostly like
>>         #if defined(__CYGWIN__) || defined(WIN32)
>> which means they wouldn't actually fail if they are wrong about that.
> I spent a large chunk of today figuring out how to build PostgreSQL
> under Cygwin/GCC on CI.  My method for answering this question was to
> put the following on the end of 192 .c files that contain the pattern
> /#if.*WIN32/:
>
> +
> +#if defined(WIN32) && defined(__CYGWIN__)
> +#pragma message "contradiction"
> +#endif
>
> Only one of them printed that message: dirmod.c.  The reason is that
> it goes out of its way to include Windows headers:
>
> #if defined(WIN32) || defined(__CYGWIN__)
> #ifndef __CYGWIN__
> #include <winioctl.h>
> #else
> #include <windows.h>
> #include <w32api/winioctl.h>
> #endif
> #endif
>
> The chain <windows.h> -> <windef.h> -> <minwindef.h> leads to WIN32 here:
>
> https://github.com/mirror/mingw-w64/blob/master/mingw-w64-headers/include/minwindef.h#L15
>
> I'm left wondering if we should de-confuse matters by ripping out all
> the checks and comments that assume that this problem is more
> widespread, and then stick a big notice about it in dirmod.c, to
> contain this Jekyll/Hide situation safely inside about 8 feet of
> concrete.


Clearly it's something we've been aware of before, port.h has:


* Note: Some CYGWIN includes might #define WIN32.
 */
#if defined(WIN32) && !defined(__CYGWIN__)
#include "port/win32_port.h"
#endif


I can test any patches you want on lorikeet.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Sun, Jul 24, 2022 at 12:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > Here are some more, a couple of which I posted before but I've now
> > gone a bit further with them in terms of removing configure checks
> > etc:
>
> After looking through these briefly, I'm pretty concerned about
> whether this won't break our Cygwin build in significant ways.
> For example, lorikeet reports "HAVE_SETSID 1", a condition that
> you want to replace with !WIN32.  The question here is whether
> or not WIN32 is defined in a Cygwin build. ...

No, it should not be unless someone screws up and leaks <windows.h>
into a header when WIN32 isn't already defined.  I've done some
analysis and testing of that, and proposed to nail it down a bit and
remove the confusion created by the inconsistent macro tests, over at
[1].

> More generally, I'm not exactly convinced that changes like
> this are a readability improvement:
>
> -#ifdef HAVE_SETSID
> +#ifndef WIN32
>
> I'd rather not have the code cluttered with a sea of
> indistinguishable "#ifndef WIN32" tests when some of them could be
> more specific and more mnemonic.  So I think we'd be better off
> leaving that as-is.  I don't mind nuking the configure-time test
> and hard-wiring "#define HAVE_SETSID 1" somewhere, but changing
> the code's #if tests doesn't seem to bring any advantage.

OK, in this version of the patch series I did this:

1.  If it's something that only Unix has, and for Windows we do
nothing or skip a feature, then I've now hard-wired the macro as you
suggested.  I put that in port.h.  I agree that's a little easier to
grok than no-context !defined(WIN32).  Examples: HAVE_SETSID,
HAVE_SHM_OPEN.

2.  If it's something that Unix has and we supply a Windows
replacements, and we just can't cope without that function, then I
didn't bother with a vestigial macro.  There generally weren't tests
for such things already (mostly stuff auto-generated by
AC_REPLACE_FUNCS).  Example: HAVE_LINK.

3.  In the special case of symlink() and readlink(), I defined the
macros on Unix even though we also have replacements on Windows.
(Previously we effectively did that for one but not the other...)  My
idea here is that, wherever we are OK using our pseudo-symlinks made
from junction points (ie for tablespaces), then we should just go
ahead and use them without testing.  But in just a couple of places
where fully compliant symlinks are clearly expected (ie non-directory
or relative path, eg tz file or executable symlinks), then the tests
can still be used.  See also commit message.  Does this make sense?

(I also propose to supply S_ISLNK and lstat() for Windows and make
usage of that stuff unconditional, but I put that in another
thread[2], as that's new code, and this thread is just about ripping
old dead stuff out.)

4.  If it's something that already had very obvious Windows and Unix
code paths, then I didn't bother with a HAVE_XXX macro, because I
think it'd be more confusing than just #ifdef WIN32 ...windows stuff
... #else ...unix stuff... #endif.  Example: HAVE_CLOCK_GETTIME.

> Specific to 0001, I don't especially like what you did to
> src/port/dlopen.c.  The original intent (and reality until
> not so long ago) was that that would be a container for
> various dlopen replacements.  Well, okay, maybe there will
> never be any more besides Windows, but I think then we should
> either rename the file to (say) win32dlopen.c or move it to
> src/backend/port/win32.  Likewise for link.c in 0007 and
> pread.c et al in 0011.  (But 0010 is fine, because the
> replacement code is already handled that way.)

Agreed on the file names win32dlopen.c, win32link.c, win32pread.c,
win32pwrite.c, and done.

Another characteristic of other Windows-only replacement code is that
it's called pgwin32_THING and then a macro replaces THING() with
pgwin32_THING().  I guess I should do that too, for consistency, and
move relevant declarations into win32_port.h?  Done.

There are clearly many other candidates for X.c -> win32X.c renamings
by the same only-for-Windows argument, but I haven't touched those (at
least dirent.c, dirmod.c, gettimeofday.c, kill.c, open.c, system.c).

I'll also include the fdatasync configure change here (moved from
another thread).  Now it also renames fdatasync.c -> win32datasync.c.
Erm, but I didn't add the pgwin32_ prefix to the function name,
because it shares a function declaration with macOS in c.h.

> OTOH, 0012 seems to immediately change pread.c et al back
> to NOT being Windows-only, though it's hard to tell for
> sure because the configure support seems all wrong.
> I'm quite confused by those two patches ... are they really
> correct?

The 0012 patch (now 0011 in v2) is about the variants with -v on the
end.  The patches are as I intended.  I've now put a longer
explanation into the commit message, but here's a short recap:

pread()/pwrite() replacements (without 'v' for vector) are now needed
only by Windows.  HP-UX < 11 was the last Unix system I knew of
without these functions.  That makes sense, as I think they were
related to the final POSIX threading push (multi-threaded programs
want to be able to skip file position side-effects), which HP-UX 10.x
predated slightly.  Gaur's retirement unblocked this cleanup.

preadv()/pwritev() replacements (with 'v' for vector) are needed by
Solaris, macOS < 11 and Windows, and will likely be required for a
long time, because these functions still haven't been standardised.
My goal is to make our replacement code side-effect free, thread-safe,
in line with the de facto standard/convention seen on Linux, *BSD,
macOS, AIX, illumos.

Note that I have some better vector I/O code for Windows to propose a
bit later, so the real effect of this choice will be to drop true
vector I/O on Solaris, until such time as they get around to providing
the modern interface that almost every other Unix managed to agree on.

[1]
https://www.postgresql.org/message-id/flat/CA%2BhUKG%2Be13wK0PBX5Z63CCwWm7MfRQuwBRabM_3aKWSko2AUww%40mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/CA+hUKGLfOOeyZpm5ByVcAt7x5Pn-=xGRNCvgiUPVVzjFLtnY0w@mail.gmail.com

Attachment

Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
Sorry, I know this is really tedious stuff, but I found a couple more
cleanup opportunities nearby:

For dlopen(), we don't have to worry about old Unix systems without
RTLD_NOW and RTLD_GLOBAL macros anymore.  They're in SUSv2 and present
on all our BF Unixes, so that's some more configure probes that we can
remove.  Also, we might as well move the declarations for everything
relating to dlopen into win32_port.h.

(Why some WIN32 things are done in port.h while others are done in
win32_port.h is something I don't grok; probably depends whether there
were ever non-Windows systems that needed something... I might propose
to tidy that up some more, later...)

For setenv()/unsetenv(), I removed the declarations from port.h.  Only
the ones in win32_port.h are needed now.

I fixed a couple of places where I'd renamed a file but forgotten to
update that IDENTIFICATION section from the CVS days, and a stray 2021
copyright year.

It'd be good to find a new home for pg_get_user_name() and
pg_get_user_home_dir(), which really shouldn't be left in the now
bogusly named src/port/thread.c.  Any suggestions?

Was it a mistake to add pgwin32_ prefixes to some Windows replacement
functions?  It seems a little arbitrary that we do that sometimes even
though we don't need to.  Perhaps we should only do that kind of thing
when we want to avoid name-clash with a real Windows function that
we're replacing?

I'd like to push these soon, if there are no further objections.  If
you prefer, I could hold back on the two that will break prairiedog
until you give the word, namely clock_gettime() and fdatasync().

Attachment

Re: Cleaning up historical portability baggage

From
Andres Freund
Date:
Hi,

Another potential cleanup is the fallback for strtoll/strtoull. Some of the
spellings were introduced because of "ancient HPUX":

commit 06f66cff9e0b93db81db1595156b2aff8ba1786e
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   2018-05-19 14:22:18 -0400

    Support platforms where strtoll/strtoull are spelled __strtoll/__strtoull.

    Ancient HPUX, for one, does this.  We hadn't noticed due to the lack
    of regression tests that required a working strtoll.

    (I was slightly tempted to remove the other historical spelling,
    strto[u]q, since it seems we have no buildfarm members testing that case.
    But I refrained.)

    Discussion: https://postgr.es/m/151935568942.1461.14623890240535309745@wrigleys.postgresql.org

and some much longer ago:

commit 9394d391b803c55281879721ea393a50df4a0be6
Author: Peter Eisentraut <peter_e@gmx.net>
Date:   2000-11-20 15:56:14 +0000

    Add configure checks for strtoll, strtoull (or strto[u]q).  Disable
    'long long int' portions of ecpg if the type or these functions don't
    exist.

since strtoq, strtouq apparently were already obsolete in 2018, and hpux is
now obsolete...


I only noticed this because I'd ported the configure check a bit naively,
without the break in the if-found case, and was looking into why HAVE_STRTOQ,
HAVE_STRTOUQ were defined with meson, but not autoconf...
AC_CHECK_FUNCS([strtoll __strtoll strtoq], [break])
AC_CHECK_FUNCS([strtoull __strtoull strtouq], [break])


Greetings,

Andres Freund



Re: Cleaning up historical portability baggage

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> Another potential cleanup is the fallback for strtoll/strtoull.

+1, I suspect the alternate spellings are dead.

            regards, tom lane



Re: Cleaning up historical portability baggage

From
Andres Freund
Date:
Hi,

On 2022-08-03 21:52:04 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Another potential cleanup is the fallback for strtoll/strtoull.
>
> +1, I suspect the alternate spellings are dead.

Looks like that includes systems where there's no declaration for strtoll,
strtoull. The test was introduced in

commit a6228128fc48c222953dfd41fd438522a184054c
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   2018-05-18 22:42:10 -0400

    Arrange to supply declarations for strtoll/strtoull if needed.

The check was introduced for animal dromedary, afaics. Looks like that stopped
reporting 2019-09-27 and transformed into florican.

A query on the bf database didn't see any runs in the last 30 days that didn't
have strtoll declared.

See attached patch.

Greetings,

Andres Freund

Attachment

Re: Cleaning up historical portability baggage

From
Andres Freund
Date:
Hi,

Can we get a few more of these committed soon? It's all tests that I need to
sync with the meson stuff, and I'd rather get it over with :). And it reduces
the set of tests that need to be compared...  Or is there a blocker (leaving
the prairedog one aside)?


On 2022-08-03 14:25:01 +1200, Thomas Munro wrote:
> Subject: [PATCH v3 01/13] Remove configure probe for dlopen.
> Subject: [PATCH v3 02/13] Remove configure probe and extra tests for
>  getrlimit.

LGTM.


> From 96a4935ff9480c2786634e9892b1f44782b403fb Mon Sep 17 00:00:00 2001
> From: Thomas Munro <thomas.munro@gmail.com>
> Date: Sat, 23 Jul 2022 23:49:27 +1200
> Subject: [PATCH v3 03/13] Remove configure probe for shm_open.
>
> shm_open() is in SUSv2 (realtime) and all targeted Unix systems have it.
>
> We retain a HAVE_SHM_OPEN macro, because it's clearer to readers than
> something like !defined(WIN32).

I don't like these. I don't find them clearer - if we really just assume this
to be the case on windows, it's easier to understand the checks if they talk
about windows rather than having to know whether this specific check just
applies to windows or potentially an unspecified separate set of systems.

But I guess I should complain upthread...


> Subject: [PATCH v3 04/13] Remove configure probe for setsid.

LGTM.



> Subject: [PATCH v3 05/13] Remove configure probes for symlink/readlink, and
>  dead code.

Nice win.

>

> From 143f6917bbc7d8f457d52d02a5fbc79d849744e1 Mon Sep 17 00:00:00 2001
> From: Thomas Munro <thomas.munro@gmail.com>
> Date: Sun, 24 Jul 2022 01:19:05 +1200

> Subject: [PATCH v3 06/13] Remove configure probe for link.
> --- a/src/include/port.h
> +++ b/src/include/port.h
> @@ -402,7 +402,8 @@ extern float pg_strtof(const char *nptr, char **endptr);
>  #define strtof(a,b) (pg_strtof((a),(b)))
>  #endif
>
> -#ifndef HAVE_LINK
> +#ifdef WIN32
> +/* src/port/win32link.c */
>  extern int    link(const char *src, const char *dst);
>  #endif

It bothers me that we have all this windows crap in port.h instead of
win32_port.h. But that's not this patch's fault.


> Subject: [PATCH v3 07/13] Remove dead replacement code for clock_gettime().

Nice.


> XXX This can only be committed once prairedog is decommissioned, because
> macOS 10.4 didn't have clock_gettime().

Maybe put it later in the queue?


> Subject: [PATCH v3 08/13] Remove configure probes for poll and poll.h.
>
> poll() and <poll.h> are in SUSv2 and all targeted Unix systems have
> them.
>
> Retain HAVE_POLL and HAVE_POLL_H macros for readability.  There's an
> error in latch.c that is now unreachable (since logically always have
> one of WIN32 or HAVE_POLL defined), but that falls out of a decision to
> keep using defined(HAVE_POLL) instead of !defined(WIN32) to guard the
> poll() code.

Wonder if we instead should add an empty poll.h to src/include/port/win32?


> Subject: [PATCH v3 09/13] Remove dead setenv, unsetenv replacement code.
> Subject: [PATCH v3 10/13] Remove dead pread and pwrite replacement code.

LGTM.


> Subject: [PATCH v3 11/13] Simplify replacement code for preadv and pwritev.
>
> preadv() and pwritev() are not standardized by POSIX.  Most targeted
> Unix systems have had them for more than a decade, since they are
> obvious combinations of standard p- and -v functions.
>
> In 15, we had two replacement implementations: one based on lseek() + -v
> function if available, and the other based on a loop over p- function.
> They aren't used for much yet, but are heavily used in a current
> proposal.
>
> Supporting two ways of falling back, at the cost of having a
> pg_preadv/pg_pwritev that could never be used in a multi-threaded
> program accessing the same file descriptor from two threads without
> unpleasant locking does not sound like a good trade.
>
> Therefore, drop the lseek()-based variant, and also the pg_ prefix, now
> that the file position portability hazard is gone.  Previously, both
> fallbacks had the file position portability hazard, because our
> pread()/pwrite() replacement had the same hazard, but that problem has
> been fixed for pread()/pwrite() by an earlier commit.  Now the way is
> clear to expunge the file position portability hazard of the
> lseek()-based variants too.
>
> At the time of writing, the following systems in our build farm lack
> native preadv/pwritev and thus use fallback code:
>
>  * Solaris (but not illumos)
>  * macOS before release 11.0
>  * Windows with Cygwin
>  * Windows native
>
> With this commit, all of the above systems will now use the *same*
> fallback code, the one that loops over pread()/pwrite() (which is
> translated to equivalent calls in Windows).  Previously, all but Windows
> native would use the readv()/writev()-based fallback that this commit
> removes.

Given that it's just solaris and old macOS that "benefited" from writev, just
using the "full" fallback there makes sense.


> Subject: [PATCH v3 12/13] Remove fdatasync configure probe.

> @@ -1928,7 +1925,6 @@ if test "$PORTNAME" = "win32"; then
>    AC_CHECK_FUNCS(_configthreadlocale)
>    AC_REPLACE_FUNCS(gettimeofday)
>    AC_LIBOBJ(dirmod)
> -  AC_LIBOBJ(fdatasync)
>    AC_LIBOBJ(getrusage)
>    AC_LIBOBJ(kill)
>    AC_LIBOBJ(open)
> @@ -1936,6 +1932,7 @@ if test "$PORTNAME" = "win32"; then
>    AC_LIBOBJ(win32dlopen)
>    AC_LIBOBJ(win32env)
>    AC_LIBOBJ(win32error)
> +  AC_LIBOBJ(win32fdatasync)
>    AC_LIBOBJ(win32link)
>    AC_LIBOBJ(win32ntdll)
>    AC_LIBOBJ(win32pread)

I like that we might get away from all those "generically" named libobjs that
are hardcoded to be used only on windows...



> From 5bda430998d502a7f8a6fe662a56b63ac374c925 Mon Sep 17 00:00:00 2001
> From: Thomas Munro <thomas.munro@gmail.com>
> Date: Fri, 26 Mar 2021 22:58:06 +1300
> Subject: [PATCH v3 13/13] Remove --disable-thread-safety.
>
> Threads are in SUSv2 and all targeted Unix systems have the option.
> There are no known Unix systems that don't choose to implement the
> threading option, and we're no longer testing such builds.
>
> Future work to improve our use of threads will be simplified by not
> having to cope with a no-threads build option.

Yeha.


>  AC_CHECK_HEADER(pthread.h, [], [AC_MSG_ERROR([
> -pthread.h not found;  use --disable-thread-safety to disable thread safety])])
> +pthread.h not found])])

Is this really needed after these changes?  We probably can't get away from
AX_PTHREAD just yet, but we should be able to rely on pthread.h?


> diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
> index 2a0d08d10f..a4ef016c06 100644
> --- a/src/include/pg_config.h.in
> +++ b/src/include/pg_config.h.in
> @@ -51,10 +51,6 @@
>  /* Define to 1 if you want National Language Support. (--enable-nls) */
>  #undef ENABLE_NLS
>
> -/* Define to 1 to build client libraries as thread-safe code.
> -   (--enable-thread-safety) */
> -#undef ENABLE_THREAD_SAFETY
> -

Might be worth grepping around whether there are extensions that reference
ENABLE_THREAD_SAFETY (e.g. to then error out if not available). If common
enough we could decide to keep it, given that it's pretty reasonable for code
to want to know that across versions?


>  # Add libraries that libpq depends (or might depend) on into the
> diff --git a/src/interfaces/libpq/fe-auth.c b/src/interfaces/libpq/fe-auth.c
> index 49a1c626f6..3504ab2c34 100644
> --- a/src/interfaces/libpq/fe-auth.c
> +++ b/src/interfaces/libpq/fe-auth.c
> @@ -1116,11 +1116,10 @@ pg_fe_getusername(uid_t user_id, PQExpBuffer errorMessage)
>  #endif
>
>      /*
> -     * Some users are using configure --enable-thread-safety-force, so we
> -     * might as well do the locking within our library to protect getpwuid().
> -     * In fact, application developers can use getpwuid() in their application
> -     * if they use the locking call we provide, or install their own locking
> -     * function using PQregisterThreadLock().
> +     * We do the locking within our library to protect getpwuid().  Application
> +     * developers can use getpwuid() in their application if they use the
> +     * locking call we provide, or install their own locking function using
> +     * PQregisterThreadLock().
>       */
>      pglock_thread();

Probably worth using getpwuid_r where available - might even be everywhere
(except windows of course, but GetUserName() is threadsafe).

> --- a/src/port/getaddrinfo.c
> +++ b/src/port/getaddrinfo.c
> @@ -414,7 +414,7 @@ pqGethostbyname(const char *name,
>                  struct hostent **result,
>                  int *herrno)
>  {
> -#if defined(ENABLE_THREAD_SAFETY) && defined(HAVE_GETHOSTBYNAME_R)
> +#if defined(HAVE_GETHOSTBYNAME_R)
>      /*
>       * broken (well early POSIX draft) gethostbyname_r() which returns 'struct

Depressingly there's still plenty systems without gethostbyname_r() :(

curculio, gombessa, longfin, lorikeet, morepork, pollock, sifaka, wrasse ...



Greetings,

Andres Freund



Re: Cleaning up historical portability baggage

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
>> XXX This can only be committed once prairedog is decommissioned, because
>> macOS 10.4 didn't have clock_gettime().

> Maybe put it later in the queue?

clock_gettime is required by SUSv2 (1997), so I have to admit that
macOS 10.4 doesn't have a lot of excuse not to have it.  In any case,
prairiedog is just sitting there doing its thing until I find cycles
to install a newer OS.  If you want to move ahead with this, don't
let prairiedog block you.

            regards, tom lane



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Thu, Aug 4, 2022 at 4:09 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@anarazel.de> writes:
> >> XXX This can only be committed once prairedog is decommissioned, because
> >> macOS 10.4 didn't have clock_gettime().
>
> > Maybe put it later in the queue?
>
> clock_gettime is required by SUSv2 (1997), so I have to admit that
> macOS 10.4 doesn't have a lot of excuse not to have it.  In any case,
> prairiedog is just sitting there doing its thing until I find cycles
> to install a newer OS.  If you want to move ahead with this, don't
> let prairiedog block you.

Thanks, will do.  Just having an argument with MSYS about something I
seem to have messed up in the most recent version, and then I'll start
pushing these...



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Thu, Aug 4, 2022 at 3:43 PM Andres Freund <andres@anarazel.de> wrote:
> > We retain a HAVE_SHM_OPEN macro, because it's clearer to readers than
> > something like !defined(WIN32).
>
> I don't like these. I don't find them clearer - if we really just assume this
> to be the case on windows, it's easier to understand the checks if they talk
> about windows rather than having to know whether this specific check just
> applies to windows or potentially an unspecified separate set of systems.
>
> But I guess I should complain upthread...

Thanks for reviewing.

For this point, I'm planning to commit with those "vestigial" macros
that Tom asked for, and then we can argue about removing them
separately later.



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
I've now pushed all of these except the --disable-thread-safety one,
which I'm still contemplating.  So far all green on the farm (except
known unrelated breakage).  But that's just the same-day animals...



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Thu, Aug 4, 2022 at 2:30 PM Andres Freund <andres@anarazel.de> wrote:
> On 2022-08-03 21:52:04 -0400, Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > Another potential cleanup is the fallback for strtoll/strtoull.
> >
> > +1, I suspect the alternate spellings are dead.
>
> Looks like that includes systems where there's no declaration for strtoll,
> strtoull. The test was introduced in
>
> commit a6228128fc48c222953dfd41fd438522a184054c
> Author: Tom Lane <tgl@sss.pgh.pa.us>
> Date:   2018-05-18 22:42:10 -0400
>
>     Arrange to supply declarations for strtoll/strtoull if needed.
>
> The check was introduced for animal dromedary, afaics. Looks like that stopped
> reporting 2019-09-27 and transformed into florican.
>
> A query on the bf database didn't see any runs in the last 30 days that didn't
> have strtoll declared.
>
> See attached patch.

LGTM.  This is just C99 <stdlib.h> stuff, and my scraped config data
set agrees with your observation.



Re: Cleaning up historical portability baggage

From
Robert Haas
Date:
On Sat, Jul 23, 2022 at 8:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> More generally, I'm not exactly convinced that changes like
> this are a readability improvement:
>
> -#ifdef HAVE_SETSID
> +#ifndef WIN32
>
> I'd rather not have the code cluttered with a sea of
> indistinguishable "#ifndef WIN32" tests when some of them could be
> more specific and more mnemonic.

I can see both sides of this issue. On the one hand, if there's a
large chunk of code that's surrounded by #ifndef WIN32, then it might
not be clear to the casual observer that the block of code in question
is working around the lack of setsid() rather than some other
Windows-specific weirdness. Comments can help with that, though. On
the other hand, to me, seeing HAVE_SETSID makes me think that we're
dealing with a configure probe, and that I might have to worry about
UNIX-like systems not having support for that primitive. If it says
WIN32, then I know that's all we're talking about, and that's clearer.

Looking at a HAVE_SETSID specifically, they seem to fall into three
different categories. First, there's a couple of places that look
roughly like this:

#ifdef HAVE_SETSID
        if (setsid() < 0)
                elog(FATAL, "setsid() failed: %m");
#endif

I don't think that changing this to WIN32 would confuse anybody.
Surely it's obvious that the WIN32 test is about setsid(), because
that's the only code in the block.

Then there are a couple of places that look like this:

                /*
                 * If we have setsid(), signal the backend's whole process
                 * group
                 */
#ifdef HAVE_SETSID
                (void) kill(-pid, SIGTERM);
#else
                (void) kill(pid, SIGTERM);
#endif

I think it would be clear enough to adopt a WIN32 test here if we also
adjusted the comment, e.g.  "All non-Windows systems supported process
groups, and on such systems, we want to signal the entire group." But
I think there might be an even better option. On Windows, kill is
anyway getting defined to pgkill, and our version of pgkill is defined
to return EINVAL if you pass it a negative number. Why don't we just
have it change a negative value into a positive one? Then we can drop
all this conditional logic in the callers, who can just do (void)
kill(-pid, SIGWHATEVER) and we should be fine. On a quick look, it
appears to me that every call site that passes non-constant second
argument to kill() would be happy with this change to pgkill().

Finally, there's this sort of thing:

static void
signal_child(pid_t pid, int signal)
{
    if (kill(pid, signal) < 0)
        elog(DEBUG3, "kill(%ld,%d) failed: %m", (long) pid, signal);
#ifdef HAVE_SETSID
    switch (signal)
    {
        case SIGINT:
        case SIGTERM:
        case SIGQUIT:
        case SIGSTOP:
        case SIGKILL:
            if (kill(-pid, signal) < 0)
                elog(DEBUG3, "kill(%ld,%d) failed: %m", (long) (-pid), signal);
            break;
        default:
            break;
    }
#endif
}

So the logic here says that we should send the signal to the child and
then if the signal is in a certain list and HAVE_SETSID is defined,
also send the same signal to the whole process group. Here HAVE_SETSID
is really just a proxy for whether the operating system has a notion
of process groups and, again, it seems OK to change this to a WIN32
test with proper comments. However, here again, I think there might be
a better option. The comment above this function notes that signalling
the process itself (as opposed to other process group members) is
assumed to not cause any problems, which implies that it's not a
desired behavior. So with the above redefinition of pgkill(), we could
rewrite this function like this:

static void
signal_child(pid_t pid, int signal)
{
    switch (signal)
    {
        case SIGINT:
        case SIGTERM:
        case SIGQUIT:
        case SIGSTOP:
        case SIGKILL:
            pid = -pid;
            break;
        default:
            break;
    }
    if (kill(pid, signal) < 0)
        elog(DEBUG3, "kill(%ld,%d) failed: %m", (long) pid, signal);
}

Overall, I don't think it's a great idea to keep all of these
HAVE_WHATEVER macros around if the configure tests are gone. It might
be necessary in the short term to make sure we don't regress the
readability of the code, but I think it would be better to come up
with other techniques for keeping the code readable rather than
relying on the names of these vestigial macros as documentation.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Cleaning up historical portability baggage

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Overall, I don't think it's a great idea to keep all of these
> HAVE_WHATEVER macros around if the configure tests are gone. It might
> be necessary in the short term to make sure we don't regress the
> readability of the code, but I think it would be better to come up
> with other techniques for keeping the code readable rather than
> relying on the names of these vestigial macros as documentation.

Hmm ... I agree with you that the end result could be nicer code,
but what's making it nicer is a pretty substantial amount of human
effort for each and every call site.  Is anybody stepping forward
to put in that amount of work?

My proposal is to leave the call sites alone until someone feels
like doing that sort of detail work.

            regards, tom lane



Re: Cleaning up historical portability baggage

From
Robert Haas
Date:
On Fri, Aug 5, 2022 at 10:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hmm ... I agree with you that the end result could be nicer code,
> but what's making it nicer is a pretty substantial amount of human
> effort for each and every call site.  Is anybody stepping forward
> to put in that amount of work?
>
> My proposal is to leave the call sites alone until someone feels
> like doing that sort of detail work.

My plan was to nerd-snipe Thomas Munro into doing it.[1]

-- 
Robert Haas
EDB: http://www.enterprisedb.com

[1] https://xkcd.com/356/



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Sat, Aug 6, 2022 at 12:01 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Thu, Aug 4, 2022 at 2:30 PM Andres Freund <andres@anarazel.de> wrote:
>> [strtoll cleanup patch]
>
> LGTM.  This is just C99 <stdlib.h> stuff, and my scraped config data
> set agrees with your observation.

I found a couple of explicit references to these macros left in
src/interfaces/ecpg/ecpglib/data.c and src/timezone/private.h.
Removed in the attached, which I'll push a bit later if no objections.

Attachment

Re: Cleaning up historical portability baggage

From
Andres Freund
Date:
Hi,


On 2022-08-06 09:02:32 +1200, Thomas Munro wrote:
> On Sat, Aug 6, 2022 at 12:01 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > On Thu, Aug 4, 2022 at 2:30 PM Andres Freund <andres@anarazel.de> wrote:
> >> [strtoll cleanup patch]
> >
> > LGTM.  This is just C99 <stdlib.h> stuff, and my scraped config data
> > set agrees with your observation.
> 
> I found a couple of explicit references to these macros left in
> src/interfaces/ecpg/ecpglib/data.c and src/timezone/private.h.
> Removed in the attached, which I'll push a bit later if no objections.

Hah, I was about to push it. Thanks for catching these. Happy for you to push
this soon!

Greetings,

Andres Freund



Re: Cleaning up historical portability baggage

From
Andres Freund
Date:
Hi,

On 2022-08-05 14:08:23 -0700, Andres Freund wrote:
> Hah, I was about to push it. Thanks for catching these. Happy for you to push
> this soon!

Thanks. Next in my quest for reducing autoconf vs meson pg_config.h
differences is GETTIMEOFDAY stuff.

HAVE_GETTIMEOFDAY currently is only defined for mingw as the configure test is
gated to windows - that's somewhat weird imo. mingw has had it since at least
2007.  The attached patch makes the gettimeofday() fallback specific to msvc.

I've renamed the file to win32gettimeofday now. I wonder if we should rename
files that are specific to msvc to indicate that? But that's for later.

1-arg gettimeofday() hasn't been around in a *long* while from what I can
see. So I've removed that configure test.

Greetings,

Andres Freund

Attachment

Re: Cleaning up historical portability baggage

From
Andres Freund
Date:
Hi,

On 2022-08-03 14:25:01 +1200, Thomas Munro wrote:
> It'd be good to find a new home for pg_get_user_name() and
> pg_get_user_home_dir(), which really shouldn't be left in the now
> bogusly named src/port/thread.c.  Any suggestions?

Leaving the name aside, the win32 handling of these functions is
embarassing. Both are inside an #ifndef WIN32.

The only caller (in fe-auth.c) of pg_get_user_name() has:
#ifdef WIN32
    if (GetUserName(username, &namesize))
        name = username;
    else if (errorMessage)
        appendPQExpBuffer(errorMessage,
                          libpq_gettext("user name lookup failure: error code %lu\n"),
                          GetLastError());
#else
    if (pg_get_user_name(user_id, pwdbuf, sizeof(pwdbuf)))
        name = pwdbuf;
    else if (errorMessage)
        appendPQExpBuffer(errorMessage, "%s\n", pwdbuf);

the only caller of pg_get_user_home_dir() (path.c) has:

bool
get_home_path(char *ret_path)
{
#ifndef WIN32
    /*
     * We first consult $HOME.  If that's unset, try to get the info from
     * <pwd.h>.
     */
    const char *home;

    home = getenv("HOME");
    if (home == NULL || home[0] == '\0')
        return pg_get_user_home_dir(geteuid(), ret_path, MAXPGPATH);
    strlcpy(ret_path, home, MAXPGPATH);
    return true;
#else
    char       *tmppath;

    /*
     * Note: We use getenv() here because the more modern SHGetFolderPath()
     * would force the backend to link with shell32.lib, which eats valuable
     * desktop heap.  XXX This function is used only in psql, which already
     * brings in shell32 via libpq.  Moving this function to its own file
     * would keep it out of the backend, freeing it from this concern.
     */
    tmppath = getenv("APPDATA");
    if (!tmppath)
        return false;
    snprintf(ret_path, MAXPGPATH, "%s/postgresql", tmppath);
    return true;
#endif
}

How does this make any sort of sense?

Greetings,

Andres Freund



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Sat, Aug 6, 2022 at 12:03 PM Andres Freund <andres@anarazel.de> wrote:
> HAVE_GETTIMEOFDAY currently is only defined for mingw as the configure test is
> gated to windows - that's somewhat weird imo. mingw has had it since at least
> 2007.  The attached patch makes the gettimeofday() fallback specific to msvc.

+1

> I've renamed the file to win32gettimeofday now. I wonder if we should rename
> files that are specific to msvc to indicate that? But that's for later.

+1, I was thinking the same.

> 1-arg gettimeofday() hasn't been around in a *long* while from what I can
> see. So I've removed that configure test.

+1

LGTM.



Re: Cleaning up historical portability baggage

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> I've renamed the file to win32gettimeofday now. I wonder if we should rename
> files that are specific to msvc to indicate that? But that's for later.

+1, but you didn't change the file's own comments containing its name.

            regards, tom lane



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Sat, Aug 6, 2022 at 2:54 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Aug 5, 2022 at 10:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Hmm ... I agree with you that the end result could be nicer code,
> > but what's making it nicer is a pretty substantial amount of human
> > effort for each and every call site.  Is anybody stepping forward
> > to put in that amount of work?
> >
> > My proposal is to leave the call sites alone until someone feels
> > like doing that sort of detail work.
>
> My plan was to nerd-snipe Thomas Munro into doing it.[1]

Alright, well here's a patch for the setsid() stuff following Robert's
plan, which I think is a pretty good plan.

Did I understand correctly that the places that do kill(-pid) followed
by kill(pid) really only need the kill(-pid)?

I checked that user processes should never have pid 0 (that's a
special system idle process) or 1 (because they're always even,
actually it looks like they're pointers in kernel space or something
like that), since those wouldn't play nice with the coding I used
here.

I note that Windows actually *does* have process groups (in one of the
CI threads, we learned that there were at least two concepts like
that).  Some of our fake signals turn into messages to pipes, and
others turn into process termination, and in theory we could probably
also take advantage of Windows' support for its native "control C" and
"control break" signals here.  It's possible that someone could do
something to make all but the pipe ones propagate to real Windows
process groups.  That might be good for things like nuking archiver
subprocesses and the like when taking out the backend, or something
like that, but I'm not planning to look into that myself.

Attachment

Re: Cleaning up historical portability baggage

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> Did I understand correctly that the places that do kill(-pid) followed
> by kill(pid) really only need the kill(-pid)?

Uh ... did you read the comment right above signal_child?

 * There is a race condition for recently-forked children: they might not
 * have executed setsid() yet.  So we signal the child directly as well as
 * the group.  We assume such a child will handle the signal before trying
 * to spawn any grandchild processes.  We also assume that signaling the
 * child twice will not cause any problems.

It might be that this is wrong and signaling -pid will work even if
the child hasn't yet done setsid(), but I doubt it: the kill(2) man
page is pretty clear that it'll fail if "the process group doesn't
exist".

Perhaps we could finesse that by signaling -pid first, and then
signaling pid if that fails, but offhand it seems like that has
the described race condition w.r.t. grandchild processes.

            regards, tom lane



Re: Cleaning up historical portability baggage

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> Thanks. Next in my quest for reducing autoconf vs meson pg_config.h
> differences is GETTIMEOFDAY stuff.

I just noticed that this could be simplified:

    #ifdef _MSC_VER
    struct timezone;
    /* Last parameter not used */
    extern int    gettimeofday(struct timeval *tp, struct timezone *tzp);
    #endif

because what POSIX actually says is

    int gettimeofday(struct timeval *restrict tp, void *restrict tzp);

and

    If tzp is not a null pointer, the behavior is unspecified.

Indeed, we never call it with anything but a null tzp value,
nor does our Windows fallback implementation do anything with tzp.

So ISTM we should drop the bogus "struct timezone;" and declare
this parameter as "void *tzp".

            regards, tom lane



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Sun, Aug 7, 2022 at 11:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Thanks. Next in my quest for reducing autoconf vs meson pg_config.h
> > differences is GETTIMEOFDAY stuff.
>
> I just noticed that this could be simplified:
>
>     #ifdef _MSC_VER
>     struct timezone;
>     /* Last parameter not used */
>     extern int  gettimeofday(struct timeval *tp, struct timezone *tzp);
>     #endif
>
> because what POSIX actually says is
>
>     int gettimeofday(struct timeval *restrict tp, void *restrict tzp);
>
> and
>
>     If tzp is not a null pointer, the behavior is unspecified.
>
> Indeed, we never call it with anything but a null tzp value,
> nor does our Windows fallback implementation do anything with tzp.
>
> So ISTM we should drop the bogus "struct timezone;" and declare
> this parameter as "void *tzp".

I also wonder if half the stuff in win32gettimeofday.c can be deleted.
From some light googling, it looks like
GetSystemTimePreciseAsFileTime() can just be called directly on
Windows 8+ (and we now require 10+), and that kernel32.dll malarky was
for older systems?



Re: Cleaning up historical portability baggage

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> I also wonder if half the stuff in win32gettimeofday.c can be deleted.
> From some light googling, it looks like
> GetSystemTimePreciseAsFileTime() can just be called directly on
> Windows 8+ (and we now require 10+), and that kernel32.dll malarky was
> for older systems?

Yeah, Microsoft's man page for it just says to include sysinfoapi.h
(which we aren't) and then it should work on supported versions.

            regards, tom lane



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Sun, Aug 7, 2022 at 10:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > Did I understand correctly that the places that do kill(-pid) followed
> > by kill(pid) really only need the kill(-pid)?
>
> Uh ... did you read the comment right above signal_child?
>
>  * There is a race condition for recently-forked children: they might not
>  * have executed setsid() yet.  So we signal the child directly as well as
>  * the group.  We assume such a child will handle the signal before trying
>  * to spawn any grandchild processes.  We also assume that signaling the
>  * child twice will not cause any problems.

Oof.  Fixed.

Attachment

Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Sat, Aug 6, 2022 at 9:08 AM Andres Freund <andres@anarazel.de> wrote:
> [stuff about strtoll, strtoull]

So what about strtof?  That's gotta be dead code too.  I gather we
still need commit 72880ac1's HAVE_BUGGY_STRTOF.  From a cursory glance
at MinGW's implementation, it still has the complained-about
behaviour, if I've understood the complaint, and if I'm looking at the
right C runtime[1].  But then our code says:

 * Test results on Mingw suggest that it has the same problem, though looking
 * at the code I can't figure out why.

... so which code was that referring to then?  I'm not up to speed on
how many C runtime libraries there are and how they are selected on
MSYS (I mean, the closest I've ever got to this system is flinging
patches at it on CI using Melih's patch, which, incidentally, I just
tested the attached with and it passed[2]).

[1] https://github.com/mirror/mingw-w64/blob/master/mingw-w64-crt/stdio/strtof.c
[2] https://github.com/macdice/postgres/runs/7708082971

Attachment

Re: Cleaning up historical portability baggage

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> On Sun, Aug 7, 2022 at 10:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> * There is a race condition for recently-forked children: they might not
>> * have executed setsid() yet.  So we signal the child directly as well as
>> * the group.  We assume such a child will handle the signal before trying
>> * to spawn any grandchild processes.  We also assume that signaling the
>> * child twice will not cause any problems.

> Oof.  Fixed.

Hmm ... it seems like these other callers have the same race condition.
StatementTimeoutHandler and LockTimeoutHandler account for that
correctly by issuing two kill()s, so how is it OK for pg_signal_backend
and TerminateOtherDBBackends not to?

It would likely be a good idea for all these places to mention that
they're doing that to avoid a race condition, and cross-reference
signal_child for details.  Or maybe we should promote signal_child
into a more widely used function?

            regards, tom lane



Re: Cleaning up historical portability baggage

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> So what about strtof?  That's gotta be dead code too.  I gather we
> still need commit 72880ac1's HAVE_BUGGY_STRTOF.  From a cursory glance
> at MinGW's implementation, it still has the complained-about
> behaviour, if I've understood the complaint, and if I'm looking at the
> right C runtime[1].

Looks plausible from here.

            regards, tom lane



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Sun, Aug 7, 2022 at 11:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > I also wonder if half the stuff in win32gettimeofday.c can be deleted.
> > From some light googling, it looks like
> > GetSystemTimePreciseAsFileTime() can just be called directly on
> > Windows 8+ (and we now require 10+), and that kernel32.dll malarky was
> > for older systems?
>
> Yeah, Microsoft's man page for it just says to include sysinfoapi.h
> (which we aren't) and then it should work on supported versions.

This looks good on CI (well I haven't waited for it to finish yet, but
MSVC compiles it without warning and we're most of the way through the
tests...).

Attachment

Re: Cleaning up historical portability baggage

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> On Sun, Aug 7, 2022 at 11:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Thomas Munro <thomas.munro@gmail.com> writes:
>>> I also wonder if half the stuff in win32gettimeofday.c can be deleted.

> This looks good on CI (well I haven't waited for it to finish yet, but
> MSVC compiles it without warning and we're most of the way through the
> tests...).

Looks plausible from here.  A couple other thoughts:

* While you're at it you could fix the "MinW" typo just above
the extern for gettimeofday.

* I'm half tempted to add something like this to gettimeofday:

    /*
     * POSIX declines to define what tzp points to, saying
     * "If tzp is not a null pointer, the behavior is unspecified".
     * Let's take this opportunity to verify that noplace in
     * Postgres tries to use any unportable behavior.
     */
    Assert(tzp == NULL);

            regards, tom lane



Re: Cleaning up historical portability baggage

From
Andres Freund
Date:
Hi,

Here's another set patches for cruft I discovered going line-by-line through
the autoconf vs meson test differences. They'd all be simple to port to meson
too, but I think it's better to clean them up.

0001: __func__ is C99, so we don't need to support fallbacks

0002: windows: We've unconditionally defined HAVE_MINIDUMP_TYPE for msvc forever, we
  can rely on it for mingw too

0003: aix: aix3.2.5, aix4.1 are not even of historical interest at this point
  - 4.1 was released before the first commit in our commit history

0004: solaris: these gcc & gnu ld vs sun stuff differences seem unnecessary or
  outdated

  I started this because I wanted to get rid of with_gnu_ld, but there's still
  a necessary reference left unfortunately. But it still seems worth doing?

  I checked and the relevant options (-shared, -Wl,-Bsymbolic, -Wl,-soname)
  work even on solaris 10 with developerstudio12.5 (not the latest)

0005: those broken system headers look to have been repaired a good while ago,
 or, in the case of irix, we don't support the platform anymore

Greetings,

Andres Freund

Attachment

Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Sun, Aug 7, 2022 at 1:29 PM Andres Freund <andres@anarazel.de> wrote:
> 0001: __func__ is C99, so we don't need to support fallbacks

+1, and my scraped data agrees.

I believe our minimum MSVC is current 2015, and this says it has it
(it doesn't let you select older versions in the version drop-down,
but we don't care about older versions):

https://docs.microsoft.com/en-us/cpp/preprocessor/predefined-macros?view=msvc-140

> 0002: windows: We've unconditionally defined HAVE_MINIDUMP_TYPE for msvc forever, we
>   can rely on it for mingw too

      * If supported on the current platform, set up a handler to be called if
      * the backend/postmaster crashes with a fatal signal or exception.
      */
-#if defined(WIN32) && defined(HAVE_MINIDUMP_TYPE)
+#if defined(WIN32)

Personally I'd remove "If supported on the current platform, " and
shove the rest of the comment inside the #if defined(WIN32), but
that's just me...

> 0003: aix: aix3.2.5, aix4.1 are not even of historical interest at this point
>   - 4.1 was released before the first commit in our commit history

Wow.

> 0004: solaris: these gcc & gnu ld vs sun stuff differences seem unnecessary or
>   outdated

LGTM from a look at the current man page.

>   I checked and the relevant options (-shared, -Wl,-Bsymbolic, -Wl,-soname)
>   work even on solaris 10 with developerstudio12.5 (not the latest)

FWIW I'd call Solaris 10 EOL'd (it's in some
sure-pay-us-but-we-aren't-really-going-to-fix-it phase with a lifetime
similar to the actual sun).

> 0005: those broken system headers look to have been repaired a good while ago,
>  or, in the case of irix, we don't support the platform anymore

Nice archeology.



Re: Cleaning up historical portability baggage

From
Andres Freund
Date:
Hi,

On 2022-08-07 11:47:31 +1200, Thomas Munro wrote:
> So what about strtof?  That's gotta be dead code too.  I gather we
> still need commit 72880ac1's HAVE_BUGGY_STRTOF.

> From a cursory glance at MinGW's implementation, it still has the
> complained-about behaviour, if I've understood the complaint, and if I'm
> looking at the right C runtime[1].

Well, right now we don't refuse to build against the "wrong" runtimes, so it's
hard to say whether you're looking at the right runtime. I don't think we need
this if we're (as we should imo) only using the ucrt - that's microsoft's,
which IIUC is ok?


> -/*
> - * strtof() is part of C99; this version is only for the benefit of obsolete
> - * platforms. As such, it is known to return incorrect values for edge cases,
> - * which have to be allowed for in variant files for regression test results
> - * for any such platform.
> - */

We can't remove the result files referenced here yet, due to the double
rounding behaviour?

Greetings,

Andres Freund



Re: Cleaning up historical portability baggage

From
Andres Freund
Date:
Hi,

On 2022-08-07 14:29:20 +1200, Thomas Munro wrote:
> On Sun, Aug 7, 2022 at 1:29 PM Andres Freund <andres@anarazel.de> wrote:
> > 0001: __func__ is C99, so we don't need to support fallbacks
> 
> +1, and my scraped data agrees.
> 
> I believe our minimum MSVC is current 2015, and this says it has it
> (it doesn't let you select older versions in the version drop-down,
> but we don't care about older versions):

Thanks for checking.


> > 0002: windows: We've unconditionally defined HAVE_MINIDUMP_TYPE for msvc forever, we
> >   can rely on it for mingw too
> 
>       * If supported on the current platform, set up a handler to be called if
>       * the backend/postmaster crashes with a fatal signal or exception.
>       */
> -#if defined(WIN32) && defined(HAVE_MINIDUMP_TYPE)
> +#if defined(WIN32)
> 
> Personally I'd remove "If supported on the current platform, " and
> shove the rest of the comment inside the #if defined(WIN32), but
> that's just me...

I started out that way as well, but it'd actually be nice to do this on other
platforms too, and we just don't support it yet :)


> >   I checked and the relevant options (-shared, -Wl,-Bsymbolic, -Wl,-soname)
> >   work even on solaris 10 with developerstudio12.5 (not the latest)
> 
> FWIW I'd call Solaris 10 EOL'd (it's in some
> sure-pay-us-but-we-aren't-really-going-to-fix-it phase with a lifetime
> similar to the actual sun).

I'd agree - but checked so that couldn't even be an argument against :)


Greetings,

Andres Freund



Re: Cleaning up historical portability baggage

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2022-08-07 11:47:31 +1200, Thomas Munro wrote:
>> So what about strtof?  That's gotta be dead code too.  I gather we
>> still need commit 72880ac1's HAVE_BUGGY_STRTOF.

> Well, right now we don't refuse to build against the "wrong" runtimes, so it's
> hard to say whether you're looking at the right runtime. I don't think we need
> this if we're (as we should imo) only using the ucrt - that's microsoft's,
> which IIUC is ok?

You could pull it out and see if the buildfarm breaks, but my money
is on it breaking.  That HAVE_BUGGY_STRTOF stuff isn't very old.

            regards, tom lane



Re: Cleaning up historical portability baggage

From
Andres Freund
Date:
Hi,

On 2022-08-06 22:58:12 -0400, Tom Lane wrote:
> You could pull it out and see if the buildfarm breaks, but my money
> is on it breaking.  That HAVE_BUGGY_STRTOF stuff isn't very old.

We only recently figured out that we should use the ucrt runtime (and that it
exists, I guess).
fairywren and jacan's first runs with ucrt are from mid February:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2022-02-13%2007%3A11%3A46
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2022-02-17%2016%3A15%3A24

We probably should just throw an error if msvcrt is used. That's the old, pre
C99, microsoft C runtime, with some mingw replacement functions ontop.  I
think our tests already don't pass when it's used. See [1] for more info.

Not entirely sure how to best detect ucrt use - we could just check MSYSTEM,
but that's not determinative because one also can specify the compiler via the
prefix...


That'd still leave us with the alternative output files due to cygwin, I
think.

Greetings,

Andres Freund


[1] https://www.msys2.org/docs/environments/



Re: Cleaning up historical portability baggage

From
Andres Freund
Date:
Hi,

On 2022-08-06 20:39:48 -0700, Andres Freund wrote:
> On 2022-08-06 22:58:12 -0400, Tom Lane wrote:
> > You could pull it out and see if the buildfarm breaks, but my money
> > is on it breaking.  That HAVE_BUGGY_STRTOF stuff isn't very old.
> 
> We only recently figured out that we should use the ucrt runtime (and that it
> exists, I guess).
> fairywren and jacan's first runs with ucrt are from mid February:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2022-02-13%2007%3A11%3A46
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2022-02-17%2016%3A15%3A24

Well, bad news and good news. The bad: We get the wrong results when just
removing HAVE_BUGGY_STRTOF. The good: That is just because we haven't applied
enough, or the right, magic. To have mingw to not interfere with things one
also has to pass -D_UCRT and -lucrt - then the tests pass, even without
HAVE_BUGGY_STRTOF.

I think this might also explain (and fix) some other oddity we had with mingw
that I was getting confused about a while back, but I forgot too much of the
details...

Greetings,

Andres Freund



Re: Cleaning up historical portability baggage

From
Andres Freund
Date:
Hi,

On 2022-08-06 23:20:26 -0700, Andres Freund wrote:
> The good: That is just because we haven't applied enough, or the right,
> magic. To have mingw to not interfere with things one also has to pass
> -D_UCRT and -lucrt - then the tests pass, even without HAVE_BUGGY_STRTOF.

Looks like only -lucrt is required, not -D_UCRT.

Asked on the mingw irc channel - it's not expected that such magic is
required. Opened https://github.com/msys2/MINGW-packages/issues/12472

Greetings,

Andres Freund



Re: Cleaning up historical portability baggage

From
Andres Freund
Date:
Hi,

On 2022-08-06 18:29:14 -0700, Andres Freund wrote:
> 0003: aix: aix3.2.5, aix4.1 are not even of historical interest at this point
>   - 4.1 was released before the first commit in our commit history

hoverfly clearly doesn't like this:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly&dt=2022-08-07%2017%3A06%3A15

Oh, I may see the problem I think I misread the comma in the ifneq

--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -101,15 +101,7 @@ postgres: $(POSTGRES_IMP)
 
 $(POSTGRES_IMP): $(OBJS)
     $(LD) $(LDREL) $(LDOUT) SUBSYS.o $(call expand_subsys,$^)
-ifeq ($(host_os), aix3.2.5)
     $(MKLDEXPORT) SUBSYS.o $(bindir)/postgres > $@
-else
-ifneq (,$(findstring aix4.1, $(host_os)))
-    $(MKLDEXPORT) SUBSYS.o $(bindir)/postgres > $@
-else
-    $(MKLDEXPORT) SUBSYS.o . > $@
-endif
-endif
     @rm -f SUBSYS.o
 
i.e. it should be "$(MKLDEXPORT) SUBSYS.o . > $@" after removing the
conditionals rather than "$(MKLDEXPORT) SUBSYS.o $(bindir)/postgres > $@".

Greetings,

Andres Freund



Re: Cleaning up historical portability baggage

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> On Thu, Aug 4, 2022 at 4:09 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> clock_gettime is required by SUSv2 (1997), so I have to admit that
>> macOS 10.4 doesn't have a lot of excuse not to have it.  In any case,
>> prairiedog is just sitting there doing its thing until I find cycles
>> to install a newer OS.  If you want to move ahead with this, don't
>> let prairiedog block you.

> Thanks, will do.

BTW, that commit really should have updated the explanation at the top of
instr_time.h:

 * This file provides an abstraction layer to hide portability issues in
 * interval timing.  On Unix we use clock_gettime() if available, else
 * gettimeofday().  On Windows, gettimeofday() gives a low-precision result
 * so we must use QueryPerformanceCounter() instead.  These macros also give
 * some breathing room to use other high-precision-timing APIs.

Updating the second sentence is easy enough, but as for the third,
I wonder if it's still true in view of 24c3ce8f1.  Should we revisit
whether to use gettimeofday vs. QueryPerformanceCounter?  At the very
least I suspect it's no longer about "low precision", but about which
API is faster.

            regards, tom lane



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Mon, Aug 8, 2022 at 12:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> BTW, that commit really should have updated the explanation at the top of
> instr_time.h:
>
>  * This file provides an abstraction layer to hide portability issues in
>  * interval timing.  On Unix we use clock_gettime() if available, else
>  * gettimeofday().  On Windows, gettimeofday() gives a low-precision result
>  * so we must use QueryPerformanceCounter() instead.  These macros also give
>  * some breathing room to use other high-precision-timing APIs.
>
> Updating the second sentence is easy enough, but as for the third,
> I wonder if it's still true in view of 24c3ce8f1.  Should we revisit
> whether to use gettimeofday vs. QueryPerformanceCounter?  At the very
> least I suspect it's no longer about "low precision", but about which
> API is faster.

Yeah, that's not true anymore, and QueryPerformanceCounter() is faster
than Get­System­Time­Precise­As­File­Time()[1], but there doesn't
really seem to be any point in mentioning that or gettimeofday() at
all here.  I propose to cut it down to just:

  * This file provides an abstraction layer to hide portability issues in
- * interval timing.  On Unix we use clock_gettime() if available, else
- * gettimeofday().  On Windows, gettimeofday() gives a low-precision result
- * so we must use QueryPerformanceCounter() instead.  These macros also give
- * some breathing room to use other high-precision-timing APIs.
+ * interval timing.  On Unix we use clock_gettime(), and on Windows we use
+ * QueryPerformanceCounter().  These macros also give some breathing room to
+ * use other high-precision-timing APIs.

FWIW I expect this stuff to get whacked around some more for v16[2].

[1] https://devblogs.microsoft.com/oldnewthing/20170921-00/?p=97057
[2] https://commitfest.postgresql.org/39/3751/



Re: Cleaning up historical portability baggage

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> Yeah, that's not true anymore, and QueryPerformanceCounter() is faster
> than Get­System­Time­Precise­As­File­Time()[1], but there doesn't
> really seem to be any point in mentioning that or gettimeofday() at
> all here.  I propose to cut it down to just:

>   * This file provides an abstraction layer to hide portability issues in
> - * interval timing.  On Unix we use clock_gettime() if available, else
> - * gettimeofday().  On Windows, gettimeofday() gives a low-precision result
> - * so we must use QueryPerformanceCounter() instead.  These macros also give
> - * some breathing room to use other high-precision-timing APIs.
> + * interval timing.  On Unix we use clock_gettime(), and on Windows we use
> + * QueryPerformanceCounter().  These macros also give some breathing room to
> + * use other high-precision-timing APIs.

WFM.

> FWIW I expect this stuff to get whacked around some more for v16[2].
> [2] https://commitfest.postgresql.org/39/3751/

Meh.  I think trying to use rdtsc is a fool's errand; you'll be fighting
CPU quirks forever.

            regards, tom lane



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
Here's a new batch of these:

Remove configure probe for sys/uio.h.
Remove configure probes for sys/un.h and struct sockaddr_un.
Remove configure probe for sys/select.h.
Remove configure probes for sys/ipc.h, sys/sem.h, sys/shm.h.
Remove configure probe for sys/resource.h and refactor.
Remove configure probe for shl_load library.

The most interesting things to say about these ones are:
 * The concept of a no-Unix-socket build is removed.  We should be
able to do that now, right?  Peter E seemed to say approximately that
in the commit message for 797129e5.  Or is there a thought that a new
operating system might show up that doesn't have 'em and we'd wish
we'd kept this stuff well marked out?
 * Instead of eg HAVE_SYS_RESOURCE_H I supplied <sys/resource.h> and
moved the relevant declarations there from the big random-win2-stuff
header, and likewise for <sys/un.h>

(I still plan to get rid of no-threads-builds soon, but I got stuck
figuring out how to make sure I don't break the recent magic ldap
library detection... more soon.)

Attachment

Re: Cleaning up historical portability baggage

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> The most interesting things to say about these ones are:
>  * The concept of a no-Unix-socket build is removed.  We should be
> able to do that now, right?  Peter E seemed to say approximately that
> in the commit message for 797129e5.  Or is there a thought that a new
> operating system might show up that doesn't have 'em and we'd wish
> we'd kept this stuff well marked out?

I'm kind of down on removing that.  I certainly think it's premature
to do so today, when we haven't even yet shipped a release that
assumes we can always define it on Windows -- I won't be too surprised
if we get pushback on that after 15.0 is out.  But in general, Unix
sockets seem like kind of an obvious thing that might not be there
on some future platform.

Instead of what you did in 0002, I propose putting
"#define HAVE_UNIX_SOCKETS 1" in pg_config_manual.h, and keeping
the #if's that reference it as-is.

All the rest of this seems fine on a cursory readthrough.  Even if
we discover that header foo.h is less universal than we thought,
putting back one or two configure tests won't be hard.

            regards, tom lane



Re: Cleaning up historical portability baggage

From
Andres Freund
Date:
Hi,

On 2022-08-11 10:52:51 -0400, Tom Lane wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > The most interesting things to say about these ones are:
> >  * The concept of a no-Unix-socket build is removed.  We should be
> > able to do that now, right?  Peter E seemed to say approximately that
> > in the commit message for 797129e5.  Or is there a thought that a new
> > operating system might show up that doesn't have 'em and we'd wish
> > we'd kept this stuff well marked out?
> 
> I'm kind of down on removing that.  I certainly think it's premature
> to do so today, when we haven't even yet shipped a release that
> assumes we can always define it on Windows

I think what might be good next step is to have tests default to using unix
sockets on windows, rather than requiring PG_TEST_USE_UNIX_SOCKETS. The
pg_regress.c and Utils.pm hunks disabling it by default on windows don't
really make sense anymore.

#if !defined(HAVE_UNIX_SOCKETS)
    use_unix_sockets = false;
#elif defined(WIN32)

    /*
     * We don't use Unix-domain sockets on Windows by default, even if the
     * build supports them.  (See comment at remove_temp() for a reason.)
     * Override at your own risk.
     */
    use_unix_sockets = getenv("PG_TEST_USE_UNIX_SOCKETS") ? true : false;
#else
    use_unix_sockets = true;
#endif

and

    # Specifies whether to use Unix sockets for test setups.  On
    # Windows we don't use them by default since it's not universally
    # supported, but it can be overridden if desired.
    $use_unix_sockets =
      (!$windows_os || defined $ENV{PG_TEST_USE_UNIX_SOCKETS});


Tests don't reliably run on windows without PG_TEST_USE_UNIX_SOCKETS, due to
the port conflict detection being incredibly racy. I see occasional failures
even without test concurrency, and with test concurrency it reliably fails.


I don't really know what to do about the warnings around remove_temp() and
trapsig(). I think we actually may be overreading the restrictions. To me the
documented restrictions read more like a high-level-ish explanation of what's
safe in a signal handler and what not. And it seems to not have caused a
problem on windows on several thousand CI cycles, including plenty failures.


Alternatively we could just default to putting the socketdir inside the data
directory on windows - I *think* windows doesn't have strict path length
limits for the socket location. If the socket dir isn't in some global temp
directory, we don't really need signal_remove_temp, given that we're ok with
leaving the much bigger data directory around.  The socket directory
determination doesn't really work on windows right now anyway, one manually
has to set a temp directory as TMPDIR isn't normally set on windows, and /tmp
doesn't exist.

    char       *template = psprintf("%s/pg_regress-XXXXXX",
                                    getenv("TMPDIR") ? getenv("TMPDIR") : "/tmp");

both TEMP and TMP would exist though...



> -- I won't be too surprised if we get pushback on that after 15.0 is out.

From what angle? I think our default behaviour doesn't change because
DEFAULT_PGSOCKET_DIR is "". And OS compatibility wise we apparently are good
as well?


> But in general, Unix sockets seem like kind of an obvious thing that might
> not be there on some future platform.

Maybe not with the precise API, but I can't imagine a new platform not having
something very similar. It serves a pretty obvious need, and I think the
security benefits have become more important over time...

Greetings,

Andres Freund



Re: Cleaning up historical portability baggage

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2022-08-11 10:52:51 -0400, Tom Lane wrote:
>> -- I won't be too surprised if we get pushback on that after 15.0 is out.

> From what angle?

If I knew that, it'd be because we'd already received the pushback.
I'm just suspicious that very little beta testing happens on Windows,
and what does is probably mostly people running up-to-date Windows.
So I think there's plenty of chance for "hey, this no longer works"
complaints later.  Maybe we'll be able to reject it all with "sorry,
we desupported that version of Windows", but I dunno.

            regards, tom lane



Re: Cleaning up historical portability baggage

From
Peter Eisentraut
Date:
On 11.08.22 19:19, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
>> On 2022-08-11 10:52:51 -0400, Tom Lane wrote:
>>> -- I won't be too surprised if we get pushback on that after 15.0 is out.
> 
>>  From what angle?
> 
> If I knew that, it'd be because we'd already received the pushback.
> I'm just suspicious that very little beta testing happens on Windows,
> and what does is probably mostly people running up-to-date Windows.
> So I think there's plenty of chance for "hey, this no longer works"
> complaints later.  Maybe we'll be able to reject it all with "sorry,
> we desupported that version of Windows", but I dunno.

What is changing in PG15 about this?  Unix-domain sockets on Windows are 
supported as of PG13.



Re: Cleaning up historical portability baggage

From
John Naylor
Date:
On Thu, Aug 4, 2022 at 11:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> clock_gettime is required by SUSv2 (1997), so I have to admit that
> macOS 10.4 doesn't have a lot of excuse not to have it.  In any case,
> prairiedog is just sitting there doing its thing until I find cycles
> to install a newer OS.  If you want to move ahead with this, don't
> let prairiedog block you.

Now that prairiedog is offline, it seems the oldest Bison in the
buildfarm is 2.3, as the vendor-supplied version for MacOS. Since
earlier versions are now untested, it seems we should now make that
the minimum, as in the attached. I took a quick look to see if that
would enable anything useful, but nothing stuck out.

Aside: The MSVC builds don't report the Bison version that I can see,
but otherwise it seems now the only non-Apple pre-3.0 animals are
prion (2.7) and the three Sparc animals on Debian 7 (2.5).

--
John Naylor
EDB: http://www.enterprisedb.com

Attachment

Re: Cleaning up historical portability baggage

From
Peter Eisentraut
Date:
On 11.08.22 12:02, Thomas Munro wrote:
>   * The concept of a no-Unix-socket build is removed.  We should be
> able to do that now, right?  Peter E seemed to say approximately that
> in the commit message for 797129e5.  Or is there a thought that a new
> operating system might show up that doesn't have 'em and we'd wish
> we'd kept this stuff well marked out?

Most uses of HAVE_UNIX_SOCKETS are not useful independent of that 
question.  For example, you patch has

@@ -348,7 +343,6 @@ StreamServerPort(int family, const char *hostName, 
unsigned short portNumber,
      hint.ai_flags = AI_PASSIVE;
      hint.ai_socktype = SOCK_STREAM;

-#ifdef HAVE_UNIX_SOCKETS
      if (family == AF_UNIX)
      {
          /*

But on a platform without support for Unix sockets, family just won't be 
AF_UNIX at run time, so there is no need to hide that if branch.

Note that we already require that AF_UNIX is defined on all platforms, 
even if the kernel doesn't support Unix sockets.

But maybe it would be better to make that a separate patch from the 
sys/un.h configure changes, just so there is more clarity around it.



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Fri, Aug 12, 2022 at 5:14 AM Andres Freund <andres@anarazel.de> wrote:
> On 2022-08-11 10:52:51 -0400, Tom Lane wrote:
> > Thomas Munro <thomas.munro@gmail.com> writes:
> > > The most interesting things to say about these ones are:
> > >  * The concept of a no-Unix-socket build is removed.  We should be
> > > able to do that now, right?  Peter E seemed to say approximately that
> > > in the commit message for 797129e5.  Or is there a thought that a new
> > > operating system might show up that doesn't have 'em and we'd wish
> > > we'd kept this stuff well marked out?
> >
> > I'm kind of down on removing that.  I certainly think it's premature
> > to do so today, when we haven't even yet shipped a release that
> > assumes we can always define it on Windows

We already do assume that on Windows though.  I assume it just bombs
out with an address-family-not-recognized or similar error if you try
to use it on old Windows?  It compiles fine because there aren't
any new APIs, and winsock.h has always had the AF_UNIX macro (and many
others probably copied-and-pasted from BSD sys/socket.h, the numbers
seem to match for the first 16 AF_XXXs), and we supply our own
sockaddr_un.

About that struct, some versions of Visual Studio and MinGW didn't
have the header for it when this went in[1].  MingGW 10, released in
April 2022, has gained it[2].  I think it's reasonable to support only
the "current" MinGW (it's just too niche to have a support window
wider than "right now" IMHO), so it'd be OK to rely on that for PG16?
That would leave Visual Studio.  Does anyone know if our recent
de-cluttering efforts have fixed that problem yet?  If that turns out
to be true, I'd propose to keep the <sys/un.h> header I added here,
but change it to simply #include <afunix.h>.  But that's probably too
hopeful... If we could find out which MSVC
version/SDK/whatever-you-call-it is the first not to need it, it'd be
great to put that in a comment at least, for future garbage collection
work...

(As for why we ever had a configure check for something as basic as
Unix sockets, it looks like QNX and very old Cygwin didn't have them?
Perhaps one or both also didn't even define AF_UNIX, explaining the
IS_AF_UNIX() macro we used to have?)

(Thinking about the standard... I wonder if Windows would have got
this facility sooner if POSIX's attempt to rename AF_UNIX to the more
portable-sounding AF_LOCAL had succeeded...  I noticed the Stevens
networking book has a note that POSIX had just decided on AF_LOCAL,
but current POSIX standards make no mention of it, so something went
wrong somewhere along the way there...)

> I think what might be good next step is to have tests default to using unix
> sockets on windows, rather than requiring PG_TEST_USE_UNIX_SOCKETS. The
> pg_regress.c and Utils.pm hunks disabling it by default on windows don't
> really make sense anymore.

+1, obviously separate from this cleanup stuff, but, yeah, from PG16
onwards it seems to be legit to assume that they're available AFAIK.

> I don't really know what to do about the warnings around remove_temp() and
> trapsig(). I think we actually may be overreading the restrictions. To me the
> documented restrictions read more like a high-level-ish explanation of what's
> safe in a signal handler and what not. And it seems to not have caused a
> problem on windows on several thousand CI cycles, including plenty failures.
>
> Alternatively we could just default to putting the socketdir inside the data
> directory on windows - I *think* windows doesn't have strict path length
> limits for the socket location. If the socket dir isn't in some global temp
> directory, we don't really need signal_remove_temp, given that we're ok with
> leaving the much bigger data directory around.  The socket directory
> determination doesn't really work on windows right now anyway, one manually
> has to set a temp directory as TMPDIR isn't normally set on windows, and /tmp
> doesn't exist.

Re: length: No Windows here but I spent some time today playing ping
pong with little stand-alone test programs on CI, and determined that
it chomps paths at 108 despite succeeding with longer paths.  I could
successfully bind(), but directory listings showed the truncation, and
if I tried to make and bind two sockets with the same 108-character
prefix, the second would fail with address-in-use.  Is that enough to
be practical?

One thing I noticed is that it is implemented as reparse points.  We
might need to tweak some of our stat/dirent stuff to be more careful
if we start putting sockets in places that might be encountered by
that stuff.  (The old pgwin32_is_junction() I recently removed would
have returned true for a socket; the newer get_dirent_type() would
return PGFILETYPE_REG right now because it examines reparse points
slightly more carefully, but that's still the wrong answer; for
comparison, on Unix you'd get PGFILETYPE_UNKNOWN for a DT_SOCK because
we didn't handle it).

There seems to be conflicting information out there about whether
"abstract" sockets work (the Linux extension to AF_UNIX where sun_path
starts with a NUL byte and they don't appear in the filesystem, and
they automatically go away when all descriptors are closed).  I
couldn't get it to work but I might be doing it wrong... information
is scant.

[1] https://www.postgresql.org/message-id/88ae9594-6177-fa3c-0061-5bf8f8044b21%402ndquadrant.com
[2] https://github.com/mingw-w64/mingw-w64/blob/v10.0.0/mingw-w64-headers/include/afunix.h



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Fri, Aug 12, 2022 at 7:15 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
> On 11.08.22 12:02, Thomas Munro wrote:
> >   * The concept of a no-Unix-socket build is removed.  We should be
> > able to do that now, right?  Peter E seemed to say approximately that
> > in the commit message for 797129e5.  Or is there a thought that a new
> > operating system might show up that doesn't have 'em and we'd wish
> > we'd kept this stuff well marked out?
>
> Most uses of HAVE_UNIX_SOCKETS are not useful independent of that
> question.  For example, you patch has
>
> @@ -348,7 +343,6 @@ StreamServerPort(int family, const char *hostName,
> unsigned short portNumber,
>         hint.ai_flags = AI_PASSIVE;
>         hint.ai_socktype = SOCK_STREAM;
>
> -#ifdef HAVE_UNIX_SOCKETS
>         if (family == AF_UNIX)
>         {
>                 /*
>
> But on a platform without support for Unix sockets, family just won't be
> AF_UNIX at run time, so there is no need to hide that if branch.

Good point.

> Note that we already require that AF_UNIX is defined on all platforms,
> even if the kernel doesn't support Unix sockets.

POSIX requires the macro too.  I think this would count as SUSv3 (AKA
issue 6?).  (IIUC it existed in much older POSIX form as 1g, it's all
very confusing...)

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_socket.h.html

> But maybe it would be better to make that a separate patch from the
> sys/un.h configure changes, just so there is more clarity around it.

Cool, I'll do that.



Re: Cleaning up historical portability baggage

From
Tom Lane
Date:
Continuing the project of removing dead configure tests ...
I see that prairiedog was the only buildfarm animal failing the
HAVE_PPC_LWARX_MUTEX_HINT test, and it seems pretty unlikely that
there are any assemblers remaining in the wild that can't parse that.
(I've confirmed that son-of-prairiedog, the NetBSD 9.3 installation
I'm cranking up on that hardware, is okay with it.)

So, PFA a little patch to remove that test.

It doesn't look like we can remove the adjacent test about "i"
syntax, unfortunately, because the AIX animals still fail that.

            regards, tom lane

diff --git a/configure b/configure
index cf2c4b85fe..35dcfbb709 100755
--- a/configure
+++ b/configure
@@ -15423,39 +15423,7 @@ $as_echo "#define HAVE_X86_64_POPCNTQ 1" >>confdefs.h
     fi
   ;;
   ppc*|powerpc*)
-    # On PPC, check if assembler supports LWARX instruction's mutex hint bit
-    { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether assembler supports lwarx hint bit" >&5
-$as_echo_n "checking whether assembler supports lwarx hint bit... " >&6; }
-if ${pgac_cv_have_ppc_mutex_hint+:} false; then :
-  $as_echo_n "(cached) " >&6
-else
-  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h.  */
-
-int
-main ()
-{
-int a = 0; int *p = &a; int r;
-     __asm__ __volatile__ (" lwarx %0,0,%1,1\n" : "=&r"(r) : "r"(p));
-  ;
-  return 0;
-}
-_ACEOF
-if ac_fn_c_try_compile "$LINENO"; then :
-  pgac_cv_have_ppc_mutex_hint=yes
-else
-  pgac_cv_have_ppc_mutex_hint=no
-fi
-rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
-fi
-{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_have_ppc_mutex_hint" >&5
-$as_echo "$pgac_cv_have_ppc_mutex_hint" >&6; }
-    if test x"$pgac_cv_have_ppc_mutex_hint" = xyes ; then
-
-$as_echo "#define HAVE_PPC_LWARX_MUTEX_HINT 1" >>confdefs.h
-
-    fi
-    # Check if compiler accepts "i"(x) when __builtin_constant_p(x).
+    # On PPC, check if compiler accepts "i"(x) when __builtin_constant_p(x).
     { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether __builtin_constant_p(x) implies \"i\"(x) acceptance" >&5
 $as_echo_n "checking whether __builtin_constant_p(x) implies \"i\"(x) acceptance... " >&6; }
 if ${pgac_cv_have_i_constraint__builtin_constant_p+:} false; then :
diff --git a/configure.ac b/configure.ac
index b5798bcb0a..a2aa725455 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1687,18 +1687,7 @@ case $host_cpu in
     fi
   ;;
   ppc*|powerpc*)
-    # On PPC, check if assembler supports LWARX instruction's mutex hint bit
-    AC_CACHE_CHECK([whether assembler supports lwarx hint bit],
-                   [pgac_cv_have_ppc_mutex_hint],
-    [AC_COMPILE_IFELSE([AC_LANG_PROGRAM([],
-    [int a = 0; int *p = &a; int r;
-     __asm__ __volatile__ (" lwarx %0,0,%1,1\n" : "=&r"(r) : "r"(p));])],
-    [pgac_cv_have_ppc_mutex_hint=yes],
-    [pgac_cv_have_ppc_mutex_hint=no])])
-    if test x"$pgac_cv_have_ppc_mutex_hint" = xyes ; then
-    AC_DEFINE(HAVE_PPC_LWARX_MUTEX_HINT, 1, [Define to 1 if the assembler supports PPC's LWARX mutex hint bit.])
-    fi
-    # Check if compiler accepts "i"(x) when __builtin_constant_p(x).
+    # On PPC, check if compiler accepts "i"(x) when __builtin_constant_p(x).
     AC_CACHE_CHECK([whether __builtin_constant_p(x) implies "i"(x) acceptance],
                    [pgac_cv_have_i_constraint__builtin_constant_p],
     [AC_COMPILE_IFELSE([AC_LANG_PROGRAM(
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index fc5ad5fd65..049c2d204f 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -349,9 +349,6 @@
 /* Define to 1 if you have the `posix_fallocate' function. */
 #undef HAVE_POSIX_FALLOCATE

-/* Define to 1 if the assembler supports PPC's LWARX mutex hint bit. */
-#undef HAVE_PPC_LWARX_MUTEX_HINT
-
 /* Define to 1 if you have the `ppoll' function. */
 #undef HAVE_PPOLL

diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index 5ee2c46267..ffd385765c 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -236,11 +236,9 @@
  * which should be safe in nearly all cases.  You might want to override
  * this if you are building 32-bit code for a known-recent PPC machine.
  */
-#ifdef HAVE_PPC_LWARX_MUTEX_HINT    /* must have assembler support in any case */
 #if defined(__ppc64__) || defined(__powerpc64__)
 #define USE_PPC_LWARX_MUTEX_HINT
 #endif
-#endif

 /*
  * On PPC machines, decide whether to use LWSYNC instructions in place of
diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm
index cc82668457..243c5fa824 100644
--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -313,7 +313,6 @@ sub GenerateFiles
         HAVE_PAM_PAM_APPL_H         => undef,
         HAVE_POSIX_FADVISE          => undef,
         HAVE_POSIX_FALLOCATE        => undef,
-        HAVE_PPC_LWARX_MUTEX_HINT   => undef,
         HAVE_PPOLL                  => undef,
         HAVE_PS_STRINGS             => undef,
         HAVE_PTHREAD                => undef,

Re: Cleaning up historical portability baggage

From
Tom Lane
Date:
I wrote:
> I see that prairiedog was the only buildfarm animal failing the
> HAVE_PPC_LWARX_MUTEX_HINT test, and it seems pretty unlikely that
> there are any assemblers remaining in the wild that can't parse that.

Actually, after further investigation and testing, I think we could
drop the conditionality around PPC spinlock sequences altogether.
The commentary in pg_config_manual.h claims that "some pre-POWER4
machines" will fail on LWSYNC or LWARX with hint, but I've now
confirmed that the oldest PPC chips in my possession (prairiedog's
ppc7400, as well as a couple of ppc7450 machines) are all fine with
both.  Indeed, prairiedog would have been failing for some time now
if it didn't like LWSYNC, because port/atomics/arch-ppc.h is using
that unconditionally in some places :-(.  I think we can safely
assume that such machines no longer exist in the wild, or at least
are not going to be used to run Postgres v16.

The attached, expanded patch hard-wires USE_PPC_LWARX_MUTEX_HINT
and USE_PPC_LWSYNC as true, and syncs the assembler sequences in
arch-ppc.h with that decision.  I've checked this lightly on
tern's host as well as my own machines.

            regards, tom lane

diff --git a/configure b/configure
index cf2c4b85fe..35dcfbb709 100755
--- a/configure
+++ b/configure
@@ -15423,39 +15423,7 @@ $as_echo "#define HAVE_X86_64_POPCNTQ 1" >>confdefs.h
     fi
   ;;
   ppc*|powerpc*)
-    # On PPC, check if assembler supports LWARX instruction's mutex hint bit
-    { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether assembler supports lwarx hint bit" >&5
-$as_echo_n "checking whether assembler supports lwarx hint bit... " >&6; }
-if ${pgac_cv_have_ppc_mutex_hint+:} false; then :
-  $as_echo_n "(cached) " >&6
-else
-  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h.  */
-
-int
-main ()
-{
-int a = 0; int *p = &a; int r;
-     __asm__ __volatile__ (" lwarx %0,0,%1,1\n" : "=&r"(r) : "r"(p));
-  ;
-  return 0;
-}
-_ACEOF
-if ac_fn_c_try_compile "$LINENO"; then :
-  pgac_cv_have_ppc_mutex_hint=yes
-else
-  pgac_cv_have_ppc_mutex_hint=no
-fi
-rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
-fi
-{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_have_ppc_mutex_hint" >&5
-$as_echo "$pgac_cv_have_ppc_mutex_hint" >&6; }
-    if test x"$pgac_cv_have_ppc_mutex_hint" = xyes ; then
-
-$as_echo "#define HAVE_PPC_LWARX_MUTEX_HINT 1" >>confdefs.h
-
-    fi
-    # Check if compiler accepts "i"(x) when __builtin_constant_p(x).
+    # On PPC, check if compiler accepts "i"(x) when __builtin_constant_p(x).
     { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether __builtin_constant_p(x) implies \"i\"(x) acceptance" >&5
 $as_echo_n "checking whether __builtin_constant_p(x) implies \"i\"(x) acceptance... " >&6; }
 if ${pgac_cv_have_i_constraint__builtin_constant_p+:} false; then :
diff --git a/configure.ac b/configure.ac
index b5798bcb0a..a2aa725455 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1687,18 +1687,7 @@ case $host_cpu in
     fi
   ;;
   ppc*|powerpc*)
-    # On PPC, check if assembler supports LWARX instruction's mutex hint bit
-    AC_CACHE_CHECK([whether assembler supports lwarx hint bit],
-                   [pgac_cv_have_ppc_mutex_hint],
-    [AC_COMPILE_IFELSE([AC_LANG_PROGRAM([],
-    [int a = 0; int *p = &a; int r;
-     __asm__ __volatile__ (" lwarx %0,0,%1,1\n" : "=&r"(r) : "r"(p));])],
-    [pgac_cv_have_ppc_mutex_hint=yes],
-    [pgac_cv_have_ppc_mutex_hint=no])])
-    if test x"$pgac_cv_have_ppc_mutex_hint" = xyes ; then
-    AC_DEFINE(HAVE_PPC_LWARX_MUTEX_HINT, 1, [Define to 1 if the assembler supports PPC's LWARX mutex hint bit.])
-    fi
-    # Check if compiler accepts "i"(x) when __builtin_constant_p(x).
+    # On PPC, check if compiler accepts "i"(x) when __builtin_constant_p(x).
     AC_CACHE_CHECK([whether __builtin_constant_p(x) implies "i"(x) acceptance],
                    [pgac_cv_have_i_constraint__builtin_constant_p],
     [AC_COMPILE_IFELSE([AC_LANG_PROGRAM(
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index fc5ad5fd65..049c2d204f 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -349,9 +349,6 @@
 /* Define to 1 if you have the `posix_fallocate' function. */
 #undef HAVE_POSIX_FALLOCATE

-/* Define to 1 if the assembler supports PPC's LWARX mutex hint bit. */
-#undef HAVE_PPC_LWARX_MUTEX_HINT
-
 /* Define to 1 if you have the `ppoll' function. */
 #undef HAVE_PPOLL

diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index 5ee2c46267..844c3e0f09 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -227,32 +227,6 @@
  */
 #define DEFAULT_EVENT_SOURCE  "PostgreSQL"

-/*
- * On PPC machines, decide whether to use the mutex hint bit in LWARX
- * instructions.  Setting the hint bit will slightly improve spinlock
- * performance on POWER6 and later machines, but does nothing before that,
- * and will result in illegal-instruction failures on some pre-POWER4
- * machines.  By default we use the hint bit when building for 64-bit PPC,
- * which should be safe in nearly all cases.  You might want to override
- * this if you are building 32-bit code for a known-recent PPC machine.
- */
-#ifdef HAVE_PPC_LWARX_MUTEX_HINT    /* must have assembler support in any case */
-#if defined(__ppc64__) || defined(__powerpc64__)
-#define USE_PPC_LWARX_MUTEX_HINT
-#endif
-#endif
-
-/*
- * On PPC machines, decide whether to use LWSYNC instructions in place of
- * ISYNC and SYNC.  This provides slightly better performance, but will
- * result in illegal-instruction failures on some pre-POWER4 machines.
- * By default we use LWSYNC when building for 64-bit PPC, which should be
- * safe in nearly all cases.
- */
-#if defined(__ppc64__) || defined(__powerpc64__)
-#define USE_PPC_LWSYNC
-#endif
-
 /*
  * Assumed cache line size. This doesn't affect correctness, but can be used
  * for low-level optimizations. Currently, this is used to pad some data
diff --git a/src/include/port/atomics/arch-ppc.h b/src/include/port/atomics/arch-ppc.h
index eb64513626..35a79042c0 100644
--- a/src/include/port/atomics/arch-ppc.h
+++ b/src/include/port/atomics/arch-ppc.h
@@ -90,12 +90,12 @@ pg_atomic_compare_exchange_u32_impl(volatile pg_atomic_uint32 *ptr,
         (int32) *expected >= PG_INT16_MIN)
         __asm__ __volatile__(
             "    sync                \n"
-            "    lwarx   %0,0,%5        \n"
+            "    lwarx   %0,0,%5,1    \n"
             "    cmpwi   %0,%3        \n"
-            "    bne     $+12        \n"        /* branch to isync */
+            "    bne     $+12        \n"        /* branch to lwsync */
             "    stwcx.  %4,0,%5        \n"
             "    bne     $-16        \n"        /* branch to lwarx */
-            "    isync                \n"
+            "    lwsync                \n"
             "    mfcr    %1          \n"
 :            "=&r"(found), "=r"(condition_register), "+m"(ptr->value)
 :            "i"(*expected), "r"(newval), "r"(&ptr->value)
@@ -104,12 +104,12 @@ pg_atomic_compare_exchange_u32_impl(volatile pg_atomic_uint32 *ptr,
 #endif
         __asm__ __volatile__(
             "    sync                \n"
-            "    lwarx   %0,0,%5        \n"
+            "    lwarx   %0,0,%5,1    \n"
             "    cmpw    %0,%3        \n"
-            "    bne     $+12        \n"        /* branch to isync */
+            "    bne     $+12        \n"        /* branch to lwsync */
             "    stwcx.  %4,0,%5        \n"
             "    bne     $-16        \n"        /* branch to lwarx */
-            "    isync                \n"
+            "    lwsync                \n"
             "    mfcr    %1          \n"
 :            "=&r"(found), "=r"(condition_register), "+m"(ptr->value)
 :            "r"(*expected), "r"(newval), "r"(&ptr->value)
@@ -138,11 +138,11 @@ pg_atomic_fetch_add_u32_impl(volatile pg_atomic_uint32 *ptr, int32 add_)
         add_ <= PG_INT16_MAX && add_ >= PG_INT16_MIN)
         __asm__ __volatile__(
             "    sync                \n"
-            "    lwarx   %1,0,%4        \n"
+            "    lwarx   %1,0,%4,1    \n"
             "    addi    %0,%1,%3    \n"
             "    stwcx.  %0,0,%4        \n"
             "    bne     $-12        \n"        /* branch to lwarx */
-            "    isync                \n"
+            "    lwsync                \n"
 :            "=&r"(_t), "=&b"(res), "+m"(ptr->value)
 :            "i"(add_), "r"(&ptr->value)
 :            "memory", "cc");
@@ -150,11 +150,11 @@ pg_atomic_fetch_add_u32_impl(volatile pg_atomic_uint32 *ptr, int32 add_)
 #endif
         __asm__ __volatile__(
             "    sync                \n"
-            "    lwarx   %1,0,%4        \n"
+            "    lwarx   %1,0,%4,1    \n"
             "    add     %0,%1,%3    \n"
             "    stwcx.  %0,0,%4        \n"
             "    bne     $-12        \n"        /* branch to lwarx */
-            "    isync                \n"
+            "    lwsync                \n"
 :            "=&r"(_t), "=&r"(res), "+m"(ptr->value)
 :            "r"(add_), "r"(&ptr->value)
 :            "memory", "cc");
@@ -180,12 +180,12 @@ pg_atomic_compare_exchange_u64_impl(volatile pg_atomic_uint64 *ptr,
         (int64) *expected >= PG_INT16_MIN)
         __asm__ __volatile__(
             "    sync                \n"
-            "    ldarx   %0,0,%5        \n"
+            "    ldarx   %0,0,%5,1    \n"
             "    cmpdi   %0,%3        \n"
-            "    bne     $+12        \n"        /* branch to isync */
+            "    bne     $+12        \n"        /* branch to lwsync */
             "    stdcx.  %4,0,%5        \n"
             "    bne     $-16        \n"        /* branch to ldarx */
-            "    isync                \n"
+            "    lwsync                \n"
             "    mfcr    %1          \n"
 :            "=&r"(found), "=r"(condition_register), "+m"(ptr->value)
 :            "i"(*expected), "r"(newval), "r"(&ptr->value)
@@ -194,12 +194,12 @@ pg_atomic_compare_exchange_u64_impl(volatile pg_atomic_uint64 *ptr,
 #endif
         __asm__ __volatile__(
             "    sync                \n"
-            "    ldarx   %0,0,%5        \n"
+            "    ldarx   %0,0,%5,1    \n"
             "    cmpd    %0,%3        \n"
-            "    bne     $+12        \n"        /* branch to isync */
+            "    bne     $+12        \n"        /* branch to lwsync */
             "    stdcx.  %4,0,%5        \n"
             "    bne     $-16        \n"        /* branch to ldarx */
-            "    isync                \n"
+            "    lwsync                \n"
             "    mfcr    %1          \n"
 :            "=&r"(found), "=r"(condition_register), "+m"(ptr->value)
 :            "r"(*expected), "r"(newval), "r"(&ptr->value)
@@ -224,11 +224,11 @@ pg_atomic_fetch_add_u64_impl(volatile pg_atomic_uint64 *ptr, int64 add_)
         add_ <= PG_INT16_MAX && add_ >= PG_INT16_MIN)
         __asm__ __volatile__(
             "    sync                \n"
-            "    ldarx   %1,0,%4        \n"
+            "    ldarx   %1,0,%4,1    \n"
             "    addi    %0,%1,%3    \n"
             "    stdcx.  %0,0,%4        \n"
             "    bne     $-12        \n"        /* branch to ldarx */
-            "    isync                \n"
+            "    lwsync                \n"
 :            "=&r"(_t), "=&b"(res), "+m"(ptr->value)
 :            "i"(add_), "r"(&ptr->value)
 :            "memory", "cc");
@@ -236,11 +236,11 @@ pg_atomic_fetch_add_u64_impl(volatile pg_atomic_uint64 *ptr, int64 add_)
 #endif
         __asm__ __volatile__(
             "    sync                \n"
-            "    ldarx   %1,0,%4        \n"
+            "    ldarx   %1,0,%4,1    \n"
             "    add     %0,%1,%3    \n"
             "    stdcx.  %0,0,%4        \n"
             "    bne     $-12        \n"        /* branch to ldarx */
-            "    isync                \n"
+            "    lwsync                \n"
 :            "=&r"(_t), "=&r"(res), "+m"(ptr->value)
 :            "r"(add_), "r"(&ptr->value)
 :            "memory", "cc");
diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h
index 0877cf65b0..cc83d561b2 100644
--- a/src/include/storage/s_lock.h
+++ b/src/include/storage/s_lock.h
@@ -435,7 +435,8 @@ typedef unsigned int slock_t;
  *
  * NOTE: per the Enhanced PowerPC Architecture manual, v1.0 dated 7-May-2002,
  * an isync is a sufficient synchronization barrier after a lwarx/stwcx loop.
- * On newer machines, we can use lwsync instead for better performance.
+ * But if the spinlock is in ordinary memory, we can use lwsync instead for
+ * better performance.
  *
  * Ordinarily, we'd code the branches here using GNU-style local symbols, that
  * is "1f" referencing "1:" and so on.  But some people run gcc on AIX with
@@ -450,23 +451,15 @@ tas(volatile slock_t *lock)
     int _res;

     __asm__ __volatile__(
-#ifdef USE_PPC_LWARX_MUTEX_HINT
 "    lwarx   %0,0,%3,1    \n"
-#else
-"    lwarx   %0,0,%3        \n"
-#endif
 "    cmpwi   %0,0        \n"
 "    bne     $+16        \n"        /* branch to li %1,1 */
 "    addi    %0,%0,1        \n"
 "    stwcx.  %0,0,%3        \n"
-"    beq     $+12        \n"        /* branch to lwsync/isync */
+"    beq     $+12        \n"        /* branch to lwsync */
 "    li      %1,1        \n"
 "    b       $+12        \n"        /* branch to end of asm sequence */
-#ifdef USE_PPC_LWSYNC
 "    lwsync                \n"
-#else
-"    isync                \n"
-#endif
 "    li      %1,0        \n"

 :    "=&b"(_t), "=r"(_res), "+m"(*lock)
@@ -477,23 +470,14 @@ tas(volatile slock_t *lock)

 /*
  * PowerPC S_UNLOCK is almost standard but requires a "sync" instruction.
- * On newer machines, we can use lwsync instead for better performance.
+ * But we can use lwsync instead for better performance.
  */
-#ifdef USE_PPC_LWSYNC
 #define S_UNLOCK(lock)    \
 do \
 { \
     __asm__ __volatile__ ("    lwsync \n" ::: "memory"); \
     *((volatile slock_t *) (lock)) = 0; \
 } while (0)
-#else
-#define S_UNLOCK(lock)    \
-do \
-{ \
-    __asm__ __volatile__ ("    sync \n" ::: "memory"); \
-    *((volatile slock_t *) (lock)) = 0; \
-} while (0)
-#endif /* USE_PPC_LWSYNC */

 #endif /* powerpc */

diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm
index cc82668457..243c5fa824 100644
--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -313,7 +313,6 @@ sub GenerateFiles
         HAVE_PAM_PAM_APPL_H         => undef,
         HAVE_POSIX_FADVISE          => undef,
         HAVE_POSIX_FALLOCATE        => undef,
-        HAVE_PPC_LWARX_MUTEX_HINT   => undef,
         HAVE_PPOLL                  => undef,
         HAVE_PS_STRINGS             => undef,
         HAVE_PTHREAD                => undef,

Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Sat, Aug 13, 2022 at 8:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
> > I see that prairiedog was the only buildfarm animal failing the
> > HAVE_PPC_LWARX_MUTEX_HINT test, and it seems pretty unlikely that
> > there are any assemblers remaining in the wild that can't parse that.
>
> Actually, after further investigation and testing, I think we could
> drop the conditionality around PPC spinlock sequences altogether.
> The commentary in pg_config_manual.h claims that "some pre-POWER4
> machines" will fail on LWSYNC or LWARX with hint, but I've now
> confirmed that the oldest PPC chips in my possession (prairiedog's
> ppc7400, as well as a couple of ppc7450 machines) are all fine with
> both.  Indeed, prairiedog would have been failing for some time now
> if it didn't like LWSYNC, because port/atomics/arch-ppc.h is using
> that unconditionally in some places :-(.  I think we can safely
> assume that such machines no longer exist in the wild, or at least
> are not going to be used to run Postgres v16.

Yeah, POWER3 was superseded in 2001.  Looking around, Linux distros
that someone might seriously run a database on already require much
more recent POWER generations to even boot up, and although there is
(for example) a separate unofficial powerpc port for Debian and of
course NetBSD and others that can target older Macs and IBM servers
etc, apparently no one has run our test suite on 21+ year old hardware
and reported that it (presumably) SIGILLs already, per that
observation (I think ports/packages that fail to work probably just
get marked broken).  As for vintage Apple hardware, clearly the G4
cube was far more collectable than those weird slightly inflated
looking G3 machines :-)

I'm curious, though... if we used compiler builtins, would
-march/-mcpu etc know about this kind of thing, for people who wanted
to compile on ancient hardware, or, I guess more interestingly, newer
tricks that we haven't got around to learning about yet?



Re: Cleaning up historical portability baggage

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> I'm curious, though... if we used compiler builtins, would
> -march/-mcpu etc know about this kind of thing, for people who wanted
> to compile on ancient hardware, or, I guess more interestingly, newer
> tricks that we haven't got around to learning about yet?

No idea, but I could run some tests if you have something
specific in mind.

            regards, tom lane



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Fri, Aug 12, 2022 at 8:03 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Fri, Aug 12, 2022 at 7:15 PM Peter Eisentraut
> <peter.eisentraut@enterprisedb.com> wrote:
> > But maybe it would be better to make that a separate patch from the
> > sys/un.h configure changes, just so there is more clarity around it.
>
> Cool, I'll do that.

I pushed these, except I chopped out the HAVE_UNIX_SOCKETS part as
requested.  Here it is in a separate patch, with a commit message that
explains the rationale (essentially, what you said, it's basically a
runtime matter for a hypothetical AF_UNIX-less system to complain
about).  Tom, does this argument persuade you?  Also, a couple of
nearby things:

  Remove HAVE_UNIX_SOCKETS.
  Remove configure probe for struct sockaddr_storage.
  Remove configure probe for getaddrinfo, and replacement code.

If I'm reading comments and scraped configure data right, it looks
like those last two things were there only for HP-UX 10 and Windows <
8.1.

I tried to figure out how to get rid of
PGAC_STRUCT_SOCKADDR_STORAGE_MEMBERS, but there we're into genuine
non-standard cross-platform differences.  At best maybe you could
maybe skip the test for ss_family (POSIX says you have to have that,
but I haven't read RFC 2553 to see why it claims someone should spell
it differently).  Didn't seem worth changing.

bfbot=> select name, value, count(*) from macro where name like
'%SOCKADDR_%' group by 1, 2;
                  name                  | value | count
----------------------------------------+-------+-------
 HAVE_STRUCT_SOCKADDR_STORAGE_SS_LEN    | 1     |    13 <-- BSDish
 HAVE_STRUCT_SOCKADDR_STORAGE           | 1     |   108 <-- everyone
 HAVE_STRUCT_SOCKADDR_SA_LEN            | 1     |    18 <-- BSDish + AIX
 HAVE_STRUCT_SOCKADDR_STORAGE_SS_FAMILY | 1     |   108 <-- everyone
 HAVE_STRUCT_SOCKADDR_STORAGE___SS_LEN  | 1     |     5 <-- AIX
 HAVE_STRUCT_SOCKADDR_UN                | 1     |   106 <-- everyone
except mingw
(6 rows)

Attachment

Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Sun, Aug 14, 2022 at 12:23 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>   Remove HAVE_UNIX_SOCKETS.
>   Remove configure probe for struct sockaddr_storage.
>   Remove configure probe for getaddrinfo, and replacement code.

Plus one more that falls out of the above (it was only used by
src/port/getaddrinfo.c):

  Remove configure probe for gethostbyname_r.

Attachment

Re: Cleaning up historical portability baggage

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> I pushed these, except I chopped out the HAVE_UNIX_SOCKETS part as
> requested.  Here it is in a separate patch, with a commit message that
> explains the rationale (essentially, what you said, it's basically a
> runtime matter for a hypothetical AF_UNIX-less system to complain
> about).  Tom, does this argument persuade you?

I looked more closely and saw that basically what HAVE_UNIX_SOCKETS
is guarding is code that assumes the existence of AF_UNIX and
struct sockaddr_un.  As Peter said, we already rely on AF_UNIX
in some other places; and I see that sys/un.h is required to exist
and to define struct sockaddr_un as far back as SUSv2.  So it
does seem like the worst consequence is that we'd be compiling
some code that would be unreachable on platforms lacking support.
Objection withdrawn.

As for the other two, they look like nice cleanup if we can actually
get away with it.  I agree that the business about nonstandard libbind
is not of interest anymore, but I have no idea about the state of
play on Windows.  I guess we can push 'em and see what the buildfarm
thinks.

> I tried to figure out how to get rid of
> PGAC_STRUCT_SOCKADDR_STORAGE_MEMBERS, but there we're into genuine
> non-standard cross-platform differences.

Right.  I don't think it's worth sweating over.

            regards, tom lane



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Sun, Aug 14, 2022 at 6:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > I pushed these, except I chopped out the HAVE_UNIX_SOCKETS part as
> > requested.  Here it is in a separate patch, with a commit message that
> > explains the rationale (essentially, what you said, it's basically a
> > runtime matter for a hypothetical AF_UNIX-less system to complain
> > about).  Tom, does this argument persuade you?
>
> I looked more closely and saw that basically what HAVE_UNIX_SOCKETS
> is guarding is code that assumes the existence of AF_UNIX and
> struct sockaddr_un.  As Peter said, we already rely on AF_UNIX
> in some other places; and I see that sys/un.h is required to exist
> and to define struct sockaddr_un as far back as SUSv2.  So it
> does seem like the worst consequence is that we'd be compiling
> some code that would be unreachable on platforms lacking support.
> Objection withdrawn.

Thanks, and pushed with a couple of minor doc tweaks.

I hadn't paid attention to our existing abstract Unix socket support
before and now I'm curious: do we have a confirmed sighting of that
working on Windows?  The thread didn't say so[1], and I'm suspicious
because I couldn't get simple standalone programs that bind() to
"\000c:\\xxx" to work sanely (but my method for investigating Windows
is to send the punch cards over to the CI system and wait for the
results to arrive by carrier pigeon which is cumbersome enough that I
haven't tried very hard).  Naively shoving a @ into PostreSQL's
PG_REGRESS_SOCK_DIR also breaks CI.

> As for the other two, they look like nice cleanup if we can actually
> get away with it.  I agree that the business about nonstandard libbind
> is not of interest anymore, but I have no idea about the state of
> play on Windows.  I guess we can push 'em and see what the buildfarm
> thinks.

All green on CI...  Next stop, build farm.

I'm a bit confused about why I had to #define gai_sterror
gai_strerrorA myself to get this working (my non-Windows-guy
understanding is that the A-for-ANSI [sic] variants of system
functions should be selected automatically unless you #define UNICODE
to get W-for-wide variants).  If anyone has any clues about that, I'd
be glad to clean it up.

I *guess* the main risk here is that different error messages might
show up in some scenarios on Windows (everything else was already
going directly to OS functions on Windows 8.1+ if I'm reading the code
right), but surely that'd be progress -- messages from the real netdb
implementation are surely preferable to our fallback stuff.

[1] https://www.postgresql.org/message-id/flat/6dee8574-b0ad-fc49-9c8c-2edc796f0033%402ndquadrant.com



Re: Cleaning up historical portability baggage

From
Andres Freund
Date:
Hi,

On 2022-08-14 10:03:19 +1200, Thomas Munro wrote:
> I hadn't paid attention to our existing abstract Unix socket support
> before and now I'm curious: do we have a confirmed sighting of that
> working on Windows?

I vaguely remember successfully trying it in the past. But I just tried it
unsuccessfully in a VM and there's a bunch of other places saying it's not
working...
https://github.com/microsoft/WSL/issues/4240

Greetings,

Andres Freund



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Sun, Aug 14, 2022 at 10:36 AM Andres Freund <andres@anarazel.de> wrote:
> On 2022-08-14 10:03:19 +1200, Thomas Munro wrote:
> > I hadn't paid attention to our existing abstract Unix socket support
> > before and now I'm curious: do we have a confirmed sighting of that
> > working on Windows?
>
> I vaguely remember successfully trying it in the past. But I just tried it
> unsuccessfully in a VM and there's a bunch of other places saying it's not
> working...
> https://github.com/microsoft/WSL/issues/4240

I think we'd better remove our claim that it works then.  Patch attached.

We could also reject it, I guess, but it doesn't immediately seem
harmful so I'm on the fence.  On the Windows version that Cirrus is
running, we happily start up with:

2022-08-13 20:44:35.174 GMT [4760][postmaster] LOG:  listening on Unix
socket "@c:/cirrus/.s.PGSQL.61696"

... and then client processes apparently can't see it, which is
confusing but, I guess, defensible if we're only claiming it works on
Linux.  We don't go out of our way to avoid this feature on a per-OS
basis in general, though at least on a typical Unix system it fails
fast.  For example, my FreeBSD system here barfs:

2022-08-15 13:26:13.483 NZST [29956] LOG:  could not bind Unix address
"@/tmp/.s.PGSQL.5432": No such file or directory

... because the kernel just sees an empty string and can't locate the
parent directory.

Attachment

Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Sun, Aug 14, 2022 at 10:03 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> All green on CI...  Next stop, build farm.

All good so far (except for an admonishment from crake, for which my
penance was to fix headerscheck, see separate thread...).  I did
figure out one thing that I mentioned I was confused by before: the
reason Windows didn't like my direct calls to gai_strerror() is
because another header of ours clobbered one of Windows' own macros.
This new batch includes a fix for that.

  Remove configure probe for IPv6.
  Remove dead ifaddrs.c fallback code.
  Remove configure probe for net/if.h.
  Fix macro problem with gai_strerror on Windows.
  Remove configure probe for netinet/tcp.h.
  mstcpip.h is not missing on MinGW.

The interesting one is a continuation of my "all computers have X"
series.  This episode: IPv6.

Attachment

Re: Cleaning up historical portability baggage

From
Peter Eisentraut
Date:
On 15.08.22 03:48, Thomas Munro wrote:
>> I vaguely remember successfully trying it in the past. But I just tried it
>> unsuccessfully in a VM and there's a bunch of other places saying it's not
>> working...
>> https://github.com/microsoft/WSL/issues/4240
> I think we'd better remove our claim that it works then.  Patch attached.

When I developed support for abstract unix sockets, I did test them on 
Windows.  The lack of support on WSL appears to be an unrelated fact. 
See for example how [0] talks about them separately.

[0]: https://devblogs.microsoft.com/commandline/af_unix-comes-to-windows/



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Mon, Aug 15, 2022 at 8:36 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
> On 15.08.22 03:48, Thomas Munro wrote:
> >> I vaguely remember successfully trying it in the past. But I just tried it
> >> unsuccessfully in a VM and there's a bunch of other places saying it's not
> >> working...
> >> https://github.com/microsoft/WSL/issues/4240
> > I think we'd better remove our claim that it works then.  Patch attached.
>
> When I developed support for abstract unix sockets, I did test them on
> Windows.  The lack of support on WSL appears to be an unrelated fact.
> See for example how [0] talks about them separately.

User amoldeshpande's complaint was posted to the WSL project's issue
tracker but it's about native Windows/winsock code and s/he says so
explicitly (though other people pile in with various other complaints
including WSL interop).  User sunilmut's comment says it's not
working, and [0] is now just confusing everybody :-(



Re: Cleaning up historical portability baggage

From
Andres Freund
Date:
Hi,

On 2022-08-15 13:48:22 +1200, Thomas Munro wrote:
> On Sun, Aug 14, 2022 at 10:36 AM Andres Freund <andres@anarazel.de> wrote:
> > On 2022-08-14 10:03:19 +1200, Thomas Munro wrote:
> > > I hadn't paid attention to our existing abstract Unix socket support
> > > before and now I'm curious: do we have a confirmed sighting of that
> > > working on Windows?
> >
> > I vaguely remember successfully trying it in the past. But I just tried it
> > unsuccessfully in a VM and there's a bunch of other places saying it's not
> > working...
> > https://github.com/microsoft/WSL/issues/4240
> 
> I think we'd better remove our claim that it works then.  Patch attached.
> 
> We could also reject it, I guess, but it doesn't immediately seem
> harmful so I'm on the fence.  On the Windows version that Cirrus is
> running, we happily start up with:
> 
> 2022-08-13 20:44:35.174 GMT [4760][postmaster] LOG:  listening on Unix
> socket "@c:/cirrus/.s.PGSQL.61696"

What I find odd is that you said your naive program rejected this...


FWIW, in an up-to-date windows 10 VM the client side fails with:

psql: error: connection to server on socket "@frak/.s.PGSQL.5432" failed: Invalid argument (0x00002726/10022)
        Is the server running locally and accepting connections on that socket?

That's with most security things disabled and developer mode turned on.

Greetings,

Andres Freund



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Tue, Aug 16, 2022 at 7:25 AM Andres Freund <andres@anarazel.de> wrote:
> On 2022-08-15 13:48:22 +1200, Thomas Munro wrote:
> > 2022-08-13 20:44:35.174 GMT [4760][postmaster] LOG:  listening on Unix
> > socket "@c:/cirrus/.s.PGSQL.61696"
>
> What I find odd is that you said your naive program rejected this...

No, I said it wasn't behaving sanely.  It allowed me to create two
sockets and bind them both to "\000C:\\xxxxxxxxxx", but I expected the
second to fail with EADDRINUSE/10048[1].  I was messing around with
things like that because my original aim was to check if the names are
silently truncated through EADDRINUSE errors, an approach that worked
for regular Unix sockets.

[1] https://cirrus-ci.com/task/4643322672185344?logs=main#L16



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Tue, Aug 16, 2022 at 7:51 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> [1] https://cirrus-ci.com/task/4643322672185344?logs=main#L16

Derp, I noticed that that particular horrendous quick and dirty test
code was invalidated by a closesocket() call, but in another version I
commented that out and it didn't help.  Of course it's possible that
I'm still doing something wrong in the test, I didn't spend long on
this once I saw the bigger picture...



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Fri, Aug 12, 2022 at 7:42 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Fri, Aug 12, 2022 at 5:14 AM Andres Freund <andres@anarazel.de> wrote:
> > I don't really know what to do about the warnings around remove_temp() and
> > trapsig(). I think we actually may be overreading the restrictions. To me the
> > documented restrictions read more like a high-level-ish explanation of what's
> > safe in a signal handler and what not. And it seems to not have caused a
> > problem on windows on several thousand CI cycles, including plenty failures.

So the question there is whether we can run this stuff safely in
Windows signal handler context, considering the rather vaguely defined
conditions[1]:

       unlink(sockself);
       unlink(socklock);
       rmdir(temp_sockdir);

You'd think that basic stuff like DeleteFile() that just enters the
kernel would be async-signal-safe, like on Unix; the danger surely
comes from stepping on the user context's toes with state mutations,
locks etc.  But let's suppose we want to play by a timid
interpretation of that page's "do not issue low-level or STDIO.H I/O
routines".  It also says that SIGINT is special and runs the handler
in a new thread (in a big warning box because that has other hazards
that would break other kinds of code).  Well, we *know* it's safe to
unlink files in another thread... so... how cheesy would it be if we
just did raise(SIGINT) in the real handlers?

[1] https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/signal?view=msvc-170



Re: Cleaning up historical portability baggage

From
Andres Freund
Date:
Hi,

On 2022-08-16 13:02:55 +1200, Thomas Munro wrote:
> On Fri, Aug 12, 2022 at 7:42 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > On Fri, Aug 12, 2022 at 5:14 AM Andres Freund <andres@anarazel.de> wrote:
> > > I don't really know what to do about the warnings around remove_temp() and
> > > trapsig(). I think we actually may be overreading the restrictions. To me the
> > > documented restrictions read more like a high-level-ish explanation of what's
> > > safe in a signal handler and what not. And it seems to not have caused a
> > > problem on windows on several thousand CI cycles, including plenty failures.
> 
> So the question there is whether we can run this stuff safely in
> Windows signal handler context, considering the rather vaguely defined
> conditions[1]:
> 
>        unlink(sockself);
>        unlink(socklock);
>        rmdir(temp_sockdir);
> 
> You'd think that basic stuff like DeleteFile() that just enters the
> kernel would be async-signal-safe, like on Unix; the danger surely
> comes from stepping on the user context's toes with state mutations,
> locks etc.

Yea.

I guess it could be different because things like file descriptors are just a
userland concept.


> But let's suppose we want to play by a timid interpretation of that page's
> "do not issue low-level or STDIO.H I/O routines".  It also says that SIGINT
> is special and runs the handler in a new thread (in a big warning box
> because that has other hazards that would break other kinds of code).  Well,
> we *know* it's safe to unlink files in another thread... so... how cheesy
> would it be if we just did raise(SIGINT) in the real handlers?

Not quite sure I understand. You're proposing to raise(SIGINT) for all other
handlers, so that signal_remove_temp() gets called in another thread, because
we assume that'd be safe because doing file IO in other threads is safe? That
assumes the signal handler invocation infrastructure isn't the problem...

Looks like we could register a "native" ctrl-c handler:
https://docs.microsoft.com/en-us/windows/console/setconsolectrlhandler
they're documented to run in a different thread, but without any of the
file-io warnings.
https://docs.microsoft.com/en-us/windows/console/handlerroutine

Greetings,

Andres Freund



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Tue, Aug 16, 2022 at 1:16 PM Andres Freund <andres@anarazel.de> wrote:
> > But let's suppose we want to play by a timid interpretation of that page's
> > "do not issue low-level or STDIO.H I/O routines".  It also says that SIGINT
> > is special and runs the handler in a new thread (in a big warning box
> > because that has other hazards that would break other kinds of code).  Well,
> > we *know* it's safe to unlink files in another thread... so... how cheesy
> > would it be if we just did raise(SIGINT) in the real handlers?
>
> Not quite sure I understand. You're proposing to raise(SIGINT) for all other
> handlers, so that signal_remove_temp() gets called in another thread, because
> we assume that'd be safe because doing file IO in other threads is safe? That
> assumes the signal handler invocation infrastructure isn't the problem...

That's what I was thinking about, yeah.  But after some more reading,
now I'm wondering if we'd even need to do that, or what I'm missing.
The 6 listed signals in the manual are SIGABRT, SIGFPE, SIGILL,
SIGINT, SIGSEGV and SIGTERM (the 6 required by C).  We want to run
signal_remove_temp() on SIGHUP (doesn't exist, we made it up), SIGINT,
SIGPIPE (doesn't exist, we made it up), and SIGTERM (exists for C spec
compliance but will never be raised by the system according to the
manual, and we don't raise it ourselves IIUC).  So the only case we
actually have to consider is SIGINT, and SIGINT handlers run in a
thread, so if we assume it is therefore exempt from those
very-hard-to-comply-with rules, aren't we good already?  What am I
missing?

> Looks like we could register a "native" ctrl-c handler:
> https://docs.microsoft.com/en-us/windows/console/setconsolectrlhandler
> they're documented to run in a different thread, but without any of the
> file-io warnings.
> https://docs.microsoft.com/en-us/windows/console/handlerroutine

Sounds better in general, considering the extreme constraints of the
signal system, but it'd be nice to see if the current system is truly
unsafe before writing more alien code.

Someone who wants to handle more than one SIGINT would certainly need
to consider that, because there doesn't seem to be a race-free way to
reinstall the signal handler when you receive it[1].  Races aside, for
any signal except SIGINT (assuming the above-mentioned exemption),
you're probably also not even allowed to try because raise() might be
a system call and they're banned.  Fortunately we don't care, we
wanted SIG_DFL next anyway.

[1]
https://wiki.sei.cmu.edu/confluence/display/c/SIG01-C.+Understand+implementation-specific+details+regarding+signal+handler+persistence



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Mon, Aug 15, 2022 at 5:53 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>   Remove configure probe for IPv6.
>   Remove dead ifaddrs.c fallback code.
>   Remove configure probe for net/if.h.
>   Fix macro problem with gai_strerror on Windows.
>   Remove configure probe for netinet/tcp.h.
>   mstcpip.h is not missing on MinGW.

I pushed these except one, plus one more about <sys/sockio.h> which
turned out to be not needed after a bit of archeology.

Here's a slightly better AF_INET6 one.  I'm planning to push it
tomorrow if there are no objections.  It does something a little more
aggressive than the preceding stuff, because SUSv3 says that IPv6 is
an "option".  I don't see that as an issue: it also says that various
other ubiquitous stuff we're using is optional.  Of course, it would
be absurd for a new socket implementation to appear today that can't
talk to a decent chunk of the internet, and all we require here is the
headers.  That optionality was relevant for the transition period a
couple of decades ago.

Attachment

Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Tue, Aug 16, 2022 at 4:14 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Tue, Aug 16, 2022 at 1:16 PM Andres Freund <andres@anarazel.de> wrote:
> > > But let's suppose we want to play by a timid interpretation of that page's
> > > "do not issue low-level or STDIO.H I/O routines".  It also says that SIGINT
> > > is special and runs the handler in a new thread (in a big warning box
> > > because that has other hazards that would break other kinds of code).  Well,
> > > we *know* it's safe to unlink files in another thread... so... how cheesy
> > > would it be if we just did raise(SIGINT) in the real handlers?
> >
> > Not quite sure I understand. You're proposing to raise(SIGINT) for all other
> > handlers, so that signal_remove_temp() gets called in another thread, because
> > we assume that'd be safe because doing file IO in other threads is safe? That
> > assumes the signal handler invocation infrastructure isn't the problem...
>
> That's what I was thinking about, yeah.  But after some more reading,
> now I'm wondering if we'd even need to do that, or what I'm missing.
> The 6 listed signals in the manual are SIGABRT, SIGFPE, SIGILL,
> SIGINT, SIGSEGV and SIGTERM (the 6 required by C).  We want to run
> signal_remove_temp() on SIGHUP (doesn't exist, we made it up), SIGINT,
> SIGPIPE (doesn't exist, we made it up), and SIGTERM (exists for C spec
> compliance but will never be raised by the system according to the
> manual, and we don't raise it ourselves IIUC).  So the only case we
> actually have to consider is SIGINT, and SIGINT handlers run in a
> thread, so if we assume it is therefore exempt from those
> very-hard-to-comply-with rules, aren't we good already?  What am I
> missing?

I converted that analysis into a WIP patch, and tried to make the
Windows test setup as similar to Unix as possible.  I put in the
explanation and an assertion that it's running in another thread.
This is blind coded as I don't have Windows, but it passes CI.  I'd
probably need some help from a Windows-enabled hacker to go further
with this, though.  Does the assertion hold if you control-C the
regression test, and is there any other way to get it to fail?

The next thing is that the security infrastructure added by commit
f6dc6dd5 for CVE-2014-0067 is ripped out (because unreachable) by the
attached, but the security infrastructure added by commit be76a6d3
probably doesn't work on Windows yet.  Where src/port/mkdtemp.c does
mkdir(name, 0700), I believe Windows throws away the mode and makes a
default ACL directory, probably due to the mismatch between the
permissions models.  I haven't studied the Windows security model, but
reading tells me that AF_UNIX will obey filesystem ACLs, so I think we
should be able to make it exactly as secure as Unix if we use native
APIs.  Perhaps we just need to replace the mkdir() call in mkdtemp.c
with CreateDirectory(), passing in a locked-down owner-only
SECURITY_DESCRIPTOR, or something like that?

Attachment

Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Sun, Aug 14, 2022 at 6:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > I tried to figure out how to get rid of
> > PGAC_STRUCT_SOCKADDR_STORAGE_MEMBERS, but there we're into genuine
> > non-standard cross-platform differences.
>
> Right.  I don't think it's worth sweating over.

I managed to get rid of four of these probes.  Some were unused, and
one could be consolidated into another leaving just one probe of this
ilk.

1.  src/common/ip.c already made a leap by assuming that if you have
ss_len then you must have sun_len.  We might as well change that to be
driven by the presence of sa_len instead.  That leap is fine: if you
have one, you have them all, and sa_len has the advantage of a stable
name across systems that have it (unlike ss_len, which AIX calls
__ss_len, requiring more configure gloop).

2.  src/backend/libpq/ifaddr.c only needs to know if you have sa_len.
This code is only used on AIX, so we could hard-wire it in theory, but
it's good to keep it general so you can still compile and test it on
systems without sa_len (mostly Linux).

Attachment

Re: Cleaning up historical portability baggage

From
Andres Freund
Date:
Hi,

On 2022-08-18 18:13:38 +1200, Thomas Munro wrote:
> Here's a slightly better AF_INET6 one.  I'm planning to push it
> tomorrow if there are no objections.

You didn't yet, I think. Any chance you could? The HAVE_IPV6 stuff is
wrong/ugly in the meson build, right now, and I'd rather not spend time fixing
it up ;)


> It does something a little more aggressive than the preceding stuff, because
> SUSv3 says that IPv6 is an "option".  I don't see that as an issue: it also
> says that various other ubiquitous stuff we're using is optional.  Of
> course, it would be absurd for a new socket implementation to appear today
> that can't talk to a decent chunk of the internet, and all we require here
> is the headers.  That optionality was relevant for the transition period a
> couple of decades ago.

> From f162a15a6d723f8c94d9daa6236149e1f39b0d9a Mon Sep 17 00:00:00 2001
> From: Thomas Munro <tmunro@postgresql.org>
> Date: Thu, 18 Aug 2022 11:55:10 +1200
> Subject: [PATCH] Remove configure probe for sockaddr_in6 and require AF_INET6.
> 
> SUSv3 <netinet/in.h> defines struct sockaddr_in6, and all targeted Unix
> systems have it.  Windows has it in <ws2ipdef.h>.  Remove the configure
> probe, the macro and a small amount of dead code.
> 
> Also remove a mention of IPv6-less builds from the documentation, since
> there aren't any.
> 
> This is similar to commits f5580882 and 077bf2f2 for Unix sockets.
> Even though AF_INET6 is an "optional" component of SUSv3, there are no
> known modern operating system without it, and it seems even less likely
> to be omitted from future systems than AF_UNIX.
> 
> Discussion: https://postgr.es/m/CA+hUKGKErNfhmvb_H0UprEmp4LPzGN06yR2_0tYikjzB-2ECMw@mail.gmail.com

Looks good to me.


I'm idly wondering whether it's worth at some point to introduce a configure
test of just compiling a file referencing all the headers and symbols we exist
to be there...

Greetings,

Andres Freund



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Fri, Aug 26, 2022 at 7:47 AM Andres Freund <andres@anarazel.de> wrote:
> On 2022-08-18 18:13:38 +1200, Thomas Munro wrote:
> > Here's a slightly better AF_INET6 one.  I'm planning to push it
> > tomorrow if there are no objections.
>
> You didn't yet, I think. Any chance you could? The HAVE_IPV6 stuff is
> wrong/ugly in the meson build, right now, and I'd rather not spend time fixing
> it up ;)

Done, and thanks for looking.

Remaining things from this thread:
 * removing --disable-thread-safety
 * removing those vestigial HAVE_XXX macros (one by one analysis and patches)
 * making Unix sockets secure for Windows in tests



Re: Cleaning up historical portability baggage

From
Tom Lane
Date:
Here's another bit of baggage handling: fixing up the places that
were afraid to use fflush(NULL).  We could doubtless have done
this years ago (indeed, I found several places already using it)
but as long as we're making a push to get rid of obsolete code,
doing it now seems appropriate.

One thing that's not clear to me is what the appropriate rules
should be for popen().  POSIX makes clear that you shouldn't
expect popen() to include an fflush() itself, but we seem quite
haphazard about whether to do one or not before popen().  In
the case of popen(..., "r") we can expect that the child can't
write on our stdout, but stderr could be a problem anyway.

Likewise, there are some places that fflush before system(),
but they are a minority.  Again it seems like the main risk
is duplicated or mis-ordered stderr output.

I'm inclined to add fflush(NULL) before any popen() or system()
that hasn't got one already, but did not do that in the attached.

Thoughts?

            regards, tom lane

diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index c75be03d2c..ec67761487 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -37,13 +37,8 @@ fork_process(void)

     /*
      * Flush stdio channels just before fork, to avoid double-output problems.
-     * Ideally we'd use fflush(NULL) here, but there are still a few non-ANSI
-     * stdio libraries out there (like SunOS 4.1.x) that coredump if we do.
-     * Presently stdout and stderr are the only stdio output channels used by
-     * the postmaster, so fflush'ing them should be sufficient.
      */
-    fflush(stdout);
-    fflush(stderr);
+    fflush(NULL);

 #ifdef LINUX_PROFILE

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index e3b19ca1ed..1aaab6c554 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -2503,8 +2503,7 @@ OpenPipeStream(const char *command, const char *mode)
     ReleaseLruFiles();

 TryAgain:
-    fflush(stdout);
-    fflush(stderr);
+    fflush(NULL);
     pqsignal(SIGPIPE, SIG_DFL);
     errno = 0;
     file = popen(command, mode);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 95f32de4e2..cb3c289889 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -643,8 +643,7 @@ errfinish(const char *filename, int lineno, const char *funcname)
          * Any other code you might be tempted to add here should probably be
          * in an on_proc_exit or on_shmem_exit callback instead.
          */
-        fflush(stdout);
-        fflush(stderr);
+        fflush(NULL);

         /*
          * Let the cumulative stats system know. Only mark the session as
@@ -670,8 +669,7 @@ errfinish(const char *filename, int lineno, const char *funcname)
          * XXX: what if we are *in* the postmaster?  abort() won't kill our
          * children...
          */
-        fflush(stdout);
-        fflush(stderr);
+        fflush(NULL);
         abort();
     }

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 29c28b7315..8567c875fe 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -489,8 +489,7 @@ popen_check(const char *command, const char *mode)
 {
     FILE       *cmdfd;

-    fflush(stdout);
-    fflush(stderr);
+    fflush(NULL);
     errno = 0;
     cmdfd = popen(command, mode);
     if (cmdfd == NULL)
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 73e20081d1..be2af9f261 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -448,8 +448,7 @@ start_postmaster(void)
     pgpid_t        pm_pid;

     /* Flush stdio channels just before fork, to avoid double-output problems */
-    fflush(stdout);
-    fflush(stderr);
+    fflush(NULL);

 #ifdef EXEC_BACKEND
     pg_disable_aslr();
diff --git a/src/bin/pg_dump/pg_dumpall.c b/src/bin/pg_dump/pg_dumpall.c
index 68c455f84b..d665b257c9 100644
--- a/src/bin/pg_dump/pg_dumpall.c
+++ b/src/bin/pg_dump/pg_dumpall.c
@@ -1578,8 +1578,7 @@ runPgDump(const char *dbname, const char *create_opts)

     pg_log_info("running \"%s\"", cmd->data);

-    fflush(stdout);
-    fflush(stderr);
+    fflush(NULL);

     ret = system(cmd->data);

diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index e2086a07de..018cd310f7 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -123,8 +123,7 @@ get_control_data(ClusterInfo *cluster, bool live_check)
         /* only pg_controldata outputs the cluster state */
         snprintf(cmd, sizeof(cmd), "\"%s/pg_controldata\" \"%s\"",
                  cluster->bindir, cluster->pgdata);
-        fflush(stdout);
-        fflush(stderr);
+        fflush(NULL);

         if ((output = popen(cmd, "r")) == NULL)
             pg_fatal("could not get control data using %s: %s",
@@ -191,8 +190,7 @@ get_control_data(ClusterInfo *cluster, bool live_check)
              cluster->bindir,
              live_check ? "pg_controldata\"" : resetwal_bin,
              cluster->pgdata);
-    fflush(stdout);
-    fflush(stderr);
+    fflush(NULL);

     if ((output = popen(cmd, "r")) == NULL)
         pg_fatal("could not get control data using %s: %s",
diff --git a/src/bin/psql/copy.c b/src/bin/psql/copy.c
index c181682a13..0f66ebc2ed 100644
--- a/src/bin/psql/copy.c
+++ b/src/bin/psql/copy.c
@@ -288,8 +288,7 @@ do_copy(const char *args)
         {
             if (options->program)
             {
-                fflush(stdout);
-                fflush(stderr);
+                fflush(NULL);
                 errno = 0;
                 copystream = popen(options->file, PG_BINARY_R);
             }
@@ -307,10 +306,9 @@ do_copy(const char *args)
         {
             if (options->program)
             {
-                fflush(stdout);
-                fflush(stderr);
-                errno = 0;
+                fflush(NULL);
                 disable_sigpipe_trap();
+                errno = 0;
                 copystream = popen(options->file, PG_BINARY_W);
             }
             else
diff --git a/src/common/exec.c b/src/common/exec.c
index fdcad0ee8c..81ff86dce6 100644
--- a/src/common/exec.c
+++ b/src/common/exec.c
@@ -389,8 +389,7 @@ pipe_read_line(char *cmd, char *line, int maxsize)
     FILE       *pgver;

     /* flush output buffers in case popen does not... */
-    fflush(stdout);
-    fflush(stderr);
+    fflush(NULL);

     errno = 0;
     if ((pgver = popen(cmd, "r")) == NULL)
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index a803355f8e..3c92aa8836 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -264,8 +264,7 @@ stop_postmaster(void)
         int            r;

         /* On Windows, system() seems not to force fflush, so... */
-        fflush(stdout);
-        fflush(stderr);
+        fflush(NULL);

         snprintf(buf, sizeof(buf),
                  "\"%s%spg_ctl\" stop -D \"%s/data\" -s",
@@ -1063,13 +1062,9 @@ spawn_process(const char *cmdline)
     pid_t        pid;

     /*
-     * Must flush I/O buffers before fork.  Ideally we'd use fflush(NULL) here
-     * ... does anyone still care about systems where that doesn't work?
+     * Must flush I/O buffers before fork.
      */
-    fflush(stdout);
-    fflush(stderr);
-    if (logfile)
-        fflush(logfile);
+    fflush(NULL);

 #ifdef EXEC_BACKEND
     pg_disable_aslr();

Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Mon, Aug 29, 2022 at 9:40 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Here's another bit of baggage handling: fixing up the places that
> were afraid to use fflush(NULL).  We could doubtless have done
> this years ago (indeed, I found several places already using it)
> but as long as we're making a push to get rid of obsolete code,
> doing it now seems appropriate.

+1, must be OK (pg_dump and pg_upgrade).

Archeology:

commit 79fcde48b229534fd4a5e07834e5e0e84dd38bee
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Sun Nov 29 01:51:56 1998 +0000

    Portability fix for old SunOS releases: fflush(NULL)

https://www.postgresql.org/message-id/199811241847.NAA04690%40tuna.uimage.com

SunOS 4.x was still in some kind of support phase for a few more
years, but I guess they weren't working too hard on conformance and
features, given that SunOS 5 (the big BSD -> System V rewrite) had
come out in '92...

> One thing that's not clear to me is what the appropriate rules
> should be for popen().  POSIX makes clear that you shouldn't
> expect popen() to include an fflush() itself, but we seem quite
> haphazard about whether to do one or not before popen().  In
> the case of popen(..., "r") we can expect that the child can't
> write on our stdout, but stderr could be a problem anyway.
>
> Likewise, there are some places that fflush before system(),
> but they are a minority.  Again it seems like the main risk
> is duplicated or mis-ordered stderr output.
>
> I'm inclined to add fflush(NULL) before any popen() or system()
> that hasn't got one already, but did not do that in the attached.

Couldn't hurt.  (Looking around at our setvbuf() setup to check the
expected stream state in various places... and huh, I hadn't
previously noticed the thing about Windows interpreting line buffering
to mean full buffering.  Pfnghghl...)



Re: Cleaning up historical portability baggage

From
Ibrar Ahmed
Date:


On Mon, Aug 29, 2022 at 3:13 AM Thomas Munro <thomas.munro@gmail.com> wrote:
On Mon, Aug 29, 2022 at 9:40 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Here's another bit of baggage handling: fixing up the places that
> were afraid to use fflush(NULL).  We could doubtless have done
> this years ago (indeed, I found several places already using it)
> but as long as we're making a push to get rid of obsolete code,
> doing it now seems appropriate.

+1, must be OK (pg_dump and pg_upgrade).

Archeology:

commit 79fcde48b229534fd4a5e07834e5e0e84dd38bee
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Sun Nov 29 01:51:56 1998 +0000

    Portability fix for old SunOS releases: fflush(NULL)

https://www.postgresql.org/message-id/199811241847.NAA04690%40tuna.uimage.com

SunOS 4.x was still in some kind of support phase for a few more
years, but I guess they weren't working too hard on conformance and
features, given that SunOS 5 (the big BSD -> System V rewrite) had
come out in '92...

> One thing that's not clear to me is what the appropriate rules
> should be for popen().  POSIX makes clear that you shouldn't
> expect popen() to include an fflush() itself, but we seem quite
> haphazard about whether to do one or not before popen().  In
> the case of popen(..., "r") we can expect that the child can't
> write on our stdout, but stderr could be a problem anyway.
>
> Likewise, there are some places that fflush before system(),
> but they are a minority.  Again it seems like the main risk
> is duplicated or mis-ordered stderr output.
>
> I'm inclined to add fflush(NULL) before any popen() or system()
> that hasn't got one already, but did not do that in the attached.

Couldn't hurt.  (Looking around at our setvbuf() setup to check the
expected stream state in various places... and huh, I hadn't
previously noticed the thing about Windows interpreting line buffering
to mean full buffering.  Pfnghghl...)


The patch does not apply successfully; please rebase the patch.
patching file src/backend/postmaster/fork_process.c
Hunk #1 FAILED at 37.
1 out of 1 hunk FAILED -- saving rejects to file src/backend/postmaster/fork_process.c.rej
patching file src/backend/storage/file/fd.c
Hunk #1 FAILED at 2503.
1 out of 1 hunk FAILED -- saving rejects to file src/backend/storage/file/fd.c.rej
patching file src/backend/utils/error/elog.c
Hunk #1 FAILED at 643.
Hunk #2 FAILED at 670.


--
Ibrar Ahmed

Re: Cleaning up historical portability baggage

From
John Naylor
Date:
On Thu, Sep 15, 2022 at 3:11 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
> The patch does not apply successfully; please rebase the patch.

There's a good reason for that -- the latest one was committed two
weeks ago. The status should still be waiting on author, though,
namely for:

On Fri, Aug 26, 2022 at 5:28 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> Remaining things from this thread:
>  * removing --disable-thread-safety
>  * removing those vestigial HAVE_XXX macros (one by one analysis and patches)
>  * making Unix sockets secure for Windows in tests

-- 
John Naylor
EDB: http://www.enterprisedb.com



Re: Cleaning up historical portability baggage

From
Tom Lane
Date:
John Naylor <john.naylor@enterprisedb.com> writes:
> On Thu, Sep 15, 2022 at 3:11 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
>> The patch does not apply successfully; please rebase the patch.

> There's a good reason for that -- the latest one was committed two
> weeks ago. The status should still be waiting on author, though,
> namely for:

> On Fri, Aug 26, 2022 at 5:28 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>> Remaining things from this thread:
>> * removing --disable-thread-safety
>> * removing those vestigial HAVE_XXX macros (one by one analysis and patches)
>> * making Unix sockets secure for Windows in tests

I imagine we should just close the current CF entry as committed.
There's no patch in existence for any of those TODO items, and
I didn't think one was imminent.

            regards, tom lane



Re: Cleaning up historical portability baggage

From
Thomas Munro
Date:
On Fri, Sep 16, 2022 at 1:55 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> John Naylor <john.naylor@enterprisedb.com> writes:
> > On Thu, Sep 15, 2022 at 3:11 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
> >> The patch does not apply successfully; please rebase the patch.
>
> > There's a good reason for that -- the latest one was committed two
> > weeks ago. The status should still be waiting on author, though,
> > namely for:
>
> > On Fri, Aug 26, 2022 at 5:28 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> >> Remaining things from this thread:
> >> * removing --disable-thread-safety
> >> * removing those vestigial HAVE_XXX macros (one by one analysis and patches)
> >> * making Unix sockets secure for Windows in tests
>
> I imagine we should just close the current CF entry as committed.
> There's no patch in existence for any of those TODO items, and
> I didn't think one was imminent.

I have patches for these, but not quite ready to post.  I'll mark this
entry closed, and make a new one or two when ready, instead of this
one-gigantic-CF-entry-that-goes-on-forever format.