Thread: Using AF_UNIX sockets always for tests on Windows

Using AF_UNIX sockets always for tests on Windows

From
Thomas Munro
Date:
Hi,

Commit f5580882 established that all supported computers have AF_UNIX.
One of the follow-up consequences that was left unfinished is that we
could simplify our test harness code to make it the same on all
platforms.  Currently we have hundreds of lines of C and perl to use
secure TCP connections instead for the benefit of defunct Windows
versions.  Here's a patch set for that.  These patches and some
discussion of them were buried in the recent
clean-up-after-recently-dropped-obsolete-systems thread[1], and I
didn't want to lose track of them.  I think they would need review and
testing from a Windows-based hacker to make progress.  The patches
are:

1.  Teach mkdtemp() to make a non-world-accessible directory.  This is
required to be able to make a socket that other processes can't
connect to, to match the paranoia level used on Unix.  This was
written just by reading documentation, because I am not a Windows
user, so I would be grateful for a second opinion and/or testing from
a Windows hacker, which would involve testing with two different
users.  The idea is that Windows' mkdir() is completely ignoring the
permissions (we can see in the mingw headers that it literally throws
away the mode argument), so we shouldn't use that, but native
CreateDirectory() when given a pointer to a SECURITY_ATTRIBUTES with
lpSecurityDesciptor set to NULL should only allow the current user to
access the object (directory).  Does this really work, and would it be
better to create some more explicit private-keep-out
SECURITY_ATTRIBUTE, and how would that look?

I'm fairly sure that filesystem permissions must be enough to stop
another OS user from connecting, because it's clear from documentation
that AF_UNIX works on Windows by opening the file and reading some
magic "reparse" data from inside it, so if you can't see into the
containing directory, you can't do it.  This patch is the one the rest
are standing on, because the tests should match Unix in their level of
security.

2.  Always use AF_UNIX for pg_regress.  Remove a bunch of
no-longer-needed sspi auth stuff.  Remove comments that worried about
signal handler safety (referring here to real Windows signals, not
fake PostgreSQL signals that are a backend-only concept).  By my
reading of the Windows documentation and our code, there is no real
concern here, so the remove_temp() stuff should be fine, as I have
explained in a new comment.  But I have not tested this signal safety
claim, not being a Windows user.  I added an assertion that should
hold if I am right.  If you run this on Windows and interrupt
pg_regress with ^C, does it hold?

3.  Use AF_UNIX for TAP tests too.

4.  In passing, remove our documentation's claim that Linux's
"abstract" AF_UNIX namespace is available on Windows.  It does not
work at all, according to all reports (IMHO it seems like an
inherently insecure interface that other OSes would be unlikely to
adopt).

Note that this thread is not about libpq, which differs from Unix by
defaulting to host=localhost rather than AF_UNIX IIRC.  That's a
user-facing policy decision I'm not touching; this thread is just
about cleaning up old test infrastructure of interest to hackers.

[1]
https://www.postgresql.org/message-id/flat/CA%2BhUKGJ3LHeP9w5Fgzdr4G8AnEtJ%3Dz%3Dp6hGDEm4qYGEUX5B6fQ%40mail.gmail.com

Attachment

Re: Using AF_UNIX sockets always for tests on Windows

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> Commit f5580882 established that all supported computers have AF_UNIX.
> One of the follow-up consequences that was left unfinished is that we
> could simplify our test harness code to make it the same on all
> platforms.  Currently we have hundreds of lines of C and perl to use
> secure TCP connections instead for the benefit of defunct Windows
> versions.  Here's a patch set for that.

If we remove that, won't we have a whole lot of code that's not
tested at all on any platform, ie all the TCP-socket code?

            regards, tom lane



Re: Using AF_UNIX sockets always for tests on Windows

From
Andres Freund
Date:
Hi,

On 2022-12-01 20:30:36 -0500, Tom Lane wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > Commit f5580882 established that all supported computers have AF_UNIX.
> > One of the follow-up consequences that was left unfinished is that we
> > could simplify our test harness code to make it the same on all
> > platforms.  Currently we have hundreds of lines of C and perl to use
> > secure TCP connections instead for the benefit of defunct Windows
> > versions.  Here's a patch set for that.
> 
> If we remove that, won't we have a whole lot of code that's not
> tested at all on any platform, ie all the TCP-socket code?

There's some coverage via the auth and ssl tests. But I agree it's an
issue. But to me the fix for that seems to be to add a dedicated test for
that, rather than relying on windows to test our socket code - that's quite a
few separate code paths from the tcp support of other platforms.

Greetings,

Andres Freund



Re: Using AF_UNIX sockets always for tests on Windows

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2022-12-01 20:30:36 -0500, Tom Lane wrote:
>> If we remove that, won't we have a whole lot of code that's not
>> tested at all on any platform, ie all the TCP-socket code?

> There's some coverage via the auth and ssl tests. But I agree it's an
> issue. But to me the fix for that seems to be to add a dedicated test for
> that, rather than relying on windows to test our socket code - that's quite a
> few separate code paths from the tcp support of other platforms.

IMO that's not the best way forward, because you'll always have
nagging questions about whether a single-purpose test covers
everything that needs coverage.  I think the best place to be in
would be to be able to run the whole test suite using either TCP or
UNIX sockets, on any platform (with stuff like the SSL test
overriding the choice as needed).  I doubt ripping out the existing
Windows-only support for TCP-based testing is a good step in that
direction.

            regards, tom lane



Re: Using AF_UNIX sockets always for tests on Windows

From
Andres Freund
Date:
Hi,

On 2022-12-01 20:56:18 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2022-12-01 20:30:36 -0500, Tom Lane wrote:
> >> If we remove that, won't we have a whole lot of code that's not
> >> tested at all on any platform, ie all the TCP-socket code?
> 
> > There's some coverage via the auth and ssl tests. But I agree it's an
> > issue. But to me the fix for that seems to be to add a dedicated test for
> > that, rather than relying on windows to test our socket code - that's quite a
> > few separate code paths from the tcp support of other platforms.
> 
> IMO that's not the best way forward, because you'll always have
> nagging questions about whether a single-purpose test covers
> everything that needs coverage.

Still seems better than not having any coverage in our development
environments...


> I think the best place to be in would be to be able to run the whole test
> suite using either TCP or UNIX sockets, on any platform (with stuff like the
> SSL test overriding the choice as needed).

I agree that that's useful. But it seems somewhat independent from the
majority of the proposed changes. To be able to test force-tcp-everywhere we
don't need e.g.  code for setting sspi auth in pg_regress etc - it's afaik
just needed so there's a secure way of running tests at all on windows.

I think 0003 should be "trimmed" to only change the default for
$use_unix_sockets on windows and to remove PG_TEST_USE_UNIX_SOCKETS. Whoever
wants to, can then add a new environment variable to force tap tests to use
tcp.

Greetings,

Andres Freund



Re: Using AF_UNIX sockets always for tests on Windows

From
Andrew Dunstan
Date:
On 2022-12-01 Th 21:10, Andres Freund wrote:
> Hi,
>
> On 2022-12-01 20:56:18 -0500, Tom Lane wrote:
>> Andres Freund <andres@anarazel.de> writes:
>>> On 2022-12-01 20:30:36 -0500, Tom Lane wrote:
>>>> If we remove that, won't we have a whole lot of code that's not
>>>> tested at all on any platform, ie all the TCP-socket code?
>>> There's some coverage via the auth and ssl tests. But I agree it's an
>>> issue. But to me the fix for that seems to be to add a dedicated test for
>>> that, rather than relying on windows to test our socket code - that's quite a
>>> few separate code paths from the tcp support of other platforms.
>> IMO that's not the best way forward, because you'll always have
>> nagging questions about whether a single-purpose test covers
>> everything that needs coverage.
> Still seems better than not having any coverage in our development
> environments...
>
>
>> I think the best place to be in would be to be able to run the whole test
>> suite using either TCP or UNIX sockets, on any platform (with stuff like the
>> SSL test overriding the choice as needed).
> I agree that that's useful. But it seems somewhat independent from the
> majority of the proposed changes. To be able to test force-tcp-everywhere we
> don't need e.g.  code for setting sspi auth in pg_regress etc - it's afaik
> just needed so there's a secure way of running tests at all on windows.
>
> I think 0003 should be "trimmed" to only change the default for
> $use_unix_sockets on windows and to remove PG_TEST_USE_UNIX_SOCKETS. Whoever
> wants to, can then add a new environment variable to force tap tests to use
> tcp.
>

Not sure if it's useful here, but a few months ago I prepared patches to
remove the config-auth option of pg_regress, which struck me as more
than odd, and replace it with a perl module. I didn't get around to
finishing them, but the patches as of then are attached.

I agree that having some switch that says "run everything with TCP" or
"run (almost) everything with Unix sockets" would be good.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Attachment

Re: Using AF_UNIX sockets always for tests on Windows

From
vignesh C
Date:
On Fri, 2 Dec 2022 at 18:08, Andrew Dunstan <andrew@dunslane.net> wrote:
>
>
> On 2022-12-01 Th 21:10, Andres Freund wrote:
> > Hi,
> >
> > On 2022-12-01 20:56:18 -0500, Tom Lane wrote:
> >> Andres Freund <andres@anarazel.de> writes:
> >>> On 2022-12-01 20:30:36 -0500, Tom Lane wrote:
> >>>> If we remove that, won't we have a whole lot of code that's not
> >>>> tested at all on any platform, ie all the TCP-socket code?
> >>> There's some coverage via the auth and ssl tests. But I agree it's an
> >>> issue. But to me the fix for that seems to be to add a dedicated test for
> >>> that, rather than relying on windows to test our socket code - that's quite a
> >>> few separate code paths from the tcp support of other platforms.
> >> IMO that's not the best way forward, because you'll always have
> >> nagging questions about whether a single-purpose test covers
> >> everything that needs coverage.
> > Still seems better than not having any coverage in our development
> > environments...
> >
> >
> >> I think the best place to be in would be to be able to run the whole test
> >> suite using either TCP or UNIX sockets, on any platform (with stuff like the
> >> SSL test overriding the choice as needed).
> > I agree that that's useful. But it seems somewhat independent from the
> > majority of the proposed changes. To be able to test force-tcp-everywhere we
> > don't need e.g.  code for setting sspi auth in pg_regress etc - it's afaik
> > just needed so there's a secure way of running tests at all on windows.
> >
> > I think 0003 should be "trimmed" to only change the default for
> > $use_unix_sockets on windows and to remove PG_TEST_USE_UNIX_SOCKETS. Whoever
> > wants to, can then add a new environment variable to force tap tests to use
> > tcp.
> >
>
> Not sure if it's useful here, but a few months ago I prepared patches to
> remove the config-auth option of pg_regress, which struck me as more
> than odd, and replace it with a perl module. I didn't get around to
> finishing them, but the patches as of then are attached.
>
> I agree that having some switch that says "run everything with TCP" or
> "run (almost) everything with Unix sockets" would be good.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
bf03cfd162176d543da79f9398131abc251ddbb9 ===
=== applying patch
./0001-Do-config_auth-in-perl-code-for-TAP-tests-and-vcregr.patch
patching file contrib/basebackup_to_shell/t/001_basic.pl
Hunk #1 FAILED at 21.
1 out of 1 hunk FAILED -- saving rejects to file
contrib/basebackup_to_shell/t/001_basic.pl.rej
patching file src/bin/pg_basebackup/t/010_pg_basebackup.pl
Hunk #1 FAILED at 29.
1 out of 1 hunk FAILED -- saving rejects to file
src/bin/pg_basebackup/t/010_pg_basebackup.pl.rej
Hunk #3 FAILED at 461.
1 out of 3 hunks FAILED -- saving rejects to file
src/test/perl/PostgreSQL/Test/Cluster.pm.rej

[1] - http://cfbot.cputube.org/patch_41_4033.log

Regards,
Vignesh



Re: Using AF_UNIX sockets always for tests on Windows

From
Andrew Dunstan
Date:
On 2023-01-04 We 07:13, vignesh C wrote:
> On Fri, 2 Dec 2022 at 18:08, Andrew Dunstan <andrew@dunslane.net> wrote:
>>
>> On 2022-12-01 Th 21:10, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2022-12-01 20:56:18 -0500, Tom Lane wrote:
>>>> Andres Freund <andres@anarazel.de> writes:
>>>>> On 2022-12-01 20:30:36 -0500, Tom Lane wrote:
>>>>>> If we remove that, won't we have a whole lot of code that's not
>>>>>> tested at all on any platform, ie all the TCP-socket code?
>>>>> There's some coverage via the auth and ssl tests. But I agree it's an
>>>>> issue. But to me the fix for that seems to be to add a dedicated test for
>>>>> that, rather than relying on windows to test our socket code - that's quite a
>>>>> few separate code paths from the tcp support of other platforms.
>>>> IMO that's not the best way forward, because you'll always have
>>>> nagging questions about whether a single-purpose test covers
>>>> everything that needs coverage.
>>> Still seems better than not having any coverage in our development
>>> environments...
>>>
>>>
>>>> I think the best place to be in would be to be able to run the whole test
>>>> suite using either TCP or UNIX sockets, on any platform (with stuff like the
>>>> SSL test overriding the choice as needed).
>>> I agree that that's useful. But it seems somewhat independent from the
>>> majority of the proposed changes. To be able to test force-tcp-everywhere we
>>> don't need e.g.  code for setting sspi auth in pg_regress etc - it's afaik
>>> just needed so there's a secure way of running tests at all on windows.
>>>
>>> I think 0003 should be "trimmed" to only change the default for
>>> $use_unix_sockets on windows and to remove PG_TEST_USE_UNIX_SOCKETS. Whoever
>>> wants to, can then add a new environment variable to force tap tests to use
>>> tcp.
>>>
>> Not sure if it's useful here, but a few months ago I prepared patches to
>> remove the config-auth option of pg_regress, which struck me as more
>> than odd, and replace it with a perl module. I didn't get around to
>> finishing them, but the patches as of then are attached.
>>
>> I agree that having some switch that says "run everything with TCP" or
>> "run (almost) everything with Unix sockets" would be good.
> The patch does not apply on top of HEAD as in [1], please post a rebased patch:
> === Applying patches on top of PostgreSQL commit ID
> bf03cfd162176d543da79f9398131abc251ddbb9 ===
> === applying patch
> ./0001-Do-config_auth-in-perl-code-for-TAP-tests-and-vcregr.patch
> patching file contrib/basebackup_to_shell/t/001_basic.pl
> Hunk #1 FAILED at 21.
> 1 out of 1 hunk FAILED -- saving rejects to file
> contrib/basebackup_to_shell/t/001_basic.pl.rej
> patching file src/bin/pg_basebackup/t/010_pg_basebackup.pl
> Hunk #1 FAILED at 29.
> 1 out of 1 hunk FAILED -- saving rejects to file
> src/bin/pg_basebackup/t/010_pg_basebackup.pl.rej
> Hunk #3 FAILED at 461.
> 1 out of 3 hunks FAILED -- saving rejects to file
> src/test/perl/PostgreSQL/Test/Cluster.pm.rej
>
> [1] - http://cfbot.cputube.org/patch_41_4033.log
>

What I posted was not intended as a replacement for Thomas' patches, or
indeed meant as a CF item at all.

So really we're waiting on Thomas to post a response to Tom's and
Andres' comments upthread.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: Using AF_UNIX sockets always for tests on Windows

From
Juan José Santamaría Flecha
Date:
Hello,

On Fri, Dec 2, 2022 at 1:03 AM Thomas Munro <thomas.munro@gmail.com> wrote:

1.  Teach mkdtemp() to make a non-world-accessible directory.  This is
required to be able to make a socket that other processes can't
connect to, to match the paranoia level used on Unix.  This was
written just by reading documentation, because I am not a Windows
user, so I would be grateful for a second opinion and/or testing from
a Windows hacker, which would involve testing with two different
users.  The idea is that Windows' mkdir() is completely ignoring the
permissions (we can see in the mingw headers that it literally throws
away the mode argument), so we shouldn't use that, but native
CreateDirectory() when given a pointer to a SECURITY_ATTRIBUTES with
lpSecurityDesciptor set to NULL should only allow the current user to
access the object (directory).  Does this really work, and would it be
better to create some more explicit private-keep-out
SECURITY_ATTRIBUTE, and how would that look?

A directory created with a NULL SECURITY_ATTRIBUTES inherits the ACL from its parent directory [1]. In this case, its parent is the designated temporary location, which already should have a limited access.

You can create an explicit DACL for that directory, PFA a patch for so. This is just an example, not something that I'm proposing as a committable alternative.

I'm fairly sure that filesystem permissions must be enough to stop
another OS user from connecting, because it's clear from documentation
that AF_UNIX works on Windows by opening the file and reading some
magic "reparse" data from inside it, so if you can't see into the
containing directory, you can't do it.  This patch is the one the rest
are standing on, because the tests should match Unix in their level of
security.
 
Yes, this is correct.

Only the first patch is modified, but I'm including all of them so they go through the cfbot.


Regards,

Juan José Santamaría Flecha
Attachment

Re: Using AF_UNIX sockets always for tests on Windows

From
Thomas Munro
Date:
Thanks Tom, Andres, Juan José, Andrew for your feedback.  I agree that
it's a better "OS harmonisation" outcome if we can choose both ways on
both OSes.  I will leave the 0003 patch aside for now due to lack of
time, but here's a rebase of the first two patches.  Since this is
really just more cleanup-obsolete-stuff background work, I'm going to
move it to the next CF.

0001 -- Teach mkdtemp() to care about permissions on Windows.
Reviewed by Juan José, who confirmed my blind-coded understanding and
showed an alternative version but didn't suggest doing it that way.
It's a little confusing that NULL "attributes" means something
different from NULL "descriptor", so I figured I should highlight that
difference more clearly with some new comments.  I guess one question
is "why should we expect the calling process to have the default
access token?"

0002 -- Use AF_UNIX for pg_regress.  This one removes a couple of
hundred Windows-only lines that set up SSPI stuff, and some comments
about a signal handling hazard that -- as far as I can tell by reading
manuals -- was never really true.

0003 -- TAP testing adjustments needs some more work based on feedback
already given, not included in this revision.

Attachment

Re: Using AF_UNIX sockets always for tests on Windows

From
vignesh C
Date:
On Wed, 22 Mar 2023 at 09:59, Thomas Munro <thomas.munro@gmail.com> wrote:
>
> Thanks Tom, Andres, Juan José, Andrew for your feedback.  I agree that
> it's a better "OS harmonisation" outcome if we can choose both ways on
> both OSes.  I will leave the 0003 patch aside for now due to lack of
> time, but here's a rebase of the first two patches.  Since this is
> really just more cleanup-obsolete-stuff background work, I'm going to
> move it to the next CF.
>
> 0001 -- Teach mkdtemp() to care about permissions on Windows.
> Reviewed by Juan José, who confirmed my blind-coded understanding and
> showed an alternative version but didn't suggest doing it that way.
> It's a little confusing that NULL "attributes" means something
> different from NULL "descriptor", so I figured I should highlight that
> difference more clearly with some new comments.  I guess one question
> is "why should we expect the calling process to have the default
> access token?"
>
> 0002 -- Use AF_UNIX for pg_regress.  This one removes a couple of
> hundred Windows-only lines that set up SSPI stuff, and some comments
> about a signal handling hazard that -- as far as I can tell by reading
> manuals -- was never really true.
>
> 0003 -- TAP testing adjustments needs some more work based on feedback
> already given, not included in this revision.

The patch does not apply anymore:
patch -p1 < v3-0002-Always-use-AF_UNIX-sockets-in-pg_regress-on-Windo.patch
patching file src/test/regress/pg_regress.c
...
Hunk #6 FAILED at 781.
...
1 out of 10 hunks FAILED -- saving rejects to file
src/test/regress/pg_regress.c.rej

Regards,
Vignesh



Re: Using AF_UNIX sockets always for tests on Windows

From
vignesh C
Date:
On Sun, 21 Jan 2024 at 18:01, vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, 22 Mar 2023 at 09:59, Thomas Munro <thomas.munro@gmail.com> wrote:
> >
> > Thanks Tom, Andres, Juan José, Andrew for your feedback.  I agree that
> > it's a better "OS harmonisation" outcome if we can choose both ways on
> > both OSes.  I will leave the 0003 patch aside for now due to lack of
> > time, but here's a rebase of the first two patches.  Since this is
> > really just more cleanup-obsolete-stuff background work, I'm going to
> > move it to the next CF.
> >
> > 0001 -- Teach mkdtemp() to care about permissions on Windows.
> > Reviewed by Juan José, who confirmed my blind-coded understanding and
> > showed an alternative version but didn't suggest doing it that way.
> > It's a little confusing that NULL "attributes" means something
> > different from NULL "descriptor", so I figured I should highlight that
> > difference more clearly with some new comments.  I guess one question
> > is "why should we expect the calling process to have the default
> > access token?"
> >
> > 0002 -- Use AF_UNIX for pg_regress.  This one removes a couple of
> > hundred Windows-only lines that set up SSPI stuff, and some comments
> > about a signal handling hazard that -- as far as I can tell by reading
> > manuals -- was never really true.
> >
> > 0003 -- TAP testing adjustments needs some more work based on feedback
> > already given, not included in this revision.
>
> The patch does not apply anymore:
> patch -p1 < v3-0002-Always-use-AF_UNIX-sockets-in-pg_regress-on-Windo.patch
> patching file src/test/regress/pg_regress.c
> ...
> Hunk #6 FAILED at 781.
> ...
> 1 out of 10 hunks FAILED -- saving rejects to file
> src/test/regress/pg_regress.c.rej

With no update to the thread and the patch not applying I'm marking
this as returned with feedback. Please feel free to resubmit to the
next CF when there is a new version of the patch.

Regards,
Vignesh