Re: Why doesn't src/backend/port/win32/socket.c implement bind()? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Why doesn't src/backend/port/win32/socket.c implement bind()?
Date
Msg-id 10970.1461249983@sss.pgh.pa.us
Whole thread Raw
In response to Re: Why doesn't src/backend/port/win32/socket.c implement bind()?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Why doesn't src/backend/port/win32/socket.c implement bind()?  (Michael Paquier <michael.paquier@gmail.com>)
Re: Why doesn't src/backend/port/win32/socket.c implement bind()?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> And this gives the patch attached, just took the time to hack it.

> I think this is a good idea, but (1) I'm inclined not to restrict it to
> Windows, and (2) I think we should hold off applying it until we've seen
> a failure or two more, and can confirm whether d1b7d4877 does anything
> useful for the error messages.

OK, we now have failures from both bowerbird and jacana with the error
reporting patch applied:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2016-04-21%2012%3A03%3A02
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2016-04-19%2021%3A00%3A39

and they both boil down to this:

pg_ctl: could not start server
Examine the log output.
# pg_ctl failed; logfile:
LOG:  could not bind IPv4 socket: Permission denied
HINT:  Is another postmaster already running on port 60200? If not, wait a few seconds and retry.
WARNING:  could not create listen socket for "127.0.0.1"
FATAL:  could not create any TCP/IP sockets
LOG:  database system is shut down

So "permission denied" is certainly more useful than "no error", which
makes me feel that d1b7d4877+22989a8e3 are doing what they intended to
and should get back-patched --- any objections?

However, it's still not entirely clear what is the root cause of the
failure and whether a patch along the discussed lines would prevent its
recurrence.  Looking at TranslateSocketError, it seems we must be seeing
an underlying error code of WSAEACCES.  A little googling says that
Windows might indeed return that, rather than the more expected
WSAEADDRINUSE, if someone else has the port open with SO_EXCLUSIVEADDRUSE:
Another possible reason for the WSAEACCES error is that when thebind function is called (on Windows NT 4.0 with SP4 and
later),anotherapplication, service, or kernel mode driver is bound tothe same address with exclusive access. Such
exclusiveaccess is anew feature of Windows NT 4.0 with SP4 and later, and isimplemented by using the
SO_EXCLUSIVEADDRUSEoption.
 

So theory A is that some other program is binding random high port numbers
with SO_EXCLUSIVEADDRUSE.  Theory B is that this is the handiwork of
Windows antivirus software doing what Windows antivirus software typically
does, ie inject random permissions failures depending on the phase of the
moon.  It's not very clear that a test along the lines described (that is,
attempt to connect to, not bind to, the target port) would pre-detect
either type of error.  Under theory A, a connect() test would recognize
the problem only if the other program were using the port to listen rather
than make an outbound connection; and the latter seems much more likely.
(Possibly we could detect the latter case by checking the error code
returned by connect(), but Michael's proposed patch does no such thing.)
Under theory B, we're pretty much screwed, we don't know what will happen.

I wonder what Andrew can tell us about what else is running on that
machine and whether either theory has any credibility.

BTW, if Windows *had* returned WSAEADDRINUSE, TranslateSocketError would
have failed to translate it --- surely that's an oversight?
        regards, tom lane



pgsql-hackers by date:

Previous
From: Andreas Karlsson
Date:
Subject: Re: Wire protocol compression
Next
From: Robert Haas
Date:
Subject: Re: Optimization for updating foreign tables in Postgres FDW