Re: BUG #18168: Parallel worker failed to initialize: could not create inherited socket: error code 10106 - Mailing list pgsql-bugs

From Thomas Munro
Subject Re: BUG #18168: Parallel worker failed to initialize: could not create inherited socket: error code 10106
Date
Msg-id CA+hUKGJQrzNrXn1us_sYC9Djh9p7AQ1uPHWAjWhLzhX5YV-35w@mail.gmail.com
Whole thread Raw
In response to RE: BUG #18168: Parallel worker failed to initialize: could not create inherited socket: error code 10106  ("Boyer, Maxime (he/him | il/lui)" <Maxime.Boyer@cra-arc.gc.ca>)
List pgsql-bugs
On Thu, Oct 26, 2023 at 3:44 AM Boyer, Maxime (he/him | il/lui)
<Maxime.Boyer@cra-arc.gc.ca> wrote:
> > FWIW, the PG code that throws that error message is old enough to vote;
> > it's not something we changed in a recent minor release.
>
> Yeah, that's what I thought :'D
>
> > I am guessing you saw the impact of some external event, but I don't know what.
>
> Fair enough. This happened the day after reverting to 11, because of the memory error on 14, but I also doubt it's
related.I was stopping one of the application node at the time. Maybe a Windows thing, or something related to the
firmwareupdates. 

Re-bonjour Maxime,

FWIW that comes from WSASocket() trying to inherit/duplicate a socket
used for communication with the pgstat process (a process and a socket
that don't exist in PostgreSQL 15, where that mechanism was replaced
with a new shared memory system; but given you were trying to upgrade
to 14 you probably don't want to hear about 15 today...).

I have no idea why that would happen, but for the record the manual[1] says:

"WSAEPROVIDERFAILEDINIT
10106
Service provider failed to initialize. The requested service provider
could not be loaded or initialized. This error is returned if either a
service provider's DLL could not be loaded (LoadLibrary failed) or the
provider's WSPStartup or NSPStartup function failed."

That seems pretty low level.  If this were PostgreSQL's fault I
suppose it would have to come from corruption of the WSAPROTOCOL_INFO
struct (a sort of cookie we need to duplicate the socket), but I doubt
it.  I see there were a few reports years ago about this error message
from pre-parallel-query times.  It's interesting that you see this
specifically with parallel workers (which inherits only a pgstat
socket, not with the client connection socket.  The pgstat socket is
different in that it is a UDP socket.  I wonder if there is something
special about UDP that is upsetting your network stack, perhaps a
firewall thing somewhere that is upset specifically by some limit on
UDP activity or something.  But I'm not a Windows guy so I have no
real clue.

[1] https://learn.microsoft.com/en-us/windows/win32/winsock/windows-sockets-error-codes-2



pgsql-bugs by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: missing requirement on ccache in postgresql16-devel
Next
From: Amit Langote
Date:
Subject: Re: AW: AW: BUG #18147: ERROR: invalid perminfoindex 0 in RTE with relid xxxxx