Thread: BUG #18168: Parallel worker failed to initialize: could not create inherited socket: error code 10106

The following bug has been logged on the website:

Bug reference:      18168
Logged by:          Maxime Boyer
Email address:      maxime.boyer@cra-arc.gc.ca
PostgreSQL version: 11.17
Operating system:   Windows Server 2019
Description:

This error happened in PostgreSQL 11.17, slightly more than 12 hours after
startup. I believe it's the first time this happens.

New firmware were installed on the server, this happened after restarting
the server. Since then, the database runs on a secondary server without
issue. That server received the same updates. We haven't tried moving it
back as this is our production database. Was this a fluke?

Looking online, the error code seems to refer to Windows sockets, with most
pages from the Windows XP era.

Startup: Oct 17 @ 11:40:20 PM
Shutdown: Oct 18 @ 11:43:15 PM

83 identical errors were logged in the system Event Viewer:
could not create inherited socket: error code 10106

Normal log until then. The application was running normally.

Server:
Windows Server 2019 Standard 64-bit
Microsoft Visual C++ 2015-2022 Redistributable (x64) - 14.36.32532
HP ProLiant BL460c Gen9
Intel Xeon CPU E5-2620 v4 @ 2.10 GHz (32 CPUs)
128 GB RAM

Logs:
2023-10-18 11:43:15.658 EDT,,,8164,,652f53a2.1fe4,3,,2023-10-17 23:40:18
EDT,,0,LOG,00000,"background worker ""parallel worker"" (PID 9740) exited
with exit code 1",,,,,,,,"LogChildExit,
d:\pginstaller.auto\postgres.windows-x64\src\backend\postmaster\postmaster.c:3590",""
2023-10-18 11:43:15.680 EDT,,,8164,,652f53a2.1fe4,4,,2023-10-17 23:40:18
EDT,,0,LOG,00000,"background worker ""parallel worker"" (PID 9516) exited
with exit code 1",,,,,,,,"LogChildExit,
d:\pginstaller.auto\postgres.windows-x64\src\backend\postmaster\postmaster.c:3590",""
2023-10-18 11:43:16.129
EDT,"dbuser","database",808,"app_node_ip:53838",652ff955.328,3,"",2023-10-18
11:27:17 EDT,43/1242715,0,ERROR,55000,"parallel worker failed to
initialize",,"More details may be available in the server log.",,,,"SELECT
COUNT(I.ID) FROM public.jiraissue I WHERE (I.ARCHIVED =  $1 ) OR (I.ARCHIVED
IS NULL )",,"WaitForParallelWorkersToFinish,
d:\pginstaller.auto\postgres.windows-x64\src\backend\access\transam\parallel.c:799","PostgreSQL
JDBC Driver"
2023-10-18 11:43:21.476 EDT,,,8164,,652f53a2.1fe4,5,,2023-10-17 23:40:18
EDT,,0,LOG,00000,"received fast shutdown request",,,,,,,,"pmdie,
d:\pginstaller.auto\postgres.windows-x64\src\backend\postmaster\postmaster.c:2722",""
2023-10-18 11:43:21.478 EDT,,,8164,,652f53a2.1fe4,6,,2023-10-17 23:40:18
EDT,,0,LOG,00000,"aborting any active transactions",,,,,,,,"pmdie,
d:\pginstaller.auto\postgres.windows-x64\src\backend\postmaster\postmaster.c:2740",""

Thank you,
Maxime


RE: BUG #18168: Parallel worker failed to initialize: could not create inherited socket: error code 10106

From
"Boyer, Maxime (he/him | il/lui)"
Date:
Correction, shutdown time was Oct 18 @ 11:43:15 AM

-----Original Message-----
From: PG Bug reporting form <noreply@postgresql.org> 
Sent: October 24, 2023 3:41 PM
To: pgsql-bugs@lists.postgresql.org
Cc: Boyer, Maxime (he/him | il/lui) <Maxime.Boyer@cra-arc.gc.ca>
Subject: BUG #18168: Parallel worker failed to initialize: could not create inherited socket: error code 10106

***ATTENTION***

This email originated from outside of the Government of Canada. Do not click links or open attachments unless you
recognizethe sender and believe the content is safe. For more information regarding reporting suspicious emails, please
visitthe KnowHow webpage: Suspicious emails - how to report<http://druid/KnowHow/sec-suseml-e.asp>
 

Ce courriel provient de l'extérieur du Gouvernement du Canada. Ne cliquez pas sur les liens et n’ouvrez pas les pièces
jointes,à moins de connaître l'expéditeur et croire que le contenu est sécuritaire. Pour de plus amples renseignements
surla façon de signaler les courriels suspects, veuillez consulter la page Web SavoirFaire: Courriels suspects –
Commentsignaler<http://druid/savoirfaire/sec-suseml-f.asp>
 



The following bug has been logged on the website:

Bug reference:      18168
Logged by:          Maxime Boyer
Email address:      maxime.boyer@cra-arc.gc.ca
PostgreSQL version: 11.17
Operating system:   Windows Server 2019
Description:

This error happened in PostgreSQL 11.17, slightly more than 12 hours after startup. I believe it's the first time this
happens.

New firmware were installed on the server, this happened after restarting the server. Since then, the database runs on
asecondary server without issue. That server received the same updates. We haven't tried moving it back as this is our
productiondatabase. Was this a fluke?
 

Looking online, the error code seems to refer to Windows sockets, with most pages from the Windows XP era.

Startup: Oct 17 @ 11:40:20 PM
Shutdown: Oct 18 @ 11:43:15 PM

83 identical errors were logged in the system Event Viewer:
could not create inherited socket: error code 10106

Normal log until then. The application was running normally.

Server:
Windows Server 2019 Standard 64-bit
Microsoft Visual C++ 2015-2022 Redistributable (x64) - 14.36.32532 HP ProLiant BL460c Gen9 Intel Xeon CPU E5-2620 v4 @
2.10GHz (32 CPUs)
 
128 GB RAM

Logs:
2023-10-18 11:43:15.658 EDT,,,8164,,652f53a2.1fe4,3,,2023-10-17 23:40:18 EDT,,0,LOG,00000,"background worker ""parallel
worker""(PID 9740) exited with exit code 1",,,,,,,,"LogChildExit,
d:\pginstaller.auto\postgres.windows-x64\src\backend\postmaster\postmaster.c:3590",""
2023-10-18 11:43:15.680 EDT,,,8164,,652f53a2.1fe4,4,,2023-10-17 23:40:18 EDT,,0,LOG,00000,"background worker ""parallel
worker""(PID 9516) exited with exit code 1",,,,,,,,"LogChildExit,
d:\pginstaller.auto\postgres.windows-x64\src\backend\postmaster\postmaster.c:3590",""
2023-10-18 11:43:16.129
EDT,"dbuser","database",808,"app_node_ip:53838",652ff955.328,3,"",2023-10-18
11:27:17 EDT,43/1242715,0,ERROR,55000,"parallel worker failed to initialize",,"More details may be available in the
serverlog.",,,,"SELECT
 
COUNT(I.ID) FROM public.jiraissue I WHERE (I.ARCHIVED =  $1 ) OR (I.ARCHIVED IS NULL
)",,"WaitForParallelWorkersToFinish,
d:\pginstaller.auto\postgres.windows-x64\src\backend\access\transam\parallel.c:799","PostgreSQL
JDBC Driver"
2023-10-18 11:43:21.476 EDT,,,8164,,652f53a2.1fe4,5,,2023-10-17 23:40:18 EDT,,0,LOG,00000,"received fast shutdown
request",,,,,,,,"pmdie,d:\pginstaller.auto\postgres.windows-x64\src\backend\postmaster\postmaster.c:2722",""
 
2023-10-18 11:43:21.478 EDT,,,8164,,652f53a2.1fe4,6,,2023-10-17 23:40:18 EDT,,0,LOG,00000,"aborting any active
transactions",,,,,,,,"pmdie,d:\pginstaller.auto\postgres.windows-x64\src\backend\postmaster\postmaster.c:2740",""
 

Thank you,
Maxime


PG Bug reporting form <noreply@postgresql.org> writes:
> New firmware were installed on the server, this happened after restarting
> the server. Since then, the database runs on a secondary server without
> issue. That server received the same updates. We haven't tried moving it
> back as this is our production database. Was this a fluke?

> 83 identical errors were logged in the system Event Viewer:
> could not create inherited socket: error code 10106

FWIW, the PG code that throws that error message is old enough
to vote; it's not something we changed in a recent minor release.
So I doubt the PG upgrade was the triggering factor.

I am guessing you saw the impact of some external event,
but I don't know what.

            regards, tom lane



RE: BUG #18168: Parallel worker failed to initialize: could not create inherited socket: error code 10106

From
"Boyer, Maxime (he/him | il/lui)"
Date:
> FWIW, the PG code that throws that error message is old enough to vote;
> it's not something we changed in a recent minor release.

Yeah, that's what I thought :'D

> I am guessing you saw the impact of some external event, but I don't know what.

Fair enough. This happened the day after reverting to 11, because of the memory error on 14, but I also doubt it's
related.I was stopping one of the application node at the time. Maybe a Windows thing, or something related to the
firmwareupdates.
 

We can leave it there for now then. We might try to fallback to see if it happens again.

Thanks,
Maxime

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us> 
Sent: October 25, 2023 10:21 AM
To: Boyer, Maxime (he/him | il/lui) <Maxime.Boyer@cra-arc.gc.ca>
Cc: pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #18168: Parallel worker failed to initialize: could not create inherited socket: error code 10106

PG Bug reporting form <noreply@postgresql.org> writes:
> New firmware were installed on the server, this happened after 
> restarting the server. Since then, the database runs on a secondary 
> server without issue. That server received the same updates. We 
> haven't tried moving it back as this is our production database. Was this a fluke?

> 83 identical errors were logged in the system Event Viewer:
> could not create inherited socket: error code 10106

FWIW, the PG code that throws that error message is old enough to vote; it's not something we changed in a recent minor
release.
So I doubt the PG upgrade was the triggering factor.

I am guessing you saw the impact of some external event, but I don't know what.

                        regards, tom lane

On Thu, Oct 26, 2023 at 3:44 AM Boyer, Maxime (he/him | il/lui)
<Maxime.Boyer@cra-arc.gc.ca> wrote:
> > FWIW, the PG code that throws that error message is old enough to vote;
> > it's not something we changed in a recent minor release.
>
> Yeah, that's what I thought :'D
>
> > I am guessing you saw the impact of some external event, but I don't know what.
>
> Fair enough. This happened the day after reverting to 11, because of the memory error on 14, but I also doubt it's
related.I was stopping one of the application node at the time. Maybe a Windows thing, or something related to the
firmwareupdates. 

Re-bonjour Maxime,

FWIW that comes from WSASocket() trying to inherit/duplicate a socket
used for communication with the pgstat process (a process and a socket
that don't exist in PostgreSQL 15, where that mechanism was replaced
with a new shared memory system; but given you were trying to upgrade
to 14 you probably don't want to hear about 15 today...).

I have no idea why that would happen, but for the record the manual[1] says:

"WSAEPROVIDERFAILEDINIT
10106
Service provider failed to initialize. The requested service provider
could not be loaded or initialized. This error is returned if either a
service provider's DLL could not be loaded (LoadLibrary failed) or the
provider's WSPStartup or NSPStartup function failed."

That seems pretty low level.  If this were PostgreSQL's fault I
suppose it would have to come from corruption of the WSAPROTOCOL_INFO
struct (a sort of cookie we need to duplicate the socket), but I doubt
it.  I see there were a few reports years ago about this error message
from pre-parallel-query times.  It's interesting that you see this
specifically with parallel workers (which inherits only a pgstat
socket, not with the client connection socket.  The pgstat socket is
different in that it is a UDP socket.  I wonder if there is something
special about UDP that is upsetting your network stack, perhaps a
firewall thing somewhere that is upset specifically by some limit on
UDP activity or something.  But I'm not a Windows guy so I have no
real clue.

[1] https://learn.microsoft.com/en-us/windows/win32/winsock/windows-sockets-error-codes-2