Thread: BUG #18168: Parallel worker failed to initialize: could not create inherited socket: error code 10106
BUG #18168: Parallel worker failed to initialize: could not create inherited socket: error code 10106
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18168 Logged by: Maxime Boyer Email address: maxime.boyer@cra-arc.gc.ca PostgreSQL version: 11.17 Operating system: Windows Server 2019 Description: This error happened in PostgreSQL 11.17, slightly more than 12 hours after startup. I believe it's the first time this happens. New firmware were installed on the server, this happened after restarting the server. Since then, the database runs on a secondary server without issue. That server received the same updates. We haven't tried moving it back as this is our production database. Was this a fluke? Looking online, the error code seems to refer to Windows sockets, with most pages from the Windows XP era. Startup: Oct 17 @ 11:40:20 PM Shutdown: Oct 18 @ 11:43:15 PM 83 identical errors were logged in the system Event Viewer: could not create inherited socket: error code 10106 Normal log until then. The application was running normally. Server: Windows Server 2019 Standard 64-bit Microsoft Visual C++ 2015-2022 Redistributable (x64) - 14.36.32532 HP ProLiant BL460c Gen9 Intel Xeon CPU E5-2620 v4 @ 2.10 GHz (32 CPUs) 128 GB RAM Logs: 2023-10-18 11:43:15.658 EDT,,,8164,,652f53a2.1fe4,3,,2023-10-17 23:40:18 EDT,,0,LOG,00000,"background worker ""parallel worker"" (PID 9740) exited with exit code 1",,,,,,,,"LogChildExit, d:\pginstaller.auto\postgres.windows-x64\src\backend\postmaster\postmaster.c:3590","" 2023-10-18 11:43:15.680 EDT,,,8164,,652f53a2.1fe4,4,,2023-10-17 23:40:18 EDT,,0,LOG,00000,"background worker ""parallel worker"" (PID 9516) exited with exit code 1",,,,,,,,"LogChildExit, d:\pginstaller.auto\postgres.windows-x64\src\backend\postmaster\postmaster.c:3590","" 2023-10-18 11:43:16.129 EDT,"dbuser","database",808,"app_node_ip:53838",652ff955.328,3,"",2023-10-18 11:27:17 EDT,43/1242715,0,ERROR,55000,"parallel worker failed to initialize",,"More details may be available in the server log.",,,,"SELECT COUNT(I.ID) FROM public.jiraissue I WHERE (I.ARCHIVED = $1 ) OR (I.ARCHIVED IS NULL )",,"WaitForParallelWorkersToFinish, d:\pginstaller.auto\postgres.windows-x64\src\backend\access\transam\parallel.c:799","PostgreSQL JDBC Driver" 2023-10-18 11:43:21.476 EDT,,,8164,,652f53a2.1fe4,5,,2023-10-17 23:40:18 EDT,,0,LOG,00000,"received fast shutdown request",,,,,,,,"pmdie, d:\pginstaller.auto\postgres.windows-x64\src\backend\postmaster\postmaster.c:2722","" 2023-10-18 11:43:21.478 EDT,,,8164,,652f53a2.1fe4,6,,2023-10-17 23:40:18 EDT,,0,LOG,00000,"aborting any active transactions",,,,,,,,"pmdie, d:\pginstaller.auto\postgres.windows-x64\src\backend\postmaster\postmaster.c:2740","" Thank you, Maxime
RE: BUG #18168: Parallel worker failed to initialize: could not create inherited socket: error code 10106
From
"Boyer, Maxime (he/him | il/lui)"
Date:
Correction, shutdown time was Oct 18 @ 11:43:15 AM -----Original Message----- From: PG Bug reporting form <noreply@postgresql.org> Sent: October 24, 2023 3:41 PM To: pgsql-bugs@lists.postgresql.org Cc: Boyer, Maxime (he/him | il/lui) <Maxime.Boyer@cra-arc.gc.ca> Subject: BUG #18168: Parallel worker failed to initialize: could not create inherited socket: error code 10106 ***ATTENTION*** This email originated from outside of the Government of Canada. Do not click links or open attachments unless you recognizethe sender and believe the content is safe. For more information regarding reporting suspicious emails, please visitthe KnowHow webpage: Suspicious emails - how to report<http://druid/KnowHow/sec-suseml-e.asp> Ce courriel provient de l'extérieur du Gouvernement du Canada. Ne cliquez pas sur les liens et n’ouvrez pas les pièces jointes,à moins de connaître l'expéditeur et croire que le contenu est sécuritaire. Pour de plus amples renseignements surla façon de signaler les courriels suspects, veuillez consulter la page Web SavoirFaire: Courriels suspects – Commentsignaler<http://druid/savoirfaire/sec-suseml-f.asp> The following bug has been logged on the website: Bug reference: 18168 Logged by: Maxime Boyer Email address: maxime.boyer@cra-arc.gc.ca PostgreSQL version: 11.17 Operating system: Windows Server 2019 Description: This error happened in PostgreSQL 11.17, slightly more than 12 hours after startup. I believe it's the first time this happens. New firmware were installed on the server, this happened after restarting the server. Since then, the database runs on asecondary server without issue. That server received the same updates. We haven't tried moving it back as this is our productiondatabase. Was this a fluke? Looking online, the error code seems to refer to Windows sockets, with most pages from the Windows XP era. Startup: Oct 17 @ 11:40:20 PM Shutdown: Oct 18 @ 11:43:15 PM 83 identical errors were logged in the system Event Viewer: could not create inherited socket: error code 10106 Normal log until then. The application was running normally. Server: Windows Server 2019 Standard 64-bit Microsoft Visual C++ 2015-2022 Redistributable (x64) - 14.36.32532 HP ProLiant BL460c Gen9 Intel Xeon CPU E5-2620 v4 @ 2.10GHz (32 CPUs) 128 GB RAM Logs: 2023-10-18 11:43:15.658 EDT,,,8164,,652f53a2.1fe4,3,,2023-10-17 23:40:18 EDT,,0,LOG,00000,"background worker ""parallel worker""(PID 9740) exited with exit code 1",,,,,,,,"LogChildExit, d:\pginstaller.auto\postgres.windows-x64\src\backend\postmaster\postmaster.c:3590","" 2023-10-18 11:43:15.680 EDT,,,8164,,652f53a2.1fe4,4,,2023-10-17 23:40:18 EDT,,0,LOG,00000,"background worker ""parallel worker""(PID 9516) exited with exit code 1",,,,,,,,"LogChildExit, d:\pginstaller.auto\postgres.windows-x64\src\backend\postmaster\postmaster.c:3590","" 2023-10-18 11:43:16.129 EDT,"dbuser","database",808,"app_node_ip:53838",652ff955.328,3,"",2023-10-18 11:27:17 EDT,43/1242715,0,ERROR,55000,"parallel worker failed to initialize",,"More details may be available in the serverlog.",,,,"SELECT COUNT(I.ID) FROM public.jiraissue I WHERE (I.ARCHIVED = $1 ) OR (I.ARCHIVED IS NULL )",,"WaitForParallelWorkersToFinish, d:\pginstaller.auto\postgres.windows-x64\src\backend\access\transam\parallel.c:799","PostgreSQL JDBC Driver" 2023-10-18 11:43:21.476 EDT,,,8164,,652f53a2.1fe4,5,,2023-10-17 23:40:18 EDT,,0,LOG,00000,"received fast shutdown request",,,,,,,,"pmdie,d:\pginstaller.auto\postgres.windows-x64\src\backend\postmaster\postmaster.c:2722","" 2023-10-18 11:43:21.478 EDT,,,8164,,652f53a2.1fe4,6,,2023-10-17 23:40:18 EDT,,0,LOG,00000,"aborting any active transactions",,,,,,,,"pmdie,d:\pginstaller.auto\postgres.windows-x64\src\backend\postmaster\postmaster.c:2740","" Thank you, Maxime
Re: BUG #18168: Parallel worker failed to initialize: could not create inherited socket: error code 10106
From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes: > New firmware were installed on the server, this happened after restarting > the server. Since then, the database runs on a secondary server without > issue. That server received the same updates. We haven't tried moving it > back as this is our production database. Was this a fluke? > 83 identical errors were logged in the system Event Viewer: > could not create inherited socket: error code 10106 FWIW, the PG code that throws that error message is old enough to vote; it's not something we changed in a recent minor release. So I doubt the PG upgrade was the triggering factor. I am guessing you saw the impact of some external event, but I don't know what. regards, tom lane
RE: BUG #18168: Parallel worker failed to initialize: could not create inherited socket: error code 10106
From
"Boyer, Maxime (he/him | il/lui)"
Date:
> FWIW, the PG code that throws that error message is old enough to vote; > it's not something we changed in a recent minor release. Yeah, that's what I thought :'D > I am guessing you saw the impact of some external event, but I don't know what. Fair enough. This happened the day after reverting to 11, because of the memory error on 14, but I also doubt it's related.I was stopping one of the application node at the time. Maybe a Windows thing, or something related to the firmwareupdates. We can leave it there for now then. We might try to fallback to see if it happens again. Thanks, Maxime -----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: October 25, 2023 10:21 AM To: Boyer, Maxime (he/him | il/lui) <Maxime.Boyer@cra-arc.gc.ca> Cc: pgsql-bugs@lists.postgresql.org Subject: Re: BUG #18168: Parallel worker failed to initialize: could not create inherited socket: error code 10106 PG Bug reporting form <noreply@postgresql.org> writes: > New firmware were installed on the server, this happened after > restarting the server. Since then, the database runs on a secondary > server without issue. That server received the same updates. We > haven't tried moving it back as this is our production database. Was this a fluke? > 83 identical errors were logged in the system Event Viewer: > could not create inherited socket: error code 10106 FWIW, the PG code that throws that error message is old enough to vote; it's not something we changed in a recent minor release. So I doubt the PG upgrade was the triggering factor. I am guessing you saw the impact of some external event, but I don't know what. regards, tom lane
Re: BUG #18168: Parallel worker failed to initialize: could not create inherited socket: error code 10106
From
Thomas Munro
Date:
On Thu, Oct 26, 2023 at 3:44 AM Boyer, Maxime (he/him | il/lui) <Maxime.Boyer@cra-arc.gc.ca> wrote: > > FWIW, the PG code that throws that error message is old enough to vote; > > it's not something we changed in a recent minor release. > > Yeah, that's what I thought :'D > > > I am guessing you saw the impact of some external event, but I don't know what. > > Fair enough. This happened the day after reverting to 11, because of the memory error on 14, but I also doubt it's related.I was stopping one of the application node at the time. Maybe a Windows thing, or something related to the firmwareupdates. Re-bonjour Maxime, FWIW that comes from WSASocket() trying to inherit/duplicate a socket used for communication with the pgstat process (a process and a socket that don't exist in PostgreSQL 15, where that mechanism was replaced with a new shared memory system; but given you were trying to upgrade to 14 you probably don't want to hear about 15 today...). I have no idea why that would happen, but for the record the manual[1] says: "WSAEPROVIDERFAILEDINIT 10106 Service provider failed to initialize. The requested service provider could not be loaded or initialized. This error is returned if either a service provider's DLL could not be loaded (LoadLibrary failed) or the provider's WSPStartup or NSPStartup function failed." That seems pretty low level. If this were PostgreSQL's fault I suppose it would have to come from corruption of the WSAPROTOCOL_INFO struct (a sort of cookie we need to duplicate the socket), but I doubt it. I see there were a few reports years ago about this error message from pre-parallel-query times. It's interesting that you see this specifically with parallel workers (which inherits only a pgstat socket, not with the client connection socket. The pgstat socket is different in that it is a UDP socket. I wonder if there is something special about UDP that is upsetting your network stack, perhaps a firewall thing somewhere that is upset specifically by some limit on UDP activity or something. But I'm not a Windows guy so I have no real clue. [1] https://learn.microsoft.com/en-us/windows/win32/winsock/windows-sockets-error-codes-2