PATCH: pg_restore parallel-execution-deadlock issue - Mailing list pgsql-hackers

From Armin Schöffmann
Subject PATCH: pg_restore parallel-execution-deadlock issue
Date
Msg-id zarafa.570306bd.3418.074bf1420d8f2ba2@root.aegaeon.de
Whole thread Raw
Responses Re: PATCH: pg_restore parallel-execution-deadlock issue  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
worthy hackers,
I propose the below patches to parallels.c and pg_backup_utils.c fixing deadlocks in pg_restore (windows only) if
runningmore than 2 parallel jobs. 
This problem was reported by me earlier this year.
http://www.postgresql.org/message-id/20160307161619.25731.78653@wrigleys.postgresql.org

- Winsock's "recv(...)" called in piperead() is a blocking read by default, therefor, signalizing termEvent as used in
ShutdownWorkersHard()is not enough to make worker-threads go away. 
We need a preceding shutdown(pipeWrite, SD_BOTH), first, to abort blocking IO in this case.
Otherwise, the main-thread will wait forever, if more than one additional worker is active (e.g. option -j3) and a
prematureEOF occurs in the input-file. 

Findings in pg_backup_utils.c/ parallels.c, which could impact other tools, too:

- threads created with _beginthreadex need to be exited by either a "return exitcode"  or "_endthreadex(exitcode)". It
mightbe obsolete in fire-and-forget-scenarios, but it matters in other cases. 
As of current, pg_backup_utils uses EndThread to retire additional worker-threads., which are spawned by _beginthreadex
inparallel.c. The corresponding call for ExitThread would be CreateThread, 
nevertheless, _beginthreadex is the correct choice here, as we do call-out into CRT and need to retain the
thread-handlefor after-death synchronization with the main-thread. 
The thread-handle needs to be closed explicitly.

If this is not the correct place to discuss patches, I'd be glad if somebody can notify the tool's maintainer, to take
alook into it. 

best regards,
Armin Schöffmann.


--
Aegaeon technologies GmbH
phone: +49.941.8107344
fax:   +49.941.8107356

Legal disclaimer:
http://aegaeon.de/disclaimer/email_all_int.txt


parallel.c

@@ -350,7 +350,8 @@ static void
 ShutdownWorkersHard(ParallelState *pstate)
 {
-#ifndef WIN32
+
     int            i;

+#ifndef WIN32
     signal(SIGPIPE, SIG_IGN);

@@ -367,4 +368,7 @@ ShutdownWorkersHard(ParallelState *pstate)
     /* The workers monitor this event via checkAborting(). */
     SetEvent(termEvent);
+
+    for (i = 0; i < pstate->numWorkers; i++)
+        shutdown(pstate->parallelSlot[i].pipeWrite, SD_BOTH);
 #endif

@@ -408,5 +412,8 @@ WaitForTerminatingWorkers(ParallelState *pstate)
         for (j = 0; j < pstate->numWorkers; j++)
             if (pstate->parallelSlot[j].hThread == hThread)
+            {
                 slot = &(pstate->parallelSlot[j]);
+                CloseHandle(hThread);
+            }

         free(lpHandles);



pg_backup_utils.c

@@ -120,5 +120,5 @@ exit_nicely(int code)
 #ifdef WIN32
     if (parallel_init_done && GetCurrentThreadId() != mainThreadId)
-        ExitThread(code);
+        _endthreadex(code);
 #endif






Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: show xl_prev in xlog.c errcontext
Next
From: Michael Paquier
Date:
Subject: Re: Yet another small patch - reorderbuffer.c:1099