Re: Parallel pg_dump's error reporting doesn't work worth squat - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: Parallel pg_dump's error reporting doesn't work worth squat
Date
Msg-id 20160526.152559.266074794.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: Parallel pg_dump's error reporting doesn't work worth squat  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: Parallel pg_dump's error reporting doesn't work worth squat  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Parallel pg_dump's error reporting doesn't work worth squat  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
> Sounds reasonable. I look into this further.

I looked into that and found one problem in the patch.

> Next, I got the following behavior for the following command,
> then freeze. Maybe stopping at the same point with the next
> paragraph but I'm not sure. The same thing occurs this patch on
> top of the current master but doesn't on Linux.

This occurs in the following steps.

1. One of the workers dies from some reason.  (InitCompressorZlib immediately goes into exit_horribly in this case)

2. The main thread detects in ListenToWorkers that the worker is dead.

3. ListenToWorkers calls exit_horribly then exit_nicely

4. exit_nicely calls archive_close_connection as a callback then  the callback calls ShutdownWorkersHard

5. ShutdownWorkersHard should close the write side of the pipe  but the patch skips it for WIN32.

So, the attached patch on top the patch fixes that, that is,
pg_dump returns to command prompt even for the case.

By the way, the reason of the "invalid snapshot identifier" is
that some worker threads try to use it after the connection on
the first worker closed. Some of the workers succesfully did
before the connection closing and remained listening to their
master to inhibit termination of the entire process.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index f650d3f..6c08426 100644
--- a/src/bin/pg_dump/parallel.c
+++ b/src/bin/pg_dump/parallel.c
@@ -308,7 +308,6 @@ checkAborting(ArchiveHandle *AH)static voidShutdownWorkersHard(ParallelState *pstate){
-#ifndef WIN32    int            i;    /*
@@ -318,6 +317,7 @@ ShutdownWorkersHard(ParallelState *pstate)    for (i = 0; i < pstate->numWorkers; i++)
closesocket(pstate->parallelSlot[i].pipeWrite);
+#ifndef WIN32    for (i = 0; i < pstate->numWorkers; i++)        kill(pstate->parallelSlot[i].pid, SIGTERM);#else

pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: statistics for shared catalogs not updated when autovacuum is off
Next
From: Etsuro Fujita
Date:
Subject: Re: Optimization for updating foreign tables in Postgres FDW