Thread: pgsql: Support Parallel Append plan nodes.
Support Parallel Append plan nodes. When we create an Append node, we can spread out the workers over the subplans instead of piling on to each subplan one at a time, which should typically be a bit more efficient, both because the startup cost of any plan executed entirely by one worker is paid only once and also because of reduced contention. We can also construct Append plans using a mix of partial and non-partial subplans, which may allow for parallelism in places that otherwise couldn't support it. Unfortunately, this patch doesn't handle the important case of parallelizing UNION ALL by running each branch in a separate worker; the executor infrastructure is added here, but more planner work is needed. Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and Rajkumar Raghuwanshi. Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/ab72716778128fb63d54ac256adf7fe6820a1185 Modified Files -------------- doc/src/sgml/config.sgml | 14 ++ doc/src/sgml/monitoring.sgml | 7 +- src/backend/executor/execParallel.c | 19 ++ src/backend/executor/nodeAppend.c | 331 +++++++++++++++++++++----- src/backend/nodes/copyfuncs.c | 1 + src/backend/nodes/list.c | 38 +++ src/backend/nodes/outfuncs.c | 1 + src/backend/nodes/readfuncs.c | 1 + src/backend/optimizer/path/allpaths.c | 216 ++++++++++++++--- src/backend/optimizer/path/costsize.c | 164 +++++++++++++ src/backend/optimizer/path/joinrels.c | 3 +- src/backend/optimizer/plan/createplan.c | 10 +- src/backend/optimizer/plan/planner.c | 5 +- src/backend/optimizer/prep/prepunion.c | 7 +- src/backend/optimizer/util/pathnode.c | 88 +++++-- src/backend/storage/lmgr/lwlock.c | 1 + src/backend/utils/misc/guc.c | 9 + src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/executor/nodeAppend.h | 5 + src/include/nodes/execnodes.h | 14 +- src/include/nodes/pg_list.h | 3 + src/include/nodes/plannodes.h | 1 + src/include/nodes/relation.h | 6 + src/include/optimizer/cost.h | 2 + src/include/optimizer/pathnode.h | 9 +- src/include/storage/lwlock.h | 1 + src/test/regress/expected/inherit.out | 2 + src/test/regress/expected/select_parallel.out | 91 ++++++- src/test/regress/expected/sysviews.out | 3 +- src/test/regress/sql/inherit.sql | 2 + src/test/regress/sql/select_parallel.sql | 33 ++- 31 files changed, 959 insertions(+), 129 deletions(-)
Robert Haas <rhaas@postgresql.org> writes: > Support Parallel Append plan nodes. Buildfarm member sidewinder doesn't like this: Program terminated with signal SIGABRT, Aborted. #0 0x00007f7ff5b0e01a in _lwp_kill () from /usr/lib/libc.so.12 #0 0x00007f7ff5b0e01a in _lwp_kill () from /usr/lib/libc.so.12 #1 0x00007f7ff5b0dca5 in abort () from /usr/lib/libc.so.12 #2 0x000000000082b5e3 in ExceptionalCondition (conditionName=conditionName@entry=0x9ad548 "!(append->first_partial_plan< node->as_nplans)", errorType=errorType@entry=0x87489a "FailedAssertion", fileName=fileName@entry=0x9ad4d6"nodeAppend.c", lineNumber=lineNumber@entry=509) at assert.c:54 #3 0x000000000060feb6 in choose_next_subplan_for_worker (node=0x7f7ff7357980) at nodeAppend.c:509 #4 0x000000000061007d in ExecAppend (pstate=0x7f7ff7357980) at nodeAppend.c:222 #5 0x0000000000610464 in ExecProcNode (node=0x7f7ff7357980) at ../../../src/include/executor/executor.h:241 #6 fetch_input_tuple (aggstate=aggstate@entry=0x7f7ff7357a98) at nodeAgg.c:699 #7 0x0000000000612531 in agg_retrieve_direct (aggstate=0x7f7ff7357a98) at nodeAgg.c:2456 #8 ExecAgg (pstate=0x7f7ff7357a98) at nodeAgg.c:2166 #9 0x0000000000603cbd in ExecProcNode (node=0x7f7ff7357a98) at ../../../src/include/executor/executor.h:241 #10 ExecutePlan (execute_once=<optimized out>, dest=0x7f7ff73556b8, direction=<optimized out>, numberTuples=0, sendTuples=1'\001', operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x7f7ff7357a98, estate=0x7f7ff7357040)at execMain.c:1718 #11 standard_ExecutorRun (queryDesc=0x7f7ff7369f58, direction=<optimized out>, count=0, execute_once=<optimized out>) atexecMain.c:361 #12 0x000000000060833b in ParallelQueryMain (seg=0x7f7ff7b65908, toc=0x7f7ff7fab000) at execParallel.c:1319 #13 0x00000000004f6e54 in ParallelWorkerMain (main_arg=<optimized out>) at parallel.c:1150 #14 0x00000000006ba07f in StartBackgroundWorker () at bgworker.c:841 #15 0x00000000006c51f9 in do_start_bgworker (rw=<optimized out>) at postmaster.c:5741 Append.first_partial_plan is sadly underdocumented, but considering that the loop above this Assert is prepared for the possibility that first_partial_plan >= node->as_nplans, I'm not sure why this code supposes that that can't happen. BTW, I find it confusing and bad style that some of the references in this function to node->as_pstate go through the local variable pstate but others do not. regards, tom lane
I wrote: > Robert Haas <rhaas@postgresql.org> writes: >> Support Parallel Append plan nodes. > Buildfarm member sidewinder doesn't like this: It transpires that my machine prairiedog also shows the failure. I instrumented nodeAppend.c (as per attached patch) to see what was going on, and I get this trace on prairiedog: 2017-12-05 22:50:30.228 EST [28812] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6 2017-12-05 22:50:30.228 EST [28812] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1; 2017-12-05 22:50:30.241 EST [28813] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6 2017-12-05 22:50:30.241 EST [28813] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1; 2017-12-05 22:50:30.252 EST [28814] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6 2017-12-05 22:50:30.252 EST [28814] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1; 2017-12-05 22:50:30.363 EST [28816] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan 0, fpp 2, nplans 6 2017-12-05 22:50:30.363 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; 2017-12-05 22:50:30.363 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 0, fpp 2, nplans 6 2017-12-05 22:50:30.363 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; 2017-12-05 22:50:30.363 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 0, nextplan 1, fpp 2, nplans 6 2017-12-05 22:50:30.363 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; 2017-12-05 22:50:30.364 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 1, fpp 2, nplans 6 2017-12-05 22:50:30.364 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; 2017-12-05 22:50:30.364 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 1, nextplan 2, fpp 2, nplans 6 2017-12-05 22:50:30.364 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; 2017-12-05 22:50:30.364 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 2, fpp 2, nplans 6 2017-12-05 22:50:30.364 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; 2017-12-05 22:50:30.364 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 2, nextplan 3, fpp 2, nplans 6 2017-12-05 22:50:30.364 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; 2017-12-05 22:50:30.364 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 3, fpp 2, nplans 6 2017-12-05 22:50:30.364 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; 2017-12-05 22:50:30.365 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 3, nextplan 4, fpp 2, nplans 6 2017-12-05 22:50:30.365 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; 2017-12-05 22:50:30.365 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 4, fpp 2, nplans 6 2017-12-05 22:50:30.365 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; 2017-12-05 22:50:30.365 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 4, nextplan 5, fpp 2, nplans 6 2017-12-05 22:50:30.365 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; 2017-12-05 22:50:30.365 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 5, fpp 2, nplans 6 2017-12-05 22:50:30.365 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; 2017-12-05 22:50:30.365 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 5, nextplan 2, fpp 2, nplans 6 2017-12-05 22:50:30.365 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; 2017-12-05 22:50:30.407 EST [28815] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6 2017-12-05 22:50:30.407 EST [28815] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; 2017-12-05 22:50:30.474 EST [28817] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6 2017-12-05 22:50:30.474 EST [28817] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; 2017-12-05 22:50:30.568 EST [28818] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan 0, fpp 6, nplans 6 2017-12-05 22:50:30.568 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; 2017-12-05 22:50:30.568 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 0, fpp 6, nplans 6 2017-12-05 22:50:30.568 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; 2017-12-05 22:50:30.569 EST [28818] LOG: choose_next_subplan_for_worker: whichplan 0, nextplan 1, fpp 6, nplans 6 2017-12-05 22:50:30.569 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; 2017-12-05 22:50:30.569 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 1, fpp 6, nplans 6 2017-12-05 22:50:30.569 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; 2017-12-05 22:50:30.569 EST [28818] LOG: choose_next_subplan_for_worker: whichplan 1, nextplan 2, fpp 6, nplans 6 2017-12-05 22:50:30.569 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; 2017-12-05 22:50:30.569 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 2, fpp 6, nplans 6 2017-12-05 22:50:30.569 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: whichplan 2, nextplan 3, fpp 6, nplans 6 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 3, fpp 6, nplans 6 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: whichplan 3, nextplan 4, fpp 6, nplans 6 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 4, fpp 6, nplans 6 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: whichplan 4, nextplan 5, fpp 6, nplans 6 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 5, fpp 6, nplans 6 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; TRAP: FailedAssertion("!(append->first_partial_plan < node->as_nplans)", File: "nodeAppend.c", Line: 520) 2017-12-05 22:50:38.211 EST [28158] LOG: background worker "parallel worker" (PID 28818) was terminated by signal 6: Aborttrap 2017-12-05 22:50:38.211 EST [28158] DETAIL: Failed process was running: select round(avg(aa)), sum(aa) from a_star a3; That makes it pretty clear what's going wrong, but why don't we see it elsewhere? On my development machine, the same patch yields 2017-12-05 22:42:24.187 EST [843] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6 2017-12-05 22:42:24.187 EST [843] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1; 2017-12-05 22:42:24.188 EST [844] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6 2017-12-05 22:42:24.188 EST [844] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1; 2017-12-05 22:42:24.188 EST [845] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6 2017-12-05 22:42:24.188 EST [845] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1; 2017-12-05 22:42:24.197 EST [846] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6 2017-12-05 22:42:24.197 EST [846] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; 2017-12-05 22:42:24.200 EST [847] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6 2017-12-05 22:42:24.200 EST [847] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; 2017-12-05 22:42:24.200 EST [848] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6 2017-12-05 22:42:24.200 EST [848] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; 2017-12-05 22:42:24.214 EST [849] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 6, nplans 6 2017-12-05 22:42:24.214 EST [849] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; 2017-12-05 22:42:24.214 EST [852] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 6, nplans 6 2017-12-05 22:42:24.214 EST [852] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; 2017-12-05 22:42:24.214 EST [853] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 6, nplans 6 2017-12-05 22:42:24.214 EST [853] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; I conclude that the reason we don't see the crash except on the slowest buildfarm critters is that on most machines the leader process manages to finish off all the subplans before any of the workers get as far as doing something. This doesn't give me a warm fuzzy feeling about how much testing this patch has gotten. regards, tom lane diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c index 246a0b2..4c4355b 100644 --- a/src/backend/executor/nodeAppend.c +++ b/src/backend/executor/nodeAppend.c @@ -463,6 +463,12 @@ choose_next_subplan_for_worker(AppendState *node) LWLockAcquire(&pstate->pa_lock, LW_EXCLUSIVE); + elog(LOG, "choose_next_subplan_for_worker: whichplan %d, nextplan %d, fpp %d, nplans %d", + node->as_whichplan, + pstate->pa_next_plan, + append->first_partial_plan, + node->as_nplans); + /* Mark just-completed subplan as finished. */ if (node->as_whichplan != INVALID_SUBPLAN_INDEX) node->as_pstate->pa_finished[node->as_whichplan] = true; @@ -502,6 +508,11 @@ choose_next_subplan_for_worker(AppendState *node) } } + elog(LOG, "choose_next_subplan_for_worker: picking nextplan %d, fpp %d, nplans %d", + pstate->pa_next_plan, + append->first_partial_plan, + node->as_nplans); + /* Pick the plan we found, and advance pa_next_plan one more time. */ node->as_whichplan = pstate->pa_next_plan++; if (pstate->pa_next_plan >= node->as_nplans)
Hi, On 2017-12-05 23:01:59 -0500, Tom Lane wrote: > I conclude that the reason we don't see the crash except on the slowest > buildfarm critters is that on most machines the leader process manages > to finish off all the subplans before any of the workers get as far as > doing something. This doesn't give me a warm fuzzy feeling about how > much testing this patch has gotten. If that's the case, it should be reproducible on fast machines if one sets parallel_leader_participation = off during an installcheck. Which indeed crashes here, albeit on a heavily modified tree: #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x00007f4c05afb19a in __GI_abort () at abort.c:89 #2 0x000055d89651e83a in ExceptionalCondition (conditionName=0x55d8966edef8 "!(append->first_partial_plan < node->as_nplans)", errorType=0x55d8966edd86 "FailedAssertion", fileName=0x55d8966eddd8 "/home/andres/src/postgresql/src/backend/executor/nodeAppend.c", lineNumber=503) at /home/andres/src/postgresql/src/backend/utils/error/assert.c:54 #3 0x000055d8961eab63 in choose_next_subplan_for_worker (node=0x55d8974535a0) at /home/andres/src/postgresql/src/backend/executor/nodeAppend.c:503 #4 0x000055d8961ea47b in ExecAppend (pstate=0x55d8974535a0) at /home/andres/src/postgresql/src/backend/executor/nodeAppend.c:216 #5 0x000055d8961d6c55 in ExecProcNode (node=0x55d8974535a0) at /home/andres/src/postgresql/src/include/executor/executor.h:239 Greetings, Andres Freund
Thanks Tom for the crash analysis, I think this is the same reason that Rajkumar's reported case[1] was crashing only with partition-wise-join = on. I tried to fix this crash[2] but missed the place where I have added assert check in the assumption that we always coming from the previous check in the while loop. Instead, assert check we need a similar bailout logic[2] before looping back to first partial plan. Attached patch does the same, I've tested with parallel_leader_participation = off setting as suggested by Andres, make check looks good except there is some obvious regression diff. 1] https://postgr.es/m/CAKcux6m+6nTO6C08kKaj-Waffvgvp-9SD3RCGStX=Mzk0gQU8g@mail.gmail.com 2] https://postgr.es/m/CAAJ_b975k58H+Ed4=p0vbJunwO2reOMX5CVB8_R=JmXxY3uW=Q@mail.gmail.com Regards, Amul On Wed, Dec 6, 2017 at 9:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> Robert Haas <rhaas@postgresql.org> writes: >>> Support Parallel Append plan nodes. > >> Buildfarm member sidewinder doesn't like this: > > It transpires that my machine prairiedog also shows the failure. > I instrumented nodeAppend.c (as per attached patch) to see what > was going on, and I get this trace on prairiedog: > > 2017-12-05 22:50:30.228 EST [28812] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6 > 2017-12-05 22:50:30.228 EST [28812] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1; > 2017-12-05 22:50:30.241 EST [28813] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6 > 2017-12-05 22:50:30.241 EST [28813] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1; > 2017-12-05 22:50:30.252 EST [28814] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6 > 2017-12-05 22:50:30.252 EST [28814] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1; > 2017-12-05 22:50:30.363 EST [28816] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan 0, fpp 2, nplans 6 > 2017-12-05 22:50:30.363 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.363 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 0, fpp 2, nplans 6 > 2017-12-05 22:50:30.363 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.363 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 0, nextplan 1, fpp 2, nplans 6 > 2017-12-05 22:50:30.363 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.364 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 1, fpp 2, nplans 6 > 2017-12-05 22:50:30.364 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.364 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 1, nextplan 2, fpp 2, nplans 6 > 2017-12-05 22:50:30.364 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.364 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 2, fpp 2, nplans 6 > 2017-12-05 22:50:30.364 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.364 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 2, nextplan 3, fpp 2, nplans 6 > 2017-12-05 22:50:30.364 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.364 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 3, fpp 2, nplans 6 > 2017-12-05 22:50:30.364 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.365 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 3, nextplan 4, fpp 2, nplans 6 > 2017-12-05 22:50:30.365 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.365 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 4, fpp 2, nplans 6 > 2017-12-05 22:50:30.365 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.365 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 4, nextplan 5, fpp 2, nplans 6 > 2017-12-05 22:50:30.365 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.365 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 5, fpp 2, nplans 6 > 2017-12-05 22:50:30.365 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.365 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 5, nextplan 2, fpp 2, nplans 6 > 2017-12-05 22:50:30.365 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.407 EST [28815] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6 > 2017-12-05 22:50:30.407 EST [28815] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.474 EST [28817] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6 > 2017-12-05 22:50:30.474 EST [28817] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.568 EST [28818] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan 0, fpp 6, nplans 6 > 2017-12-05 22:50:30.568 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > 2017-12-05 22:50:30.568 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 0, fpp 6, nplans 6 > 2017-12-05 22:50:30.568 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > 2017-12-05 22:50:30.569 EST [28818] LOG: choose_next_subplan_for_worker: whichplan 0, nextplan 1, fpp 6, nplans 6 > 2017-12-05 22:50:30.569 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > 2017-12-05 22:50:30.569 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 1, fpp 6, nplans 6 > 2017-12-05 22:50:30.569 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > 2017-12-05 22:50:30.569 EST [28818] LOG: choose_next_subplan_for_worker: whichplan 1, nextplan 2, fpp 6, nplans 6 > 2017-12-05 22:50:30.569 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > 2017-12-05 22:50:30.569 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 2, fpp 6, nplans 6 > 2017-12-05 22:50:30.569 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: whichplan 2, nextplan 3, fpp 6, nplans 6 > 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 3, fpp 6, nplans 6 > 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: whichplan 3, nextplan 4, fpp 6, nplans 6 > 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 4, fpp 6, nplans 6 > 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: whichplan 4, nextplan 5, fpp 6, nplans 6 > 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 5, fpp 6, nplans 6 > 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > TRAP: FailedAssertion("!(append->first_partial_plan < node->as_nplans)", File: "nodeAppend.c", Line: 520) > 2017-12-05 22:50:38.211 EST [28158] LOG: background worker "parallel worker" (PID 28818) was terminated by signal 6: Aborttrap > 2017-12-05 22:50:38.211 EST [28158] DETAIL: Failed process was running: select round(avg(aa)), sum(aa) from a_star a3; > > That makes it pretty clear what's going wrong, but why don't we see it > elsewhere? > > On my development machine, the same patch yields > > 2017-12-05 22:42:24.187 EST [843] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6 > 2017-12-05 22:42:24.187 EST [843] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1; > 2017-12-05 22:42:24.188 EST [844] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6 > 2017-12-05 22:42:24.188 EST [844] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1; > 2017-12-05 22:42:24.188 EST [845] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6 > 2017-12-05 22:42:24.188 EST [845] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1; > 2017-12-05 22:42:24.197 EST [846] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6 > 2017-12-05 22:42:24.197 EST [846] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:42:24.200 EST [847] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6 > 2017-12-05 22:42:24.200 EST [847] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:42:24.200 EST [848] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6 > 2017-12-05 22:42:24.200 EST [848] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:42:24.214 EST [849] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 6, nplans 6 > 2017-12-05 22:42:24.214 EST [849] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > 2017-12-05 22:42:24.214 EST [852] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 6, nplans 6 > 2017-12-05 22:42:24.214 EST [852] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > 2017-12-05 22:42:24.214 EST [853] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 6, nplans 6 > 2017-12-05 22:42:24.214 EST [853] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > > I conclude that the reason we don't see the crash except on the slowest > buildfarm critters is that on most machines the leader process manages > to finish off all the subplans before any of the workers get as far as > doing something. This doesn't give me a warm fuzzy feeling about how > much testing this patch has gotten. > > regards, tom lane >
Attachment
On 6 November 2017 16:50, amul sul <sulamul@gmail.com> wrote: > Thanks Tom for the crash analysis, I think this is the same reason that > Rajkumar's reported case[1] was crashing only with partition-wise-join = on. > I tried to fix this crash[2] but missed the place where I have added assert > check in the assumption that we always coming from the previous check in the > while loop. > > Instead, assert check we need a similar bailout logic[2] before looping back to > first partial plan. Attached patch does the same, I've tested with > parallel_leader_participation = off setting as suggested by Andres, make check > looks good except there is some obvious regression diff. > > 1] https://postgr.es/m/CAKcux6m+6nTO6C08kKaj-Waffvgvp-9SD3RCGStX=Mzk0gQU8g@mail.gmail.com > 2] https://postgr.es/m/CAAJ_b975k58H+Ed4=p0vbJunwO2reOMX5CVB8_R=JmXxY3uW=Q@mail.gmail.com > @@ -506,7 +506,14 @@ choose_next_subplan_for_worker(AppendState *node) node->as_whichplan = pstate->pa_next_plan++; if (pstate->pa_next_plan >= node->as_nplans) { - Assert(append->first_partial_plan < node->as_nplans); + /* No partial plans then bail out. */ + if (append->first_partial_plan >= node->as_nplans) + { + pstate->pa_next_plan = INVALID_SUBPLAN_INDEX; + node->as_whichplan = INVALID_SUBPLAN_INDEX; + LWLockRelease(&pstate->pa_lock); + return false; + } pstate->pa_next_plan = append->first_partial_plan; In the above code, the fact that we have not bailed out from the earlier for loop means that we have already found an unfinished plan and node->as_whichplan is set to that plan. So while setting the next plan above for the other workers to pick, we should not return false, nor should we set node->as_whichplan to INVALID_SUBPLAN_INDEX. Instead, just set pa_next_plan to INVALID_SUBPLAN_INDEX. Otherwise, the chosen plan won't get executed at all. Thanks -Amit Khandekar
Copying & reverting to Amit Khandekar's email here: On Wed, Dec 6, 2017 at 11:45 AM, amul sul <sulamul@gmail.com> wrote: >> Thanks Tom for the crash analysis, I think this is the same reason that >> Rajkumar's reported case[1] was crashing only with partition-wise-join = on. >> I tried to fix this crash[2] but missed the place where I have added assert >> check in the assumption that we always coming from the previous check in the >> while loop. >> >> Instead, assert check we need a similar bailout logic[2] before looping back to >> first partial plan. Attached patch does the same, I've tested with >> parallel_leader_participation = off setting as suggested by Andres, make check >> looks good except there is some obvious regression diff. >> >> 1] https://postgr.es/m/CAKcux6m+6nTO6C08kKaj-Waffvgvp-9SD3RCGStX=Mzk0gQU8g@mail.gmail.com >> 2] https://postgr.es/m/CAAJ_b975k58H+Ed4=p0vbJunwO2reOMX5CVB8_R=JmXxY3uW=Q@mail.gmail.com >> > > @@ -506,7 +506,14 @@ choose_next_subplan_for_worker(AppendState *node) > node->as_whichplan = pstate->pa_next_plan++; > if (pstate->pa_next_plan >= node->as_nplans) > { > - Assert(append->first_partial_plan < node->as_nplans); > + /* No partial plans then bail out. */ > + if (append->first_partial_plan >= node->as_nplans) > + { > + pstate->pa_next_plan = INVALID_SUBPLAN_INDEX; > + node->as_whichplan = INVALID_SUBPLAN_INDEX; > + LWLockRelease(&pstate->pa_lock); > + return false; > + } > pstate->pa_next_plan = append->first_partial_plan; > > In the above code, the fact that we have not bailed out from the > earlier for loop means that we have already found an unfinished plan > and node->as_whichplan is set to that plan. So while setting the next > plan above for the other workers to pick, we should not return false, > nor should we set node->as_whichplan to INVALID_SUBPLAN_INDEX. > Instead, just set pa_next_plan to INVALID_SUBPLAN_INDEX. Otherwise, > the chosen plan won't get executed at all. > Understood, thanks for the review. Updated patch attached. 1] https://postgr.es/m/CAJ3gD9e3_D3fFqzWBFYoaF-6yCXgbOFn3Mb-pgd_mxvjpsw7Rw@mail.gmail.com Regards, Amul
Attachment
On 6 December 2017 at 14:31, amul sul <sulamul@gmail.com> wrote: > Copying & reverting to Amit Khandekar's email here: > > On Wed, Dec 6, 2017 at 11:45 AM, amul sul <sulamul@gmail.com> wrote: >>> Thanks Tom for the crash analysis, I think this is the same reason that >>> Rajkumar's reported case[1] was crashing only with partition-wise-join = on. >>> I tried to fix this crash[2] but missed the place where I have added assert >>> check in the assumption that we always coming from the previous check in the >>> while loop. >>> >>> Instead, assert check we need a similar bailout logic[2] before looping back to >>> first partial plan. Attached patch does the same, I've tested with >>> parallel_leader_participation = off setting as suggested by Andres, make check >>> looks good except there is some obvious regression diff. >>> >>> 1] https://postgr.es/m/CAKcux6m+6nTO6C08kKaj-Waffvgvp-9SD3RCGStX=Mzk0gQU8g@mail.gmail.com >>> 2] https://postgr.es/m/CAAJ_b975k58H+Ed4=p0vbJunwO2reOMX5CVB8_R=JmXxY3uW=Q@mail.gmail.com >>> >> >> @@ -506,7 +506,14 @@ choose_next_subplan_for_worker(AppendState *node) >> node->as_whichplan = pstate->pa_next_plan++; >> if (pstate->pa_next_plan >= node->as_nplans) >> { >> - Assert(append->first_partial_plan < node->as_nplans); >> + /* No partial plans then bail out. */ >> + if (append->first_partial_plan >= node->as_nplans) >> + { >> + pstate->pa_next_plan = INVALID_SUBPLAN_INDEX; >> + node->as_whichplan = INVALID_SUBPLAN_INDEX; >> + LWLockRelease(&pstate->pa_lock); >> + return false; >> + } >> pstate->pa_next_plan = append->first_partial_plan; >> >> In the above code, the fact that we have not bailed out from the >> earlier for loop means that we have already found an unfinished plan >> and node->as_whichplan is set to that plan. So while setting the next >> plan above for the other workers to pick, we should not return false, >> nor should we set node->as_whichplan to INVALID_SUBPLAN_INDEX. >> Instead, just set pa_next_plan to INVALID_SUBPLAN_INDEX. Otherwise, >> the chosen plan won't get executed at all. >> > > Understood, thanks for the review. Updated patch attached. > > 1] https://postgr.es/m/CAJ3gD9e3_D3fFqzWBFYoaF-6yCXgbOFn3Mb-pgd_mxvjpsw7Rw@mail.gmail.com This looks good. In attached revised patch, just added some comments in the changes that you did. > > Regards, > Amul -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company
Attachment
On Wed, Dec 6, 2017 at 5:01 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote: > This looks good. > > In attached revised patch, just added some comments in the changes that you did. Committed, thanks. It's rather embarrassing that I didn't notice this problem, because I did compare that logic with the preceding loop. I concluded it was OK on the theory the previous loop would have already given up if there were no partial plans. But that's wrong, of course: the previous loop will not have given up if it grabbed the last plan in a list of only non-partial plans. Oops. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Dec 6, 2017 at 5:01 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote: >> In attached revised patch, just added some comments in the changes that you did. > Committed, thanks. While this patch (looks like it) fixes the logic error, it does nothing for the problem that the committed test cases don't actually exercise any of this code on most machines -- certainly not whichever one is producing the code coverage report: https://coverage.postgresql.org/src/backend/executor/nodeAppend.c.gcov.html Can we do something with Andres' suggestion of running these test cases under parallel_leader_participation = off? regards, tom lane
On 6 December 2017 at 20:59, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Dec 6, 2017 at 5:01 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote: >>> In attached revised patch, just added some comments in the changes that you did. > >> Committed, thanks. > > While this patch (looks like it) fixes the logic error, it does nothing > for the problem that the committed test cases don't actually exercise > any of this code on most machines -- certainly not whichever one is > producing the code coverage report: > https://coverage.postgresql.org/src/backend/executor/nodeAppend.c.gcov.html > > Can we do something with Andres' suggestion of running these test cases > under parallel_leader_participation = off? > > regards, tom lane Yes, I am planning to send a patch that makes all those Parallel-Append test cases run once with parallel_leader_participation "on" and then run again with the guc "off" . Thanks. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company
On 7 December 2017 at 12:32, Amit Khandekar <amitdkhan.pg@gmail.com> wrote: > On 6 December 2017 at 20:59, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> On Wed, Dec 6, 2017 at 5:01 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote: >>>> In attached revised patch, just added some comments in the changes that you did. >> >>> Committed, thanks. >> >> While this patch (looks like it) fixes the logic error, it does nothing >> for the problem that the committed test cases don't actually exercise >> any of this code on most machines -- certainly not whichever one is >> producing the code coverage report: >> https://coverage.postgresql.org/src/backend/executor/nodeAppend.c.gcov.html >> >> Can we do something with Andres' suggestion of running these test cases >> under parallel_leader_participation = off? >> >> regards, tom lane > > Yes, I am planning to send a patch that makes all those > Parallel-Append test cases run once with parallel_leader_participation > "on" and then run again with the guc "off" . Thanks. > Attached is a patch that adds the above test cases. This allows coverage for the function choose_next_subplan_for_worker(). The code to advance pa_next_plan is there in the for-loop (place_1) and again below the for loop (place_2). At both these places, the logic involves wrapping around to the first (partial) plan. The code coverage exists for this wrap-around logic at place_2, but not at place_1 (i.e. in the for loop) : 470 : /* If all the plans are already done, we have nothing to do */ 471 72 : if (pstate->pa_next_plan == INVALID_SUBPLAN_INDEX) 472 : { 473 32 : LWLockRelease(&pstate->pa_lock); 474 32 : return false; 475 : } 476 : 477 : /* Loop until we find a subplan to execute. */ 478 92 : while (pstate->pa_finished[pstate->pa_next_plan]) 479 : { 480 16 : if (pstate->pa_next_plan < node->as_nplans - 1) 481 : { 482 : /* Advance to next plan. */ 483 16 : pstate->pa_next_plan++; 484 : } 485 0 : else if (append->first_partial_plan < node->as_nplans) 486 : { 487 : /* Loop back to first partial plan. */ 488 0 : pstate->pa_next_plan = append->first_partial_plan; 489 : } 490 : else 491 : { 492 : /* At last plan, no partial plans, arrange to bail out. */ 493 0 : pstate->pa_next_plan = node->as_whichplan; 494 : } 495 : 496 16 : if (pstate->pa_next_plan == node->as_whichplan) 497 : { 498 : /* We've tried everything! */ 499 4 : pstate->pa_next_plan = INVALID_SUBPLAN_INDEX; 500 4 : LWLockRelease(&pstate->pa_lock); 501 4 : return false; 502 : } 503 : } I have not managed to make the regression test cases execute the above not-covered case. I could do that with my local test that I have, but that test has lots of data, so it is slow, and not suitable for regression. In select_parallel.sql, by the time a worker w1 gets into the function choose_next_subplan_for_worker(), an earlier worker w2 must have already wrapped around the pa_next_plan at place_2, so this worker doesn't get a chance to hit place_1. It's a timing issue. I will see if I can solve this. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company