Thread: pgsql: Support Parallel Append plan nodes.

pgsql: Support Parallel Append plan nodes.

From

Robert Haas

Date:

06 December 2017, 01:31:22

Support Parallel Append plan nodes.

When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention.  We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.

Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.

Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/ab72716778128fb63d54ac256adf7fe6820a1185

Modified Files
--------------
doc/src/sgml/config.sgml                      |  14 ++
doc/src/sgml/monitoring.sgml                  |   7 +-
src/backend/executor/execParallel.c           |  19 ++
src/backend/executor/nodeAppend.c             | 331 +++++++++++++++++++++-----
src/backend/nodes/copyfuncs.c                 |   1 +
src/backend/nodes/list.c                      |  38 +++
src/backend/nodes/outfuncs.c                  |   1 +
src/backend/nodes/readfuncs.c                 |   1 +
src/backend/optimizer/path/allpaths.c         | 216 ++++++++++++++---
src/backend/optimizer/path/costsize.c         | 164 +++++++++++++
src/backend/optimizer/path/joinrels.c         |   3 +-
src/backend/optimizer/plan/createplan.c       |  10 +-
src/backend/optimizer/plan/planner.c          |   5 +-
src/backend/optimizer/prep/prepunion.c        |   7 +-
src/backend/optimizer/util/pathnode.c         |  88 +++++--
src/backend/storage/lmgr/lwlock.c             |   1 +
src/backend/utils/misc/guc.c                  |   9 +
src/backend/utils/misc/postgresql.conf.sample |   1 +
src/include/executor/nodeAppend.h             |   5 +
src/include/nodes/execnodes.h                 |  14 +-
src/include/nodes/pg_list.h                   |   3 +
src/include/nodes/plannodes.h                 |   1 +
src/include/nodes/relation.h                  |   6 +
src/include/optimizer/cost.h                  |   2 +
src/include/optimizer/pathnode.h              |   9 +-
src/include/storage/lwlock.h                  |   1 +
src/test/regress/expected/inherit.out         |   2 +
src/test/regress/expected/select_parallel.out |  91 ++++++-
src/test/regress/expected/sysviews.out        |   3 +-
src/test/regress/sql/inherit.sql              |   2 +
src/test/regress/sql/select_parallel.sql      |  33 ++-
31 files changed, 959 insertions(+), 129 deletions(-)

Re: pgsql: Support Parallel Append plan nodes.

From

Tom Lane

Date:

06 December 2017, 03:15:18

Robert Haas <rhaas@postgresql.org> writes:
> Support Parallel Append plan nodes.

Buildfarm member sidewinder doesn't like this:

Program terminated with signal SIGABRT, Aborted.
#0  0x00007f7ff5b0e01a in _lwp_kill () from /usr/lib/libc.so.12
#0  0x00007f7ff5b0e01a in _lwp_kill () from /usr/lib/libc.so.12
#1  0x00007f7ff5b0dca5 in abort () from /usr/lib/libc.so.12
#2  0x000000000082b5e3 in ExceptionalCondition (conditionName=conditionName@entry=0x9ad548
"!(append->first_partial_plan< node->as_nplans)", errorType=errorType@entry=0x87489a "FailedAssertion",
fileName=fileName@entry=0x9ad4d6"nodeAppend.c", lineNumber=lineNumber@entry=509) at assert.c:54 
#3  0x000000000060feb6 in choose_next_subplan_for_worker (node=0x7f7ff7357980) at nodeAppend.c:509
#4  0x000000000061007d in ExecAppend (pstate=0x7f7ff7357980) at nodeAppend.c:222
#5  0x0000000000610464 in ExecProcNode (node=0x7f7ff7357980) at ../../../src/include/executor/executor.h:241
#6  fetch_input_tuple (aggstate=aggstate@entry=0x7f7ff7357a98) at nodeAgg.c:699
#7  0x0000000000612531 in agg_retrieve_direct (aggstate=0x7f7ff7357a98) at nodeAgg.c:2456
#8  ExecAgg (pstate=0x7f7ff7357a98) at nodeAgg.c:2166
#9  0x0000000000603cbd in ExecProcNode (node=0x7f7ff7357a98) at ../../../src/include/executor/executor.h:241
#10 ExecutePlan (execute_once=<optimized out>, dest=0x7f7ff73556b8, direction=<optimized out>, numberTuples=0,
sendTuples=1'\001', operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x7f7ff7357a98,
estate=0x7f7ff7357040)at execMain.c:1718 
#11 standard_ExecutorRun (queryDesc=0x7f7ff7369f58, direction=<optimized out>, count=0, execute_once=<optimized out>)
atexecMain.c:361 
#12 0x000000000060833b in ParallelQueryMain (seg=0x7f7ff7b65908, toc=0x7f7ff7fab000) at execParallel.c:1319
#13 0x00000000004f6e54 in ParallelWorkerMain (main_arg=<optimized out>) at parallel.c:1150
#14 0x00000000006ba07f in StartBackgroundWorker () at bgworker.c:841
#15 0x00000000006c51f9 in do_start_bgworker (rw=<optimized out>) at postmaster.c:5741

Append.first_partial_plan is sadly underdocumented, but considering that
the loop above this Assert is prepared for the possibility that
first_partial_plan >= node->as_nplans, I'm not sure why this code supposes
that that can't happen.

BTW, I find it confusing and bad style that some of the references
in this function to node->as_pstate go through the local variable
pstate but others do not.

            regards, tom lane

Re: pgsql: Support Parallel Append plan nodes.

From

Tom Lane

Date:

06 December 2017, 07:01:59

I wrote:
> Robert Haas <rhaas@postgresql.org> writes:
>> Support Parallel Append plan nodes.

> Buildfarm member sidewinder doesn't like this:

It transpires that my machine prairiedog also shows the failure.
I instrumented nodeAppend.c (as per attached patch) to see what
was going on, and I get this trace on prairiedog:

2017-12-05 22:50:30.228 EST [28812] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
2017-12-05 22:50:30.228 EST [28812] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a1;
2017-12-05 22:50:30.241 EST [28813] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
2017-12-05 22:50:30.241 EST [28813] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a1;
2017-12-05 22:50:30.252 EST [28814] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
2017-12-05 22:50:30.252 EST [28814] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a1;
2017-12-05 22:50:30.363 EST [28816] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan 0, fpp 2, nplans 6
2017-12-05 22:50:30.363 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
2017-12-05 22:50:30.363 EST [28816] LOG:  choose_next_subplan_for_worker: picking nextplan 0, fpp 2, nplans 6
2017-12-05 22:50:30.363 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
2017-12-05 22:50:30.363 EST [28816] LOG:  choose_next_subplan_for_worker: whichplan 0, nextplan 1, fpp 2, nplans 6
2017-12-05 22:50:30.363 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
2017-12-05 22:50:30.364 EST [28816] LOG:  choose_next_subplan_for_worker: picking nextplan 1, fpp 2, nplans 6
2017-12-05 22:50:30.364 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
2017-12-05 22:50:30.364 EST [28816] LOG:  choose_next_subplan_for_worker: whichplan 1, nextplan 2, fpp 2, nplans 6
2017-12-05 22:50:30.364 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
2017-12-05 22:50:30.364 EST [28816] LOG:  choose_next_subplan_for_worker: picking nextplan 2, fpp 2, nplans 6
2017-12-05 22:50:30.364 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
2017-12-05 22:50:30.364 EST [28816] LOG:  choose_next_subplan_for_worker: whichplan 2, nextplan 3, fpp 2, nplans 6
2017-12-05 22:50:30.364 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
2017-12-05 22:50:30.364 EST [28816] LOG:  choose_next_subplan_for_worker: picking nextplan 3, fpp 2, nplans 6
2017-12-05 22:50:30.364 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
2017-12-05 22:50:30.365 EST [28816] LOG:  choose_next_subplan_for_worker: whichplan 3, nextplan 4, fpp 2, nplans 6
2017-12-05 22:50:30.365 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
2017-12-05 22:50:30.365 EST [28816] LOG:  choose_next_subplan_for_worker: picking nextplan 4, fpp 2, nplans 6
2017-12-05 22:50:30.365 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
2017-12-05 22:50:30.365 EST [28816] LOG:  choose_next_subplan_for_worker: whichplan 4, nextplan 5, fpp 2, nplans 6
2017-12-05 22:50:30.365 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
2017-12-05 22:50:30.365 EST [28816] LOG:  choose_next_subplan_for_worker: picking nextplan 5, fpp 2, nplans 6
2017-12-05 22:50:30.365 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
2017-12-05 22:50:30.365 EST [28816] LOG:  choose_next_subplan_for_worker: whichplan 5, nextplan 2, fpp 2, nplans 6
2017-12-05 22:50:30.365 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
2017-12-05 22:50:30.407 EST [28815] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6
2017-12-05 22:50:30.407 EST [28815] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
2017-12-05 22:50:30.474 EST [28817] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6
2017-12-05 22:50:30.474 EST [28817] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
2017-12-05 22:50:30.568 EST [28818] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan 0, fpp 6, nplans 6
2017-12-05 22:50:30.568 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
2017-12-05 22:50:30.568 EST [28818] LOG:  choose_next_subplan_for_worker: picking nextplan 0, fpp 6, nplans 6
2017-12-05 22:50:30.568 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
2017-12-05 22:50:30.569 EST [28818] LOG:  choose_next_subplan_for_worker: whichplan 0, nextplan 1, fpp 6, nplans 6
2017-12-05 22:50:30.569 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
2017-12-05 22:50:30.569 EST [28818] LOG:  choose_next_subplan_for_worker: picking nextplan 1, fpp 6, nplans 6
2017-12-05 22:50:30.569 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
2017-12-05 22:50:30.569 EST [28818] LOG:  choose_next_subplan_for_worker: whichplan 1, nextplan 2, fpp 6, nplans 6
2017-12-05 22:50:30.569 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
2017-12-05 22:50:30.569 EST [28818] LOG:  choose_next_subplan_for_worker: picking nextplan 2, fpp 6, nplans 6
2017-12-05 22:50:30.569 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
2017-12-05 22:50:30.570 EST [28818] LOG:  choose_next_subplan_for_worker: whichplan 2, nextplan 3, fpp 6, nplans 6
2017-12-05 22:50:30.570 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
2017-12-05 22:50:30.570 EST [28818] LOG:  choose_next_subplan_for_worker: picking nextplan 3, fpp 6, nplans 6
2017-12-05 22:50:30.570 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
2017-12-05 22:50:30.570 EST [28818] LOG:  choose_next_subplan_for_worker: whichplan 3, nextplan 4, fpp 6, nplans 6
2017-12-05 22:50:30.570 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
2017-12-05 22:50:30.570 EST [28818] LOG:  choose_next_subplan_for_worker: picking nextplan 4, fpp 6, nplans 6
2017-12-05 22:50:30.570 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
2017-12-05 22:50:30.570 EST [28818] LOG:  choose_next_subplan_for_worker: whichplan 4, nextplan 5, fpp 6, nplans 6
2017-12-05 22:50:30.570 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
2017-12-05 22:50:30.570 EST [28818] LOG:  choose_next_subplan_for_worker: picking nextplan 5, fpp 6, nplans 6
2017-12-05 22:50:30.570 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
TRAP: FailedAssertion("!(append->first_partial_plan < node->as_nplans)", File: "nodeAppend.c", Line: 520)
2017-12-05 22:50:38.211 EST [28158] LOG:  background worker "parallel worker" (PID 28818) was terminated by signal 6:
Aborttrap 
2017-12-05 22:50:38.211 EST [28158] DETAIL:  Failed process was running: select round(avg(aa)), sum(aa) from a_star a3;

That makes it pretty clear what's going wrong, but why don't we see it
elsewhere?

On my development machine, the same patch yields

2017-12-05 22:42:24.187 EST [843] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
2017-12-05 22:42:24.187 EST [843] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a1;
2017-12-05 22:42:24.188 EST [844] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
2017-12-05 22:42:24.188 EST [844] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a1;
2017-12-05 22:42:24.188 EST [845] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
2017-12-05 22:42:24.188 EST [845] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a1;
2017-12-05 22:42:24.197 EST [846] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6
2017-12-05 22:42:24.197 EST [846] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
2017-12-05 22:42:24.200 EST [847] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6
2017-12-05 22:42:24.200 EST [847] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
2017-12-05 22:42:24.200 EST [848] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6
2017-12-05 22:42:24.200 EST [848] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
2017-12-05 22:42:24.214 EST [849] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 6, nplans 6
2017-12-05 22:42:24.214 EST [849] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
2017-12-05 22:42:24.214 EST [852] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 6, nplans 6
2017-12-05 22:42:24.214 EST [852] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
2017-12-05 22:42:24.214 EST [853] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 6, nplans 6
2017-12-05 22:42:24.214 EST [853] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;

I conclude that the reason we don't see the crash except on the slowest
buildfarm critters is that on most machines the leader process manages
to finish off all the subplans before any of the workers get as far as
doing something.  This doesn't give me a warm fuzzy feeling about how
much testing this patch has gotten.

            regards, tom lane

diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 246a0b2..4c4355b 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -463,6 +463,12 @@ choose_next_subplan_for_worker(AppendState *node)

     LWLockAcquire(&pstate->pa_lock, LW_EXCLUSIVE);

+    elog(LOG, "choose_next_subplan_for_worker: whichplan %d, nextplan %d, fpp %d, nplans %d",
+         node->as_whichplan,
+         pstate->pa_next_plan,
+         append->first_partial_plan,
+         node->as_nplans);
+
     /* Mark just-completed subplan as finished. */
     if (node->as_whichplan != INVALID_SUBPLAN_INDEX)
         node->as_pstate->pa_finished[node->as_whichplan] = true;
@@ -502,6 +508,11 @@ choose_next_subplan_for_worker(AppendState *node)
         }
     }

+    elog(LOG, "choose_next_subplan_for_worker: picking nextplan %d, fpp %d, nplans %d",
+         pstate->pa_next_plan,
+         append->first_partial_plan,
+         node->as_nplans);
+
     /* Pick the plan we found, and advance pa_next_plan one more time. */
     node->as_whichplan = pstate->pa_next_plan++;
     if (pstate->pa_next_plan >= node->as_nplans)

Re: pgsql: Support Parallel Append plan nodes.

From

Andres Freund

Date:

06 December 2017, 07:17:15

Hi,

On 2017-12-05 23:01:59 -0500, Tom Lane wrote:
> I conclude that the reason we don't see the crash except on the slowest
> buildfarm critters is that on most machines the leader process manages
> to finish off all the subplans before any of the workers get as far as
> doing something.  This doesn't give me a warm fuzzy feeling about how
> much testing this patch has gotten.

If that's the case, it should be reproducible on fast machines if one
sets parallel_leader_participation = off during an installcheck. Which
indeed crashes here, albeit on a heavily modified tree:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f4c05afb19a in __GI_abort () at abort.c:89
#2  0x000055d89651e83a in ExceptionalCondition (conditionName=0x55d8966edef8 "!(append->first_partial_plan <
node->as_nplans)",
 
    errorType=0x55d8966edd86 "FailedAssertion", fileName=0x55d8966eddd8
"/home/andres/src/postgresql/src/backend/executor/nodeAppend.c",
 
    lineNumber=503) at /home/andres/src/postgresql/src/backend/utils/error/assert.c:54
#3  0x000055d8961eab63 in choose_next_subplan_for_worker (node=0x55d8974535a0)
    at /home/andres/src/postgresql/src/backend/executor/nodeAppend.c:503
#4  0x000055d8961ea47b in ExecAppend (pstate=0x55d8974535a0) at
/home/andres/src/postgresql/src/backend/executor/nodeAppend.c:216
#5  0x000055d8961d6c55 in ExecProcNode (node=0x55d8974535a0) at
/home/andres/src/postgresql/src/include/executor/executor.h:239

Greetings,

Andres Freund

Re: pgsql: Support Parallel Append plan nodes.

From

amul sul

Date:

06 December 2017, 09:15:17

Thanks Tom for the crash analysis, I think this is the same reason that
Rajkumar's reported case[1] was crashing only with partition-wise-join = on.
I tried to fix this crash[2] but missed the place where I have added assert
check in the assumption that we always coming from the previous check in the
while loop.

Instead, assert check we need a similar bailout logic[2] before looping back to
first partial plan. Attached patch does the same, I've tested with
parallel_leader_participation = off setting as suggested by Andres, make check
looks good except there is some obvious regression diff.

1] https://postgr.es/m/CAKcux6m+6nTO6C08kKaj-Waffvgvp-9SD3RCGStX=Mzk0gQU8g@mail.gmail.com
2] https://postgr.es/m/CAAJ_b975k58H+Ed4=p0vbJunwO2reOMX5CVB8_R=JmXxY3uW=Q@mail.gmail.com

Regards,
Amul

On Wed, Dec 6, 2017 at 9:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> Robert Haas <rhaas@postgresql.org> writes:
>>> Support Parallel Append plan nodes.
>
>> Buildfarm member sidewinder doesn't like this:
>
> It transpires that my machine prairiedog also shows the failure.
> I instrumented nodeAppend.c (as per attached patch) to see what
> was going on, and I get this trace on prairiedog:
>
> 2017-12-05 22:50:30.228 EST [28812] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
> 2017-12-05 22:50:30.228 EST [28812] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a1;
> 2017-12-05 22:50:30.241 EST [28813] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
> 2017-12-05 22:50:30.241 EST [28813] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a1;
> 2017-12-05 22:50:30.252 EST [28814] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
> 2017-12-05 22:50:30.252 EST [28814] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a1;
> 2017-12-05 22:50:30.363 EST [28816] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan 0, fpp 2, nplans 6
> 2017-12-05 22:50:30.363 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.363 EST [28816] LOG:  choose_next_subplan_for_worker: picking nextplan 0, fpp 2, nplans 6
> 2017-12-05 22:50:30.363 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.363 EST [28816] LOG:  choose_next_subplan_for_worker: whichplan 0, nextplan 1, fpp 2, nplans 6
> 2017-12-05 22:50:30.363 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.364 EST [28816] LOG:  choose_next_subplan_for_worker: picking nextplan 1, fpp 2, nplans 6
> 2017-12-05 22:50:30.364 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.364 EST [28816] LOG:  choose_next_subplan_for_worker: whichplan 1, nextplan 2, fpp 2, nplans 6
> 2017-12-05 22:50:30.364 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.364 EST [28816] LOG:  choose_next_subplan_for_worker: picking nextplan 2, fpp 2, nplans 6
> 2017-12-05 22:50:30.364 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.364 EST [28816] LOG:  choose_next_subplan_for_worker: whichplan 2, nextplan 3, fpp 2, nplans 6
> 2017-12-05 22:50:30.364 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.364 EST [28816] LOG:  choose_next_subplan_for_worker: picking nextplan 3, fpp 2, nplans 6
> 2017-12-05 22:50:30.364 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.365 EST [28816] LOG:  choose_next_subplan_for_worker: whichplan 3, nextplan 4, fpp 2, nplans 6
> 2017-12-05 22:50:30.365 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.365 EST [28816] LOG:  choose_next_subplan_for_worker: picking nextplan 4, fpp 2, nplans 6
> 2017-12-05 22:50:30.365 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.365 EST [28816] LOG:  choose_next_subplan_for_worker: whichplan 4, nextplan 5, fpp 2, nplans 6
> 2017-12-05 22:50:30.365 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.365 EST [28816] LOG:  choose_next_subplan_for_worker: picking nextplan 5, fpp 2, nplans 6
> 2017-12-05 22:50:30.365 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.365 EST [28816] LOG:  choose_next_subplan_for_worker: whichplan 5, nextplan 2, fpp 2, nplans 6
> 2017-12-05 22:50:30.365 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.407 EST [28815] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6
> 2017-12-05 22:50:30.407 EST [28815] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.474 EST [28817] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6
> 2017-12-05 22:50:30.474 EST [28817] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.568 EST [28818] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan 0, fpp 6, nplans 6
> 2017-12-05 22:50:30.568 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.568 EST [28818] LOG:  choose_next_subplan_for_worker: picking nextplan 0, fpp 6, nplans 6
> 2017-12-05 22:50:30.568 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.569 EST [28818] LOG:  choose_next_subplan_for_worker: whichplan 0, nextplan 1, fpp 6, nplans 6
> 2017-12-05 22:50:30.569 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.569 EST [28818] LOG:  choose_next_subplan_for_worker: picking nextplan 1, fpp 6, nplans 6
> 2017-12-05 22:50:30.569 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.569 EST [28818] LOG:  choose_next_subplan_for_worker: whichplan 1, nextplan 2, fpp 6, nplans 6
> 2017-12-05 22:50:30.569 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.569 EST [28818] LOG:  choose_next_subplan_for_worker: picking nextplan 2, fpp 6, nplans 6
> 2017-12-05 22:50:30.569 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.570 EST [28818] LOG:  choose_next_subplan_for_worker: whichplan 2, nextplan 3, fpp 6, nplans 6
> 2017-12-05 22:50:30.570 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.570 EST [28818] LOG:  choose_next_subplan_for_worker: picking nextplan 3, fpp 6, nplans 6
> 2017-12-05 22:50:30.570 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.570 EST [28818] LOG:  choose_next_subplan_for_worker: whichplan 3, nextplan 4, fpp 6, nplans 6
> 2017-12-05 22:50:30.570 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.570 EST [28818] LOG:  choose_next_subplan_for_worker: picking nextplan 4, fpp 6, nplans 6
> 2017-12-05 22:50:30.570 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.570 EST [28818] LOG:  choose_next_subplan_for_worker: whichplan 4, nextplan 5, fpp 6, nplans 6
> 2017-12-05 22:50:30.570 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.570 EST [28818] LOG:  choose_next_subplan_for_worker: picking nextplan 5, fpp 6, nplans 6
> 2017-12-05 22:50:30.570 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> TRAP: FailedAssertion("!(append->first_partial_plan < node->as_nplans)", File: "nodeAppend.c", Line: 520)
> 2017-12-05 22:50:38.211 EST [28158] LOG:  background worker "parallel worker" (PID 28818) was terminated by signal 6:
Aborttrap
 
> 2017-12-05 22:50:38.211 EST [28158] DETAIL:  Failed process was running: select round(avg(aa)), sum(aa) from a_star
a3;
>
> That makes it pretty clear what's going wrong, but why don't we see it
> elsewhere?
>
> On my development machine, the same patch yields
>
> 2017-12-05 22:42:24.187 EST [843] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
> 2017-12-05 22:42:24.187 EST [843] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a1;
> 2017-12-05 22:42:24.188 EST [844] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
> 2017-12-05 22:42:24.188 EST [844] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a1;
> 2017-12-05 22:42:24.188 EST [845] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
> 2017-12-05 22:42:24.188 EST [845] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a1;
> 2017-12-05 22:42:24.197 EST [846] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6
> 2017-12-05 22:42:24.197 EST [846] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:42:24.200 EST [847] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6
> 2017-12-05 22:42:24.200 EST [847] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:42:24.200 EST [848] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6
> 2017-12-05 22:42:24.200 EST [848] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:42:24.214 EST [849] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 6, nplans 6
> 2017-12-05 22:42:24.214 EST [849] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:42:24.214 EST [852] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 6, nplans 6
> 2017-12-05 22:42:24.214 EST [852] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:42:24.214 EST [853] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 6, nplans 6
> 2017-12-05 22:42:24.214 EST [853] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
>
> I conclude that the reason we don't see the crash except on the slowest
> buildfarm critters is that on most machines the leader process manages
> to finish off all the subplans before any of the workers get as far as
> doing something.  This doesn't give me a warm fuzzy feeling about how
> much testing this patch has gotten.
>
>                         regards, tom lane
>

Attachment

Parallel-Append-crash-fix.patch

Re: pgsql: Support Parallel Append plan nodes.

From

Amit Khandekar

Date:

06 December 2017, 11:43:49

On 6 November 2017 16:50, amul sul <sulamul@gmail.com> wrote:
> Thanks Tom for the crash analysis, I think this is the same reason that
> Rajkumar's reported case[1] was crashing only with partition-wise-join = on.
> I tried to fix this crash[2] but missed the place where I have added assert
> check in the assumption that we always coming from the previous check in the
> while loop.
>
> Instead, assert check we need a similar bailout logic[2] before looping back to
> first partial plan. Attached patch does the same, I've tested with
> parallel_leader_participation = off setting as suggested by Andres, make check
> looks good except there is some obvious regression diff.
>
> 1] https://postgr.es/m/CAKcux6m+6nTO6C08kKaj-Waffvgvp-9SD3RCGStX=Mzk0gQU8g@mail.gmail.com
> 2] https://postgr.es/m/CAAJ_b975k58H+Ed4=p0vbJunwO2reOMX5CVB8_R=JmXxY3uW=Q@mail.gmail.com
>

@@ -506,7 +506,14 @@ choose_next_subplan_for_worker(AppendState *node)
  node->as_whichplan = pstate->pa_next_plan++;
  if (pstate->pa_next_plan >= node->as_nplans)
  {
- Assert(append->first_partial_plan < node->as_nplans);
+ /* No partial plans then bail out. */
+ if (append->first_partial_plan >= node->as_nplans)
+ {
+ pstate->pa_next_plan = INVALID_SUBPLAN_INDEX;
+ node->as_whichplan = INVALID_SUBPLAN_INDEX;
+ LWLockRelease(&pstate->pa_lock);
+ return false;
+ }
  pstate->pa_next_plan = append->first_partial_plan;

In the above code, the fact that we have not bailed out from the
earlier for loop means that we have already found an unfinished plan
and node->as_whichplan is set to that plan. So while setting the next
plan above for the other workers to pick, we should not return false,
nor should we set node->as_whichplan to INVALID_SUBPLAN_INDEX.
Instead, just set pa_next_plan to INVALID_SUBPLAN_INDEX. Otherwise,
the chosen plan won't get executed at all.

Thanks
-Amit Khandekar

Re: pgsql: Support Parallel Append plan nodes.

From

amul sul

Date:

06 December 2017, 12:01:24

Copying & reverting to Amit Khandekar's email here:

On Wed, Dec 6, 2017 at 11:45 AM, amul sul <sulamul@gmail.com> wrote:
>> Thanks Tom for the crash analysis, I think this is the same reason that
>> Rajkumar's reported case[1] was crashing only with partition-wise-join = on.
>> I tried to fix this crash[2] but missed the place where I have added assert
>> check in the assumption that we always coming from the previous check in the
>> while loop.
>>
>> Instead, assert check we need a similar bailout logic[2] before looping back to
>> first partial plan. Attached patch does the same, I've tested with
>> parallel_leader_participation = off setting as suggested by Andres, make check
>> looks good except there is some obvious regression diff.
>>
>> 1] https://postgr.es/m/CAKcux6m+6nTO6C08kKaj-Waffvgvp-9SD3RCGStX=Mzk0gQU8g@mail.gmail.com
>> 2] https://postgr.es/m/CAAJ_b975k58H+Ed4=p0vbJunwO2reOMX5CVB8_R=JmXxY3uW=Q@mail.gmail.com
>>
>
> @@ -506,7 +506,14 @@ choose_next_subplan_for_worker(AppendState *node)
>   node->as_whichplan = pstate->pa_next_plan++;
>   if (pstate->pa_next_plan >= node->as_nplans)
>   {
> - Assert(append->first_partial_plan < node->as_nplans);
> + /* No partial plans then bail out. */
> + if (append->first_partial_plan >= node->as_nplans)
> + {
> + pstate->pa_next_plan = INVALID_SUBPLAN_INDEX;
> + node->as_whichplan = INVALID_SUBPLAN_INDEX;
> + LWLockRelease(&pstate->pa_lock);
> + return false;
> + }
>   pstate->pa_next_plan = append->first_partial_plan;
>
> In the above code, the fact that we have not bailed out from the
> earlier for loop means that we have already found an unfinished plan
> and node->as_whichplan is set to that plan. So while setting the next
> plan above for the other workers to pick, we should not return false,
> nor should we set node->as_whichplan to INVALID_SUBPLAN_INDEX.
> Instead, just set pa_next_plan to INVALID_SUBPLAN_INDEX. Otherwise,
> the chosen plan won't get executed at all.
>

Understood, thanks for the review. Updated patch attached.

1] https://postgr.es/m/CAJ3gD9e3_D3fFqzWBFYoaF-6yCXgbOFn3Mb-pgd_mxvjpsw7Rw@mail.gmail.com

Regards,
Amul

Attachment

Parallel-Append-crash-fix-v2.patch

Re: pgsql: Support Parallel Append plan nodes.

From

Amit Khandekar

Date:

06 December 2017, 13:01:44

On 6 December 2017 at 14:31, amul sul <sulamul@gmail.com> wrote:
> Copying & reverting to Amit Khandekar's email here:
>
> On Wed, Dec 6, 2017 at 11:45 AM, amul sul <sulamul@gmail.com> wrote:
>>> Thanks Tom for the crash analysis, I think this is the same reason that
>>> Rajkumar's reported case[1] was crashing only with partition-wise-join = on.
>>> I tried to fix this crash[2] but missed the place where I have added assert
>>> check in the assumption that we always coming from the previous check in the
>>> while loop.
>>>
>>> Instead, assert check we need a similar bailout logic[2] before looping back to
>>> first partial plan. Attached patch does the same, I've tested with
>>> parallel_leader_participation = off setting as suggested by Andres, make check
>>> looks good except there is some obvious regression diff.
>>>
>>> 1] https://postgr.es/m/CAKcux6m+6nTO6C08kKaj-Waffvgvp-9SD3RCGStX=Mzk0gQU8g@mail.gmail.com
>>> 2] https://postgr.es/m/CAAJ_b975k58H+Ed4=p0vbJunwO2reOMX5CVB8_R=JmXxY3uW=Q@mail.gmail.com
>>>
>>
>> @@ -506,7 +506,14 @@ choose_next_subplan_for_worker(AppendState *node)
>>   node->as_whichplan = pstate->pa_next_plan++;
>>   if (pstate->pa_next_plan >= node->as_nplans)
>>   {
>> - Assert(append->first_partial_plan < node->as_nplans);
>> + /* No partial plans then bail out. */
>> + if (append->first_partial_plan >= node->as_nplans)
>> + {
>> + pstate->pa_next_plan = INVALID_SUBPLAN_INDEX;
>> + node->as_whichplan = INVALID_SUBPLAN_INDEX;
>> + LWLockRelease(&pstate->pa_lock);
>> + return false;
>> + }
>>   pstate->pa_next_plan = append->first_partial_plan;
>>
>> In the above code, the fact that we have not bailed out from the
>> earlier for loop means that we have already found an unfinished plan
>> and node->as_whichplan is set to that plan. So while setting the next
>> plan above for the other workers to pick, we should not return false,
>> nor should we set node->as_whichplan to INVALID_SUBPLAN_INDEX.
>> Instead, just set pa_next_plan to INVALID_SUBPLAN_INDEX. Otherwise,
>> the chosen plan won't get executed at all.
>>
>
> Understood, thanks for the review. Updated patch attached.
>
> 1] https://postgr.es/m/CAJ3gD9e3_D3fFqzWBFYoaF-6yCXgbOFn3Mb-pgd_mxvjpsw7Rw@mail.gmail.com

This looks good.

In attached revised patch, just added some comments in the changes that you did.

>
> Regards,
> Amul



-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment

Parallel-Append-crash-fix-v3.patch

Re: pgsql: Support Parallel Append plan nodes.

From

Robert Haas

Date:

06 December 2017, 16:48:08

On Wed, Dec 6, 2017 at 5:01 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> This looks good.
>
> In attached revised patch, just added some comments in the changes that you did.

Committed, thanks.  It's rather embarrassing that I didn't notice this
problem, because I did compare that logic with the preceding loop.  I
concluded it was OK on the theory the previous loop would have already
given up if there were no partial plans.  But that's wrong, of course:
the previous loop will not have given up if it grabbed the last plan
in a list of only non-partial plans.  Oops.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pgsql: Support Parallel Append plan nodes.

From

Tom Lane

Date:

06 December 2017, 18:29:16

Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Dec 6, 2017 at 5:01 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>> In attached revised patch, just added some comments in the changes that you did.

> Committed, thanks.

While this patch (looks like it) fixes the logic error, it does nothing
for the problem that the committed test cases don't actually exercise
any of this code on most machines -- certainly not whichever one is
producing the code coverage report:
https://coverage.postgresql.org/src/backend/executor/nodeAppend.c.gcov.html

Can we do something with Andres' suggestion of running these test cases
under parallel_leader_participation = off?

            regards, tom lane

Re: pgsql: Support Parallel Append plan nodes.

From

Amit Khandekar

Date:

07 December 2017, 10:02:46

On 6 December 2017 at 20:59, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Dec 6, 2017 at 5:01 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>>> In attached revised patch, just added some comments in the changes that you did.
>
>> Committed, thanks.
>
> While this patch (looks like it) fixes the logic error, it does nothing
> for the problem that the committed test cases don't actually exercise
> any of this code on most machines -- certainly not whichever one is
> producing the code coverage report:
> https://coverage.postgresql.org/src/backend/executor/nodeAppend.c.gcov.html
>
> Can we do something with Andres' suggestion of running these test cases
> under parallel_leader_participation = off?
>
>                         regards, tom lane

Yes, I am planning to send a patch that makes all those
Parallel-Append test cases run once with parallel_leader_participation
"on" and then run again with the guc "off" . Thanks.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: pgsql: Support Parallel Append plan nodes.

From

Amit Khandekar

Date:

07 December 2017, 14:01:31

On 7 December 2017 at 12:32, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> On 6 December 2017 at 20:59, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> On Wed, Dec 6, 2017 at 5:01 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>>>> In attached revised patch, just added some comments in the changes that you did.
>>
>>> Committed, thanks.
>>
>> While this patch (looks like it) fixes the logic error, it does nothing
>> for the problem that the committed test cases don't actually exercise
>> any of this code on most machines -- certainly not whichever one is
>> producing the code coverage report:
>> https://coverage.postgresql.org/src/backend/executor/nodeAppend.c.gcov.html
>>
>> Can we do something with Andres' suggestion of running these test cases
>> under parallel_leader_participation = off?
>>
>>                         regards, tom lane
>
> Yes, I am planning to send a patch that makes all those
> Parallel-Append test cases run once with parallel_leader_participation
> "on" and then run again with the guc "off" . Thanks.
>

Attached is a patch that adds the above test cases. This allows
coverage for the function choose_next_subplan_for_worker().

The code to advance pa_next_plan is there in the for-loop (place_1)
and again below the for loop (place_2). At both these places, the
logic involves wrapping around to the first (partial) plan. The code
coverage exists for this wrap-around logic at place_2, but not at
place_1 (i.e. in the for loop) :

470         :     /* If all the plans are already done, we have nothing to do */
471      72 :     if (pstate->pa_next_plan == INVALID_SUBPLAN_INDEX)
472         :     {
473      32 :     LWLockRelease(&pstate->pa_lock);
474      32 :     return false;
475         :     }
476         :
477         :     /* Loop until we find a subplan to execute. */
478      92 :     while (pstate->pa_finished[pstate->pa_next_plan])
479         :     {
480      16 :     if (pstate->pa_next_plan < node->as_nplans - 1)
481         :     {
482         :         /* Advance to next plan. */
483      16 :         pstate->pa_next_plan++;
484         :     }
485       0 :     else if (append->first_partial_plan < node->as_nplans)
486         :     {
487         :         /* Loop back to first partial plan. */
488       0 :         pstate->pa_next_plan = append->first_partial_plan;
489         :     }
490         :     else
491         :     {
492         :         /* At last plan, no partial plans, arrange to bail out. */
493       0 :         pstate->pa_next_plan = node->as_whichplan;
494         :     }
495         :
496      16 :     if (pstate->pa_next_plan == node->as_whichplan)
497         :     {
498         :         /* We've tried everything! */
499       4 :         pstate->pa_next_plan = INVALID_SUBPLAN_INDEX;
500       4 :         LWLockRelease(&pstate->pa_lock);
501       4 :         return false;
502         :     }
503         :     }

I have not managed to make the regression test cases execute the above
not-covered case. I could do that with my local test that I have, but
that test has lots of data, so it is slow, and not suitable for
regression. In select_parallel.sql, by the time a worker w1 gets into
the function choose_next_subplan_for_worker(), an earlier worker w2
must have already wrapped around the pa_next_plan at place_2, so this
worker doesn't get a chance to hit place_1. It's a timing issue. I
will see if I can solve this.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment

test_with_noleader.patch