Re: pgsql: Support Parallel Append plan nodes. - Mailing list pgsql-committers

From amul sul
Subject Re: pgsql: Support Parallel Append plan nodes.
Date
Msg-id CAAJ_b97Bfn32NJB_7P0mvgFWDmo+Ago7dPbKBr08eWKyprwHsA@mail.gmail.com
Whole thread Raw
In response to Re: pgsql: Support Parallel Append plan nodes.  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: pgsql: Support Parallel Append plan nodes.  (amul sul <sulamul@gmail.com>)
List pgsql-committers
Thanks Tom for the crash analysis, I think this is the same reason that
Rajkumar's reported case[1] was crashing only with partition-wise-join = on.
I tried to fix this crash[2] but missed the place where I have added assert
check in the assumption that we always coming from the previous check in the
while loop.

Instead, assert check we need a similar bailout logic[2] before looping back to
first partial plan. Attached patch does the same, I've tested with
parallel_leader_participation = off setting as suggested by Andres, make check
looks good except there is some obvious regression diff.

1] https://postgr.es/m/CAKcux6m+6nTO6C08kKaj-Waffvgvp-9SD3RCGStX=Mzk0gQU8g@mail.gmail.com
2] https://postgr.es/m/CAAJ_b975k58H+Ed4=p0vbJunwO2reOMX5CVB8_R=JmXxY3uW=Q@mail.gmail.com

Regards,
Amul

On Wed, Dec 6, 2017 at 9:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> Robert Haas <rhaas@postgresql.org> writes:
>>> Support Parallel Append plan nodes.
>
>> Buildfarm member sidewinder doesn't like this:
>
> It transpires that my machine prairiedog also shows the failure.
> I instrumented nodeAppend.c (as per attached patch) to see what
> was going on, and I get this trace on prairiedog:
>
> 2017-12-05 22:50:30.228 EST [28812] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
> 2017-12-05 22:50:30.228 EST [28812] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a1;
> 2017-12-05 22:50:30.241 EST [28813] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
> 2017-12-05 22:50:30.241 EST [28813] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a1;
> 2017-12-05 22:50:30.252 EST [28814] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
> 2017-12-05 22:50:30.252 EST [28814] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a1;
> 2017-12-05 22:50:30.363 EST [28816] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan 0, fpp 2, nplans 6
> 2017-12-05 22:50:30.363 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.363 EST [28816] LOG:  choose_next_subplan_for_worker: picking nextplan 0, fpp 2, nplans 6
> 2017-12-05 22:50:30.363 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.363 EST [28816] LOG:  choose_next_subplan_for_worker: whichplan 0, nextplan 1, fpp 2, nplans 6
> 2017-12-05 22:50:30.363 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.364 EST [28816] LOG:  choose_next_subplan_for_worker: picking nextplan 1, fpp 2, nplans 6
> 2017-12-05 22:50:30.364 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.364 EST [28816] LOG:  choose_next_subplan_for_worker: whichplan 1, nextplan 2, fpp 2, nplans 6
> 2017-12-05 22:50:30.364 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.364 EST [28816] LOG:  choose_next_subplan_for_worker: picking nextplan 2, fpp 2, nplans 6
> 2017-12-05 22:50:30.364 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.364 EST [28816] LOG:  choose_next_subplan_for_worker: whichplan 2, nextplan 3, fpp 2, nplans 6
> 2017-12-05 22:50:30.364 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.364 EST [28816] LOG:  choose_next_subplan_for_worker: picking nextplan 3, fpp 2, nplans 6
> 2017-12-05 22:50:30.364 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.365 EST [28816] LOG:  choose_next_subplan_for_worker: whichplan 3, nextplan 4, fpp 2, nplans 6
> 2017-12-05 22:50:30.365 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.365 EST [28816] LOG:  choose_next_subplan_for_worker: picking nextplan 4, fpp 2, nplans 6
> 2017-12-05 22:50:30.365 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.365 EST [28816] LOG:  choose_next_subplan_for_worker: whichplan 4, nextplan 5, fpp 2, nplans 6
> 2017-12-05 22:50:30.365 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.365 EST [28816] LOG:  choose_next_subplan_for_worker: picking nextplan 5, fpp 2, nplans 6
> 2017-12-05 22:50:30.365 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.365 EST [28816] LOG:  choose_next_subplan_for_worker: whichplan 5, nextplan 2, fpp 2, nplans 6
> 2017-12-05 22:50:30.365 EST [28816] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.407 EST [28815] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6
> 2017-12-05 22:50:30.407 EST [28815] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.474 EST [28817] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6
> 2017-12-05 22:50:30.474 EST [28817] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:50:30.568 EST [28818] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan 0, fpp 6, nplans 6
> 2017-12-05 22:50:30.568 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.568 EST [28818] LOG:  choose_next_subplan_for_worker: picking nextplan 0, fpp 6, nplans 6
> 2017-12-05 22:50:30.568 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.569 EST [28818] LOG:  choose_next_subplan_for_worker: whichplan 0, nextplan 1, fpp 6, nplans 6
> 2017-12-05 22:50:30.569 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.569 EST [28818] LOG:  choose_next_subplan_for_worker: picking nextplan 1, fpp 6, nplans 6
> 2017-12-05 22:50:30.569 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.569 EST [28818] LOG:  choose_next_subplan_for_worker: whichplan 1, nextplan 2, fpp 6, nplans 6
> 2017-12-05 22:50:30.569 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.569 EST [28818] LOG:  choose_next_subplan_for_worker: picking nextplan 2, fpp 6, nplans 6
> 2017-12-05 22:50:30.569 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.570 EST [28818] LOG:  choose_next_subplan_for_worker: whichplan 2, nextplan 3, fpp 6, nplans 6
> 2017-12-05 22:50:30.570 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.570 EST [28818] LOG:  choose_next_subplan_for_worker: picking nextplan 3, fpp 6, nplans 6
> 2017-12-05 22:50:30.570 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.570 EST [28818] LOG:  choose_next_subplan_for_worker: whichplan 3, nextplan 4, fpp 6, nplans 6
> 2017-12-05 22:50:30.570 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.570 EST [28818] LOG:  choose_next_subplan_for_worker: picking nextplan 4, fpp 6, nplans 6
> 2017-12-05 22:50:30.570 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.570 EST [28818] LOG:  choose_next_subplan_for_worker: whichplan 4, nextplan 5, fpp 6, nplans 6
> 2017-12-05 22:50:30.570 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:50:30.570 EST [28818] LOG:  choose_next_subplan_for_worker: picking nextplan 5, fpp 6, nplans 6
> 2017-12-05 22:50:30.570 EST [28818] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> TRAP: FailedAssertion("!(append->first_partial_plan < node->as_nplans)", File: "nodeAppend.c", Line: 520)
> 2017-12-05 22:50:38.211 EST [28158] LOG:  background worker "parallel worker" (PID 28818) was terminated by signal 6:
Aborttrap
 
> 2017-12-05 22:50:38.211 EST [28158] DETAIL:  Failed process was running: select round(avg(aa)), sum(aa) from a_star
a3;
>
> That makes it pretty clear what's going wrong, but why don't we see it
> elsewhere?
>
> On my development machine, the same patch yields
>
> 2017-12-05 22:42:24.187 EST [843] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
> 2017-12-05 22:42:24.187 EST [843] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a1;
> 2017-12-05 22:42:24.188 EST [844] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
> 2017-12-05 22:42:24.188 EST [844] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a1;
> 2017-12-05 22:42:24.188 EST [845] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6
> 2017-12-05 22:42:24.188 EST [845] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a1;
> 2017-12-05 22:42:24.197 EST [846] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6
> 2017-12-05 22:42:24.197 EST [846] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:42:24.200 EST [847] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6
> 2017-12-05 22:42:24.200 EST [847] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:42:24.200 EST [848] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6
> 2017-12-05 22:42:24.200 EST [848] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a2;
> 2017-12-05 22:42:24.214 EST [849] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 6, nplans 6
> 2017-12-05 22:42:24.214 EST [849] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:42:24.214 EST [852] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 6, nplans 6
> 2017-12-05 22:42:24.214 EST [852] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
> 2017-12-05 22:42:24.214 EST [853] LOG:  choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 6, nplans 6
> 2017-12-05 22:42:24.214 EST [853] STATEMENT:  select round(avg(aa)), sum(aa) from a_star a3;
>
> I conclude that the reason we don't see the crash except on the slowest
> buildfarm critters is that on most machines the leader process manages
> to finish off all the subplans before any of the workers get as far as
> doing something.  This doesn't give me a warm fuzzy feeling about how
> much testing this patch has gotten.
>
>                         regards, tom lane
>

Attachment

pgsql-committers by date:

Previous
From: Andres Freund
Date:
Subject: Re: pgsql: Support Parallel Append plan nodes.
Next
From: Andres Freund
Date:
Subject: Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple