Re: pgsql: Support Parallel Append plan nodes. - Mailing list pgsql-committers
From | amul sul |
---|---|
Subject | Re: pgsql: Support Parallel Append plan nodes. |
Date | |
Msg-id | CAAJ_b97Bfn32NJB_7P0mvgFWDmo+Ago7dPbKBr08eWKyprwHsA@mail.gmail.com Whole thread Raw |
In response to | Re: pgsql: Support Parallel Append plan nodes. (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: pgsql: Support Parallel Append plan nodes.
|
List | pgsql-committers |
Thanks Tom for the crash analysis, I think this is the same reason that Rajkumar's reported case[1] was crashing only with partition-wise-join = on. I tried to fix this crash[2] but missed the place where I have added assert check in the assumption that we always coming from the previous check in the while loop. Instead, assert check we need a similar bailout logic[2] before looping back to first partial plan. Attached patch does the same, I've tested with parallel_leader_participation = off setting as suggested by Andres, make check looks good except there is some obvious regression diff. 1] https://postgr.es/m/CAKcux6m+6nTO6C08kKaj-Waffvgvp-9SD3RCGStX=Mzk0gQU8g@mail.gmail.com 2] https://postgr.es/m/CAAJ_b975k58H+Ed4=p0vbJunwO2reOMX5CVB8_R=JmXxY3uW=Q@mail.gmail.com Regards, Amul On Wed, Dec 6, 2017 at 9:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> Robert Haas <rhaas@postgresql.org> writes: >>> Support Parallel Append plan nodes. > >> Buildfarm member sidewinder doesn't like this: > > It transpires that my machine prairiedog also shows the failure. > I instrumented nodeAppend.c (as per attached patch) to see what > was going on, and I get this trace on prairiedog: > > 2017-12-05 22:50:30.228 EST [28812] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6 > 2017-12-05 22:50:30.228 EST [28812] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1; > 2017-12-05 22:50:30.241 EST [28813] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6 > 2017-12-05 22:50:30.241 EST [28813] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1; > 2017-12-05 22:50:30.252 EST [28814] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6 > 2017-12-05 22:50:30.252 EST [28814] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1; > 2017-12-05 22:50:30.363 EST [28816] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan 0, fpp 2, nplans 6 > 2017-12-05 22:50:30.363 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.363 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 0, fpp 2, nplans 6 > 2017-12-05 22:50:30.363 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.363 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 0, nextplan 1, fpp 2, nplans 6 > 2017-12-05 22:50:30.363 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.364 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 1, fpp 2, nplans 6 > 2017-12-05 22:50:30.364 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.364 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 1, nextplan 2, fpp 2, nplans 6 > 2017-12-05 22:50:30.364 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.364 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 2, fpp 2, nplans 6 > 2017-12-05 22:50:30.364 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.364 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 2, nextplan 3, fpp 2, nplans 6 > 2017-12-05 22:50:30.364 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.364 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 3, fpp 2, nplans 6 > 2017-12-05 22:50:30.364 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.365 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 3, nextplan 4, fpp 2, nplans 6 > 2017-12-05 22:50:30.365 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.365 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 4, fpp 2, nplans 6 > 2017-12-05 22:50:30.365 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.365 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 4, nextplan 5, fpp 2, nplans 6 > 2017-12-05 22:50:30.365 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.365 EST [28816] LOG: choose_next_subplan_for_worker: picking nextplan 5, fpp 2, nplans 6 > 2017-12-05 22:50:30.365 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.365 EST [28816] LOG: choose_next_subplan_for_worker: whichplan 5, nextplan 2, fpp 2, nplans 6 > 2017-12-05 22:50:30.365 EST [28816] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.407 EST [28815] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6 > 2017-12-05 22:50:30.407 EST [28815] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.474 EST [28817] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6 > 2017-12-05 22:50:30.474 EST [28817] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:50:30.568 EST [28818] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan 0, fpp 6, nplans 6 > 2017-12-05 22:50:30.568 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > 2017-12-05 22:50:30.568 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 0, fpp 6, nplans 6 > 2017-12-05 22:50:30.568 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > 2017-12-05 22:50:30.569 EST [28818] LOG: choose_next_subplan_for_worker: whichplan 0, nextplan 1, fpp 6, nplans 6 > 2017-12-05 22:50:30.569 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > 2017-12-05 22:50:30.569 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 1, fpp 6, nplans 6 > 2017-12-05 22:50:30.569 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > 2017-12-05 22:50:30.569 EST [28818] LOG: choose_next_subplan_for_worker: whichplan 1, nextplan 2, fpp 6, nplans 6 > 2017-12-05 22:50:30.569 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > 2017-12-05 22:50:30.569 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 2, fpp 6, nplans 6 > 2017-12-05 22:50:30.569 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: whichplan 2, nextplan 3, fpp 6, nplans 6 > 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 3, fpp 6, nplans 6 > 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: whichplan 3, nextplan 4, fpp 6, nplans 6 > 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 4, fpp 6, nplans 6 > 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: whichplan 4, nextplan 5, fpp 6, nplans 6 > 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > 2017-12-05 22:50:30.570 EST [28818] LOG: choose_next_subplan_for_worker: picking nextplan 5, fpp 6, nplans 6 > 2017-12-05 22:50:30.570 EST [28818] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > TRAP: FailedAssertion("!(append->first_partial_plan < node->as_nplans)", File: "nodeAppend.c", Line: 520) > 2017-12-05 22:50:38.211 EST [28158] LOG: background worker "parallel worker" (PID 28818) was terminated by signal 6: Aborttrap > 2017-12-05 22:50:38.211 EST [28158] DETAIL: Failed process was running: select round(avg(aa)), sum(aa) from a_star a3; > > That makes it pretty clear what's going wrong, but why don't we see it > elsewhere? > > On my development machine, the same patch yields > > 2017-12-05 22:42:24.187 EST [843] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6 > 2017-12-05 22:42:24.187 EST [843] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1; > 2017-12-05 22:42:24.188 EST [844] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6 > 2017-12-05 22:42:24.188 EST [844] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1; > 2017-12-05 22:42:24.188 EST [845] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 0, nplans 6 > 2017-12-05 22:42:24.188 EST [845] STATEMENT: select round(avg(aa)), sum(aa) from a_star a1; > 2017-12-05 22:42:24.197 EST [846] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6 > 2017-12-05 22:42:24.197 EST [846] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:42:24.200 EST [847] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6 > 2017-12-05 22:42:24.200 EST [847] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:42:24.200 EST [848] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 2, nplans 6 > 2017-12-05 22:42:24.200 EST [848] STATEMENT: select round(avg(aa)), sum(aa) from a_star a2; > 2017-12-05 22:42:24.214 EST [849] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 6, nplans 6 > 2017-12-05 22:42:24.214 EST [849] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > 2017-12-05 22:42:24.214 EST [852] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 6, nplans 6 > 2017-12-05 22:42:24.214 EST [852] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > 2017-12-05 22:42:24.214 EST [853] LOG: choose_next_subplan_for_worker: whichplan -1, nextplan -1, fpp 6, nplans 6 > 2017-12-05 22:42:24.214 EST [853] STATEMENT: select round(avg(aa)), sum(aa) from a_star a3; > > I conclude that the reason we don't see the crash except on the slowest > buildfarm critters is that on most machines the leader process manages > to finish off all the subplans before any of the workers get as far as > doing something. This doesn't give me a warm fuzzy feeling about how > much testing this patch has gotten. > > regards, tom lane >
Attachment
pgsql-committers by date: