Re: SegFault on 9.6.14 - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: SegFault on 9.6.14 |
Date | |
Msg-id | CAA4eK1LE=Nnrc007sS00_nNt0eYdOejmikKcZ7S92mXqFi7n=A@mail.gmail.com Whole thread Raw |
In response to | Re: SegFault on 9.6.14 (vignesh C <vignesh21@gmail.com>) |
List | pgsql-hackers |
On Wed, Aug 7, 2019 at 3:15 PM vignesh C <vignesh21@gmail.com> wrote: > > On Wed, Jul 31, 2019 at 9:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Jul 31, 2019 at 12:05 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > > The other idea we had discussed which comes closer to adopting Tom's > > position was that during ExecShutdownNode, we just destroy parallel > > workers, collect instrumentation data and don't destroy the parallel > > context. The parallel context could be destroyed in ExecEndNode > > (ExecEndGather(Merge)) code path. The problem with this idea is that > > ExitParallelMode doesn't expect parallel context to be active. Now, > > we can either change the location of Exit/EnterParallelMode or relax > > that restriction. As mentioned above that restriction appears good to > > me, so I am not in favor of changing it unless we have some other > > solid way to install it. I am not sure if this idea is better than > > other approaches we are discussing. > > > > > I have made a patch based on the above lines. > I have tested the scenarios which Thomas had shared in the earlier > mail and few more tests based on Thomas's tests. > I'm not sure if we will be going ahead with this solution or not. > Let me know your opinion on the same. > If you feel this approach is ok, we can add few of this tests into pg tests. > This patch is on the lines of what I had in mind, but I see some problems in this which are explained below. The other approach to fix this was to move Enter/ExitParallelMode to the outer layer. For ex., can we enter in parallel mode during InitPlan and exit from it during ExecEndPlan? That might not be good to backpatch, but it might turn out to be more robust than the current approach. Few comments on your patch: 1. @@ -569,13 +569,6 @@ ExecParallelCleanup(ParallelExecutorInfo *pei) if (pei->instrumentation) ExecParallelRetrieveInstrumentation(pei->planstate, pei->instrumentation); - - if (pei->pcxt != NULL) - { - DestroyParallelContext(pei->pcxt); - pei->pcxt = NULL; - } - pfree(pei); } Here, you have just removed parallel context-free, but I think we can't detach from parallel context area here as well, otherwise, it will create similar problems in other cases. Note, that we create the area only in ExecInitParallelPlan and just reuse it in ExecParallelReinitialize. So, if we allow getting it destroyed in ExecParallelCleanup (which is called via ExecShutdownNode), we won't have access to it in rescan code path. IT is better if we have a test for the same as well. I think we should only retrieve the instrumentation information here. Also, if we do that, then we might also want to change function name and comments atop of this function. 2. ExecEndGather(GatherState *node) { + ParallelExecutorInfo *pei = node->pei; ExecShutdownGather(node); + + if (pei != NULL) + { + if (pei->pcxt != NULL) + { + DestroyParallelContext(pei->pcxt); + pei->pcxt = NULL; + } + + pfree(pei); + node->pei = NULL; + } I feel that it is better you move a collection of instrumentation information from ExecParallelCleanup to a separate function and then use ExecParallelCleanup here. 3. extern bool IsInParallelMode(void); +extern bool getParallelModeLevel(void); To be consistent, it better to name the function as GetParallelModeLevel. 4. @@ -1461,6 +1461,8 @@ ExecEndPlan(PlanState *planstate, EState *estate) ExecEndNode(subplanstate); } + if (estate->es_use_parallel_mode) + Assert (getParallelModeLevel() > 0 || !ParallelContextActive()); Add some comments here to explain about this Assert. I am not sure if this is correct because it won't fail even if the parallel mode is non-zero and there is no parallel context. At this stage, we must have exited the parallel mode. 5. explain analyze select count(*) from join_foo left join (select b1.id from join_bar b1 limit 1000) ss All the tests in your test file use left join to reproduce the issue, but I think it should be reproducible with inner join as well. This comment is not that your test case is wrong, but I want to see if we can further simplify it. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: