Re: [sqlsmith] Failed assertions on parallel worker shutdown - Mailing list pgsql-hackers
From | Andreas Seltenreich |
---|---|
Subject | Re: [sqlsmith] Failed assertions on parallel worker shutdown |
Date | |
Msg-id | 87shx7ip0u.fsf@credativ.de Whole thread Raw |
In response to | Re: [sqlsmith] Failed assertions on parallel worker shutdown (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: [sqlsmith] Failed assertions on parallel worker shutdown
|
List | pgsql-hackers |
Amit Kapila writes: > On Mon, May 23, 2016 at 4:48 PM, Andreas Seltenreich <seltenreich@gmx.de> > wrote: >> plan6 corresponds to this query: >> > Are you sure that the core dumps you are seeing are due to plan6? Each of the plans sent was harvested from a controlling process when the above assertion failed in its workers. I do not know whether the plans themselves really are at fault, as most of the collected plans look ok to me. The backtrace in the controlling process always look like the one reported. (Except when the coredumping took so long as to trigger a statement_timeout in the still-running master. There are no plans/queries available in this case, as the the state is no longer available in an aborted transaction.) > I have tried to generate a parallel plan for above query and it seems to me that > after applying the patches (avoid_restricted_clause_below_gather_v1.patch > and prohibit_parallel_clause_below_rel_v1.patch), the plan it generates > doesn't have subplan below gather node [1]. > Without patch avoid_restricted_clause_below_gather_v1.patch, it will allow to push > subplan below gather node, so I think either there is some other plan > (query) due to which you are seeing core dumps or the above two patches > haven't been applied before testing. According to my notes, the patches were applied in the instance that crashed. The fact that I do not see the other variants of the crashes the patches fix anymore, and the probability for this failed assertion per random query is reduced by about a factor of 20 in contrast to testing with the patches not applied, I'm pretty certain that this is not a bookkeeping error on my part. > Is it possible that core dump is due to plan2 or some other similar > plan (I am not sure at this stage about the cause of the problem you > are seeing, but if due to some reason PARAM_EXEC params are pushed > below gather, then such a plan might not work)? If you think plan > other than plan6 can cause such a problem, then can you share the > query for plan2? Each of the sent plans was collected when a worker dumped core due to the failed assertion. More core dumps than plans were actually observed, since with this failed assertion, multiple workers usually trip on and dump core simultaneously. The following query corresponds to plan2: --8<---------------cut here---------------start------------->8--- select pg_catalog.pg_stat_get_bgwriter_requested_checkpoints() as c0, subq_0.c3 as c1, subq_0.c1 as c2, 31 as c3, 18 as c4,(select unique1 from public.bprime limit 1 offset 9) as c5, subq_0.c2 as c6 from (select ref_0.tablename as c0, ref_0.inherited as c1, ref_0.histogram_bounds as c2, 100 as c3 from pg_catalog.pg_statsas ref_0 where 49 is not NULL limit 55) as subq_0 where true limit 58; --8<---------------cut here---------------end--------------->8--- regards, Andreas
pgsql-hackers by date: