Home > mailing lists

Re: [sqlsmith] Failed assertions on parallel worker shutdown - Mailing list pgsql-hackers

From	Andreas Seltenreich
Subject	Re: [sqlsmith] Failed assertions on parallel worker shutdown
Date	May 24, 2016 16:06:39
Msg-id	87shx7ip0u.fsf@credativ.de Whole thread Raw
In response to	Re: [sqlsmith] Failed assertions on parallel worker shutdown (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: [sqlsmith] Failed assertions on parallel worker shutdown
List	pgsql-hackers

Tree view

Amit Kapila writes:

> On Mon, May 23, 2016 at 4:48 PM, Andreas Seltenreich <seltenreich@gmx.de>
> wrote:
>> plan6 corresponds to this query:
>>
> Are you sure that the core dumps you are seeing are due to plan6?

Each of the plans sent was harvested from a controlling process when the
above assertion failed in its workers.  I do not know whether the plans
themselves really are at fault, as most of the collected plans look ok
to me.  The backtrace in the controlling process always look like the
one reported. (Except when the coredumping took so long as to trigger a
statement_timeout in the still-running master. There are no
plans/queries available in this case, as the the state is no longer
available in an aborted transaction.)

> I have tried to generate a parallel plan for above query and it seems to me that
> after applying the patches (avoid_restricted_clause_below_gather_v1.patch
> and prohibit_parallel_clause_below_rel_v1.patch), the plan it generates
> doesn't have subplan below gather node [1].

> Without patch avoid_restricted_clause_below_gather_v1.patch, it will allow to push
> subplan below gather node, so I think either there is some other plan
> (query) due to which you are seeing core dumps or the above two patches
> haven't been applied before testing.

According to my notes, the patches were applied in the instance that
crashed.  The fact that I do not see the other variants of the crashes
the patches fix anymore, and the probability for this failed assertion
per random query is reduced by about a factor of 20 in contrast to
testing with the patches not applied, I'm pretty certain that this is
not a bookkeeping error on my part.

> Is it possible that core dump is due to plan2 or some other similar
> plan (I am not sure at this stage about the cause of the problem you
> are seeing, but if due to some reason PARAM_EXEC params are pushed
> below gather, then such a plan might not work)?  If you think plan
> other than plan6 can cause such a problem, then can you share the
> query for plan2?

Each of the sent plans was collected when a worker dumped core due to
the failed assertion.  More core dumps than plans were actually
observed, since with this failed assertion, multiple workers usually
trip on and dump core simultaneously.

The following query corresponds to plan2:

--8<---------------cut here---------------start------------->8---
select pg_catalog.pg_stat_get_bgwriter_requested_checkpoints() as c0, subq_0.c3 as c1, subq_0.c1 as c2, 31 as c3, 18 as
c4,(select unique1 from public.bprime limit 1 offset 9) as c5, subq_0.c2 as c6

from
(select ref_0.tablename as c0, ref_0.inherited as c1,       ref_0.histogram_bounds as c2, 100 as c3     from
pg_catalog.pg_statsas ref_0     where 49 is not NULL limit 55) as subq_0

where true
limit 58;
--8<---------------cut here---------------end--------------->8---

regards,
Andreas

pgsql-hackers by date:

From: Amit Kapila
Date: 24 May 2016, 15:33:39
Subject: Re: [sqlsmith] Failed assertions on parallel worker shutdown

From: Jim Nasby
Date: 24 May 2016, 16:53:17
Subject: Re: Inheritance

Re: [sqlsmith] Failed assertions on parallel worker shutdown - Mailing list pgsql-hackers

Previous

Next