Re: FailedAssertion("pd_idx == pinfo->nparts", File: "execPartition.c", Line: 1689) - Mailing list pgsql-hackers

From Amit Langote
Subject Re: FailedAssertion("pd_idx == pinfo->nparts", File: "execPartition.c", Line: 1689)
Date
Msg-id CA+HiwqH=nLqn4e9j-Xwwz06fLeNh7rCBzp7iUG4iWrSMmWzBUQ@mail.gmail.com
Whole thread Raw
In response to Re: FailedAssertion("pd_idx == pinfo->nparts", File: "execPartition.c", Line: 1689)  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
On Wed, Aug 5, 2020 at 10:04 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> On Wed, Aug 05, 2020 at 09:53:44AM +0900, Amit Langote wrote:
> > On Wed, Aug 5, 2020 at 9:52 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > > On Wed, Aug 5, 2020 at 9:32 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > > On Wed, Aug 05, 2020 at 09:26:20AM +0900, Amit Langote wrote:
> > > > > On Wed, Aug 5, 2020 at 12:11 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > > > >
> > > > > > On Tue, Aug 04, 2020 at 08:12:10PM +0900, Amit Langote wrote:
> > > > > > > It may be this commit that went into PG 12 that is causing the problem:
> > > > > >
> > > > > > Thanks for digging into this.
> > > > > >
> > > > > > > to account for partitions that were pruned by the planner for which we
> > > > > > > decided to put 0 into relid_map, but it only considered the case where
> > > > > > > the number of partitions doesn't change since the plan was created.
> > > > > > > The crash reported here is in the other case where the concurrently
> > > > > > > added partitions cause the execution-time PartitionDesc to have more
> > > > > > > partitions than the one that PartitionedRelPruneInfo is based on.
> > > > > >
> > > > > > Is there anything else needed to check that my crash matches your analysis ?
> > > > >
> > > > > If you can spot a 0 in the output of the following, then yes.
> > > > >
> > > > > (gdb) p *pinfo->relid_map@pinfo->nparts
> > > >
> > > > I guess you knew that an earlier message has just that.  Thanks.
> > > > https://www.postgresql.org/message-id/20200803161133.GA21372@telsasoft.com
> > >
> > > Yeah, you showed:
> > >
> > > (gdb) p *pinfo->relid_map@414
> > >
> > > And there is indeed a 0 in there, but I wasn't sure if it was actually
> > > in the array or a stray zero due to forcing gdb to show beyond the
> > > array bound.  Does pinfo->nparts match 414?
>
> Yes.  I typed 414 manually since the the array lengths were suspect.
>
> (gdb) p pinfo->nparts
> $1 = 414
> (gdb) set print elements 0
> (gdb) p *pinfo->relid_map@pinfo->nparts
> $3 = {....
>   21151836, 21151726, 21151608, 21151498, 21151388, 21151278, 21151168, 21151055, 2576248, 2576255, 2576262, 2576269,
2576276,21456497, 22064128, 0}
 

Thanks.  There is a 0 in there, which can only be there if planner was
able to prune that last partition.  So, the planner saw a table with
414 partitions, was able to prune the last one and constructed an
Append plan with 413 subplans for unpruned partitions as you showed
upthread:

> (gdb) p *node->appendplans
> $17 = {type = T_List, length = 413, max_length = 509, elements = 0x7037400, initial_elements = 0x7037400}

This suggests that the crash I was able produce is similar to what you saw.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: new heapcheck contrib module
Next
From: Justin Pryzby
Date:
Subject: pg13dev: explain partial, parallel hashagg, and memory use