Re: Internal error XX000 with enable_partition_pruning=on, pg 11beta1 on Debian - Mailing list pgsql-hackers

From Amit Langote
Subject Re: Internal error XX000 with enable_partition_pruning=on, pg 11beta1 on Debian
Date
Msg-id 2776b664-b175-5ebc-c8c9-d234a2f733b4@lab.ntt.co.jp
Whole thread Raw
In response to Re: Internal error XX000 with enable_partition_pruning=on, pg 11beta1 on Debian  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: Internal error XX000 with enable_partition_pruning=on, pg 11beta1 on Debian
List pgsql-hackers
On 2018/06/11 16:49, David Rowley wrote:
> On 11 June 2018 at 12:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> David Rowley <david.rowley@2ndquadrant.com> writes:
>>> On 10 June 2018 at 04:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> So, IIUC, the issue is that for partitioning cases Append expects *all*
>>>> its children to be partitions of the *same* partitioned table?  That
>>>> is, you could also break it with
>>>>
>>>> select * from partitioned_table_a
>>>> union all
>>>> select * from partitioned_table_b
>>>>
>>>> ?
>>
>>> Not quite.

That would be correct I think.  An Append may contain multiple partitioned
tables that all appear under an UNION ALL parent, as in the OP's case and
the example above.  In this case, the partitioned_rels list of Append
consist of non-leaf tables from *all* of the partitioned tables.  Before
run-time pruning arrived, the only purpose of partitioned_rels list was to
make sure that the executor goes through it and locks all of those
non-leaf tables (ExecLockNonLeafAppendTables).  Run-time pruning expanded
its usage by depending it to generate run-time pruning info.

>> I just had a thought that might lead to a nice solution to that, or
>> might be totally crazy.  What if we inverted the sense of the bitmaps
>> that track partition pruning state, so that instead of a bitmap of
>> valid partitions that need to be scanned, we had a bitmap of pruned
>> partitions that we know we don't need to scan?  (The indexes of this
>> bitmap would be subplan indexes not partition indexes.)  With this
>> representation, it doesn't matter if some of the Append's children
>> are not supposed to participate in pruning; they just don't ever get
>> added to the bitmap of what to skip.  It's also fairly clear, I think,
>> how to handle independent pruning rules for different top-level tables
>> that are being unioned together: just OR the what-to-skip bitmaps.
>> But there may be some reason why this isn't workable.
> 
> I think it would be less efficient. A common case and one that I very
> much would like to make as fast as possible is when all but a single
> partition is pruned. Doing the opposite sounds like more effort would
> need to be expended to get the subplans that we do need to scan.
> 
> I don't really see the way it works now as a huge problem to overcome
> in pruning. We'd just a list of subplans that don't belong to the
> hierarchy and tag them on to the matches found in
> ExecFindInitialMatchingSubPlans and ExecFindMatchingSubPlans. The
> bigger issue to overcome is the mixed flattened list of partition RT
> indexes in partitioned_rels.  Perhaps having a list of Lists for
> partitioned_rels could be used to resolve that. The question is more,
> how to solve for PG11. Do we need that?
> 
> I think we'll very soon be wanting to have ordered partition scans
> where something like:
> 
> create table listp(a int) partition by list(a);
> create index on listp(a);
> create table listp1 partition of listp for values in (1);
> create table listp2 partition of listp for values in (2);
> 
> and
> 
> select * from listp order by a;
> 
> would be possible with an Append and Index Scan, rather than having a
> MergeAppend or Sort. In which case we'll not want mixed partition
> hierarchies in the Append subplans. Although, perhaps that would mean
> we just wouldn't pullup AppendPaths which have PathKeys.
> 
> I have written and attached the patch to stop flattening of
> partitioned tables into UNION ALL parent's paths, meaning we can now
> get nested Append and MergeAppend paths.
> 
> I've added Robert too as I know he was the committer of partitioning
> and parallel Append. Maybe he has a view on what should be done about
> this? Is not flattening the paths a problem?

Not speaking for Robert here, just saying from what I know.

I don't think your patch breaks anything, even if does change the shape of
the plan.  So, for:

select * from partitioned_table_a
union all
select * from partitioned_table_b

The only thing that changes with the patch is that
ExecLockNonLeafAppendTables is called *twice* for the two nested Appends
corresponding to partitioned_table_a and partitioned_table_b, resp.,
instead of just once for the top level Append corresponding to the UNION
ALL parent.  In fact, when called for the top level Append,
ExecLockNonLeafAppendTables is now a no-op.

Thanks,
Amit



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Index maintenance function for BRIN doesn't check RecoveryInProgress()
Next
From: Amit Langote
Date:
Subject: Re: why partition pruning doesn't work?