Re: Parallel Append can break run-time partition pruning - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: Parallel Append can break run-time partition pruning |
Date | |
Msg-id | CA+HiwqHjzbSj4MbrBTA34b8Un-rz56f8twhj_Sur1agfduNYMw@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Append can break run-time partition pruning (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Parallel Append can break run-time partition pruning
|
List | pgsql-hackers |
On Sun, Oct 25, 2020 at 10:06 AM David Rowley <dgrowleyml@gmail.com> wrote: > On Fri, 16 Oct 2020 at 03:01, Amit Langote <amitlangote09@gmail.com> wrote: > > > > Going over the last few emails, it seems that David held off from > > committing the patch, because of the lack of confidence in its > > robustness. With the patch, a sub-partitioned child's partial > > Append's child paths will *always* be pulled up into the parent's set > > of partial child paths thus preventing the nesting of Appends, which > > the run-time pruning code can't currently cope with. The lack of > > confidence seems to be due to the fact that always pulling up a > > sub-Append's child paths into the parent partial Append's child paths > > *may* cause the latter to behave wrongly under parallelism and the new > > code structure will prevent add_partial_path() from being the > > arbitrator of whether such a path is really the best in terms of cost. > > > > If we can't be confident in that approach, maybe we should consider > > making the run-time pruning code cope with nested Appends? > > I've been thinking about this and my thoughts are: Thanks for working on this. > There are other cases where we don't pullup sub-Merge Append nodes > anyway, so I think we should just make run-time pruning work for more > than just the top-level Append/Merge Append. > > The case I'm thinking about is the code added in 959d00e9dbe for > ordered Append scans. It's possible a sub-partitioned table has > partitions which don't participate in the same ordering. We need to > keep the sub-Merge Append in those cases... well, at least when > there's more than 1 partition remaining after plan-time pruning. Ah, I guess that case would also likewise fail to use runtime pruning properly. > I've attached the patch which, pending a final look, I'm proposing to > commit for master only. I just don't quite think this is a bug fix, > and certainly, due to the invasiveness of the proposed patch, that > means no backpatch. > > I fixed all the stuff about the incorrectly set partitioned_rels list. > What I ended up with there is making it accumulate_append_subpath's > job to also pullup the sub-paths partitioned_rels fields when pulling > up a nested Append/MergeAppend. If there's no pullup, there then we > don't care about the sub-path's partitioned_rels. That's for it to > deal with. With that, I think that run-time pruning will only get the > RT indexes for partitions that we actually have sub-paths for. That's > pretty good, so I added an Assert() to verify that in > make_partitionedrel_pruneinfo(). (I hope I don't regret that later) So AIUI, the fix here is to make any given Append/MergeAppend's part_prune_info only contain PartitionedRelPruneInfos of the (sub-) partitioned tables whose partitions' subplans are directly in appendplan/mergeplans, such that the partition indexes can be mapped to the subplan indexes. That does imply present_parts must be non-empty, so the Assert seems warranted. > This does mean we need to maintain a different partitioned_rels list > for each Append path we consider. So there's a number (six) of these > now between add_paths_to_append_rel() and > generate_orderedappend_paths(). To try to minimise the impact of > that, I've changed the field so that instead of being a List of > IntLists, it's just a List of Relids. The top-level list just > contains a single element until you get a UNION ALL that selects from > a partitioned table on each side of the union. Merging sub-path > partitioned rel RT indexes into the existing element is now pretty > cheap as it just uses bms_add_members() rather the list_concat we'd > have had to have used if it was still a List of IntLists. The refactoring seemed complicated on a first look, but overall looks good. > After fixing up how partitioned_rels is built, I saw there were no > usages of RelOptInfo.partitioned_child_rels, so I got rid of it. +1 > I did another couple of little cleanups and wrote some regression > tests to test all this. > > Overall I'm fairly happy with this, especially getting rid of a > partitioned table related field from RelOptInfo. Some comments: + * For partitioned tables, we accumulate a list of the partitioned RT + * indexes for the subpaths that are directly under this Append. Maybe: For partitioned tables, accumulate a list of the RT indexes of partitioned tables in the tree whose partitions' subpaths are directly under this Append. + * lists for each Append Path that we create as accumulate_append_subpath + * sometimes can't flatten sub-Appends into the top-level Append. How about expanding that reason a little bit as: ...can't flatten sub-Appends into the top-level Append which prevents the former's partitioned_rels from being pulled into the latter's. + * most one element which is a RelIds of the partitioned relations which there s/RelIds/Relids + * are subpaths for. In this case we just add the RT indexes for the + * partitioned tables for the subpath we're pulling up to the single entry in + * 'partitioned_rels'. How about: In this case, we just pull the RT indexes contained in sub-Append/MergeAppend's partitioned_rels into the single entry of *partitioned_rels, which belongs to the parent Append. * relid_subpart_map maps relid of a non-leaf partition to the index - * in 'partitioned_rels' of that rel (which will also be the index in - * the returned PartitionedRelPruneInfo list of the info for that + * in 'partrelids' of that rel (which will also be the index in the + * returned PartitionedRelPruneInfo list of the info for that ...the index in 'partrelids', which in the new code is a bitmap set, sounds a bit odd. Maybe just mention the index in the list of PartitionedRelPruneInfos as: relid_subpart_map maps relid of a given non-leaf partition in 'partrelids' to the index of its PartitionedRelPruneInfo in the returned list. + /* + * Ensure there were no stray PartitionedRelPruneInfo generated for + * partitioned tables that had no sub-paths for. + */ + Assert(!bms_is_empty(present_parts)); Maybe you meant: ...for partitioned tables for which we had neither leaf subpaths nor sub-PartitionedRelPruneInfos. + List *partitioned_rels; /* List of Relids for each non-leaf + * partitioned table in the partition + * tree. One for each partition hierarchy. + */ List *subpaths; /* list of component Paths */ How about describing partitioned_rels as follows: List of Relids set containing RT indexes of non-leaf tables for each partition hierarchy whose paths are in 'subpaths' -- Amit Langote EDB: http://www.enterprisedb.com
pgsql-hackers by date: