Re: Parallel Append can break run-time partition pruning - Mailing list pgsql-hackers

From David Rowley
Subject Re: Parallel Append can break run-time partition pruning
Date
Msg-id CAApHDvqQR7i6feyEvFaE_gr2UgX0=BNHW63tyHHZrsvu5UEYrA@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Append can break run-time partition pruning  (Amit Langote <amitlangote09@gmail.com>)
Responses Re: Parallel Append can break run-time partition pruning
List pgsql-hackers
On Fri, 16 Oct 2020 at 03:01, Amit Langote <amitlangote09@gmail.com> wrote:
>
> Going over the last few emails, it seems that David held off from
> committing the patch, because of the lack of confidence in its
> robustness.   With the patch, a sub-partitioned child's partial
> Append's child paths will *always* be pulled up into the parent's set
> of partial child paths thus preventing the nesting of Appends, which
> the run-time pruning code can't currently cope with.  The lack of
> confidence seems to be due to the fact that always pulling up a
> sub-Append's child paths into the parent partial Append's child paths
> *may* cause the latter to behave wrongly under parallelism and the new
> code structure will prevent add_partial_path() from being the
> arbitrator of whether such a path is really the best in terms of cost.
>
> If we can't be confident in that approach, maybe we should consider
> making the run-time pruning code cope with nested Appends?

I've been thinking about this and my thoughts are:

There are other cases where we don't pullup sub-Merge Append nodes
anyway, so I think we should just make run-time pruning work for more
than just the top-level Append/Merge Append.

The case I'm thinking about is the code added in 959d00e9dbe for
ordered Append scans.  It's possible a sub-partitioned table has
partitions which don't participate in the same ordering.  We need to
keep the sub-Merge Append in those cases... well, at least when
there's more than 1 partition remaining after plan-time pruning.

I've attached the patch which, pending a final look, I'm proposing to
commit for master only.  I just don't quite think this is a bug fix,
and certainly, due to the invasiveness of the proposed patch, that
means no backpatch.

I fixed all the stuff about the incorrectly set partitioned_rels list.
What I ended up with there is making it accumulate_append_subpath's
job to also pullup the sub-paths partitioned_rels fields when pulling
up a nested Append/MergeAppend.   If there's no pullup, there then we
don't care about the sub-path's partitioned_rels.  That's for it to
deal with.  With that, I think that run-time pruning will only get the
RT indexes for partitions that we actually have sub-paths for. That's
pretty good, so I added an Assert() to verify that in
make_partitionedrel_pruneinfo().  (I hope I don't regret that later)

This does mean we need to maintain a different partitioned_rels list
for each Append path we consider.  So there's a number (six) of these
now between add_paths_to_append_rel() and
generate_orderedappend_paths().  To try to minimise the impact of
that, I've changed the field so that instead of being a List of
IntLists, it's just a List of Relids.  The top-level list just
contains a single element until you get a UNION ALL that selects from
a partitioned table on each side of the union.  Merging sub-path
partitioned rel RT indexes into the existing element is now pretty
cheap as it just uses bms_add_members() rather the list_concat we'd
have had to have used if it was still a List of IntLists.

After fixing up how partitioned_rels is built, I saw there were no
usages of RelOptInfo.partitioned_child_rels, so I got rid of it.

I did another couple of little cleanups and wrote some regression
tests to test all this.

Overall I'm fairly happy with this, especially getting rid of a
partitioned table related field from RelOptInfo.

David

Attachment

pgsql-hackers by date:

Previous
From: Dmitry Dolgov
Date:
Subject: Re: Index Skip Scan (new UniqueKeys)
Next
From: "Hou, Zhijie"
Date:
Subject: Fix typo in src/backend/utils/cache/lsyscache.c