Thread: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
It seems like a good idea for the planner not to generate Append or MergeAppend paths when the node contains just a single subpath. If we were able to remove these then the planner would have more flexibility to build a better plan. As of today, because we include this needless [Merge]Append node, we cannot parallelise scans below the Append. We must also include a Materialize node in Merge Joins since both MergeAppend and Append don't support mark and restore. Also, as shown in [1], there's an overhead to pulling tuples through nodes. I've been looking into resolving this issue but the ways I've explored so far seems to be bending the current planner a bit out of shape. Method 1: In set_append_rel_size() detect when just a single subpath would be added to the Append path. Set a promotion_child RelOptInfo field in the base rel's RelOptInfo, and during make_one_rel, after set_base_rel_sizes() modify the joinlist to get rid of the parent and use the child instead. This pretty much breaks the assumption we have that the finalrel will have all the relids to root->all_baserels. We have an Assert in make_one_rel() which checks this is true. There are also complications around set_rel_size() generating paths for subqueries which may be parameterized paths with the Append parent required for the parameterization to work. Method 2: Invent a ProxyPath concept that allows us to add Paths belonging to one relation to another relation. The ProxyPaths can have translation_vars to translate targetlists to reference the correct Vars. These ProxyPaths could exist up as far as createplan, where we could perform the translation and build the ProxyPath's subnode instead. This method is not exactly very clean either as there are various places around the planner we'd need to give special treatment to these ProxyPaths, such as is_projection_capable_path() would need to return the projection capability of the subpath within the ProxyPath since we never actually create a "Proxy" node. Probably either of these two methods could be made to work. I prefer method 2, since that infrastructure could one day be put towards scanning a Materialized View instead of the relation. in order to satisfy some query. However, method 2 appears to also require some Var translation in Path targetlists which contain this Proxy path, either that or some global Var replacement would need to be done during setrefs. I'm posting here in the hope that it will steer my development of this in a direction that's acceptable to the community. Perhaps there is also a "Method 3" which I've not thought about which would be much cleaner. [1] https://www.postgresql.org/message-id/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B%3DZRh-rxy9qxfPA5Gw%40mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Oct 26, 2017 at 3:29 AM, David Rowley <david.rowley@2ndquadrant.com> wrote: > However, method 2 appears to also require some Var > translation in Path targetlists which contain this Proxy path, either > that or some global Var replacement would need to be done during > setrefs. > For inheritance and partitioning, we translate parent expressions to child expression by applying Var translations. The translated expressions differ only in the Var nodes (at least for partitioning this true, and I believe it to be true for inheritance). I have been toying with an idea of having some kind of a (polymorphic?) Var node which can represent parent and children. That will greatly reduce the need to translate the expression trees and will also help in this implementation. I am wondering, if ProxyPath could be helpful in avoiding the cost of reparameterize_path_by_child() introduced for partition-wise join. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Oct 25, 2017 at 11:59 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: > As of today, because we include this needless [Merge]Append node, we > cannot parallelise scans below the Append. Without disputing your general notion that singe-child Append or MergeAppend nodes are a pointless waste, how does a such a needless node prevent parallelizing scans beneath it? > Invent a ProxyPath concept that allows us to add Paths belonging to > one relation to another relation. The ProxyPaths can have > translation_vars to translate targetlists to reference the correct > Vars. These ProxyPaths could exist up as far as createplan, where we > could perform the translation and build the ProxyPath's subnode > instead. This idea sounds like it might have some legs, although I'm not sure "proxy" is really the right terminology. Like both you and Ashutosh, I suspect that there might be some other applications for this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
David Rowley <david.rowley@2ndquadrant.com> wrote: > Method 1: > > In set_append_rel_size() detect when just a single subpath would be > added to the Append path. I spent some time reviewing the partition-wise join patch during the last CF and have an impression that set_append_rel_size() is called too early to find out how many child paths will eventually exist if Append represents a join of a partitioned table. I think the partition matching takes place at later stage and that determines the actual number of partitions, and therefore the number of Append subpaths. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 26 October 2017 at 23:30, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Oct 25, 2017 at 11:59 PM, David Rowley > <david.rowley@2ndquadrant.com> wrote: >> As of today, because we include this needless [Merge]Append node, we >> cannot parallelise scans below the Append. > > Without disputing your general notion that singe-child Append or > MergeAppend nodes are a pointless waste, how does a such a needless > node prevent parallelizing scans beneath it? hmm, I think I was wrong about that now. I had been looking over the regression test diffs after having made tenk1 a partitioned table with a single partition containing all the rows, but it looks like I read the diff backwards. The planner actually parallelized the Append version, not the non-Append version, like I had previously thought. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 26 October 2017 at 23:42, Antonin Houska <ah@cybertec.at> wrote: > David Rowley <david.rowley@2ndquadrant.com> wrote: > >> Method 1: >> >> In set_append_rel_size() detect when just a single subpath would be >> added to the Append path. > > I spent some time reviewing the partition-wise join patch during the last CF > and have an impression that set_append_rel_size() is called too early to find > out how many child paths will eventually exist if Append represents a join of > a partitioned table. I think the partition matching takes place at later stage > and that determines the actual number of partitions, and therefore the number > of Append subpaths. I understand that we must wait until set_append_rel_size() is being called on the RELOPT_BASEREL since partitions may themselves be partitioned tables and we'll only know what's left after we've recursively checked all partitions of the baserel. I've not studied the partition-wise join code yet, but if we've eliminated all but one partition in set_append_rel_size() then I imagine we'd just need to ensure partition-wise join is not attempted since we'd be joining directly to the only partition anyway. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Oct 26, 2017 at 5:07 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 26 October 2017 at 23:42, Antonin Houska <ah@cybertec.at> wrote: >> David Rowley <david.rowley@2ndquadrant.com> wrote: >> >>> Method 1: >>> >>> In set_append_rel_size() detect when just a single subpath would be >>> added to the Append path. >> >> I spent some time reviewing the partition-wise join patch during the last CF >> and have an impression that set_append_rel_size() is called too early to find >> out how many child paths will eventually exist if Append represents a join of >> a partitioned table. I think the partition matching takes place at later stage >> and that determines the actual number of partitions, and therefore the number >> of Append subpaths. > > I understand that we must wait until set_append_rel_size() is being > called on the RELOPT_BASEREL since partitions may themselves be > partitioned tables and we'll only know what's left after we've > recursively checked all partitions of the baserel. > > I've not studied the partition-wise join code yet, but if we've > eliminated all but one partition in set_append_rel_size() then I > imagine we'd just need to ensure partition-wise join is not attempted > since we'd be joining directly to the only partition anyway. > I think Antonin has a point. The join processing may deem some base relations dummy (See populate_joinrel_with_paths()). So an Append relation which had multiple child alive at the end of set_append_rel_size() might ultimately have only one child after partition-wise join has worked on it. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
David Rowley <david.rowley@2ndquadrant.com> writes: > It seems like a good idea for the planner not to generate Append or > MergeAppend paths when the node contains just a single subpath. > ... > Perhaps there is also a "Method 3" which I've not thought about which > would be much cleaner. Have you considered turning AppendPath with a single child into more of a special case? I think this is morally equivalent to your ProxyPath proposal, but it would take less new boilerplate code to support. If you taught create_append_path to inherit the child's pathkeys when there's only one child, and taught createplan.c to skip making the useless Append node, I think you might be nearly done. This might seem a bit ugly, but considering that zero-child AppendPaths are already a special case (and a much bigger one), I don't have much of a problem with decreeing that one-child is also a special case. As an example of why this might be better than a separate ProxyPath node type, consider this call in add_paths_to_append_rel: /* * If we found unparameterized paths for all children, build an unordered, * unparameterized Append path for therel. (Note: this is correct even * if we have zero or one live subpath due to constraint exclusion.) */ if (subpaths_valid) add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0, partitioned_rels)); That comment could stand a bit of adjustment with this approach, but the code itself requires no extra work. Now, you would have to do something about this in create_append_plan: /* * XXX ideally, if there's just one child, we'd not bother to generate an * Append node but just return the singlechild. At the moment this does * not work because the varno of the child scan plan won't match the * parent-relVars it'll be asked to emit. */ but that's going to come out in the wash someplace, no matter what you do. Leaving it for the last minute is likely to reduce the number of places you have to teach about this. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 27 October 2017 at 01:44, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > I think Antonin has a point. The join processing may deem some base > relations dummy (See populate_joinrel_with_paths()). So an Append > relation which had multiple child alive at the end of > set_append_rel_size() might ultimately have only one child after > partition-wise join has worked on it. hmm, I see. partition-wise join is able to remove eliminate partitions on the other side of the join that can't be matched to: # set enable_partition_wise_join=on; SET # explain select count(*) from ab ab1 inner join ab ab2 on ab1.a=ab2.a where ab1.a between 1 and 2 and ab2.a between 100000000 and 100000001; QUERY PLAN ------------------------------------------------Aggregate (cost=0.00..0.01 rows=1 width=8) -> Result (cost=0.00..0.00rows=0 width=0) One-Time Filter: false (3 rows) # \d+ ab Table "public.ab"Column | Type | Collation | Nullable | Default | Storage |Stats target | Description --------+---------+-----------+----------+---------+---------+--------------+-------------a | integer | | | | plain | |b | integer | | | | plain | | Partition key: RANGE (a) Partitions: ab_p1 FOR VALUES FROM (1) TO (100000000), ab_p2 FOR VALUES FROM (100000000) TO (200000000) -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
David Rowley <david.rowley@2ndquadrant.com> writes: > On 27 October 2017 at 01:44, Ashutosh Bapat > <ashutosh.bapat@enterprisedb.com> wrote: >> I think Antonin has a point. The join processing may deem some base >> relations dummy (See populate_joinrel_with_paths()). So an Append >> relation which had multiple child alive at the end of >> set_append_rel_size() might ultimately have only one child after >> partition-wise join has worked on it. TBH, that means partition-wise join planning is broken and needs to be redesigned. It's impossible that we're going to make sane planning choices if the sizes of relations change after we've already done some planning work. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 27, 2017 at 12:49 AM, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 27 October 2017 at 01:44, Ashutosh Bapat > <ashutosh.bapat@enterprisedb.com> wrote: >> I think Antonin has a point. The join processing may deem some base >> relations dummy (See populate_joinrel_with_paths()). So an Append >> relation which had multiple child alive at the end of >> set_append_rel_size() might ultimately have only one child after >> partition-wise join has worked on it. > > hmm, I see. partition-wise join is able to remove eliminate partitions > on the other side of the join that can't be matched to: > > # set enable_partition_wise_join=on; > SET > # explain select count(*) from ab ab1 inner join ab ab2 on ab1.a=ab2.a > where ab1.a between 1 and 2 and ab2.a between 100000000 and 100000001; > QUERY PLAN > ------------------------------------------------ > Aggregate (cost=0.00..0.01 rows=1 width=8) > -> Result (cost=0.00..0.00 rows=0 width=0) > One-Time Filter: false > (3 rows) > > # \d+ ab > Table "public.ab" > Column | Type | Collation | Nullable | Default | Storage | Stats > target | Description > --------+---------+-----------+----------+---------+---------+--------------+------------- > a | integer | | | | plain | | > b | integer | | | | plain | | > Partition key: RANGE (a) > Partitions: ab_p1 FOR VALUES FROM (1) TO (100000000), > ab_p2 FOR VALUES FROM (100000000) TO (200000000) > IIUC, ab1_p2 and ab2_p1 are marked dummy by constraint exclusion. Because of partition-wise join when the corresponding partitions are joined, their inner joins are marked dummy resulting in an empty join. This isn't the case of a join processing marking one of the joining relations dummy, that I was referring to. But I think it's relevant here since partition-wise join might create single child joins this way. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 27, 2017 at 1:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: >> On 27 October 2017 at 01:44, Ashutosh Bapat >> <ashutosh.bapat@enterprisedb.com> wrote: >>> I think Antonin has a point. The join processing may deem some base >>> relations dummy (See populate_joinrel_with_paths()). So an Append >>> relation which had multiple child alive at the end of >>> set_append_rel_size() might ultimately have only one child after >>> partition-wise join has worked on it. > > TBH, that means partition-wise join planning is broken and needs to be > redesigned. It's impossible that we're going to make sane planning > choices if the sizes of relations change after we've already done some > planning work. The mark_dummy_rel() call that marks a joining relation dummy was added by 719012e013f10f72938520c46889c52df40fa329. The joining relations are planned before the joins they participate in are planned. Further some of the joins in which it participates might have been planned before the joining pair, which causes it to be marked dummy, is processed by populate_joinrel_with_paths(). So we already have places where the sizes of relation changes during planning. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 27 October 2017 at 04:47, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: >> It seems like a good idea for the planner not to generate Append or >> MergeAppend paths when the node contains just a single subpath. >> ... >> Perhaps there is also a "Method 3" which I've not thought about which >> would be much cleaner. > > Have you considered turning AppendPath with a single child into more > of a special case? I think this is morally equivalent to your ProxyPath > proposal, but it would take less new boilerplate code to support. > If you taught create_append_path to inherit the child's pathkeys > when there's only one child, and taught createplan.c to skip making the > useless Append node, I think you might be nearly done. That seems reasonable, although after going and implementing this, it does not seem quite as convenient as dummy paths are. I had to make the AppendPath carries a bit more weight than I originally thought. It now stores translate to/from expressions and also a single bool (isproxy) to mark if it's a proxy type Append. I tried to do without this bool and just check if there's a single subpath and some translation expressions, but that was getting fooled by Paths which had empty targetlists which meant there were no translations, and going on the single subpath alone is no good as do we create append paths more places than just add_paths_to_append_rel(). I've attached a patch which goes and implements all this. I think it still needs a bit of work. I'm not 100% confident I'm translating expressions in all the required places in createplan.c. I did modify tenk1 to become the parent of a single partition and ran the regression tests. There appear to be 4 subtle differences to querying a parent with an only-child vs querying the child directly. 1. EXPLAIN VERBOSE shows the columns prefixed by the table name. This is caused by "useprefix = list_length(es->rtable) > 1;" in explain.c. 2. There's a small change in an error message in portals.out. "ERROR: cursor "c" is not a simply updatable scan of table "tenk1a"" now mentions the child table instead of the parent. Nothing to worry about as it appears to already do this with native partitioning. 3. Some plans are parallelized that were not before (see join.out). This is caused by compute_parallel_worker() having a special case for non-baserels. 4. Some gating plans don't work the same as they used to, this is highlighted in aggregates.out and is caused by child rel Const TRUE baserestrictinfo clauses being dropped in set_append_rel_size. For differences 2-4, I've attached another file which modifies tenk1 to become a partitioned table with a single partition. I've modified all the expected results for #1 to reduce the noise, so 3 tests are still failing for reasons 2-4. I'm not at all concerned with #2. #3 might not be worth fixing, I've just no idea how yet and #4 seems like it's on purpose. Other things I'm not 100% happy with are; in prepunion.c I had to invent a new function named find_all_appinfos_for_child, which is a bit similar to adjust_appendrel_attrs_multilevel, only it modifies the attributes, but the use I have for find_all_appinfos_for_child includes also calling add_child_rel_equivalences for the children at each level. Probably a bit more thinking will come up with some way to modify the existing function sets to be more reusable, I've just not done that yet. I decided that in createplan.c to have a generic expression translator rather than something that only understands AppendRelInfos. The reason for this is two-fold: 1. There may be multiple AppendRelInfos required for a single translation between a parent and child if there are multiple other children in between. Seems wasteful to have intermediate translations. 2. I have other uses in mind for this feature that are not Append child relations. A sanity check on what I have would be most welcome if anyone has time. I'm going to add this to the November commitfest. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 1 November 2017 at 02:47, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 27 October 2017 at 04:47, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Have you considered turning AppendPath with a single child into more >> of a special case? I think this is morally equivalent to your ProxyPath >> proposal, but it would take less new boilerplate code to support. >> If you taught create_append_path to inherit the child's pathkeys >> when there's only one child, and taught createplan.c to skip making the >> useless Append node, I think you might be nearly done. I've made another pass over this patch today to try to ensure it's in a better state. I'm away on leave until the end of the month starting from this Friday, so likely I won't get any time to resolve any issues found until maybe the last day of the month. I've attached an updated version of the patch which fixes a bug that I discovered today where I had neglected to translate the quals to a grouping set. I noticed this when I made all the main regression tables partitioned tables containing a single child partition with MINVALUE to MAXVALUE range. The fact that all plans now appear to work as planned after having removed the Append has given me a bit more confidence that I've not missed any places to translate Vars, but I'm still not 100% certain. I just can't think of any other way or testing that at the moment. Any that I have missed should result in an error like "variable not found in subplan target list" I've also modified the patch to make use of some existing functions in prepunion.c which gather up AppendRelInfos for a childrel. I'd previously invented my own function because the existing ones were not quite exactly what I needed. I'd planned to come back to it, but only just got around to fixing that. Apart from that, the changes are just in the code comments. The remove_singleton_appends_examples_of_differences_2017-11-15.patch which I've attached applies changes to the regression tests to make many of the major tables partitioned tables with a single partition. It also changes the expected results of the small insignificant plan changes and leaves only the actual true differences. This file is not intended for commit, but more just a demo of it working for a larger variety of plan shapes. There is actually no additional tests with the patch. It relies on the places in the existing tests which have changed. I didn't add any extra as I can't think of any particular area to target given that it's such a generic thing that can apply to so many different cases. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
From
Michael Paquier
Date:
On Wed, Nov 15, 2017 at 3:17 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: > The remove_singleton_appends_examples_of_differences_2017-11-15.patch > which I've attached applies changes to the regression tests to make > many of the major tables partitioned tables with a single partition. > It also changes the expected results of the small insignificant plan > changes and leaves only the actual true differences. This file is not > intended for commit, but more just a demo of it working for a larger > variety of plan shapes. There is actually no additional tests with the > patch. It relies on the places in the existing tests which have > changed. I didn't add any extra as I can't think of any particular > area to target given that it's such a generic thing that can apply to > so many different cases. The last patch set does not apply and did not get any reviews. So I am moving this item to the next with waiting on author as status. -- Michael
On 30 November 2017 at 15:34, Michael Paquier <michael.paquier@gmail.com> wrote:
-- On Wed, Nov 15, 2017 at 3:17 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> The remove_singleton_appends_examples_of_differences_2017- 11-15.patch
> which I've attached applies changes to the regression tests to make
> many of the major tables partitioned tables with a single partition.
> It also changes the expected results of the small insignificant plan
> changes and leaves only the actual true differences. This file is not
> intended for commit, but more just a demo of it working for a larger
> variety of plan shapes. There is actually no additional tests with the
> patch. It relies on the places in the existing tests which have
> changed. I didn't add any extra as I can't think of any particular
> area to target given that it's such a generic thing that can apply to
> so many different cases.
The last patch set does not apply and did not get any reviews. So I am
moving this item to the next with waiting on author as status.
(returning from 2 weeks leave)
Thanks for managing the commitfest.
I've attached a patch which fixes the conflict with the regression test expected output in inherits.out. I've also made changes to the expected output in the new partition_prune test's expected output.
I'll set this back to waiting on review now.
Attachment
On 30 November 2017 at 16:04, David Rowley <david.rowley@2ndquadrant.com> wrote: > I've attached a patch which fixes the conflict with the regression test > expected output in inherits.out. I've also made changes to the expected > output in the new partition_prune test's expected output. I've attached an updated patch which fixes the conflicts caused by ab727167. Also, I've attached a patch which is not intended for commit which shows the subtle differences between directly querying a partitioned table with a single remaining leaf partition to scan vs directly querying the leaf partition itself. The regression test failures seen with both patches applied highlight the differences. While rebasing this today I also noticed that we won't properly detect unique joins in add_paths_to_joinrel() as we're still testing for uniqueness against the partitioned parent rather than the only child. This is likely not a huge problem since we'll always get a false negative and never a false positive, but it is a missing optimisation. I've not thought of how to solve it yet, it's perhaps not worth going to too much trouble over. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Thu, Dec 7, 2017 at 5:11 AM, David Rowley <david.rowley@2ndquadrant.com> wrote: > > While rebasing this today I also noticed that we won't properly detect > unique joins in add_paths_to_joinrel() as we're still testing for > uniqueness against the partitioned parent rather than the only child. > This is likely not a huge problem since we'll always get a false > negative and never a false positive, but it is a missing optimisation. > I've not thought of how to solve it yet, it's perhaps not worth going > to too much trouble over. > It's only the jointrelids that are parent's relids, but all the following tests for unique join use RelOptInfo::relids which are child relids. case JOIN_UNIQUE_INNER: extra.inner_unique = bms_is_subset(sjinfo->min_lefthand, outerrel->relids); break; case JOIN_UNIQUE_OUTER: extra.inner_unique = innerrel_is_unique(root, outerrel->relids, innerrel, JOIN_INNER, restrictlist, false); break; default: extra.inner_unique = innerrel_is_unique(root, outerrel->relids, innerrel, jointype, restrictlist, false); break; Do you have a testcase, which shows the problem? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On 11 December 2017 at 21:18, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > On Thu, Dec 7, 2017 at 5:11 AM, David Rowley > <david.rowley@2ndquadrant.com> wrote: >> While rebasing this today I also noticed that we won't properly detect >> unique joins in add_paths_to_joinrel() as we're still testing for >> uniqueness against the partitioned parent rather than the only child. >> This is likely not a huge problem since we'll always get a false >> negative and never a false positive, but it is a missing optimisation. >> I've not thought of how to solve it yet, it's perhaps not worth going >> to too much trouble over. >> > [...] > Do you have a testcase, which shows the problem? I do indeed: create table p (a int not null) partition by range (a); create table p1 partition of p for values from (minvalue) to (maxvalue); create unique index on p1 (a); create table t (a int not null); insert into p values(1),(2); insert into t values(1); analyze p; analyze t; explain (verbose, costs off) select * from p inner join t on p.a = t.a; QUERY PLAN ----------------------------- Nested Loop Output: p1.a, t.a Join Filter: (p1.a = t.a) -> Seq Scan on public.t Output: t.a -> Seq Scan on public.p1 Output: p1.a (7 rows) explain (verbose, costs off) select * from p1 inner join t on p1.a = t.a; QUERY PLAN ----------------------------- Nested Loop Output: p1.a, t.a Inner Unique: true Join Filter: (p1.a = t.a) -> Seq Scan on public.t Output: t.a -> Seq Scan on public.p1 Output: p1.a (8 rows) Notice that when we join to the partitioned table directly and the Append is removed, we don't get the "Inner Unique: true" -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Dec 11, 2017 at 2:42 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 11 December 2017 at 21:18, Ashutosh Bapat > <ashutosh.bapat@enterprisedb.com> wrote: >> On Thu, Dec 7, 2017 at 5:11 AM, David Rowley >> <david.rowley@2ndquadrant.com> wrote: >>> While rebasing this today I also noticed that we won't properly detect >>> unique joins in add_paths_to_joinrel() as we're still testing for >>> uniqueness against the partitioned parent rather than the only child. >>> This is likely not a huge problem since we'll always get a false >>> negative and never a false positive, but it is a missing optimisation. >>> I've not thought of how to solve it yet, it's perhaps not worth going >>> to too much trouble over. >>> >> > [...] > >> Do you have a testcase, which shows the problem? > > I do indeed: > > create table p (a int not null) partition by range (a); > create table p1 partition of p for values from (minvalue) to (maxvalue); > create unique index on p1 (a); > create table t (a int not null); > > insert into p values(1),(2); > insert into t values(1); > > analyze p; > analyze t; > > explain (verbose, costs off) select * from p inner join t on p.a = t.a; > QUERY PLAN > ----------------------------- > Nested Loop > Output: p1.a, t.a > Join Filter: (p1.a = t.a) > -> Seq Scan on public.t > Output: t.a > -> Seq Scan on public.p1 > Output: p1.a > (7 rows) > > explain (verbose, costs off) select * from p1 inner join t on p1.a = t.a; > QUERY PLAN > ----------------------------- > Nested Loop > Output: p1.a, t.a > Inner Unique: true > Join Filter: (p1.a = t.a) > -> Seq Scan on public.t > Output: t.a > -> Seq Scan on public.p1 > Output: p1.a > (8 rows) > > Notice that when we join to the partitioned table directly and the > Append is removed, we don't get the "Inner Unique: true" Ok. I thought we don't use unique joins in partition-wise joins. Sorry for the misunderstanding. I think this didn't work with inheritance before partition-wise join as well for the same reason. Probably, we could do it if unique paths (sorted paths which can be unique-ified) bubble up the Append hierarchy. We already have code in add_paths_to_append_rel() to create sorted paths when even a single child has an index on it. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On 11 December 2017 at 22:23, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > I think this didn't work with inheritance before partition-wise join > as well for the same reason. Yeah, I was just demonstrating a reason that the plans are not identical from querying a partitioned table when partition pruning eliminates all but one partition vs directly querying that single partition. I've been attaching some regression test changes which demonstrate some other subtle differences along with each version of the patch. I just wanted to be open with these subtle differences that I've encountered while working on this. > Probably, we could do it if unique paths (sorted paths which can be > unique-ified) bubble up the Append hierarchy. We already have code in > add_paths_to_append_rel() to create sorted paths when even a single > child has an index on it. Sometime in the future, I'd like to see some sort of UniqueKeys List in RelOptInfo which is initially populated by using looking at the unique indexes during build_simple_rel(). The idea is that these UniqueKeys would get propagated into RELOPT_JOINRELs when the join condition matches a set of unique keys on the other side of the join. e.g. If the outer side of the join had a UniqueKey matching the join condition then we could take all of the UniqueKeys from the RelOptInfo for the inner side of the join (and vice versa). This infrastructure would allow us to no-op a DISTINCT when the RelOptInfo has targetlist items which completely cover at least one UniqueKey set, or not bother performing a sort for a GROUP BY when the GROUP BY clause is covered by a UniqueKey. To fix the case we're discussing, we can also propagate those UniqueKeys to an AppendPath with a single subpath similar to how I'm propagating the PathKeys for the same case in this patch, that way the unique join code would find the UniqueKeys instead of what it does today and not find any unique indexes on the partitioned table. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Dec 11, 2017 at 3:16 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: > > Sometime in the future, I'd like to see some sort of UniqueKeys List > in RelOptInfo which is initially populated by using looking at the > unique indexes during build_simple_rel(). The idea is that these > UniqueKeys would get propagated into RELOPT_JOINRELs when the join > condition matches a set of unique keys on the other side of the join. > e.g. If the outer side of the join had a UniqueKey matching the join > condition then we could take all of the UniqueKeys from the RelOptInfo > for the inner side of the join (and vice versa). This infrastructure > would allow us to no-op a DISTINCT when the RelOptInfo has targetlist > items which completely cover at least one UniqueKey set, or not bother > performing a sort for a GROUP BY when the GROUP BY clause is covered > by a UniqueKey. > That looks neat. > To fix the case we're discussing, we can also propagate those > UniqueKeys to an AppendPath with a single subpath similar to how I'm > propagating the PathKeys for the same case in this patch, that way the > unique join code would find the UniqueKeys instead of what it does > today and not find any unique indexes on the partitioned table. > Another possibility is to support partition-wise join between an unpartitioned table and a partitioned table by joining the unpartitioned table with each partition separately and appending the results. That's a lot of work and quite some new infrastructure. Each of those join will pick unique key if appropriate index exists on the partitions. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
I've attached a rebased patch. The previous patch was conflicting with parallel Hash Join (180428404) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 28 December 2017 at 18:31, David Rowley <david.rowley@2ndquadrant.com> wrote: > I've attached a rebased patch. > > The previous patch was conflicting with parallel Hash Join (180428404) And another one. Thanks to the patch tester [1], I realised that I didn't make check-world and there was a postgres_fdw test that was failing. The attached fixes. [1] http://commitfest.cputube.org/ -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
I've attached an updated patch which takes care of the build failure caused by the new call to create_append_path added in bb94ce4. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Hi, On 01/25/2018 01:55 PM, David Rowley wrote: > I've attached an updated patch which takes care of the build failure > caused by the new call to create_append_path added in bb94ce4. > I've been looking at the patch over the past few days. I haven't found any obvious issues with it, but I do have a couple of questions. 1) I can confirm that it indeed eliminates the Append overhead, using the example from [1], modified to use table with a single partition. Of course, this is a pretty formal check, because the patch simply removes the Append node and so it'd be utterly broken if the overhead did not disappear too. [1] https://www.postgresql.org/message-id/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B%3DZRh-rxy9qxfPA5Gw%40mail.gmail.com 2) If I had to guess where the bugs will be, my guess would be a missing finalize_plan_tlist call - not necessarily in existing code, but perhaps in future improvements. I wonder if we could automate this call somehow, say by detecting when create_plan_recurse is called after build_path_tlist (for a given path), and if needed calling finalize_plan_tlist from create_plan_recurse. Although, now that I look at it, promote_child_relation may be doing something like that by adding stuff to the translate_* variables. I'm not quite sure why we need to append to the lists, though? 3) I'm not quite sure what replace_translatable_exprs does, or more precisely why do we actually need it. At this point the comment pretty much says "We need to do translation. This function does translation." Perhaps it should mention why removing/skipping the AppendPath implies the need for translation or something? 4) The thread talks about "[Merge]Append" but the patch apparently only tweaks create_append_path and does absolutely nothing to "merge" case. While looking into why, I notices that generate_mergeappend_paths always gets all_child_pathkeys=NULL in this case (with single subpath), and so we never create the MergePath at all. I suppose there's some condition somewhere responsible for doing that, but I haven't found it ... 5) The comment before AppendPath typedef says this: * An AppendPath with a single subpath can be set up to become a "proxy" * path. This allows a Path which belongs to one relation to be added to * the pathlist of some other relation. This is intended as generic * infrastructure, but its primary use case is to allow Appends with * only a single subpath to be removed from the final plan. I'm not quite sure how adding a 'isproxy' flag into AppendPath node makes this a generic infrastructure? Wouldn't an explicit ProxyPath be a better way to achieve that? What other proxy paths do you envision, and why should they be represented as AppendPath? Although, there seems to be a precedent in using AppendPath to represent dummy paths ... FWIW one advantage of a separate ProxyPath would be that it would be a much clearer breakage for third-party code (say, extensions tweaking paths using hooks), either at compile or execution time. With just a new flag in existing node, that may easily slip through. On the other hand no one promises the structures to be stable API ... 6) I suggest we add this assert to create_append_path: Assert(!isproxy || (isproxy && (list_length(subpaths)==1))); and perhaps we should do s/isproxy/is_proxy/ which seems like the usual naming for boolean variables. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 19 February 2018 at 15:11, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > 1) I can confirm that it indeed eliminates the Append overhead, using > the example from [1], modified to use table with a single partition. Of > course, this is a pretty formal check, because the patch simply removes > the Append node and so it'd be utterly broken if the overhead did not > disappear too. > > [1] > https://www.postgresql.org/message-id/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B%3DZRh-rxy9qxfPA5Gw%40mail.gmail.com Thanks for testing that. > 2) If I had to guess where the bugs will be, my guess would be a missing > finalize_plan_tlist call - not necessarily in existing code, but perhaps > in future improvements. I agree with that. I'd say though that additionally for the future that we might end up with some missing translation calls to replace_translatable_exprs(). To try to ensure I didn't miss anything, I did previously modify the regression test tables "tenk" and "tenk1" to become partitioned tables with a single partition which allowed all possible values to try to ensure I'd not missed anywhere. I just failed to find a reasonable way to make this a more permanent test without enforcing that change on the tables as a permanent change. > I wonder if we could automate this call somehow, say by detecting when > create_plan_recurse is called after build_path_tlist (for a given path), > and if needed calling finalize_plan_tlist from create_plan_recurse. Sounds nice, but I'm unsure how we could do that. > Although, now that I look at it, promote_child_relation may be doing > something like that by adding stuff to the translate_* variables. Do you mean this? /* * Record this child as having been promoted. Some places treat child * relations in a special way, and this will give them a VIP ticket to * adulthood, where required. */ root->translated_childrelids = bms_add_members(root->translated_childrelids, child->relids); That's to get around a problem in use_physical_tlist() where there's different behaviour for RELOPT_OTHER_MEMBER_REL than there is for RELOPT_BASEREL. Another option might be to instead change the RELOPT_OTHER_MEMBER_REL into RELOPT_BASEREL, although I was a little too scared that might result in many other areas requiring being changed. I may be wrong about that though. We do, for example, occasionally change a RELOPT_BASEREL into a RELOPT_DEADREL, so RelOptInfos changing their RelOptKind is not a new concept, so perhaps it's fine. > I'm > not quite sure why we need to append to the lists, though? Multiple Append rels may be replaced by their only-child relation, so we'd need to be able to translate the targetlists for both. For example, joins above the Appends may contain Vars which belong to either of the Appends and those need to be translated into the promoted child relation's Vars. > 3) I'm not quite sure what replace_translatable_exprs does, or more > precisely why do we actually need it. At this point the comment pretty > much says "We need to do translation. This function does translation." > Perhaps it should mention why removing/skipping the AppendPath implies > the need for translation or something? It does mention "Expr translation is required to translate the targetlists of nodes above the Append", but perhaps I can make that a bit more clear. Let's say we have a query such as: SELECT * FROM fact f INNER JOIN dim1 d ON f.dim1_id = d.id WHERE f.date BETWEEN '2018-01-01' AND '2018-01-31'; Which results in a hash join and a single Jan 2018 partition being scanned for "fact". A plan without the Append node having been removed would have the Append targetlist containing the Vars from the "fact" table, as would the Hash Join node... The problem this function is trying to solve is translating the Vars in the Hash Join node (or any node above the Append) to change the "fact" Vars into "fact_jan2018" Vars. In create_plan.c we skip over any isproxy Append paths and instead return the recursive call to the single proxy-path. Without the translation we'd run into problems when trying to find Vars in the targetlists later. > 4) The thread talks about "[Merge]Append" but the patch apparently only > tweaks create_append_path and does absolutely nothing to "merge" case. > While looking into why, I notices that generate_mergeappend_paths always > gets all_child_pathkeys=NULL in this case (with single subpath), and so > we never create the MergePath at all. I suppose there's some condition > somewhere responsible for doing that, but I haven't found it ... Yeah, only Append paths are used as proxy paths. The confusion here is that the path creation logic has also been changed in allpaths.c (see generate_proxy_paths()), so that we no longer build MergeAppend paths when there's a single subpath. It's probably just the fact that Append is being hi-jacked to act as ProxyPath that's causing the confusion there. If you look over generate_proxy_paths and where it gets called from, you'll see that we, instead of creating Append/MergeAppend paths, we just add each path from pathlist and partial_pathlist from the only-child subpath to the Append rel. This what gives the planner the same flexibility to generate a plan as if the only child partition had been written the query, instead of the parent. > 5) The comment before AppendPath typedef says this: > > * An AppendPath with a single subpath can be set up to become a "proxy" > * path. This allows a Path which belongs to one relation to be added to > * the pathlist of some other relation. This is intended as generic > * infrastructure, but its primary use case is to allow Appends with > * only a single subpath to be removed from the final plan. > > I'm not quite sure how adding a 'isproxy' flag into AppendPath node > makes this a generic infrastructure? Wouldn't an explicit ProxyPath be a > better way to achieve that? It's supposed to be generic in the sense that I didn't aim to code it just to allow Appends with a single subpath to build a plan with the single subpath instead of the Append. I'll give an example a bit below. It evolved that way due to an email early on in this thread. I proposed 2 ways to do this, which included ProxyPath as 1 option. Tom proposed hi-jacking Append to save on the additional boilerplate code. At the time to implement that, I imagined that I could have gotten away with just adding the two translation Lists to the AppendPath struct, and have code determine it's a proxy rather than an Append by the fact that the Lists are non-empty, but that fails due to lack of anything to translate for SELECT COUNT(*) FROM ... type queries which have empty target lists. I don't have a huge preference as to which gets used here, but I don't want to change it just to have to change it back again later. Although, I'm not ruling out the fact that Tom might change his mind when he sees that I had to add an "isproxy" flag to AppendPath to get the whole thing to work. It's certainly not quite as convenient as how dummy paths work. > What other proxy paths do you envision, and why should they be > represented as AppendPath? For example: CREATE MATERIALIZED VIEW mytable_count AS SELECT COUNT(*) FROM mytable; which would allow queries such as: SELECT COUNT(*) FROM mytable; to have an isproxy AppendPath added to the pathlist for the RelOptInfo for mytable which just performs a seqscan on mytable_count instead. It would likely come out a much cheaper Path. The translation exprs, in this case would translate the COUNT(*) Aggref into the Var belonging to mytable_count. Of course, there are other unrelated reasons why that particular example can't work yet, but that was just an example of why I tried to keep it generic. > FWIW one advantage of a separate ProxyPath would be that it would be a > much clearer breakage for third-party code (say, extensions tweaking > paths using hooks), either at compile or execution time. With just a new > flag in existing node, that may easily slip through. On the other hand > no one promises the structures to be stable API ... hmm true, but you could probably have said that for overloading an Agg node to become Partial Aggregate and Finalize Aggregate nodes too. > 6) I suggest we add this assert to create_append_path: > > Assert(!isproxy || (isproxy && (list_length(subpaths)==1))); That kind of indirectly exists already as two seperate Asserts(), although, what I have is equivalent to: Assert((!isproxy && translate_from == NIL) || (isproxy && (list_length(subpaths)==1))); ... as two separate Asserts(), which, if I'm not mistaken, is slightly more strict than the one you mentioned. > and perhaps we should do s/isproxy/is_proxy/ which seems like the usual > naming for boolean variables. You're right. I'll change that and post an updated patch. I'll also reprocess your email again and try to improve some comments to make the intent of the code more clear. Thanks for the review! -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 19 February 2018 at 18:01, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 19 February 2018 at 15:11, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >> and perhaps we should do s/isproxy/is_proxy/ which seems like the usual >> naming for boolean variables. > > You're right. I'll change that and post an updated patch. I'll also > reprocess your email again and try to improve some comments to make > the intent of the code more clear. I've made a few changes to the patch. "isproxy" is now "is_proxy". I've made the comments a bit more verbose at the top of the definition of struct AppendPath. Also shuffled some comments around in allpaths.c I've attached both a delta patch with just these changes, and also a completely new patch based on a recent master. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Mon, Feb 19, 2018 at 4:02 AM, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 19 February 2018 at 18:01, David Rowley <david.rowley@2ndquadrant.com> wrote: >> On 19 February 2018 at 15:11, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >>> and perhaps we should do s/isproxy/is_proxy/ which seems like the usual >>> naming for boolean variables. >> >> You're right. I'll change that and post an updated patch. I'll also >> reprocess your email again and try to improve some comments to make >> the intent of the code more clear. > > I've made a few changes to the patch. "isproxy" is now "is_proxy". > I've made the comments a bit more verbose at the top of the definition > of struct AppendPath. Also shuffled some comments around in allpaths.c > > I've attached both a delta patch with just these changes, and also a > completely new patch based on a recent master. While I was working on http://postgr.es/m/CA+TgmoakT5gmahbPWGqrR2nAdFOMAOnOXYoWHRdVfGWs34t6_A@mail.gmail.com I noticed that a projection path is very close to being usable as a proxy path, because we've already got code to strip out unnecessary proxy paths at plan generation time. I noticed that there were a few problems with that which I wrote patches to fix (see 0001, 0002 attached to that email) but they seem to be only minor issues. What do you think about the idea of using a projection path as a proxy path instead of inventing a new method? It seems simple enough to do: new_path = (Path *) create_projection_path(root, new_rel, old_path, old_path->pathtarget); ...when we need a proxy path. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 14 March 2018 at 09:25, Robert Haas <robertmhaas@gmail.com> wrote: > What do you think about the idea of using a projection path as a proxy > path instead of inventing a new method? It seems simple enough to do: > > new_path = (Path *) create_projection_path(root, new_rel, old_path, > old_path->pathtarget); > > ...when we need a proxy path. I'm very open to finding a better way to do this, but does that not just handle the targetlist issue? The proxy path also carries information which allows the translation of Vars in other places in the plan from the old rel into the vars of the new rel. Join conditions, sort clauses etc. Wouldn't a ProjectionPath just need the same additional translation fields that I've bolted onto AppendPath to make it work properly? I've looked at the projection path code but got a bit confused with the dummypp field. I see where it gets set, but not where it gets checked. A comment in create_projection_plan() seems to explain that dummypp is pretty useless since targetlists are modified sometimes, so I'm a bit unsure what the point of it is. Maybe that just goes to show that my understanding of projection paths is not that great. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Mar 13, 2018 at 8:20 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 14 March 2018 at 09:25, Robert Haas <robertmhaas@gmail.com> wrote: >> What do you think about the idea of using a projection path as a proxy >> path instead of inventing a new method? It seems simple enough to do: >> >> new_path = (Path *) create_projection_path(root, new_rel, old_path, >> old_path->pathtarget); >> >> ...when we need a proxy path. > > I'm very open to finding a better way to do this, but does that not > just handle the targetlist issue? The proxy path also carries > information which allows the translation of Vars in other places in > the plan from the old rel into the vars of the new rel. Join > conditions, sort clauses etc. > > Wouldn't a ProjectionPath just need the same additional translation > fields that I've bolted onto AppendPath to make it work properly? Well, I guess I'm not sure. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Mar 15, 2018 at 9:22 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> Wouldn't a ProjectionPath just need the same additional translation >> fields that I've bolted onto AppendPath to make it work properly? > > Well, I guess I'm not sure. Sorry, hit send too soon there. I'm not sure I entirely understand the purpose of those changes; I think the comments could use some work. For example: + /* + * First we must record the translation expressions in the PlannerInfo. + * These need to be found when the expression translation is being done + * when the final plan is being assembled. + */ + /* + * Record this child as having been promoted. Some places treat child + * relations in a special way, and this will give them a VIP ticket to + * adulthood, where required. + */ These comments just recapitulate what the code does, without really explaining what problem we're trying to solve ("need" for unspecified reasons, unspecified "special" handling). That said, I gather that one problem is the path might contain references to child varnos where we need to reference parent varnos. That does seem like something we need to handle, but I'm not sure whether this is really the right method. I wonder if we couldn't deduce the necessary translations from the parent pointers of the paths. That is, if we've got a proxy path associated with the parent rel and the path it's proxying is associated with the child rel, then we need to translate from realpath->parent->relids to proxypath->parent_relids. Not that this is an argument of overwhelming import, but note that ExecSupportsMarkRestore wouldn't need to change if we used ProjectionPath; that's already got the required handling. If we stick with your idea of using AppendPath, do we actually need generate_proxy_paths()? What paths get lost if we don't have a special case for them here? I find promote_child_rel_equivalences() kind of scary. It seems like a change that might have unintended and hard-to-predict consequences. Perhaps that's only my own lack of knowledge. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > That said, I gather that one problem is the path might contain > references to child varnos where we need to reference parent varnos. > That does seem like something we need to handle, but I'm not sure > whether this is really the right method. I wonder if we couldn't > deduce the necessary translations from the parent pointers of the > paths. That is, if we've got a proxy path associated with the parent > rel and the path it's proxying is associated with the child rel, then > we need to translate from realpath->parent->relids to > proxypath->parent_relids. I hadn't been paying much attention to this thread, but I've now taken a quick look at the 2018-02-19 patch, and I've got to say I do not like it much. The changes in createplan.c in particular seem like hack-and- slash rather than anything principled or maintainable. The core issue here is that Paths involving the appendrel and higher will contain Vars referencing the appendrel's varno, whereas the child is set up to emit Vars containing its own varno, and somewhere we've got to match those up. I don't think though that this problem is exactly specific to single-member Appends, and so I would rather we not invent a solution that's specific to that. A nearly identical issue is getting rid of no-op SubqueryScan nodes. I've long wished we could simply not emit those in the first place, but it's really hard to do because of the fact that Vars inside the subquery have different varnos from those outside. (I've toyed with the idea of globally flattening the rangetable before we start planning, not at the end, but haven't made it happen yet; and anyway that would be only one step towards such a goal.) It might be worth looking at whether we couldn't fix the single-member- Append issue the same way we fix no-op SubqueryScans, ie let setrefs.c get rid of them. That's not the most beautiful solution perhaps, but it'd be very localized and low-risk. In general setrefs.c is the right place to deal with variable-matching issues. So even if you don't like that specific idea, it'd probably be worth thinking about handling this by recording instructions telling setrefs what to do, instead of actually doing it at earlier stages. regards, tom lane
On Thu, Mar 15, 2018 at 11:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > It might be worth looking at whether we couldn't fix the single-member- > Append issue the same way we fix no-op SubqueryScans, ie let setrefs.c > get rid of them. That's not the most beautiful solution perhaps, but > it'd be very localized and low-risk. That's definitely a thought; it's a probably the simplest way of saving the run-time cost of the Append node. However, I don't think it's a great solution overall because it doesn't get us the other advantages that David mentions in his original post. I think that to gain those advantages we'll need to know at path-creation time that there won't ultimately be an Append node in the finished plan. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Mar 15, 2018 at 11:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It might be worth looking at whether we couldn't fix the single-member- >> Append issue the same way we fix no-op SubqueryScans, ie let setrefs.c >> get rid of them. That's not the most beautiful solution perhaps, but >> it'd be very localized and low-risk. > That's definitely a thought; it's a probably the simplest way of > saving the run-time cost of the Append node. However, I don't think > it's a great solution overall because it doesn't get us the other > advantages that David mentions in his original post. I think that to > gain those advantages we'll need to know at path-creation time that > there won't ultimately be an Append node in the finished plan. Meh. We could certainly know that by inspection ("only one child? it'll be history"). I remain of the opinion that this is a big patch with a small patch struggling to get out. regards, tom lane
On 16 March 2018 at 02:46, Robert Haas <robertmhaas@gmail.com> wrote: > If we stick with your idea of using AppendPath, do we actually need > generate_proxy_paths()? What paths get lost if we don't have a > special case for them here? Actually, we do a surprisingly good job of allowing plan shapes to stay the same when planning for an only-child scan for an Append or MergeAppend. The main difference I did find was that and Append and MergeAppend don't support Mark and Restore, so if you just generated the same paths and simply skipped over Appends and MergeAppends you'd still be left with Materialize nodes which might not actually be required at all. This might not be huge, but seeing this made me worried that there might be some valid reason, if not today, then sometime in the future why it might not be safe to simply pluck the singleton Append/MergeAppend nodes out the plan tree. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 16 March 2018 at 04:01, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I hadn't been paying much attention to this thread, but I've now taken > a quick look at the 2018-02-19 patch, and I've got to say I do not like > it much. The changes in createplan.c in particular seem like hack-and- > slash rather than anything principled or maintainable. Thanks for looking at this. I didn't manage to discover any other working solutions to when the Vars can be replaced. If we don't do this in createplan.c then it's going to cause problems in places such as apply_pathtarget_labeling_to_tlist, which is well before setrefs.c gets hold of the plan. > The core issue here is that Paths involving the appendrel and higher > will contain Vars referencing the appendrel's varno, whereas the child > is set up to emit Vars containing its own varno, and somewhere we've got > to match those up. I don't think though that this problem is exactly > specific to single-member Appends, and so I would rather we not invent a > solution that's specific to that. A nearly identical issue is getting > rid of no-op SubqueryScan nodes. I've long wished we could simply not > emit those in the first place, but it's really hard to do because of > the fact that Vars inside the subquery have different varnos from those > outside. (I've toyed with the idea of globally flattening the rangetable > before we start planning, not at the end, but haven't made it happen yet; > and anyway that would be only one step towards such a goal.) I'm not quite sure why you think the solution I came up with is specific to single-member Appends. The solution merely uses single-member Append paths as a proxy path for the solution which I've tried to make generic to any node type. For example, the patch also resolves the issue for MergeAppend, so certainly nothing in there is specific to single-member Appends. I could have made the proxy any other path type, it's just that you had suggested Append would be better than inventing ProxyPath, which is what I originally proposed. > It might be worth looking at whether we couldn't fix the single-member- > Append issue the same way we fix no-op SubqueryScans, ie let setrefs.c > get rid of them. That's not the most beautiful solution perhaps, but > it'd be very localized and low-risk. It might be possible, but wouldn't that only solve 1 out of 2 problems. The problem of the planner not generating the most optimal plan is ignored with this. For example, it does not make much sense to bolt a Materialize node on top of an IndexScan node in order to provide the IndexScan with mark/restore capabilities... They already allow that. > In general setrefs.c is the right place to deal with variable-matching > issues. So even if you don't like that specific idea, it'd probably be > worth thinking about handling this by recording instructions telling > setrefs what to do, instead of actually doing it at earlier stages. From what I can see, setrefs.c is too late. ERRORs are generated before setrefs.c gets hold of the plan if we don't replace Vars. I'm not opposed to finding a better way to do this. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Now that the September 'fest is open for new patches I'm going to move this patch over there. This patch has become slightly less important than some other stuff, but I'd still like to come back to it. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jul 02, 2018 at 09:37:31AM +1200, David Rowley wrote: > Now that the September 'fest is open for new patches I'm going to move > this patch over there. This patch has become slightly less important > than some other stuff, but I'd still like to come back to it. Please note that the latest patch does not apply anymore, so I moved it to CF 2018-11 (already this time of the year!), waiting for your input. -- Michael
Attachment
On Mon, 2 Jul 2018 at 09:37, David Rowley <david.rowley@2ndquadrant.com> wrote: > Now that the September 'fest is open for new patches I'm going to move > this patch over there. This patch has become slightly less important > than some other stuff, but I'd still like to come back to it. This had bit-rotted quite a bit, so I've rebased it on top of today's master. There's not really any other changes to speak of. I've not reevaluated the patch to see if there is any more simple way to take care of this problem yet. However, I do recall getting off to quite a few false starts with this patch and the way I have it now was the only way I saw to make it possible. Although, perhaps some more experience will show me something different. Anyway, I've attached the rebased version. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 4/1/18 9:26 AM, David Rowley wrote: > On 16 March 2018 at 04:01, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I hadn't been paying much attention to this thread, but I've now taken >> a quick look at the 2018-02-19 patch, and I've got to say I do not like >> it much. The changes in createplan.c in particular seem like hack-and- >> slash rather than anything principled or maintainable. > > Thanks for looking at this. I didn't manage to discover any other > working solutions to when the Vars can be replaced. If we don't do > this in createplan.c then it's going to cause problems in places such > as apply_pathtarget_labeling_to_tlist, which is well before setrefs.c > gets hold of the plan. > >> The core issue here is that Paths involving the appendrel and higher >> will contain Vars referencing the appendrel's varno, whereas the child >> is set up to emit Vars containing its own varno, and somewhere we've got >> to match those up. I don't think though that this problem is exactly >> specific to single-member Appends, and so I would rather we not invent a >> solution that's specific to that. A nearly identical issue is getting >> rid of no-op SubqueryScan nodes. I've long wished we could simply not >> emit those in the first place, but it's really hard to do because of >> the fact that Vars inside the subquery have different varnos from those >> outside. (I've toyed with the idea of globally flattening the rangetable >> before we start planning, not at the end, but haven't made it happen yet; >> and anyway that would be only one step towards such a goal.) > > I'm not quite sure why you think the solution I came up with is > specific to single-member Appends. The solution merely uses > single-member Append paths as a proxy path for the solution which I've > tried to make generic to any node type. For example, the patch also > resolves the issue for MergeAppend, so certainly nothing in there is > specific to single-member Appends. I could have made the proxy any > other path type, it's just that you had suggested Append would be > better than inventing ProxyPath, which is what I originally proposed. > I think a natural question is how the approach devised in this thread (based on single-member Append paths) could be used to deal with no-op Subquery nodes. I don't see an obvious way to do that, and I guess that's what Tom meant by "specific to single-member Appends". >> It might be worth looking at whether we couldn't fix the single-member- >> Append issue the same way we fix no-op SubqueryScans, ie let setrefs.c >> get rid of them. That's not the most beautiful solution perhaps, but >> it'd be very localized and low-risk. > > It might be possible, but wouldn't that only solve 1 out of 2 > problems. The problem of the planner not generating the most optimal > plan is ignored with this. For example, it does not make much sense to > bolt a Materialize node on top of an IndexScan node in order to > provide the IndexScan with mark/restore capabilities... They already > allow that. > Yeah, I think we can't do all the magic in setrefs.c if we want the decisions to affect plan shape in any significant way - the decision whether an Append node has only a single node needs to happen while constructing the path or shortly after it (before we build other paths on top of it). And before costing the path. So setrefs.c seems a too late for that :-( I suppose this limitation was acceptable for no-op Subqueries, but I'm not sure the same thing applies to single-member Appends. >> In general setrefs.c is the right place to deal with variable-matching >> issues. So even if you don't like that specific idea, it'd probably be >> worth thinking about handling this by recording instructions telling >> setrefs what to do, instead of actually doing it at earlier stages. > > From what I can see, setrefs.c is too late. ERRORs are generated > before setrefs.c gets hold of the plan if we don't replace Vars. > > I'm not opposed to finding a better way to do this. > AFAICS the patch essentially does two things: (a) identifies Append paths with a single member and (b) ensures the Vars are properly mapped when the Append node is skipped when creating the plan. I agree doing (a) in setrefs.c is too late, as mentioned earlier. But perhaps we could do at least (b) to setrefs.c? I'd say mapping the Vars is the most fragile and error-prone piece of this patch. It's a bit annoying that it's inventing a new infrastructure to do that, translates expressions in various places, establishes new rules for tlists (having to call finalize_plan_tlist when calling build_path_tlist before create_plan_recurse), etc. But we already do such mappings (not exactly the same, but similar) for other purposes - from setrefs.c. So why can't why do it the same way here? FWIW I haven't tried to do any of that, and I haven't looked on this patch for quite a long time, so perhaps I'm missing an obvious obstacle preventing us from doing that ... regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, 3 Jan 2019 at 08:01, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > AFAICS the patch essentially does two things: (a) identifies Append > paths with a single member and (b) ensures the Vars are properly mapped > when the Append node is skipped when creating the plan. Yes, but traditional Append and MergeAppend paths with a single member are never actually generated. I say "traditional" as we do happen to use a single-subpath Append as the "proxy" path to represent a path that needs to not have a plan node generated during createplan. Originally I had a new Path type named "ProxyPath" to do this, but per recommendation from Tom, I overloaded Append to mean that. (Append already has its meaning overloaded in regards to dummy paths.) > I agree doing (a) in setrefs.c is too late, as mentioned earlier. But > perhaps we could do at least (b) to setrefs.c? The problem I found with doing var translations in setrefs.c was that the plan tree is traversed there breadth-first and in createplan.c we build the plan depth-first. The problem I didn't manage to overcome with that is that when we replace a "ProxyPath" (actually an Append path marked as is_proxy=true) it only alter targetlists, etc of nodes above that in the plan tree. The nodes below it and their targetlists don't care, as these don't fetch Vars from nodes above them. If we do the Var translation in setrefs then we've not yet looked at the nodes that don't care and we've already done the nodes that do care. So the tree traversal is backwards. I sort of borrowed the traversal in createplan.c since it was in the correct depth-first order. Equally, I could have invented an entirely new stage that traverses the path tree depth-first, but I imagined that would have added more overhead and raised even more questions. Piggy-backing on createplan seemed like the best option at the time. > I'd say mapping the Vars is the most fragile and error-prone piece of > this patch. It's a bit annoying that it's inventing a new infrastructure > to do that, translates expressions in various places, establishes new > rules for tlists (having to call finalize_plan_tlist when calling > build_path_tlist before create_plan_recurse), etc. I entirely agree. It's by far the worst part of the patch. Getting the code to a working state to do this was hard. I kept finding places that I'd missed the translation. I'd rather it worked some other way. I just don't yet know what that way is. I changed the core regression table "tenk" to become a partitioned table with a single partition in the hope to catch all of the places that need the translations to be performed. I'm not entirely confident that I've caught them all by doing this. I've attached an updated patch that's really just a rebased version of v8. The old version conflicted with changes made in 1db5667b. The only real change was to the commit message. I'd previously managed to miss out the part about not generating needless Append/MergeAppend paths can allow plan shapes that were previously not possible. I'd only mentioned the executor not having to pull tuples through these nodes, which increases performance. Not having this perhaps caused some of the confusion earlier in this thread. One drawback to this patch, which I'm unsure if I previously mentioned is that run-time pruning only works for Append and MergeAppend. If we remove the Append/MergeAppend node then the single Append/MergeAppend sub-plan has no hope of being pruned. Going from 1 to 0 sub-plans may not be that common, but no longer supporting that is something that needs to be considered. I had imagined that gained flexibility of plan shapes plus less executor overhead was better than that, but there likely is a small use-case for keeping that ability. It seems impossible to have both advantages of this patch without that disadvantage. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
I've attached an updated patch. The last one no longer applied due to the changes made in d723f5687 -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
David Rowley <david.rowley@2ndquadrant.com> writes: > On Thu, 3 Jan 2019 at 08:01, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >> AFAICS the patch essentially does two things: (a) identifies Append >> paths with a single member and (b) ensures the Vars are properly mapped >> when the Append node is skipped when creating the plan. >> I agree doing (a) in setrefs.c is too late, as mentioned earlier. But >> perhaps we could do at least (b) to setrefs.c? > The problem I found with doing var translations in setrefs.c was that > the plan tree is traversed there breadth-first and in createplan.c we > build the plan depth-first. The problem I didn't manage to overcome > with that is that when we replace a "ProxyPath" (actually an Append > path marked as is_proxy=true) it only alter targetlists, etc of nodes > above that in the plan tree. The nodes below it and their targetlists > don't care, as these don't fetch Vars from nodes above them. If we do > the Var translation in setrefs then we've not yet looked at the nodes > that don't care and we've already done the nodes that do care. So the > tree traversal is backwards. I don't quite buy this argument, because you haven't given a reason why it doesn't apply with equal force to setrefs.c's elimination of no-op SubqueryScan nodes. Yet that does work. Stepping back for a minute: even if we get this to work, how often is it going to do anything useful? It seems like most of the other development that's going on is trying to postpone pruning till later, so I wonder how often we'll really know at the beginning of planning that only one child will survive. regards, tom lane
On Sat, 26 Jan 2019 at 13:51, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <david.rowley@2ndquadrant.com> writes: > > On Thu, 3 Jan 2019 at 08:01, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > >> AFAICS the patch essentially does two things: (a) identifies Append > >> paths with a single member and (b) ensures the Vars are properly mapped > >> when the Append node is skipped when creating the plan. > >> I agree doing (a) in setrefs.c is too late, as mentioned earlier. But > >> perhaps we could do at least (b) to setrefs.c? > > > The problem I found with doing var translations in setrefs.c was that > > the plan tree is traversed there breadth-first and in createplan.c we > > build the plan depth-first. The problem I didn't manage to overcome > > with that is that when we replace a "ProxyPath" (actually an Append > > path marked as is_proxy=true) it only alter targetlists, etc of nodes > > above that in the plan tree. The nodes below it and their targetlists > > don't care, as these don't fetch Vars from nodes above them. If we do > > the Var translation in setrefs then we've not yet looked at the nodes > > that don't care and we've already done the nodes that do care. So the > > tree traversal is backwards. > > I don't quite buy this argument, because you haven't given a reason > why it doesn't apply with equal force to setrefs.c's elimination of > no-op SubqueryScan nodes. Yet that does work. I assume you're talking about the code that's in set_subqueryscan_references() inside the trivial_subqueryscan() condition? If so, then that seems pretty different from what I'm doing here because trivial_subqueryscan() only ever returns true when the Vars/Exprs from the subquery's target list match the scan's targetlist exactly, in the same order. It's not quite that simple with the single-subpath Append/MergeAppend case since the Vars/Exprs won't be the same since they're from different relations and possibly in some different order. > Stepping back for a minute: even if we get this to work, how often > is it going to do anything useful? It seems like most of the other > development that's going on is trying to postpone pruning till later, > so I wonder how often we'll really know at the beginning of planning > that only one child will survive. It'll do something useful everytime the planner would otherwise produce an Append or MergeAppend with a single subpath. Pruning down to just 1 relation is pretty common, so seems worth optimising for. There's no other development that postpones pruning until later. If you're talking about run-time pruning then nothing is "postponed". We still attempt to prune during planning, it's just that we might not be able to prune in some cases until during execution, so we may do both. You make it sound like we're trying to do away with plan-time pruning, but that's not happening, in fact, we're working pretty hard to speed up the planner when plan-time pruning *can* be done. So far there are no patches to speed up planner for partitioned tables when the planner can't prune any partitions, although it is something that I've been giving some thought to. I've also attached a rebased patch. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Thu, 31 Jan 2019 at 17:22, David Rowley <david.rowley@2ndquadrant.com> wrote: > I've also attached a rebased patch. Rebased again. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Tue, 5 Feb 2019 at 12:05, David Rowley <david.rowley@2ndquadrant.com> wrote: > Rebased again. and again, due to new create_append_path() call added in ab5fcf2b04f9. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
David Rowley <david.rowley@2ndquadrant.com> writes: > [ v13-0001-Forgo-generating-single-subpath-Append-and-Merge.patch ] I continue to think that this is the wrong way to go about it, and as proof of concept present the attached, which reproduces all of the regression-test plan changes appearing in v13 --- with a whole lot less mechanism and next to no added planning cycles (which certainly cannot be said of v13). I was a bit surprised to find that I didn't need to fool around with lying about whether [Merge]Append can project. I've not dug into the exact reason why, but I suspect it's that previous changes made in support of parallelization have resulted in ensuring that we push the upper tlist down to the children anyway, at some earlier stage. I haven't looked into whether this does the right things for parallel planning --- possibly create_[merge]append_path need to propagate up parallel-related path fields from the single child? Also, I wonder why you didn't teach ExecSupportsMarkRestore that a single-child MergeAppend can support mark/restore. I didn't add such code here either, but I suspect that's an oversight. One other remark is that the division of labor between create_[merge]append_path and their costsize.c subroutines seems pretty unprincipled. I'd be inclined to push all the relevant logic into costsize.c, but have not done so here. Moving the existing cost calculations in create_mergeappend_path into costsize.c would better be done as a separate refactoring patch, perhaps. regards, tom lane diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 42108bd..2be67bc 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -8427,17 +8427,16 @@ SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER J 400 | 400 | 0008 (4 rows) --- left outer join + nullable clasue -EXPLAIN (COSTS OFF) +-- left outer join + nullable clause +EXPLAIN (VERBOSE, COSTS OFF) SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHEREt1.a < 10 ORDER BY 1,2,3; - QUERY PLAN ------------------------------------------------------------------------------------ - Sort - Sort Key: t1.a, ftprt2_p1.b, ftprt2_p1.c - -> Append - -> Foreign Scan - Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2) -(5 rows) + QUERYPLAN +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + Foreign Scan + Output: t1.a, ftprt2_p1.b, ftprt2_p1.c + Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2) + Remote SQL: SELECT r6.a, r9.b, r9.c FROM (public.fprt1_p1 r6 LEFT JOIN public.fprt2_p1 r9 ON (((r6.a = r9.b)) AND ((r6.b= r9.a)) AND ((r9.a < 10)))) WHERE ((r6.a < 10)) ORDER BY r6.a ASC NULLS LAST, r9.b ASC NULLS LAST, r9.c ASC NULLSLAST +(4 rows) SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHEREt1.a < 10 ORDER BY 1,2,3; a | b | c diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql index eb9d1ad..4728511 100644 --- a/contrib/postgres_fdw/sql/postgres_fdw.sql +++ b/contrib/postgres_fdw/sql/postgres_fdw.sql @@ -2309,8 +2309,8 @@ EXPLAIN (COSTS OFF) SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER JOIN fprt1 t3 ON (t2.b = t3.a) WHERE t1.a% 25 =0 ORDER BY 1,2,3; SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER JOIN fprt1 t3 ON (t2.b = t3.a) WHERE t1.a% 25 =0 ORDER BY 1,2,3; --- left outer join + nullable clasue -EXPLAIN (COSTS OFF) +-- left outer join + nullable clause +EXPLAIN (VERBOSE, COSTS OFF) SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHEREt1.a < 10 ORDER BY 1,2,3; SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHEREt1.a < 10 ORDER BY 1,2,3; diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c index 187f892..14ad738 100644 --- a/src/backend/executor/execAmi.c +++ b/src/backend/executor/execAmi.c @@ -447,6 +447,21 @@ ExecSupportsMarkRestore(Path *pathnode) return false; /* childless Result */ } + case T_Append: + { + AppendPath *appendPath = castNode(AppendPath, pathnode); + + /* + * If there's exactly one child, then there will be no Append + * in the final plan, so we can handle mark/restore if the + * child plan node can. + */ + if (list_length(appendPath->subpaths) == 1) + return ExecSupportsMarkRestore((Path *) linitial(appendPath->subpaths)); + /* Otherwise, Append can't handle it */ + return false; + } + default: break; } diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index 236f506..4c823e9 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -1134,10 +1134,10 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path) } /* - * XXX ideally, if there's just one child, we'd not bother to generate an - * Append node but just return the single child. At the moment this does - * not work because the varno of the child scan plan won't match the - * parent-rel Vars it'll be asked to emit. + * And build the Append plan. Note that if there's just one child, the + * Append is pretty useless; but we wait till setrefs.c to get rid of it. + * Doing so here doesn't work because the varno of the child scan plan + * won't match the parent-rel Vars it'll be asked to emit. */ plan = make_append(subplans, best_path->first_partial_path, diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c index 0213a37..4204ca4 100644 --- a/src/backend/optimizer/plan/setrefs.c +++ b/src/backend/optimizer/plan/setrefs.c @@ -94,12 +94,19 @@ static Plan *set_subqueryscan_references(PlannerInfo *root, SubqueryScan *plan, int rtoffset); static bool trivial_subqueryscan(SubqueryScan *plan); +static Plan *clean_up_removed_plan_level(Plan *parent, Plan *child); static void set_foreignscan_references(PlannerInfo *root, ForeignScan *fscan, int rtoffset); static void set_customscan_references(PlannerInfo *root, CustomScan *cscan, int rtoffset); +static Plan *set_append_references(PlannerInfo *root, + Append *aplan, + int rtoffset); +static Plan *set_mergeappend_references(PlannerInfo *root, + MergeAppend *mplan, + int rtoffset); static Node *fix_scan_expr(PlannerInfo *root, Node *node, int rtoffset); static Node *fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context); static bool fix_scan_expr_walker(Node *node, fix_scan_expr_context *context); @@ -181,19 +188,22 @@ static List *set_returning_clause_references(PlannerInfo *root, * 8. We assign every plan node in the tree a unique ID. * * We also perform one final optimization step, which is to delete - * SubqueryScan plan nodes that aren't doing anything useful (ie, have - * no qual and a no-op targetlist). The reason for doing this last is that + * SubqueryScan, Append, and MergeAppend plan nodes that aren't doing + * anything useful. The reason for doing this last is that * it can't readily be done before set_plan_references, because it would - * break set_upper_references: the Vars in the subquery's top tlist - * wouldn't match up with the Vars in the outer plan tree. The SubqueryScan + * break set_upper_references: the Vars in the child plan's top tlist + * wouldn't match up with the Vars in the outer plan tree. A SubqueryScan * serves a necessary function as a buffer between outer query and subquery * variable numbering ... but after we've flattened the rangetable this is * no longer a problem, since then there's only one rtindex namespace. + * Likewise, Append and MergeAppend buffer between the parent and child vars + * of an appendrel, but we don't need to worry about that once we've done + * set_plan_references. * * set_plan_references recursively traverses the whole plan tree. * * The return value is normally the same Plan node passed in, but can be - * different when the passed-in Plan is a SubqueryScan we decide isn't needed. + * different when the passed-in Plan is a node we decide isn't needed. * * The flattened rangetable entries are appended to root->glob->finalrtable. * Also, rowmarks entries are appended to root->glob->finalrowmarks, and the @@ -897,71 +907,15 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) } break; case T_Append: - { - Append *splan = (Append *) plan; - - /* - * Append, like Sort et al, doesn't actually evaluate its - * targetlist or check quals. - */ - set_dummy_tlist_references(plan, rtoffset); - Assert(splan->plan.qual == NIL); - foreach(l, splan->appendplans) - { - lfirst(l) = set_plan_refs(root, - (Plan *) lfirst(l), - rtoffset); - } - if (splan->part_prune_info) - { - foreach(l, splan->part_prune_info->prune_infos) - { - List *prune_infos = lfirst(l); - ListCell *l2; - - foreach(l2, prune_infos) - { - PartitionedRelPruneInfo *pinfo = lfirst(l2); - - pinfo->rtindex += rtoffset; - } - } - } - } - break; + /* Needs special treatment, see comments below */ + return set_append_references(root, + (Append *) plan, + rtoffset); case T_MergeAppend: - { - MergeAppend *splan = (MergeAppend *) plan; - - /* - * MergeAppend, like Sort et al, doesn't actually evaluate its - * targetlist or check quals. - */ - set_dummy_tlist_references(plan, rtoffset); - Assert(splan->plan.qual == NIL); - foreach(l, splan->mergeplans) - { - lfirst(l) = set_plan_refs(root, - (Plan *) lfirst(l), + /* Needs special treatment, see comments below */ + return set_mergeappend_references(root, + (MergeAppend *) plan, rtoffset); - } - if (splan->part_prune_info) - { - foreach(l, splan->part_prune_info->prune_infos) - { - List *prune_infos = lfirst(l); - ListCell *l2; - - foreach(l2, prune_infos) - { - PartitionedRelPruneInfo *pinfo = lfirst(l2); - - pinfo->rtindex += rtoffset; - } - } - } - } - break; case T_RecursiveUnion: /* This doesn't evaluate targetlist or check quals either */ set_dummy_tlist_references(plan, rtoffset); @@ -1086,30 +1040,7 @@ set_subqueryscan_references(PlannerInfo *root, /* * We can omit the SubqueryScan node and just pull up the subplan. */ - ListCell *lp, - *lc; - - result = plan->subplan; - - /* We have to be sure we don't lose any initplans */ - result->initPlan = list_concat(plan->scan.plan.initPlan, - result->initPlan); - - /* - * We also have to transfer the SubqueryScan's result-column names - * into the subplan, else columns sent to client will be improperly - * labeled if this is the topmost plan level. Copy the "source - * column" information too. - */ - forboth(lp, plan->scan.plan.targetlist, lc, result->targetlist) - { - TargetEntry *ptle = (TargetEntry *) lfirst(lp); - TargetEntry *ctle = (TargetEntry *) lfirst(lc); - - ctle->resname = ptle->resname; - ctle->resorigtbl = ptle->resorigtbl; - ctle->resorigcol = ptle->resorigcol; - } + result = clean_up_removed_plan_level((Plan *) plan, plan->subplan); } else { @@ -1191,6 +1122,30 @@ trivial_subqueryscan(SubqueryScan *plan) } /* + * clean_up_removed_plan_level + * Do necessary cleanup when we strip out a SubqueryScan, Append, etc + * + * We are dropping the "parent" plan in favor of returning just its "child". + * A few small tweaks are needed. + */ +static Plan * +clean_up_removed_plan_level(Plan *parent, Plan *child) +{ + /* We have to be sure we don't lose any initplans */ + child->initPlan = list_concat(parent->initPlan, + child->initPlan); + + /* + * We also have to transfer the parent's column labeling info into the + * child, else columns sent to client will be improperly labeled if this + * is the topmost plan level. resjunk and so on may be important too. + */ + apply_tlist_labeling(child->targetlist, parent->targetlist); + + return child; +} + +/* * set_foreignscan_references * Do set_plan_references processing on a ForeignScan */ @@ -1341,6 +1296,131 @@ set_customscan_references(PlannerInfo *root, } /* + * set_append_references + * Do set_plan_references processing on an Append + * + * We try to strip out the Append entirely; if we can't, we have + * to do the normal processing on it. + */ +static Plan * +set_append_references(PlannerInfo *root, + Append *aplan, + int rtoffset) +{ + ListCell *l; + + /* + * Append, like Sort et al, doesn't actually evaluate its targetlist or + * check quals. If it's got exactly one child plan, then it's not doing + * anything useful at all, and we can strip it out. + */ + Assert(aplan->plan.qual == NIL); + + /* First, we gotta recurse on the children */ + foreach(l, aplan->appendplans) + { + lfirst(l) = set_plan_refs(root, (Plan *) lfirst(l), rtoffset); + } + + /* Now, if there's just one, forget the Append and return that child */ + if (list_length(aplan->appendplans) == 1) + return clean_up_removed_plan_level((Plan *) aplan, + (Plan *) linitial(aplan->appendplans)); + + /* + * Otherwise, clean up the Append as needed. It's okay to do this after + * recursing to the children, because set_dummy_tlist_references doesn't + * look at those. + */ + set_dummy_tlist_references((Plan *) aplan, rtoffset); + + if (aplan->part_prune_info) + { + foreach(l, aplan->part_prune_info->prune_infos) + { + List *prune_infos = lfirst(l); + ListCell *l2; + + foreach(l2, prune_infos) + { + PartitionedRelPruneInfo *pinfo = lfirst(l2); + + pinfo->rtindex += rtoffset; + } + } + } + + /* We don't need to recurse to lefttree or righttree ... */ + Assert(aplan->plan.lefttree == NULL); + Assert(aplan->plan.righttree == NULL); + + return (Plan *) aplan; +} + +/* + * set_mergeappend_references + * Do set_plan_references processing on a MergeAppend + * + * We try to strip out the MergeAppend entirely; if we can't, we have + * to do the normal processing on it. + */ +static Plan * +set_mergeappend_references(PlannerInfo *root, + MergeAppend *mplan, + int rtoffset) +{ + ListCell *l; + + /* + * MergeAppend, like Sort et al, doesn't actually evaluate its targetlist + * or check quals. If it's got exactly one child plan, then it's not + * doing anything useful at all, and we can strip it out. + */ + Assert(mplan->plan.qual == NIL); + + /* First, we gotta recurse on the children */ + foreach(l, mplan->mergeplans) + { + lfirst(l) = set_plan_refs(root, (Plan *) lfirst(l), rtoffset); + } + + /* Now, if there's just one, forget the MergeAppend and return that child */ + if (list_length(mplan->mergeplans) == 1) + return clean_up_removed_plan_level((Plan *) mplan, + (Plan *) linitial(mplan->mergeplans)); + + /* + * Otherwise, clean up the MergeAppend as needed. It's okay to do this + * after recursing to the children, because set_dummy_tlist_references + * doesn't look at those. + */ + set_dummy_tlist_references((Plan *) mplan, rtoffset); + + if (mplan->part_prune_info) + { + foreach(l, mplan->part_prune_info->prune_infos) + { + List *prune_infos = lfirst(l); + ListCell *l2; + + foreach(l2, prune_infos) + { + PartitionedRelPruneInfo *pinfo = lfirst(l2); + + pinfo->rtindex += rtoffset; + } + } + } + + /* We don't need to recurse to lefttree or righttree ... */ + Assert(mplan->plan.lefttree == NULL); + Assert(mplan->plan.righttree == NULL); + + return (Plan *) mplan; +} + + +/* * copyVar * Copy a Var node. * diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index 169e51e..dde9e30 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -1276,7 +1276,21 @@ create_append_path(PlannerInfo *root, Assert(!parallel_aware || pathnode->path.parallel_safe); - cost_append(pathnode); + /* + * If there's exactly one child path, the Append is a no-op and will be + * discarded later (in setrefs.c); therefore, we can inherit the child's + * size and cost. Otherwise, do the normal costsize calculation. + */ + if (list_length(pathnode->subpaths) == 1) + { + Path *child = (Path *) linitial(pathnode->subpaths); + + pathnode->path.rows = child->rows; + pathnode->path.startup_cost = child->startup_cost; + pathnode->path.total_cost = child->total_cost; + } + else + cost_append(pathnode); /* If the caller provided a row estimate, override the computed value. */ if (rows >= 0) @@ -1408,11 +1422,21 @@ create_merge_append_path(PlannerInfo *root, Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer)); } - /* Now we can compute total costs of the MergeAppend */ - cost_merge_append(&pathnode->path, root, - pathkeys, list_length(subpaths), - input_startup_cost, input_total_cost, - pathnode->path.rows); + /* + * Now we can compute total costs of the MergeAppend. If there's exactly + * one child path, the MergeAppend is a no-op and will be discarded later + * (in setrefs.c); otherwise we do the normal cost calculation. + */ + if (list_length(subpaths) == 1) + { + pathnode->path.startup_cost = input_startup_cost; + pathnode->path.total_cost = input_total_cost; + } + else + cost_merge_append(&pathnode->path, root, + pathkeys, list_length(subpaths), + input_startup_cost, input_total_cost, + pathnode->path.rows); return pathnode; } diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out index 565d947..7518148 100644 --- a/src/test/regress/expected/inherit.out +++ b/src/test/regress/expected/inherit.out @@ -1770,12 +1770,11 @@ explain (costs off) select * from list_parted; (4 rows) explain (costs off) select * from list_parted where a is null; - QUERY PLAN --------------------------------- - Append - -> Seq Scan on part_null_xy - Filter: (a IS NULL) -(3 rows) + QUERY PLAN +-------------------------- + Seq Scan on part_null_xy + Filter: (a IS NULL) +(2 rows) explain (costs off) select * from list_parted where a is not null; QUERY PLAN @@ -1800,20 +1799,18 @@ explain (costs off) select * from list_parted where a in ('ab', 'cd', 'ef'); (5 rows) explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'); - QUERY PLAN ---------------------------------------------------------------------------------------- - Append - -> Seq Scan on part_ab_cd - Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[]))) -(3 rows) + QUERY PLAN +--------------------------------------------------------------------------------- + Seq Scan on part_ab_cd + Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[]))) +(2 rows) explain (costs off) select * from list_parted where a = 'ab'; - QUERY PLAN ------------------------------------------- - Append - -> Seq Scan on part_ab_cd - Filter: ((a)::text = 'ab'::text) -(3 rows) + QUERY PLAN +------------------------------------ + Seq Scan on part_ab_cd + Filter: ((a)::text = 'ab'::text) +(2 rows) create table range_list_parted ( a int, @@ -1893,12 +1890,11 @@ explain (costs off) select * from range_list_parted where a is null; /* Should only select rows from the null-accepting partition */ explain (costs off) select * from range_list_parted where b is null; - QUERY PLAN ------------------------------------- - Append - -> Seq Scan on part_40_inf_null - Filter: (b IS NULL) -(3 rows) + QUERY PLAN +------------------------------ + Seq Scan on part_40_inf_null + Filter: (b IS NULL) +(2 rows) explain (costs off) select * from range_list_parted where a is not null and a < 67; QUERY PLAN @@ -2021,12 +2017,11 @@ explain (costs off) select * from mcrparted where a > -1; -- scans all partition (15 rows) explain (costs off) select * from mcrparted where a = 20 and abs(b) = 10 and c > 10; -- scans mcrparted4 - QUERY PLAN ------------------------------------------------------------ - Append - -> Seq Scan on mcrparted4 - Filter: ((c > 10) AND (a = 20) AND (abs(b) = 10)) -(3 rows) + QUERY PLAN +----------------------------------------------------- + Seq Scan on mcrparted4 + Filter: ((c > 10) AND (a = 20) AND (abs(b) = 10)) +(2 rows) explain (costs off) select * from mcrparted where a = 20 and c > 20; -- scans mcrparted3, mcrparte4, mcrparte5, mcrparted_def QUERY PLAN @@ -2050,22 +2045,18 @@ create table parted_minmax1 partition of parted_minmax for values from (1) to (1 create index parted_minmax1i on parted_minmax1 (a, b); insert into parted_minmax values (1,'12345'); explain (costs off) select min(a), max(a) from parted_minmax where b = '12345'; - QUERY PLAN -------------------------------------------------------------------------------------------------------- + QUERY PLAN +------------------------------------------------------------------------------------------------- Result InitPlan 1 (returns $0) -> Limit - -> Merge Append - Sort Key: parted_minmax1.a - -> Index Only Scan using parted_minmax1i on parted_minmax1 - Index Cond: ((a IS NOT NULL) AND (b = '12345'::text)) + -> Index Only Scan using parted_minmax1i on parted_minmax1 + Index Cond: ((a IS NOT NULL) AND (b = '12345'::text)) InitPlan 2 (returns $1) -> Limit - -> Merge Append - Sort Key: parted_minmax1_1.a DESC - -> Index Only Scan Backward using parted_minmax1i on parted_minmax1 parted_minmax1_1 - Index Cond: ((a IS NOT NULL) AND (b = '12345'::text)) -(13 rows) + -> Index Only Scan Backward using parted_minmax1i on parted_minmax1 parted_minmax1_1 + Index Cond: ((a IS NOT NULL) AND (b = '12345'::text)) +(9 rows) select min(a), max(a) from parted_minmax where b = '12345'; min | max diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out index bbdc373..e19535d 100644 --- a/src/test/regress/expected/partition_join.out +++ b/src/test/regress/expected/partition_join.out @@ -186,19 +186,18 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM (SELECT 50 phv, * FROM prt1 WHERE prt1.b = 0) -- Join with pruned partitions from joining relations EXPLAIN (COSTS OFF) SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.b AND t1.a < 450 AND t2.b > 250 AND t1.b = 0 ORDER BYt1.a, t2.b; - QUERY PLAN ------------------------------------------------------------ + QUERY PLAN +----------------------------------------------------- Sort Sort Key: t1.a - -> Append - -> Hash Join - Hash Cond: (t2.b = t1.a) - -> Seq Scan on prt2_p2 t2 - Filter: (b > 250) - -> Hash - -> Seq Scan on prt1_p2 t1 - Filter: ((a < 450) AND (b = 0)) -(10 rows) + -> Hash Join + Hash Cond: (t2.b = t1.a) + -> Seq Scan on prt2_p2 t2 + Filter: (b > 250) + -> Hash + -> Seq Scan on prt1_p2 t1 + Filter: ((a < 450) AND (b = 0)) +(9 rows) SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.b AND t1.a < 450 AND t2.b > 250 AND t1.b = 0 ORDER BYt1.a, t2.b; a | c | b | c @@ -1480,10 +1479,9 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_l t1, prt2_l t2 WHERE t1.a = t2.b AND t1 -> Seq Scan on prt2_l_p3_p1 t2_3 -> Seq Scan on prt2_l_p3_p2 t2_4 -> Hash - -> Append - -> Seq Scan on prt1_l_p3_p1 t1_3 - Filter: (b = 0) -(29 rows) + -> Seq Scan on prt1_l_p3_p1 t1_3 + Filter: (b = 0) +(28 rows) SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_l t1, prt2_l t2 WHERE t1.a = t2.b AND t1.b = 0 ORDER BY t1.a, t2.b; a | c | b | c @@ -1526,10 +1524,9 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_l t1 LEFT JOIN prt2_l t2 ON t1.a = t2.b -> Seq Scan on prt2_l_p3_p1 t2_3 -> Seq Scan on prt2_l_p3_p2 t2_4 -> Hash - -> Append - -> Seq Scan on prt1_l_p3_p1 t1_3 - Filter: (b = 0) -(30 rows) + -> Seq Scan on prt1_l_p3_p1 t1_3 + Filter: (b = 0) +(29 rows) SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_l t1 LEFT JOIN prt2_l t2 ON t1.a = t2.b AND t1.c = t2.c WHERE t1.b = 0 ORDER BYt1.a, t2.b; a | c | b | c @@ -1580,10 +1577,9 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_l t1 RIGHT JOIN prt2_l t2 ON t1.a = t2.b -> Seq Scan on prt1_l_p3_p1 t1_3 -> Seq Scan on prt1_l_p3_p2 t1_4 -> Hash - -> Append - -> Seq Scan on prt2_l_p3_p1 t2_3 - Filter: (a = 0) -(30 rows) + -> Seq Scan on prt2_l_p3_p1 t2_3 + Filter: (a = 0) +(29 rows) SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_l t1 RIGHT JOIN prt2_l t2 ON t1.a = t2.b AND t1.c = t2.c WHERE t2.a = 0 ORDER BYt1.a, t2.b; a | c | b | c @@ -1629,14 +1625,12 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM (SELECT * FROM prt1_l WHERE prt1_l.b = 0) t1 Filter: (a = 0) -> Hash Full Join Hash Cond: ((prt1_l_p3_p1.a = prt2_l_p3_p1.b) AND ((prt1_l_p3_p1.c)::text = (prt2_l_p3_p1.c)::text)) - -> Append - -> Seq Scan on prt1_l_p3_p1 - Filter: (b = 0) + -> Seq Scan on prt1_l_p3_p1 + Filter: (b = 0) -> Hash - -> Append - -> Seq Scan on prt2_l_p3_p1 - Filter: (a = 0) -(33 rows) + -> Seq Scan on prt2_l_p3_p1 + Filter: (a = 0) +(31 rows) SELECT t1.a, t1.c, t2.b, t2.c FROM (SELECT * FROM prt1_l WHERE prt1_l.b = 0) t1 FULL JOIN (SELECT * FROM prt2_l WHERE prt2_l.a= 0) t2 ON (t1.a = t2.b AND t1.c = t2.c) ORDER BY t1.a, t2.b; a | c | b | c @@ -1697,9 +1691,8 @@ SELECT * FROM prt1_l t1 LEFT JOIN LATERAL -> Seq Scan on prt1_l_p2_p2 t2_2 Filter: ((t1_2.a = a) AND ((t1_2.c)::text = (c)::text)) -> Nested Loop Left Join - -> Append - -> Seq Scan on prt1_l_p3_p1 t1_3 - Filter: (b = 0) + -> Seq Scan on prt1_l_p3_p1 t1_3 + Filter: (b = 0) -> Hash Join Hash Cond: ((t3_3.b = t2_3.a) AND ((t3_3.c)::text = (t2_3.c)::text)) -> Append @@ -1711,7 +1704,7 @@ SELECT * FROM prt1_l t1 LEFT JOIN LATERAL Filter: ((t1_3.a = a) AND ((t1_3.c)::text = (c)::text)) -> Seq Scan on prt1_l_p3_p2 t2_4 Filter: ((t1_3.a = a) AND ((t1_3.c)::text = (c)::text)) -(45 rows) +(44 rows) SELECT * FROM prt1_l t1 LEFT JOIN LATERAL (SELECT t2.a AS t2a, t2.c AS t2c, t2.b AS t2b, t3.b AS t3b, least(t1.a,t2.a,t3.b) FROM prt1_l t2 JOIN prt2_lt3 ON (t2.a = t3.b AND t2.c = t3.c)) ss diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out index 30946f7..7659ef7 100644 --- a/src/test/regress/expected/partition_prune.out +++ b/src/test/regress/expected/partition_prune.out @@ -43,20 +43,18 @@ explain (costs off) select * from lp where a > 'a' and a <= 'd'; (7 rows) explain (costs off) select * from lp where a = 'a'; - QUERY PLAN ------------------------------------ - Append - -> Seq Scan on lp_ad - Filter: (a = 'a'::bpchar) -(3 rows) + QUERY PLAN +----------------------------- + Seq Scan on lp_ad + Filter: (a = 'a'::bpchar) +(2 rows) explain (costs off) select * from lp where 'a' = a; /* commuted */ - QUERY PLAN ------------------------------------ - Append - -> Seq Scan on lp_ad - Filter: ('a'::bpchar = a) -(3 rows) + QUERY PLAN +----------------------------- + Seq Scan on lp_ad + Filter: ('a'::bpchar = a) +(2 rows) explain (costs off) select * from lp where a is not null; QUERY PLAN @@ -75,12 +73,11 @@ explain (costs off) select * from lp where a is not null; (11 rows) explain (costs off) select * from lp where a is null; - QUERY PLAN ------------------------------ - Append - -> Seq Scan on lp_null - Filter: (a IS NULL) -(3 rows) + QUERY PLAN +----------------------- + Seq Scan on lp_null + Filter: (a IS NULL) +(2 rows) explain (costs off) select * from lp where a = 'a' or a = 'c'; QUERY PLAN @@ -150,12 +147,11 @@ create table coll_pruning_a partition of coll_pruning for values in ('a'); create table coll_pruning_b partition of coll_pruning for values in ('b'); create table coll_pruning_def partition of coll_pruning default; explain (costs off) select * from coll_pruning where a collate "C" = 'a' collate "C"; - QUERY PLAN ---------------------------------------------- - Append - -> Seq Scan on coll_pruning_a - Filter: (a = 'a'::text COLLATE "C") -(3 rows) + QUERY PLAN +--------------------------------------- + Seq Scan on coll_pruning_a + Filter: (a = 'a'::text COLLATE "C") +(2 rows) -- collation doesn't match the partitioning collation, no pruning occurs explain (costs off) select * from coll_pruning where a collate "POSIX" = 'a' collate "POSIX"; @@ -192,20 +188,18 @@ create table rlp5 partition of rlp for values from (31) to (maxvalue) partition create table rlp5_default partition of rlp5 default; create table rlp5_1 partition of rlp5 for values from (31) to (40); explain (costs off) select * from rlp where a < 1; - QUERY PLAN -------------------------- - Append - -> Seq Scan on rlp1 - Filter: (a < 1) -(3 rows) + QUERY PLAN +------------------- + Seq Scan on rlp1 + Filter: (a < 1) +(2 rows) explain (costs off) select * from rlp where 1 > a; /* commuted */ - QUERY PLAN -------------------------- - Append - -> Seq Scan on rlp1 - Filter: (1 > a) -(3 rows) + QUERY PLAN +------------------- + Seq Scan on rlp1 + Filter: (1 > a) +(2 rows) explain (costs off) select * from rlp where a <= 1; QUERY PLAN @@ -218,20 +212,18 @@ explain (costs off) select * from rlp where a <= 1; (5 rows) explain (costs off) select * from rlp where a = 1; - QUERY PLAN -------------------------- - Append - -> Seq Scan on rlp2 - Filter: (a = 1) -(3 rows) + QUERY PLAN +------------------- + Seq Scan on rlp2 + Filter: (a = 1) +(2 rows) explain (costs off) select * from rlp where a = 1::bigint; /* same as above */ - QUERY PLAN ------------------------------------ - Append - -> Seq Scan on rlp2 - Filter: (a = '1'::bigint) -(3 rows) + QUERY PLAN +----------------------------- + Seq Scan on rlp2 + Filter: (a = '1'::bigint) +(2 rows) explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */ QUERY PLAN @@ -384,20 +376,18 @@ explain (costs off) select * from rlp where a = 16; (9 rows) explain (costs off) select * from rlp where a = 16 and b in ('not', 'in', 'here'); - QUERY PLAN ----------------------------------------------------------------------------- - Append - -> Seq Scan on rlp3_default - Filter: ((a = 16) AND ((b)::text = ANY ('{not,in,here}'::text[]))) -(3 rows) + QUERY PLAN +---------------------------------------------------------------------- + Seq Scan on rlp3_default + Filter: ((a = 16) AND ((b)::text = ANY ('{not,in,here}'::text[]))) +(2 rows) explain (costs off) select * from rlp where a = 16 and b < 'ab'; - QUERY PLAN ---------------------------------------------------------- - Append - -> Seq Scan on rlp3_default - Filter: (((b)::text < 'ab'::text) AND (a = 16)) -(3 rows) + QUERY PLAN +--------------------------------------------------- + Seq Scan on rlp3_default + Filter: (((b)::text < 'ab'::text) AND (a = 16)) +(2 rows) explain (costs off) select * from rlp where a = 16 and b <= 'ab'; QUERY PLAN @@ -410,12 +400,11 @@ explain (costs off) select * from rlp where a = 16 and b <= 'ab'; (5 rows) explain (costs off) select * from rlp where a = 16 and b is null; - QUERY PLAN --------------------------------------------- - Append - -> Seq Scan on rlp3nullxy - Filter: ((b IS NULL) AND (a = 16)) -(3 rows) + QUERY PLAN +-------------------------------------- + Seq Scan on rlp3nullxy + Filter: ((b IS NULL) AND (a = 16)) +(2 rows) explain (costs off) select * from rlp where a = 16 and b is not null; QUERY PLAN @@ -432,12 +421,11 @@ explain (costs off) select * from rlp where a = 16 and b is not null; (9 rows) explain (costs off) select * from rlp where a is null; - QUERY PLAN ------------------------------------- - Append - -> Seq Scan on rlp_default_null - Filter: (a IS NULL) -(3 rows) + QUERY PLAN +------------------------------ + Seq Scan on rlp_default_null + Filter: (a IS NULL) +(2 rows) explain (costs off) select * from rlp where a is not null; QUERY PLAN @@ -486,12 +474,11 @@ explain (costs off) select * from rlp where a > 30; (7 rows) explain (costs off) select * from rlp where a = 30; /* only default is scanned */ - QUERY PLAN ----------------------------------- - Append - -> Seq Scan on rlp_default_30 - Filter: (a = 30) -(3 rows) + QUERY PLAN +---------------------------- + Seq Scan on rlp_default_30 + Filter: (a = 30) +(2 rows) explain (costs off) select * from rlp where a <= 31; QUERY PLAN @@ -528,12 +515,11 @@ explain (costs off) select * from rlp where a <= 31; (29 rows) explain (costs off) select * from rlp where a = 1 or a = 7; - QUERY PLAN --------------------------------------- - Append - -> Seq Scan on rlp2 - Filter: ((a = 1) OR (a = 7)) -(3 rows) + QUERY PLAN +-------------------------------- + Seq Scan on rlp2 + Filter: ((a = 1) OR (a = 7)) +(2 rows) explain (costs off) select * from rlp where a = 1 or b = 'ab'; QUERY PLAN @@ -580,12 +566,11 @@ explain (costs off) select * from rlp where a > 20 and a < 27; (9 rows) explain (costs off) select * from rlp where a = 29; - QUERY PLAN --------------------------------- - Append - -> Seq Scan on rlp4_default - Filter: (a = 29) -(3 rows) + QUERY PLAN +-------------------------- + Seq Scan on rlp4_default + Filter: (a = 29) +(2 rows) explain (costs off) select * from rlp where a >= 29; QUERY PLAN @@ -605,12 +590,11 @@ explain (costs off) select * from rlp where a >= 29; -- redundant clauses are eliminated explain (costs off) select * from rlp where a > 1 and a = 10; /* only default */ - QUERY PLAN ----------------------------------------- - Append - -> Seq Scan on rlp_default_10 - Filter: ((a > 1) AND (a = 10)) -(3 rows) + QUERY PLAN +---------------------------------- + Seq Scan on rlp_default_10 + Filter: ((a > 1) AND (a = 10)) +(2 rows) explain (costs off) select * from rlp where a > 1 and a >=15; /* rlp3 onwards, including default */ QUERY PLAN @@ -797,20 +781,18 @@ explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10; (9 rows) explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; - QUERY PLAN ---------------------------------------------- - Append - -> Seq Scan on mc3p_default - Filter: ((a = 11) AND (abs(b) = 0)) -(3 rows) + QUERY PLAN +--------------------------------------- + Seq Scan on mc3p_default + Filter: ((a = 11) AND (abs(b) = 0)) +(2 rows) explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100; - QUERY PLAN ------------------------------------------------------------- - Append - -> Seq Scan on mc3p6 - Filter: ((a = 20) AND (c = 100) AND (abs(b) = 10)) -(3 rows) + QUERY PLAN +------------------------------------------------------ + Seq Scan on mc3p6 + Filter: ((a = 20) AND (c = 100) AND (abs(b) = 10)) +(2 rows) explain (costs off) select * from mc3p where a > 20; QUERY PLAN @@ -962,12 +944,11 @@ explain (costs off) select * from mc2p where a < 2; (9 rows) explain (costs off) select * from mc2p where a = 2 and b < 1; - QUERY PLAN ---------------------------------------- - Append - -> Seq Scan on mc2p3 - Filter: ((b < 1) AND (a = 2)) -(3 rows) + QUERY PLAN +--------------------------------- + Seq Scan on mc2p3 + Filter: ((b < 1) AND (a = 2)) +(2 rows) explain (costs off) select * from mc2p where a > 1; QUERY PLAN @@ -986,53 +967,47 @@ explain (costs off) select * from mc2p where a > 1; (11 rows) explain (costs off) select * from mc2p where a = 1 and b > 1; - QUERY PLAN ---------------------------------------- - Append - -> Seq Scan on mc2p2 - Filter: ((b > 1) AND (a = 1)) -(3 rows) + QUERY PLAN +--------------------------------- + Seq Scan on mc2p2 + Filter: ((b > 1) AND (a = 1)) +(2 rows) -- all partitions but the default one should be pruned explain (costs off) select * from mc2p where a = 1 and b is null; - QUERY PLAN -------------------------------------------- - Append - -> Seq Scan on mc2p_default - Filter: ((b IS NULL) AND (a = 1)) -(3 rows) + QUERY PLAN +------------------------------------- + Seq Scan on mc2p_default + Filter: ((b IS NULL) AND (a = 1)) +(2 rows) explain (costs off) select * from mc2p where a is null and b is null; - QUERY PLAN ------------------------------------------------ - Append - -> Seq Scan on mc2p_default - Filter: ((a IS NULL) AND (b IS NULL)) -(3 rows) + QUERY PLAN +----------------------------------------- + Seq Scan on mc2p_default + Filter: ((a IS NULL) AND (b IS NULL)) +(2 rows) explain (costs off) select * from mc2p where a is null and b = 1; - QUERY PLAN -------------------------------------------- - Append - -> Seq Scan on mc2p_default - Filter: ((a IS NULL) AND (b = 1)) -(3 rows) + QUERY PLAN +------------------------------------- + Seq Scan on mc2p_default + Filter: ((a IS NULL) AND (b = 1)) +(2 rows) explain (costs off) select * from mc2p where a is null; - QUERY PLAN --------------------------------- - Append - -> Seq Scan on mc2p_default - Filter: (a IS NULL) -(3 rows) + QUERY PLAN +-------------------------- + Seq Scan on mc2p_default + Filter: (a IS NULL) +(2 rows) explain (costs off) select * from mc2p where b is null; - QUERY PLAN --------------------------------- - Append - -> Seq Scan on mc2p_default - Filter: (b IS NULL) -(3 rows) + QUERY PLAN +-------------------------- + Seq Scan on mc2p_default + Filter: (b IS NULL) +(2 rows) -- boolean partitioning create table boolpart (a bool) partition by list (a); @@ -1050,20 +1025,18 @@ explain (costs off) select * from boolpart where a in (true, false); (5 rows) explain (costs off) select * from boolpart where a = false; - QUERY PLAN ------------------------------- - Append - -> Seq Scan on boolpart_f - Filter: (NOT a) -(3 rows) + QUERY PLAN +------------------------ + Seq Scan on boolpart_f + Filter: (NOT a) +(2 rows) explain (costs off) select * from boolpart where not a = false; - QUERY PLAN ------------------------------- - Append - -> Seq Scan on boolpart_t - Filter: a -(3 rows) + QUERY PLAN +------------------------ + Seq Scan on boolpart_t + Filter: a +(2 rows) explain (costs off) select * from boolpart where a is true or a is not true; QUERY PLAN @@ -1076,12 +1049,11 @@ explain (costs off) select * from boolpart where a is true or a is not true; (5 rows) explain (costs off) select * from boolpart where a is not true; - QUERY PLAN ---------------------------------- - Append - -> Seq Scan on boolpart_f - Filter: (a IS NOT TRUE) -(3 rows) + QUERY PLAN +--------------------------- + Seq Scan on boolpart_f + Filter: (a IS NOT TRUE) +(2 rows) explain (costs off) select * from boolpart where a is not true and a is not false; QUERY PLAN @@ -1190,10 +1162,9 @@ EXPLAIN (COSTS OFF) SELECT tableoid::regclass as part, a, b FROM part WHERE a IS --------------------------------------------------------------------------- Sort Sort Key: ((part_p2_p1.tableoid)::regclass), part_p2_p1.a, part_p2_p1.b - -> Append - -> Seq Scan on part_p2_p1 - Filter: (a IS NULL) -(5 rows) + -> Seq Scan on part_p2_p1 + Filter: (a IS NULL) +(4 rows) -- -- some more cases @@ -1260,13 +1231,12 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 -- also here, because values for all keys are provided explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c= 1) s where t1.a = 1; - QUERY PLAN --------------------------------------------------------------------- + QUERY PLAN +-------------------------------------------------------------- Nested Loop -> Aggregate - -> Append - -> Seq Scan on mc3p1 t2 - Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1)) + -> Seq Scan on mc3p1 t2 + Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1)) -> Append -> Seq Scan on mc2p1 t1 Filter: (a = 1) @@ -1274,7 +1244,7 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 Filter: (a = 1) -> Seq Scan on mc2p_default t1_2 Filter: (a = 1) -(12 rows) +(11 rows) -- -- pruning with clauses containing <> operator @@ -1395,12 +1365,11 @@ explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' co -- pruning, with values provided for both keys explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate"POSIX"; - QUERY PLAN ---------------------------------------------------------------------------------------------------------- - Append - -> Seq Scan on coll_pruning_multi2 - Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX")) -(3 rows) + QUERY PLAN +--------------------------------------------------------------------------------------------------- + Seq Scan on coll_pruning_multi2 + Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX")) +(2 rows) -- -- LIKE operators don't prune @@ -1445,12 +1414,11 @@ explain (costs off) select * from rparted_by_int2 where a > 100000000000000; create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue); -- all partitions but rparted_by_int2_maxvalue pruned explain (costs off) select * from rparted_by_int2 where a > 100000000000000; - QUERY PLAN -------------------------------------------------- - Append - -> Seq Scan on rparted_by_int2_maxvalue - Filter: (a > '100000000000000'::bigint) -(3 rows) + QUERY PLAN +------------------------------------------- + Seq Scan on rparted_by_int2_maxvalue + Filter: (a > '100000000000000'::bigint) +(2 rows) drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2; -- @@ -1584,52 +1552,46 @@ explain (costs off) select * from hp where a <> 1 and b <> 'xxx'; -- pruning should work if either a value or a IS NULL clause is provided for -- each of the keys explain (costs off) select * from hp where a is null and b is null; - QUERY PLAN ------------------------------------------------ - Append - -> Seq Scan on hp0 - Filter: ((a IS NULL) AND (b IS NULL)) -(3 rows) + QUERY PLAN +----------------------------------------- + Seq Scan on hp0 + Filter: ((a IS NULL) AND (b IS NULL)) +(2 rows) explain (costs off) select * from hp where a = 1 and b is null; - QUERY PLAN -------------------------------------------- - Append - -> Seq Scan on hp1 - Filter: ((b IS NULL) AND (a = 1)) -(3 rows) + QUERY PLAN +------------------------------------- + Seq Scan on hp1 + Filter: ((b IS NULL) AND (a = 1)) +(2 rows) explain (costs off) select * from hp where a = 1 and b = 'xxx'; - QUERY PLAN -------------------------------------------------- - Append - -> Seq Scan on hp0 - Filter: ((a = 1) AND (b = 'xxx'::text)) -(3 rows) + QUERY PLAN +------------------------------------------- + Seq Scan on hp0 + Filter: ((a = 1) AND (b = 'xxx'::text)) +(2 rows) explain (costs off) select * from hp where a is null and b = 'xxx'; - QUERY PLAN ------------------------------------------------------ - Append - -> Seq Scan on hp2 - Filter: ((a IS NULL) AND (b = 'xxx'::text)) -(3 rows) + QUERY PLAN +----------------------------------------------- + Seq Scan on hp2 + Filter: ((a IS NULL) AND (b = 'xxx'::text)) +(2 rows) explain (costs off) select * from hp where a = 2 and b = 'xxx'; - QUERY PLAN -------------------------------------------------- - Append - -> Seq Scan on hp3 - Filter: ((a = 2) AND (b = 'xxx'::text)) -(3 rows) + QUERY PLAN +------------------------------------------- + Seq Scan on hp3 + Filter: ((a = 2) AND (b = 'xxx'::text)) +(2 rows) explain (costs off) select * from hp where a = 1 and b = 'abcde'; - QUERY PLAN ---------------------------------------------------- - Append - -> Seq Scan on hp2 - Filter: ((a = 1) AND (b = 'abcde'::text)) -(3 rows) + QUERY PLAN +--------------------------------------------- + Seq Scan on hp2 + Filter: ((a = 1) AND (b = 'abcde'::text)) +(2 rows) explain (costs off) select * from hp where (a = 1 and b = 'abcde') or (a = 2 and b = 'xxx') or (a is null and b is null); QUERY PLAN @@ -2878,12 +2840,11 @@ execute part_abc_q1 (1, 2, 3); -- Single partition should be scanned. explain (analyze, costs off, summary off, timing off) execute part_abc_q1 (1, 2, 3); - QUERY PLAN -------------------------------------------------------- - Append (actual rows=0 loops=1) - -> Seq Scan on part_abc_p1 (actual rows=0 loops=1) - Filter: ((a = $1) AND (b = $2) AND (c = $3)) -(3 rows) + QUERY PLAN +------------------------------------------------- + Seq Scan on part_abc_p1 (actual rows=0 loops=1) + Filter: ((a = $1) AND (b = $2) AND (c = $3)) +(2 rows) deallocate part_abc_q1; drop table part_abc; @@ -3205,12 +3166,11 @@ create table pp_arrpart (a int[]) partition by list (a); create table pp_arrpart1 partition of pp_arrpart for values in ('{1}'); create table pp_arrpart2 partition of pp_arrpart for values in ('{2, 3}', '{4, 5}'); explain (costs off) select * from pp_arrpart where a = '{1}'; - QUERY PLAN ----------------------------------------- - Append - -> Seq Scan on pp_arrpart1 - Filter: (a = '{1}'::integer[]) -(3 rows) + QUERY PLAN +---------------------------------- + Seq Scan on pp_arrpart1 + Filter: (a = '{1}'::integer[]) +(2 rows) explain (costs off) select * from pp_arrpart where a = '{1, 2}'; QUERY PLAN @@ -3262,20 +3222,18 @@ select tableoid::regclass, * from pph_arrpart order by 1; (3 rows) explain (costs off) select * from pph_arrpart where a = '{1}'; - QUERY PLAN ----------------------------------------- - Append - -> Seq Scan on pph_arrpart2 - Filter: (a = '{1}'::integer[]) -(3 rows) + QUERY PLAN +---------------------------------- + Seq Scan on pph_arrpart2 + Filter: (a = '{1}'::integer[]) +(2 rows) explain (costs off) select * from pph_arrpart where a = '{1, 2}'; - QUERY PLAN ------------------------------------------- - Append - -> Seq Scan on pph_arrpart1 - Filter: (a = '{1,2}'::integer[]) -(3 rows) + QUERY PLAN +------------------------------------ + Seq Scan on pph_arrpart1 + Filter: (a = '{1,2}'::integer[]) +(2 rows) explain (costs off) select * from pph_arrpart where a in ('{4, 5}', '{1}'); QUERY PLAN @@ -3294,12 +3252,11 @@ create table pp_enumpart (a pp_colors) partition by list (a); create table pp_enumpart_green partition of pp_enumpart for values in ('green'); create table pp_enumpart_blue partition of pp_enumpart for values in ('blue'); explain (costs off) select * from pp_enumpart where a = 'blue'; - QUERY PLAN ------------------------------------------ - Append - -> Seq Scan on pp_enumpart_blue - Filter: (a = 'blue'::pp_colors) -(3 rows) + QUERY PLAN +----------------------------------- + Seq Scan on pp_enumpart_blue + Filter: (a = 'blue'::pp_colors) +(2 rows) explain (costs off) select * from pp_enumpart where a = 'black'; QUERY PLAN @@ -3316,12 +3273,11 @@ create table pp_recpart (a pp_rectype) partition by list (a); create table pp_recpart_11 partition of pp_recpart for values in ('(1,1)'); create table pp_recpart_23 partition of pp_recpart for values in ('(2,3)'); explain (costs off) select * from pp_recpart where a = '(1,1)'::pp_rectype; - QUERY PLAN -------------------------------------------- - Append - -> Seq Scan on pp_recpart_11 - Filter: (a = '(1,1)'::pp_rectype) -(3 rows) + QUERY PLAN +------------------------------------- + Seq Scan on pp_recpart_11 + Filter: (a = '(1,1)'::pp_rectype) +(2 rows) explain (costs off) select * from pp_recpart where a = '(1,2)'::pp_rectype; QUERY PLAN @@ -3337,12 +3293,11 @@ create table pp_intrangepart (a int4range) partition by list (a); create table pp_intrangepart12 partition of pp_intrangepart for values in ('[1,2]'); create table pp_intrangepart2inf partition of pp_intrangepart for values in ('[2,)'); explain (costs off) select * from pp_intrangepart where a = '[1,2]'::int4range; - QUERY PLAN ------------------------------------------- - Append - -> Seq Scan on pp_intrangepart12 - Filter: (a = '[1,3)'::int4range) -(3 rows) + QUERY PLAN +------------------------------------ + Seq Scan on pp_intrangepart12 + Filter: (a = '[1,3)'::int4range) +(2 rows) explain (costs off) select * from pp_intrangepart where a = '(1,2)'::int4range; QUERY PLAN @@ -3359,12 +3314,11 @@ create table pp_lp (a int, value int) partition by list (a); create table pp_lp1 partition of pp_lp for values in(1); create table pp_lp2 partition of pp_lp for values in(2); explain (costs off) select * from pp_lp where a = 1; - QUERY PLAN --------------------------- - Append - -> Seq Scan on pp_lp1 - Filter: (a = 1) -(3 rows) + QUERY PLAN +-------------------- + Seq Scan on pp_lp1 + Filter: (a = 1) +(2 rows) explain (costs off) update pp_lp set value = 10 where a = 1; QUERY PLAN @@ -3529,12 +3483,11 @@ explain (costs off) select * from pp_temp_parent where true; (3 rows) explain (costs off) select * from pp_temp_parent where a = 2; - QUERY PLAN ------------------------------------- - Append - -> Seq Scan on pp_temp_part_def - Filter: (a = 2) -(3 rows) + QUERY PLAN +------------------------------ + Seq Scan on pp_temp_part_def + Filter: (a = 2) +(2 rows) drop table pp_temp_parent; -- Stress run-time partition pruning a bit more, per bug reports @@ -3628,13 +3581,12 @@ create table listp2 partition of listp for values in(2) partition by list(b); create table listp2_10 partition of listp2 for values in (10); explain (analyze, costs off, summary off, timing off) select * from listp where a = (select 2) and b <> 10; - QUERY PLAN -------------------------------------------- - Append (actual rows=0 loops=1) + QUERY PLAN +-------------------------------------------- + Seq Scan on listp1 (actual rows=0 loops=1) + Filter: ((b <> 10) AND (a = $0)) InitPlan 1 (returns $0) - -> Result (actual rows=1 loops=1) - -> Seq Scan on listp1 (never executed) - Filter: ((b <> 10) AND (a = $0)) -(5 rows) + -> Result (never executed) +(4 rows) drop table listp; diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out index 2e17049..0cc5851 100644 --- a/src/test/regress/expected/rowsecurity.out +++ b/src/test/regress/expected/rowsecurity.out @@ -1057,15 +1057,14 @@ NOTICE: f_leak => awesome science fiction (4 rows) EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle); - QUERY PLAN --------------------------------------------------------------------- - Append + QUERY PLAN +-------------------------------------------------------------- + Seq Scan on part_document_fiction + Filter: ((cid < 55) AND (dlevel <= $0) AND f_leak(dtitle)) InitPlan 1 (returns $0) -> Index Scan using uaccount_pkey on uaccount Index Cond: (pguser = CURRENT_USER) - -> Seq Scan on part_document_fiction - Filter: ((cid < 55) AND (dlevel <= $0) AND f_leak(dtitle)) -(6 rows) +(5 rows) -- pp1 ERROR INSERT INTO part_document VALUES (100, 11, 5, 'regress_rls_dave', 'testing pp1'); -- fail @@ -1136,15 +1135,14 @@ NOTICE: f_leak => awesome science fiction (4 rows) EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle); - QUERY PLAN --------------------------------------------------------------------- - Append + QUERY PLAN +-------------------------------------------------------------- + Seq Scan on part_document_fiction + Filter: ((cid < 55) AND (dlevel <= $0) AND f_leak(dtitle)) InitPlan 1 (returns $0) -> Index Scan using uaccount_pkey on uaccount Index Cond: (pguser = CURRENT_USER) - -> Seq Scan on part_document_fiction - Filter: ((cid < 55) AND (dlevel <= $0) AND f_leak(dtitle)) -(6 rows) +(5 rows) -- viewpoint from regress_rls_carol SET SESSION AUTHORIZATION regress_rls_carol; diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out index 92d427a..da70438 100644 --- a/src/test/regress/expected/union.out +++ b/src/test/regress/expected/union.out @@ -812,11 +812,10 @@ explain (costs off) UNION ALL SELECT 2 AS t, * FROM tenk1 b) c WHERE t = 2; - QUERY PLAN ---------------------------- - Append - -> Seq Scan on tenk1 b -(2 rows) + QUERY PLAN +--------------------- + Seq Scan on tenk1 b +(1 row) -- Test that we push quals into UNION sub-selects only when it's safe explain (costs off)
On Mon, 4 Mar 2019 at 16:26, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <david.rowley@2ndquadrant.com> writes: > > [ v13-0001-Forgo-generating-single-subpath-Append-and-Merge.patch ] > > I continue to think that this is the wrong way to go about it, > and as proof of concept present the attached, which reproduces > all of the regression-test plan changes appearing in v13 --- > with a whole lot less mechanism and next to no added planning > cycles (which certainly cannot be said of v13). Nice. I see you solved the problem I had been complaining about by pulling the surplus Append out of the final plan after setrefs, in which case the varnos are all set to the special varnos, meaning you don't suffer from the same problem with nodes higher in the plan having the varnos of the parent instead of the append child. This method is certainly much much better than what I had. Thanks for taking the time to show me how this should be done. > Also, I wonder why you didn't teach ExecSupportsMarkRestore that a > single-child MergeAppend can support mark/restore. I didn't add such > code here either, but I suspect that's an oversight. The reason for that was that I didn't ever create any single-subpath MergeAppends. I instead created Append paths having isproxy as true. This is slightly confusing as I was just borrowing Append to act as this proxy path and it was only these proxy Appends I was dealing with in ExecSupportsMarkRestore. I'll need to modify your version of the patch as you're keeping single-subpath MergeAppends paths, so ExecSupportsMarkRestore needs to know about those. > One other remark is that the division of labor between > create_[merge]append_path and their costsize.c subroutines > seems pretty unprincipled. I'd be inclined to push all the > relevant logic into costsize.c, but have not done so here. > Moving the existing cost calculations in create_mergeappend_path > into costsize.c would better be done as a separate refactoring patch, > perhaps. The part I don't like about that is the fact that we don't generate sort paths in the MergeAppend subpaths, instead, we rely on checking pathkeys_contained_in() again in create_merge_append_plan() and just creating a sort node, if required. There is some repeat pathkeys checking there but I wonder if we'll see much a performance regression if we go to the trouble of making sort paths. Is this what you meant? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes: > On Mon, 4 Mar 2019 at 16:26, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> One other remark is that the division of labor between >> create_[merge]append_path and their costsize.c subroutines >> seems pretty unprincipled. I'd be inclined to push all the >> relevant logic into costsize.c, but have not done so here. >> Moving the existing cost calculations in create_mergeappend_path >> into costsize.c would better be done as a separate refactoring patch, >> perhaps. > The part I don't like about that is the fact that we don't generate > sort paths in the MergeAppend subpaths, instead, we rely on checking > pathkeys_contained_in() again in create_merge_append_plan() and just > creating a sort node, if required. There is some repeat pathkeys > checking there but I wonder if we'll see much a performance regression > if we go to the trouble of making sort paths. Is this what you meant? No, I'm just suggesting taking the stanza "Add up the sizes and costs of the input paths" out of create_merge_append_path and putting it into cost_merge_append. It seems weird to me that in the plain Append case, cost_append does the adding-up of the child costs, but in the MergeAppend case it's not done similarly. regards, tom lane
On Mon, 4 Mar 2019 at 16:26, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I was a bit surprised to find that I didn't need to fool around > with lying about whether [Merge]Append can project. I've not dug > into the exact reason why, but I suspect it's that previous changes > made in support of parallelization have resulted in ensuring that > we push the upper tlist down to the children anyway, at some earlier > stage. I can get a plan that does end up with Result nodes above a Scan node. create table rangep (a int, b int) partition by range (a); create table rangep1 partition of rangep for values from(0) to (1000000); explain select r1::text from rangep r1 inner join rangep r2 on r1::text = r2::Text order by r1::text; However, if we modify is_projection_capable_plan() similar to how ExecSupportsMarkRestore() has been modified then we'd need to ensure that any changes made to the Append/MergeAppend's tlist also are made to the underlying node's tlist. I'm unsure if that's worth the complexity. What do you think? > I haven't looked into whether this does the right things for parallel > planning --- possibly create_[merge]append_path need to propagate up > parallel-related path fields from the single child? I looked at this. For Append, with the current code in add_paths_to_append_rel() we won't increase the parallel_workers higher than the parallel workers from the single child rel. The current code does: parallel_workers = Max(parallel_workers, fls(list_length(live_childrels))); so fls(1) == 1. However, I wonder if it's better to set that explicitly in create_append_path() for the list_length(pathnode->subpaths) == 1 case. The other issue I found when looking into the parallel code was that partial paths with pathkeys will get made, but will never be used. The reason is that Append can't normally do anything with these as it can't maintain their sort order and MergeAppend is not a parallel aware node. I ended up adding some new code at the bottom of generate_append_paths() to add these paths as single subpaths to Append nodes. This also required changing create_append_path() so that it records the child pathkeys when dealing with a single subpath Append. > Also, I wonder why you didn't teach ExecSupportsMarkRestore that a > single-child MergeAppend can support mark/restore. I didn't add such > code here either, but I suspect that's an oversight. Added code for that. > One other remark is that the division of labor between > create_[merge]append_path and their costsize.c subroutines > seems pretty unprincipled. I'd be inclined to push all the > relevant logic into costsize.c, but have not done so here. > Moving the existing cost calculations in create_mergeappend_path > into costsize.c would better be done as a separate refactoring patch, > perhaps. I've not touched that, but happy to do it as a separate patch. I've attached my changes to your v14 patch. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
David Rowley <david.rowley@2ndquadrant.com> writes: > [ drop-useless-merge-appends-15.patch ] Pushed with some minor adjustments. > I can get a plan that does end up with Result nodes above a Scan node. > create table rangep (a int, b int) partition by range (a); > create table rangep1 partition of rangep for values from(0) to (1000000); > explain select r1::text from rangep r1 inner join rangep r2 on > r1::text = r2::Text order by r1::text; > However, if we modify is_projection_capable_plan() similar to how > ExecSupportsMarkRestore() has been modified then we'd need to ensure > that any changes made to the Append/MergeAppend's tlist also are made > to the underlying node's tlist. I'm unsure if that's worth the > complexity. What do you think? I'm inclined to think not, at least not without more compelling examples than that one ;-). Note that we had Result-above-the-Append before, so we're not regressing for such cases, we're just leaving something on the table. >> One other remark is that the division of labor between >> create_[merge]append_path and their costsize.c subroutines >> seems pretty unprincipled. I'd be inclined to push all the >> relevant logic into costsize.c, but have not done so here. >> Moving the existing cost calculations in create_mergeappend_path >> into costsize.c would better be done as a separate refactoring patch, >> perhaps. > I've not touched that, but happy to do it as a separate patch. After looking at this, I'm less excited than I was before. It seems it'd be only marginally less crufty than the way it is now. In particular, costsize.c would have to be aware of the rule about single-child MergeAppend being removable, which seems a little outside its purview: typically we only ask costsize.c to estimate the cost of a plan node as-presented, not to make decisions about what the shape of the plan tree will be. So I left it alone. regards, tom lane
On Tue, 26 Mar 2019 at 09:00, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <david.rowley@2ndquadrant.com> writes: > > [ drop-useless-merge-appends-15.patch ] > > Pushed with some minor adjustments. Thank for all your work on this, and thanks for the final push. FWIW, you should probably have credited yourself as the main author since I don't think there was much of my patch left. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services