Thread: Parallel Append can break run-time partition pruning
I had a report from the wilds that run-time partition pruning was not working in certain cases. After some investigation and obtaining the mockup of the actual case, I discovered that the problem was down to accumulate_append_subpath() hitting the case where it does not pullup a Parallel Append where the first parallel node is > 0. What's actually happening is that the plan is left with a nested Append, and in this particular case, the top-level Append only has a single subpath, to which the code for 8edd0e794 (Suppress Append and MergeAppend plan nodes that have a single child) causes the nested Append to be pulled up to become the main Append. This causes run-time pruning to break since we only attach the pruning information to the top-level Append. The most simplified test case I can find to demonstrate this issue is: create table list (a int, b int) partition by list(a); create table list_12 partition of list for values in(1,2) partition by list(a); create table list_12_1 partition of list_12 for values in(1); create table list_12_2 partition of list_12 for values in(2); insert into list select 2,0 from generate_Series(1,1000000) x; vacuum analyze list; explain (analyze on, costs off, timing off, summary off) select * from list where a = (select 1) and b > 0; -- force the 2nd subnode of the Append to be non-parallel. alter table list_12_1 set (parallel_workers=0); explain (analyze on, costs off, timing off, summary off) select * from list where a = (select 1) and b > 0; The results of this in master are: postgres=# explain (analyze on, costs off, timing off, summary off) select * from list where a = (select 1) and b > 0; QUERY PLAN --------------------------------------------------------------------------- Gather (actual rows=0 loops=1) Workers Planned: 2 Params Evaluated: $0 Workers Launched: 2 InitPlan 1 (returns $0) -> Result (actual rows=1 loops=1) -> Parallel Append (actual rows=0 loops=3) -> Parallel Seq Scan on list_12_2 list_2 (never executed) Filter: ((b > 0) AND (a = $0)) -> Parallel Seq Scan on list_12_1 list_1 (actual rows=0 loops=1) Filter: ((b > 0) AND (a = $0)) (11 rows) postgres=# alter table list_12_1 set (parallel_workers=0); ALTER TABLE postgres=# explain (analyze on, costs off, timing off, summary off) select * from list where a = (select 1) and b > 0; QUERY PLAN --------------------------------------------------------------------------- Gather (actual rows=0 loops=1) Workers Planned: 2 Params Evaluated: $0 Workers Launched: 2 InitPlan 1 (returns $0) -> Result (actual rows=1 loops=1) -> Parallel Append (actual rows=0 loops=3) -> Seq Scan on list_12_1 list_1 (actual rows=0 loops=1) Filter: ((b > 0) AND (a = $0)) -> Parallel Seq Scan on list_12_2 list_2 (actual rows=0 loops=3) Filter: ((b > 0) AND (a = $0)) Rows Removed by Filter: 333333 (12 rows) Notice that we don't get "(never executed)" for list_12_2 in the 2nd case. I'm a bit divided on what the correct fix is. If I blame Parallel Append for not trying hard enough to pull up the lower Append in accumulate_append_subpath(), then clearly the parallel append code is to blame. However, perhaps run-time pruning should be tagging on PartitionPruneInfo to more than top-level Appends. Fixing the latter case, code-wise is about as simple as removing the "rel->reloptkind == RELOPT_BASEREL &&" line from create_append_plan(). Certainly, if the outer Append hadn't been a single subpath Append, then we wouldn't have pulled up the lower-level Append, so perhaps we should be run-time pruning lower-level ones too. What do other people think? (copying in Robert and Amit K due to their work on Parallel Append, Tom as I seem to remember him complaining about accumulate_append_subpath() at some point and Amit L because... partitioning...) David
On Wed, Apr 15, 2020 at 4:18 PM David Rowley <dgrowleyml@gmail.com> wrote: > I'm a bit divided on what the correct fix is. If I blame Parallel > Append for not trying hard enough to pull up the lower Append in > accumulate_append_subpath(), then clearly the parallel append code is > to blame. I spent some time trying to understand how Append parallelism works and I am tempted to agree with you that there might be problems with how accumulate_append_subpath()'s interacts with parallelism. Maybe it would be better to disregard a non-parallel-aware partial Append if it requires us to fail on flattening a child Append. I have as attached a PoC fix to show that. While a nested Append is not really a problem in general, it appears to me that our run-time code is not in position to work correctly with them, or at least not with how things stand today... > However, perhaps run-time pruning should be tagging on > PartitionPruneInfo to more than top-level Appends. Fixing the latter > case, code-wise is about as simple as removing the "rel->reloptkind == > RELOPT_BASEREL &&" line from create_append_plan(). Certainly, if the > outer Append hadn't been a single subpath Append, then we wouldn't > have pulled up the lower-level Append, so perhaps we should be > run-time pruning lower-level ones too. While looking at this, I observed that the PartitionPruneInfo of the top-level Append (the one that later gets thrown out) contains bogus information: {PARTITIONPRUNEINFO :prune_infos (( {PARTITIONEDRELPRUNEINFO :rtindex 1 :present_parts (b 0) :nparts 1 :subplan_map 0 :subpart_map 1 One of these should be -1. {PARTITIONEDRELPRUNEINFO :rtindex 2 :present_parts (b) :nparts 2 :subplan_map -1 -1 :subpart_map -1 -1 subplan_map values are not correct, because subpaths list that would have been passed would not include paths of lower-level partitions as the flattening didn't occur. )) :other_subplans (b) } I guess the problem is that we let an Append be nested, but don't account for that in how partitioned_rels list it parent Append is constructed. The top-level Append's partitioned_rels should not have contained sub-partitioned table's RTI if it got its own Append. Maybe if we want to make run-time pruning work with nested Appends, we need to fix how partitioned_rels are gathered. -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
Attachment
On Fri, 17 Apr 2020 at 19:08, Amit Langote <amitlangote09@gmail.com> wrote: > While looking at this, I observed that the PartitionPruneInfo of the > top-level Append (the one that later gets thrown out) contains bogus > information: > > {PARTITIONPRUNEINFO > :prune_infos (( > {PARTITIONEDRELPRUNEINFO > :rtindex 1 > :present_parts (b 0) > :nparts 1 > :subplan_map 0 > :subpart_map 1 > > One of these should be -1. > > {PARTITIONEDRELPRUNEINFO > :rtindex 2 > :present_parts (b) > :nparts 2 > :subplan_map -1 -1 > :subpart_map -1 -1 > > subplan_map values are not correct, because subpaths list that would > have been passed would not include paths of lower-level partitions as > the flattening didn't occur. It's not great that we're generating that, but as far as I can see, it's not going to cause any misbehaviour. It'll cause a small slowdown in run-time pruning due to perhaps having to perform an additional call to find_matching_subplans_recurse() during execution. In this case, it'll never find any subnodes that match due to both maps having all -1 element values. Since f2343653f5, we're not using partitioned_rels for anything else, so we should likely fix this so that we don't add the item to partitioned_rels when we don't pullup the sub-Append. I think we should hold off on fixing that until we decide if any adjustments need to be made to the sub-Append pullup code. David
On Fri, 17 Apr 2020 at 19:08, Amit Langote <amitlangote09@gmail.com> wrote: > > On Wed, Apr 15, 2020 at 4:18 PM David Rowley <dgrowleyml@gmail.com> wrote: > > I'm a bit divided on what the correct fix is. If I blame Parallel > > Append for not trying hard enough to pull up the lower Append in > > accumulate_append_subpath(), then clearly the parallel append code is > > to blame. > > I spent some time trying to understand how Append parallelism works > and I am tempted to agree with you that there might be problems with > how accumulate_append_subpath()'s interacts with parallelism. Maybe it > would be better to disregard a non-parallel-aware partial Append if it > requires us to fail on flattening a child Append. I have as attached > a PoC fix to show that. While a nested Append is not really a problem > in general, it appears to me that our run-time code is not in position > to work correctly with them, or at least not with how things stand > today... Thanks for taking a look at this. I've now looked at this in more detail and below is my understanding of what's going on: It seems, in this case, what's going on is, on the following line: accumulate_append_subpath(cheapest_partial_path, &partial_subpaths, NULL); we don't manage to pullup the sub-Append due to passing a NULL pointer for the final special_subpaths argument. This results in just taking the child's Append path verbatim. i.e. nested Append Later, when we do: else if (nppath == NULL || (cheapest_partial_path != NULL && cheapest_partial_path->total_cost < nppath->total_cost)) { /* Partial path is cheaper or the only option. */ Assert(cheapest_partial_path != NULL); accumulate_append_subpath(cheapest_partial_path, &pa_partial_subpaths, &pa_nonpartial_subpaths); we do pass a non-NULL special_subpaths argument to allow the sub-Append to be pulled up. So, now we have 2 paths, one with a nested Append and one with a flattened Append. Both paths have the same cost, but due to the fact that we call add_partial_path() for the nested Append version first, the logic in add_partial_path() accepts that path. However, the subsequent call of add_partial_path(), the one for the non-nested Append, that path is rejected due to the total cost being too similar to one of the existing partial path. We just end up keeping the nested Append as the cheapest partial path... That path, since in the example case only has a single subpath, is pulled up into the main append by the logic added in 8edd0e794. I think you've realised this and that's why your PoC patch just rejected the first path when it's unable to do the pullup. We'll get a better path later when we allow mixed partial and non-partial paths. (We'll never fail to do a pullup when calling accumulate_append_subpath() for "nppath", since that's a non-parallel path and accumulate_append_subpath() will always pull Append paths up when they're not parallel aware.) I wonder if the fix should be more something along the lines of trying to merge things do we only generate a single partial path. That way we wouldn't be at the mercy of the logic in add_partial_path() to accept or reject the path based on the order the paths are added. David
Hi David, On Tue, Apr 21, 2020 at 12:03 PM David Rowley <dgrowleyml@gmail.com> wrote: > On Fri, 17 Apr 2020 at 19:08, Amit Langote <amitlangote09@gmail.com> wrote: > > On Wed, Apr 15, 2020 at 4:18 PM David Rowley <dgrowleyml@gmail.com> wrote: > > > I'm a bit divided on what the correct fix is. If I blame Parallel > > > Append for not trying hard enough to pull up the lower Append in > > > accumulate_append_subpath(), then clearly the parallel append code is > > > to blame. > > > > I spent some time trying to understand how Append parallelism works > > and I am tempted to agree with you that there might be problems with > > how accumulate_append_subpath()'s interacts with parallelism. Maybe it > > would be better to disregard a non-parallel-aware partial Append if it > > requires us to fail on flattening a child Append. I have as attached > > a PoC fix to show that. While a nested Append is not really a problem > > in general, it appears to me that our run-time code is not in position > > to work correctly with them, or at least not with how things stand > > today... > > Thanks for taking a look at this. I've now looked at this in more > detail and below is my understanding of what's going on: > > It seems, in this case, what's going on is, on the following line: > > accumulate_append_subpath(cheapest_partial_path, > &partial_subpaths, NULL); > > we don't manage to pullup the sub-Append due to passing a NULL pointer > for the final special_subpaths argument. This results in just taking > the child's Append path verbatim. i.e. nested Append > > Later, when we do: > > else if (nppath == NULL || > (cheapest_partial_path != NULL && > cheapest_partial_path->total_cost < nppath->total_cost)) > { > /* Partial path is cheaper or the only option. */ > Assert(cheapest_partial_path != NULL); > accumulate_append_subpath(cheapest_partial_path, > &pa_partial_subpaths, > &pa_nonpartial_subpaths); > > we do pass a non-NULL special_subpaths argument to allow the > sub-Append to be pulled up. > > So, now we have 2 paths, one with a nested Append and one with a > flattened Append. Both paths have the same cost, but due to the fact > that we call add_partial_path() for the nested Append version first, > the logic in add_partial_path() accepts that path. However, the > subsequent call of add_partial_path(), the one for the non-nested > Append, that path is rejected due to the total cost being too similar > to one of the existing partial path. We just end up keeping the nested > Append as the cheapest partial path... That path, since in the example > case only has a single subpath, is pulled up into the main append by > the logic added in 8edd0e794. > > I think you've realised this and that's why your PoC patch just > rejected the first path when it's unable to do the pullup. Right. > We'll get a > better path later when we allow mixed partial and non-partial paths. Yes, but only if parallel-aware Append is allowed (pa_subpaths_valid). So it's possible that the Append may not participate in any parallelism whatsoever if we reject partial Append on failing to fold a child Append, which does somewhat suck. > (We'll never fail to do a pullup when calling > accumulate_append_subpath() for "nppath", since that's a non-parallel > path and accumulate_append_subpath() will always pull Append paths up > when they're not parallel aware.) > > I wonder if the fix should be more something along the lines of trying > to merge things do we only generate a single partial path. That way > we wouldn't be at the mercy of the logic in add_partial_path() to > accept or reject the path based on the order the paths are added. So as things stand, parallel-aware partial Append (Parallel Append) path competes with non-parallel partial Append path on cost grounds. As far as I can see, it's only the latter that can contain among its subpaths an (nested) Append which can be problematic. Given that, out choice between the two types of partial Append paths becomes based on something that is not cost, but is that okay? -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
On Tue, 21 Apr 2020 at 15:03, David Rowley <dgrowleyml@gmail.com> wrote: > I wonder if the fix should be more something along the lines of trying > to merge things do we only generate a single partial path. That way > we wouldn't be at the mercy of the logic in add_partial_path() to > accept or reject the path based on the order the paths are added. I took a shot at doing things this way. First, I'll recap on the problem this is trying to solve: add_paths_to_append_rel() attempts to create two separate partial Append paths. I'll describe both of these below: Path1: This path is generated regardless of if Parallel Append is enabled and contains all the cheapest partial paths from each child relation. If parallel append is enabled this will become a Parallel Append. If it's not then a non-parallel append will be created containing the list of partial subpaths. Here's an example from select_parallel.out: QUERY PLAN -------------------------------------------------------------- Finalize Aggregate -> Gather Workers Planned: 1 -> Partial Aggregate -> Append -> Parallel Seq Scan on a_star a_star_1 -> Parallel Seq Scan on b_star a_star_2 -> Parallel Seq Scan on c_star a_star_3 -> Parallel Seq Scan on d_star a_star_4 -> Parallel Seq Scan on e_star a_star_5 -> Parallel Seq Scan on f_star a_star_6 Path2: We only ever consider this one when enable_parallel_append == true and the append rel's consider_parallel == true. When this path is generated, it'll always be for a Parallel Append. This path may contain a mix of partial paths for child rels and parallel_safe child paths, of which will only be visited by a single worker. The problem is that path1 does not pullup child Appends when the child append path contains a mix of partial and parallel safe paths (i.e a sub-path2, per above). Since we create path2 in addition to path1, the costs come out the same even though path1 couldn't pullup the sub-Append paths. Unfortunately, the costs are the same so path1 is prefered since it's added first. add_partial_path() just rejects path2 based on it being too similar in cost to the existing path1. In the attached, I'm trying to solve this by only created 1 partial Append path in the first place. This path will always try to use the cheapest partial path, or the cheapest parallel safe path, if parallel append is allowed and that path is cheaper than the cheapest partial path. I believe the attached gives us what we want and additionally, since it should now always pullup the sub-Appends, then there's no need to consider adjusting partitioned_rels based on if the pull-up occurred or not. Those should now be right in all cases. This should also fix the run-time pruning issue too since in my original test case it'll pullup the sub-Append which means that the code added in 8edd0e794 is no longer going to do anything with it as the top-level Append will never contain just 1 subpath. I'm reasonably certain that this is correct, but I did find it a bit mind-bending considering all the possible cases, so it could do with some more eyes on it. I've not really done a final polish of the comments yet. I'll do that if the patch is looking promising. The output of the original test with the attached is as follows: postgres=# explain (analyze on, costs off, timing off, summary off) postgres-# select * from list where a = (select 1) and b > 0; QUERY PLAN --------------------------------------------------------------------------- Gather (actual rows=0 loops=1) Workers Planned: 2 Params Evaluated: $0 Workers Launched: 2 InitPlan 1 (returns $0) -> Result (actual rows=1 loops=1) -> Parallel Append (actual rows=0 loops=3) -> Parallel Seq Scan on list_12_2 list_2 (never executed) Filter: ((b > 0) AND (a = $0)) -> Parallel Seq Scan on list_12_1 list_1 (actual rows=0 loops=1) Filter: ((b > 0) AND (a = $0)) (11 rows) postgres=# -- force the 2nd subnode of the Append to be non-parallel. postgres=# alter table list_12_1 set (parallel_workers=0); ALTER TABLE postgres=# explain (analyze on, costs off, timing off, summary off) postgres-# select * from list where a = (select 1) and b > 0; QUERY PLAN -------------------------------------------------------------------- Gather (actual rows=0 loops=1) Workers Planned: 2 Params Evaluated: $0 Workers Launched: 2 InitPlan 1 (returns $0) -> Result (actual rows=1 loops=1) -> Parallel Append (actual rows=0 loops=3) -> Seq Scan on list_12_1 list_1 (actual rows=0 loops=1) Filter: ((b > 0) AND (a = $0)) -> Parallel Seq Scan on list_12_2 list_2 (never executed) Filter: ((b > 0) AND (a = $0)) (11 rows) Notice we get "(never executed)" in both cases. David
Attachment
On Wed, Apr 22, 2020 at 12:22 PM David Rowley <dgrowleyml@gmail.com> wrote: > On Tue, 21 Apr 2020 at 15:03, David Rowley <dgrowleyml@gmail.com> wrote: > > I wonder if the fix should be more something along the lines of trying > > to merge things do we only generate a single partial path. That way > > we wouldn't be at the mercy of the logic in add_partial_path() to > > accept or reject the path based on the order the paths are added. > > In the attached, I'm trying to solve this by only created 1 partial > Append path in the first place. This path will always try to use the > cheapest partial path, or the cheapest parallel safe path, if parallel > append is allowed and that path is cheaper than the cheapest partial > path. > > I believe the attached gives us what we want and additionally, since > it should now always pullup the sub-Appends, then there's no need to > consider adjusting partitioned_rels based on if the pull-up occurred > or not. Those should now be right in all cases. This should also fix > the run-time pruning issue too since in my original test case it'll > pullup the sub-Append which means that the code added in 8edd0e794 is > no longer going to do anything with it as the top-level Append will > never contain just 1 subpath. > > I'm reasonably certain that this is correct, but I did find it a bit > mind-bending considering all the possible cases, so it could do with > some more eyes on it. I've not really done a final polish of the > comments yet. I'll do that if the patch is looking promising. Thanks for the patch. It's good to see that unfolded sub-Appends will not occur with the new code structure or one hopes. Also, I am finding it somewhat easier to understand how partial Appends get built due to smaller code footprint after patching. One thing I remain concerned about is that it appears like we are no longer leaving the choice between parallel and non-parallel Append to the cost machinery which is currently the case. AFAICS with patched, as long as parallel Append is enabled and allowed, it will be chosen over a non-parallel Append as the partial path. Regarding the patch, I had been assuming that the "pa" in pa_subpaths_valid stands for "parallel append", so it using the variable as is in the new code structure would be misleading. Maybe, parallel_subpaths_valid? -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
On Thu, 23 Apr 2020 at 02:37, Amit Langote <amitlangote09@gmail.com> wrote: > One thing I remain concerned about is that it appears like we are no > longer leaving the choice between parallel and non-parallel Append to > the cost machinery which is currently the case. AFAICS with patched, > as long as parallel Append is enabled and allowed, it will be chosen > over a non-parallel Append as the partial path. Given the same set of paths, when would a non-parallel append be cheaper than a parallel one? I don't see anything in cost_append() that could cause the costs to come out higher for the parallel version. However, I might have misunderstood something. Can you give an example of a case that you think might change? The cost comparison is still there for the cheapest parallel safe normal path vs the cheapest partial path, so, when each of those paths are allowed, then we will still end up with the cheapest paths for each child. > Regarding the patch, I had been assuming that the "pa" in > pa_subpaths_valid stands for "parallel append", so it using the > variable as is in the new code structure would be misleading. Maybe, > parallel_subpaths_valid? Yeah, I had wondered if it would be better to rename it, I'd just not thought too hard on what to call it yet. David
David Rowley <dgrowleyml@gmail.com> writes: > Given the same set of paths, when would a non-parallel append be > cheaper than a parallel one? Well, anytime the parallel startup cost is significant, for starters. But maybe we account for that at some other point, like when building the Gather? regards, tom lane
On Thu, 23 Apr 2020 at 11:11, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > > Given the same set of paths, when would a non-parallel append be > > cheaper than a parallel one? > > Well, anytime the parallel startup cost is significant, for starters. > But maybe we account for that at some other point, like when building > the Gather? Yeah. There's no mention of parallel_setup_cost or parallel_tuple_cost in any of the Append costing code. Those are only applied when we cost Gather / GatherMerge At the point Amit and I are talking about, we're only comparing two Append paths. No Gather/GatherMerge in sight yet, so any additional costs from those is not applicable. If there was some reason that a Parallel Append could come out more expensive, then maybe we could just create a non-parallel Append using the same subpath list and add_partial_path() it. I just don't quite see how that would ever win though. I'm willing to be proven wrong though. David
David Rowley <dgrowleyml@gmail.com> writes: > On Thu, 23 Apr 2020 at 11:11, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Well, anytime the parallel startup cost is significant, for starters. >> But maybe we account for that at some other point, like when building >> the Gather? > Yeah. There's no mention of parallel_setup_cost or parallel_tuple_cost > in any of the Append costing code. Those are only applied when we cost > Gather / GatherMerge At the point Amit and I are talking about, we're > only comparing two Append paths. No Gather/GatherMerge in sight yet, > so any additional costs from those is not applicable. Right, so really the costs of partial and non-partial paths are not commensurable, and comparing them directly is just misleading. I trust we're not throwing away non-partial paths on that basis? regards, tom lane
On Thu, 23 Apr 2020 at 11:39, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > > On Thu, 23 Apr 2020 at 11:11, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Well, anytime the parallel startup cost is significant, for starters. > >> But maybe we account for that at some other point, like when building > >> the Gather? > > > Yeah. There's no mention of parallel_setup_cost or parallel_tuple_cost > > in any of the Append costing code. Those are only applied when we cost > > Gather / GatherMerge At the point Amit and I are talking about, we're > > only comparing two Append paths. No Gather/GatherMerge in sight yet, > > so any additional costs from those is not applicable. > > Right, so really the costs of partial and non-partial paths are not > commensurable, and comparing them directly is just misleading. > I trust we're not throwing away non-partial paths on that basis? There is a case in both master and in the patch where we compare the cost of the cheapest path in partial_pathlist. However, in this case, the pathlist path will be used in an Append or Parallel Append with a Gather below it, so those parallel_(setup|tuple)_costs will be applied regardless. The non-parallel Append in this case still requires a Gather since it still is using multiple workers to execute the subpaths. e.g the plan I posted in [1]. The code comparing the path costs is: else if (nppath == NULL || (cheapest_partial_path != NULL && cheapest_partial_path->total_cost < nppath->total_cost)) [1] https://www.postgresql.org/message-id/CAApHDvqcOD3ObPgPAeU+3qyFL_wzE5kmczw70qMAh7qJ-3wuzw@mail.gmail.com
On Wed, Apr 22, 2020 at 7:36 PM David Rowley <dgrowleyml@gmail.com> wrote: > If there was some reason that a Parallel Append could come out more > expensive, then maybe we could just create a non-parallel Append using > the same subpath list and add_partial_path() it. I just don't quite > see how that would ever win though. I'm willing to be proven wrong > though. I think you're talking about the thing that this comment is trying to explain: /* * Consider a parallel-aware append using a mix of partial and non-partial * paths. (This only makes sense if there's at least one child which has * a non-partial path that is substantially cheaper than any partial path; * otherwise, we should use the append path added in the previous step.) */ Like, suppose there are two relations A and B, and we're appending them. A has no indexes, so we can only choose between a Seq Scan and an Index Scan. B has a GIST index that is well-suited to the query, but GIST indexes don't support parallel scans. So we've got three choices: 1. Don't use parallelism at all. Then, we can do a normal Append with a Seq Scan on a and an Index Scan on b. ("If we found unparameterized paths for all children, build an unordered, unparameterized Append path for the rel.") 2. Use parallelism for both relations. Then, we can do a Gather over a Parallel Append (or a regular Append, if Parallel Append is disabled) with a Parallel Seq Scan on a and a Parallel Seq Scan on b. As compared with #1, this should win for a, but it might lose heavily for b, because switching from an index scan to a Seq Scan could be a big loser. ("Consider an append of unordered, unparameterized partial paths. Make it parallel-aware if possible.") 3. Use parallelism for a but not for b. The only way to do this is a Parallel Append, because there's no other way to mix partial and non-partial paths at present. This lets us get the benefit of a Parallel Seq Scan on a while still being able to do a non-parallel GIST Index Scan on b. This has a good chance of being better than #2, but it's fundamentally a costing decision, because if the table is small enough or the query isn't very selective, #2 will actually be faster, just on the raw power of more workers and less random I/O ("Consider a parallel-aware append using a mix of partial and non-partial paths.") It seems to me that all three strategies are viable. The third one is much less likely to be used now that we have parallel index scans for btree and parallel bitmap heap scans, but I believe it can be a winner if you have the right case. You want to think about cases where there are parallel plans available for everything in the tree, but at least some of the children have much better non-parallel plans. Note that for strategy #2 we always prefer Parallel Append to non-Parallel Append on the optimistic assumption that Parallel Append will always be better; we only use regular Append if Parallel Append is disabled. But for strategy #3 there is no such choice to be made: a regular Append would not be valid. If placed under a Gather, it would execute the non-partial paths more than once; if not placed under a Gather, we'd have partial paths without any Gather above them, which is an invalid plan shape. So you can't "just use a regular Append" in case #3. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, 23 Apr 2020 at 14:37, Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Apr 22, 2020 at 7:36 PM David Rowley <dgrowleyml@gmail.com> wrote: > > If there was some reason that a Parallel Append could come out more > > expensive, then maybe we could just create a non-parallel Append using > > the same subpath list and add_partial_path() it. I just don't quite > > see how that would ever win though. I'm willing to be proven wrong > > though. > > I think you're talking about the thing that this comment is trying to explain: > > /* > * Consider a parallel-aware append using a mix of partial and non-partial > * paths. (This only makes sense if there's at least one child which has > * a non-partial path that is substantially cheaper than any partial path; > * otherwise, we should use the append path added in the previous step.) > */ > > Like, suppose there are two relations A and B, and we're appending > them. A has no indexes, so we can only choose between a Seq Scan and > an Index Scan. B has a GIST index that is well-suited to the query, > but GIST indexes don't support parallel scans. So we've got three > choices: > > 1. Don't use parallelism at all. Then, we can do a normal Append with > a Seq Scan on a and an Index Scan on b. ("If we found unparameterized > paths for all children, build an unordered, unparameterized Append > path for the rel.") > > 2. Use parallelism for both relations. Then, we can do a Gather over a > Parallel Append (or a regular Append, if Parallel Append is disabled) > with a Parallel Seq Scan on a and a Parallel Seq Scan on b. As > compared with #1, this should win for a, but it might lose heavily for > b, because switching from an index scan to a Seq Scan could be a big > loser. ("Consider an append of unordered, unparameterized partial > paths. Make it parallel-aware if possible.") > > 3. Use parallelism for a but not for b. The only way to do this is a > Parallel Append, because there's no other way to mix partial and > non-partial paths at present. This lets us get the benefit of a > Parallel Seq Scan on a while still being able to do a non-parallel > GIST Index Scan on b. This has a good chance of being better than #2, > but it's fundamentally a costing decision, because if the table is > small enough or the query isn't very selective, #2 will actually be > faster, just on the raw power of more workers and less random I/O > ("Consider a parallel-aware append using a mix of partial and > non-partial paths.") Thanks for those examples. I ran this situation through the code but used a hash index instead of GIST. The 3 settings which give us control over this plan are enable_parallel_append, enable_indexscan, enable_bitmapscan. enable_bitmapscan must be included since we can still get a parallel bitmap scan with a hash index. For completeness, I just tried with each of the 8 combinations of the GUCs, but I'd detailed below which of your cases I'm testing as a comment. There are 4 cases since #2 works with parallel and non-parallel Append. Naturally, the aim is that the patched version does not change the behaviour. -- Test case create table listp (a int, b int) partition by list(a); create table listp1 partition of listp for values in(1); create table listp2 partition of listp for values in(2); insert into listp select x,y from generate_Series(1,2) x, generate_Series(1,1000000) y; create index on listp2 using hash(b); vacuum analyze listp; explain (costs off) select * from listp where b = 1; SET enable_indexscan = off; explain (costs off) select * from listp where b = 1; SET enable_indexscan = on; SET enable_bitmapscan = off; explain (costs off) select * from listp where b = 1; -- case #3, Mixed scan of parallel and non-parallel paths with a Parallel Append SET enable_indexscan = off; explain (costs off) select * from listp where b = 1; -- case #2 with Parallel Append SET enable_indexscan = on; SET enable_bitmapscan = on; SET enable_parallel_append = off; explain (costs off) select * from listp where b = 1; SET enable_indexscan = off; explain (costs off) select * from listp where b = 1; -- case #2 with non-Parallel Append SET enable_indexscan = on; SET enable_bitmapscan = off; explain (costs off) select * from listp where b = 1; -- case #1, best serial plan SET enable_indexscan = off; explain (costs off) select * from listp where b = 1; The results, patched/unpatched, are the same. David
On Thu, 23 Apr 2020 at 02:37, Amit Langote <amitlangote09@gmail.com> wrote: > Regarding the patch, I had been assuming that the "pa" in > pa_subpaths_valid stands for "parallel append", so it using the > variable as is in the new code structure would be misleading. Maybe, > parallel_subpaths_valid? I started making another pass over this patch again. I did the renaming you've mentioned plus the other two List variables for the subpaths. I did notice that I had forgotten to properly disable parallel append when it was disallowed by the rel's consider_parallel flag. For us to have done something wrong, we'd have needed a parallel safe path in the pathlist and also have needed the rel to be consider_parallel == false. It was easy enough to make up a case for that by sticking a parallel restrict function in the target list. I've fixed that issue in the attached and also polished up the comments a little and removed the unused variable that I had forgotten about. For now. I'd still like to get a bit more confidence that the only noticeable change in the outcome here is that we're now pulling up sub-Appends in all cases. I've read the code a number of times and just can't quite see any room for anything changing. My tests per Robert's case all matched what the previous version did, but I'm still only about 93% on this. Given that I'm aiming to fix a bug in master, v11 and v12 here, I need to get that confidence level up to about the 100% mark. I've attached the v2 patch. David
Attachment
On Thu, Apr 23, 2020 at 7:37 PM David Rowley <dgrowleyml@gmail.com> wrote: > On Thu, 23 Apr 2020 at 02:37, Amit Langote <amitlangote09@gmail.com> wrote: > > Regarding the patch, I had been assuming that the "pa" in > > pa_subpaths_valid stands for "parallel append", so it using the > > variable as is in the new code structure would be misleading. Maybe, > > parallel_subpaths_valid? > > I started making another pass over this patch again. I did the > renaming you've mentioned plus the other two List variables for the > subpaths. > > I did notice that I had forgotten to properly disable parallel append > when it was disallowed by the rel's consider_parallel flag. For us to > have done something wrong, we'd have needed a parallel safe path in > the pathlist and also have needed the rel to be consider_parallel == > false. It was easy enough to make up a case for that by sticking a > parallel restrict function in the target list. I've fixed that issue > in the attached and also polished up the comments a little and removed > the unused variable that I had forgotten about. Thanks for updating the patch. > For now. I'd still like to get a bit more confidence that the only > noticeable change in the outcome here is that we're now pulling up > sub-Appends in all cases. Yeah, I would have liked us to have enough confidence to remove the following text in the header comment of accumulate_append_subpath: ... If the latter is * NULL, we don't flatten the path at all (unless it contains only partial * paths). > I've read the code a number of times and > just can't quite see any room for anything changing. My tests per > Robert's case all matched what the previous version did, but I'm still > only about 93% on this. Given that I'm aiming to fix a bug in master, > v11 and v12 here, I need to get that confidence level up to about the > 100% mark. I think I have managed to convince myself (still < 100% though) that it's not all that bad that we won't be leaving it up to add_partial_path() to decide between a Parallel Append whose subpaths set consist only of partial paths and another whose subpaths set consists of a mix of partial and non-partial paths. That's because we will be building only the latter if that one looks cheaper to begin with. If Parallel Append is disabled, there could only ever be one partial Append path for add_partial_path() to consider, one whose subpaths set consists only of partial paths. On the patch itself: + * We also build a set of paths for each child by trying to use the + * cheapest partial path, or the cheapest parallel safe normal path + * either when that is cheaper, or if parallel append is not allowed. */ - if (pa_subpaths_valid) + if (parallel_subpaths_valid) { In the comment above ", or parallel append is not allowed" should be " provided parallel append is allowed". Or how about writing it as follows: /* * Add the child's cheapest partial path, or if parallel append is * allowed, its cheapest parallel safe normal path if that one is * cheaper, to the partial Append path we are constructing for the * parent. */ -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
On Thu, 23 Apr 2020 at 22:37, David Rowley <dgrowleyml@gmail.com> wrote: > For now. I'd still like to get a bit more confidence that the only > noticeable change in the outcome here is that we're now pulling up > sub-Appends in all cases. I've read the code a number of times and > just can't quite see any room for anything changing. My tests per > Robert's case all matched what the previous version did, but I'm still > only about 93% on this. Given that I'm aiming to fix a bug in master, > v11 and v12 here, I need to get that confidence level up to about the > 100% mark. I looked at this again and I don't think what I've got is right. The situation that I'm concerned about is that we generate a Parallel Append path for a sub-partitioned table and that path has a mix of partial and parallel safe paths. When we perform add_paths_to_append_rel() for the top-level partition, if for some reason we didn't do a Parallel Append, then we could pullup the mixed set of partial and parallel safe paths from the child-Append. We can't execute a parallel_safe subpath under a normal Append that's below a Gather as multiple workers could execute the same parallel safe scan, which won't lead to anything good, probably wrong results at best. Now, we'll only ever have a Parallel Append for the sub-partitioned table if enable_parallel_append is on and the rel's consider_parallel is switched on too, but if for some reason the top-level partitioned table had consider_parallel set to off, then we could end up pulling up the subpaths from a Parallel Append path, which could contain a mix of partial and parallel safe paths and we'd then throw those into a non-Parallel Append partial path! The parallel safe path just can't go in one of those as multiple workers could then execute that path. Only Parallel Append knows how to execute those safely as it ensures only 1 worker touches it. Now, looking at the code today, it does seem that a child rel will do parallel only if the parent rel does, but I don't think it's a great idea to assume that will never change. Additionally, it seems fragile to also assume the value of the enable_parallel_append, a global variable would stay the same between calls too. I wonder if we could fix this by when we call add_paths_to_append_rel() for the top-level partitioned table, just recursively get all the child rels and their children too. accumulate_append_subpath() would then never see Appends or MergeAppends as we'd just be dealing with scan type Paths. add_paths_to_append_rel() for sub-partitioned tables wouldn't have to do very much then. I'll need to see how that idea would fit in with partition-wise joins. My guess is, not very well. There's also apply_scanjoin_target_to_paths() which calls add_paths_to_append_rel() too. The only other idea I have so far is just to either not generate the partial path using the partial_subpaths list in add_paths_to_append_rel() when pa_subpaths_valid and pa_nonpartial_subpaths != NIL. i.e only create the first of those partial paths if we're not going to create the 2nd one. Or even just swap the order that we add_partial_path() them so that add_partial_path() does not reject the path with the flattened Appends. David
On Sat, Apr 25, 2020 at 7:25 AM David Rowley <dgrowleyml@gmail.com> wrote: > I looked at this again and I don't think what I've got is right. Apart from the considerations which you raised, I think there's also a costing issue here. For instance, suppose we have an Append with three children. Two of them have partial paths which will complete after consuming 1 second of CPU time, and using a partial path is unquestionably the best strategy for those children. The third one has a partial path which will complete after using 10 seconds of CPU time and a non-partial path which will complete after using 8 seconds of CPU time. What strategy is best? If we pick the non-partial path, the time that the whole thing takes to complete will be limited by the non-partial path, which, since it cannot use parallelism, will take 8 seconds of wall clock time. If we pick the partial path, we have a total of 12 seconds of CPU time which we can finish in 6 wall clock seconds with 2 participants, 4 seconds with 3 participants, or 3 seconds with 4 participants. This is clearly a lot better. Incidentally, the planner knows about the fact that you can't finish an Append faster than you can finish its non-partial participants. See append_nonpartial_cost(). Now, I don't think anything necessarily gets broken here by your patch as written. Planner decisions are made on estimated cost, which is a proxy for wall clock time, not CPU time. Right now, we only prefer the non-partial path if we think it can be executed in less time by ONE worker than the partial path could be executed by ALL workers: /* * Either we've got only a non-partial path, or we think that * a single backend can execute the best non-partial path * faster than all the parallel backends working together can * execute the best partial path. So, in the above example, we wouldn't even consider the non-partial path. Even say we just have 2 workers. The partial path will have a cost of 6, which is less than 8, so it gets rejected. But as the comment goes on to note: * It might make sense to be more aggressive here. Even if * the best non-partial path is more expensive than the best * partial path, it could still be better to choose the * non-partial path if there are several such paths that can * be given to different workers. For now, we don't try to * figure that out. This kind of strategy is particularly appealing when the number of Append children is large compared to the number of workers. For instance, if you have an Append with 100 children and you are planning with 4 workers, it's probably a pretty good idea to be very aggressive about picking the path that uses the least resources, which the current algorithm would not do. You're unlikely to end up with idle workers, because you have so many plans to which they can be allocated. However, it's possible: it could be that there's one child which is way bigger than all the others and the really important thing is to get a partial path for that child, so that it can be attacked in parallel, even if that means that overall resource consumption is higher. As the number of children decreases relative to the number of workers, having partial paths becomes increasingly appealing. To take a degenerate case, suppose you have 4 workers but only 2 children. Partial paths should look really appealing, because the alternative is leaving workers unused. I *think* your patch is based around the idea of merging the turn-the-append-of-partial-paths-into-a-parallel-append case with the consider-a-mix-of-partial-and-nonpartial-paths-for-parallel-append case. That seems possible to do given that the current heuristic is to compare the raw path costs, but I think that it wouldn't work if we wanted to be more aggressive about considering the mixed strategy and letting the costing machinery sort out which way is better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert forwarded me a link to an email sent to -general list, where the reported problem seems to be the same problem that was being discussed here. https://www.postgresql.org/message-id/flat/d0f6d811-8946-eb9f-68e2-1a8a7f80ff21%40a-kretschmer.de Going over the last few emails, it seems that David held off from committing the patch, because of the lack of confidence in its robustness. With the patch, a sub-partitioned child's partial Append's child paths will *always* be pulled up into the parent's set of partial child paths thus preventing the nesting of Appends, which the run-time pruning code can't currently cope with. The lack of confidence seems to be due to the fact that always pulling up a sub-Append's child paths into the parent partial Append's child paths *may* cause the latter to behave wrongly under parallelism and the new code structure will prevent add_partial_path() from being the arbitrator of whether such a path is really the best in terms of cost. If we can't be confident in that approach, maybe we should consider making the run-time pruning code cope with nested Appends? -- Amit Langote EDB: http://www.enterprisedb.com
On Fri, 16 Oct 2020 at 03:01, Amit Langote <amitlangote09@gmail.com> wrote: > > Going over the last few emails, it seems that David held off from > committing the patch, because of the lack of confidence in its > robustness. With the patch, a sub-partitioned child's partial > Append's child paths will *always* be pulled up into the parent's set > of partial child paths thus preventing the nesting of Appends, which > the run-time pruning code can't currently cope with. The lack of > confidence seems to be due to the fact that always pulling up a > sub-Append's child paths into the parent partial Append's child paths > *may* cause the latter to behave wrongly under parallelism and the new > code structure will prevent add_partial_path() from being the > arbitrator of whether such a path is really the best in terms of cost. > > If we can't be confident in that approach, maybe we should consider > making the run-time pruning code cope with nested Appends? I've been thinking about this and my thoughts are: There are other cases where we don't pullup sub-Merge Append nodes anyway, so I think we should just make run-time pruning work for more than just the top-level Append/Merge Append. The case I'm thinking about is the code added in 959d00e9dbe for ordered Append scans. It's possible a sub-partitioned table has partitions which don't participate in the same ordering. We need to keep the sub-Merge Append in those cases... well, at least when there's more than 1 partition remaining after plan-time pruning. I've attached the patch which, pending a final look, I'm proposing to commit for master only. I just don't quite think this is a bug fix, and certainly, due to the invasiveness of the proposed patch, that means no backpatch. I fixed all the stuff about the incorrectly set partitioned_rels list. What I ended up with there is making it accumulate_append_subpath's job to also pullup the sub-paths partitioned_rels fields when pulling up a nested Append/MergeAppend. If there's no pullup, there then we don't care about the sub-path's partitioned_rels. That's for it to deal with. With that, I think that run-time pruning will only get the RT indexes for partitions that we actually have sub-paths for. That's pretty good, so I added an Assert() to verify that in make_partitionedrel_pruneinfo(). (I hope I don't regret that later) This does mean we need to maintain a different partitioned_rels list for each Append path we consider. So there's a number (six) of these now between add_paths_to_append_rel() and generate_orderedappend_paths(). To try to minimise the impact of that, I've changed the field so that instead of being a List of IntLists, it's just a List of Relids. The top-level list just contains a single element until you get a UNION ALL that selects from a partitioned table on each side of the union. Merging sub-path partitioned rel RT indexes into the existing element is now pretty cheap as it just uses bms_add_members() rather the list_concat we'd have had to have used if it was still a List of IntLists. After fixing up how partitioned_rels is built, I saw there were no usages of RelOptInfo.partitioned_child_rels, so I got rid of it. I did another couple of little cleanups and wrote some regression tests to test all this. Overall I'm fairly happy with this, especially getting rid of a partitioned table related field from RelOptInfo. David
Attachment
On Sun, Oct 25, 2020 at 10:06 AM David Rowley <dgrowleyml@gmail.com> wrote: > On Fri, 16 Oct 2020 at 03:01, Amit Langote <amitlangote09@gmail.com> wrote: > > > > Going over the last few emails, it seems that David held off from > > committing the patch, because of the lack of confidence in its > > robustness. With the patch, a sub-partitioned child's partial > > Append's child paths will *always* be pulled up into the parent's set > > of partial child paths thus preventing the nesting of Appends, which > > the run-time pruning code can't currently cope with. The lack of > > confidence seems to be due to the fact that always pulling up a > > sub-Append's child paths into the parent partial Append's child paths > > *may* cause the latter to behave wrongly under parallelism and the new > > code structure will prevent add_partial_path() from being the > > arbitrator of whether such a path is really the best in terms of cost. > > > > If we can't be confident in that approach, maybe we should consider > > making the run-time pruning code cope with nested Appends? > > I've been thinking about this and my thoughts are: Thanks for working on this. > There are other cases where we don't pullup sub-Merge Append nodes > anyway, so I think we should just make run-time pruning work for more > than just the top-level Append/Merge Append. > > The case I'm thinking about is the code added in 959d00e9dbe for > ordered Append scans. It's possible a sub-partitioned table has > partitions which don't participate in the same ordering. We need to > keep the sub-Merge Append in those cases... well, at least when > there's more than 1 partition remaining after plan-time pruning. Ah, I guess that case would also likewise fail to use runtime pruning properly. > I've attached the patch which, pending a final look, I'm proposing to > commit for master only. I just don't quite think this is a bug fix, > and certainly, due to the invasiveness of the proposed patch, that > means no backpatch. > > I fixed all the stuff about the incorrectly set partitioned_rels list. > What I ended up with there is making it accumulate_append_subpath's > job to also pullup the sub-paths partitioned_rels fields when pulling > up a nested Append/MergeAppend. If there's no pullup, there then we > don't care about the sub-path's partitioned_rels. That's for it to > deal with. With that, I think that run-time pruning will only get the > RT indexes for partitions that we actually have sub-paths for. That's > pretty good, so I added an Assert() to verify that in > make_partitionedrel_pruneinfo(). (I hope I don't regret that later) So AIUI, the fix here is to make any given Append/MergeAppend's part_prune_info only contain PartitionedRelPruneInfos of the (sub-) partitioned tables whose partitions' subplans are directly in appendplan/mergeplans, such that the partition indexes can be mapped to the subplan indexes. That does imply present_parts must be non-empty, so the Assert seems warranted. > This does mean we need to maintain a different partitioned_rels list > for each Append path we consider. So there's a number (six) of these > now between add_paths_to_append_rel() and > generate_orderedappend_paths(). To try to minimise the impact of > that, I've changed the field so that instead of being a List of > IntLists, it's just a List of Relids. The top-level list just > contains a single element until you get a UNION ALL that selects from > a partitioned table on each side of the union. Merging sub-path > partitioned rel RT indexes into the existing element is now pretty > cheap as it just uses bms_add_members() rather the list_concat we'd > have had to have used if it was still a List of IntLists. The refactoring seemed complicated on a first look, but overall looks good. > After fixing up how partitioned_rels is built, I saw there were no > usages of RelOptInfo.partitioned_child_rels, so I got rid of it. +1 > I did another couple of little cleanups and wrote some regression > tests to test all this. > > Overall I'm fairly happy with this, especially getting rid of a > partitioned table related field from RelOptInfo. Some comments: + * For partitioned tables, we accumulate a list of the partitioned RT + * indexes for the subpaths that are directly under this Append. Maybe: For partitioned tables, accumulate a list of the RT indexes of partitioned tables in the tree whose partitions' subpaths are directly under this Append. + * lists for each Append Path that we create as accumulate_append_subpath + * sometimes can't flatten sub-Appends into the top-level Append. How about expanding that reason a little bit as: ...can't flatten sub-Appends into the top-level Append which prevents the former's partitioned_rels from being pulled into the latter's. + * most one element which is a RelIds of the partitioned relations which there s/RelIds/Relids + * are subpaths for. In this case we just add the RT indexes for the + * partitioned tables for the subpath we're pulling up to the single entry in + * 'partitioned_rels'. How about: In this case, we just pull the RT indexes contained in sub-Append/MergeAppend's partitioned_rels into the single entry of *partitioned_rels, which belongs to the parent Append. * relid_subpart_map maps relid of a non-leaf partition to the index - * in 'partitioned_rels' of that rel (which will also be the index in - * the returned PartitionedRelPruneInfo list of the info for that + * in 'partrelids' of that rel (which will also be the index in the + * returned PartitionedRelPruneInfo list of the info for that ...the index in 'partrelids', which in the new code is a bitmap set, sounds a bit odd. Maybe just mention the index in the list of PartitionedRelPruneInfos as: relid_subpart_map maps relid of a given non-leaf partition in 'partrelids' to the index of its PartitionedRelPruneInfo in the returned list. + /* + * Ensure there were no stray PartitionedRelPruneInfo generated for + * partitioned tables that had no sub-paths for. + */ + Assert(!bms_is_empty(present_parts)); Maybe you meant: ...for partitioned tables for which we had neither leaf subpaths nor sub-PartitionedRelPruneInfos. + List *partitioned_rels; /* List of Relids for each non-leaf + * partitioned table in the partition + * tree. One for each partition hierarchy. + */ List *subpaths; /* list of component Paths */ How about describing partitioned_rels as follows: List of Relids set containing RT indexes of non-leaf tables for each partition hierarchy whose paths are in 'subpaths' -- Amit Langote EDB: http://www.enterprisedb.com
On Tue, 27 Oct 2020 at 19:40, Amit Langote <amitlangote09@gmail.com> wrote: > Some comments: Thanks for having a look at this. I've made some adjustments to those comments and pushed. David
On Mon, Nov 2, 2020 at 9:51 AM David Rowley <dgrowleyml@gmail.com> wrote: > > On Tue, 27 Oct 2020 at 19:40, Amit Langote <amitlangote09@gmail.com> wrote: > > Some comments: > > Thanks for having a look at this. > > I've made some adjustments to those comments and pushed. Thank you. -- Amit Langote EDB: http://www.enterprisedb.com
On Mon, Nov 02, 2020 at 01:50:57PM +1300, David Rowley wrote: > On Tue, 27 Oct 2020 at 19:40, Amit Langote <amitlangote09@gmail.com> wrote: > > Some comments: > > Thanks for having a look at this. > > I've made some adjustments to those comments and pushed. commit a929e17e5 doesn't appear in the v14 release notes, but I wanted to mention that this appears to allow fixing a rowcount mis-estimate for us, which I had reported before: https://www.postgresql.org/message-id/20170326193344.GS31628%40telsasoft.com https://www.postgresql.org/message-id/20170415002322.GA24216@telsasoft.com https://www.postgresql.org/message-id/20170524211730.GM31097@telsasoft.com And others have reported before: https://www.postgresql.org/message-id/flat/7DF51702-0F6A-4571-80BB-188AAEF260DA@gmail.com https://www.postgresql.org/message-id/SG2PR01MB29673BE6F7AA24424FDBFF60BC670%40SG2PR01MB2967.apcprd01.prod.exchangelabs.com For years, our reports have included a generated WHERE clause for each table being queried, to allow each table's partitions to be properly pruned/excluded (not just one table, as happened if we used a single WHERE clause). That worked, but then the planner underestimates the rowcount, since it doesn't realize that the conditions are redundant (since "equality classes" do not handle the inequality conditions). In v14, one WHERE clause per table still gives an underestimate; but, now multiple WHERE clauses aren't required, because a single WHERE clause excludes partitions from each table, and the rowcount from the elided partitions is excluded from the Append rowcount at plan time. Thanks for this feature ! -- Justin