Thread: FailedAssertion on partprune
Hi, I was trying sqlsmith on REL_11_STABLE (commit 1b957e59b92dc44c14708762f882d7910463a9ac) with a database i have at hand, and got an assertion failure. It seems to happen during planning on prunning time but only when tables get bigger than certain size. I configured it with "--enable-debug --enable-depend --enable-cassert", attached is the backtrace and a script to reproduce the problem. -- Jaime Casanova www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 24 July 2018 at 16:55, Jaime Casanova <jaime.casanova@2ndquadrant.com> wrote: > I was trying sqlsmith on REL_11_STABLE (commit > 1b957e59b92dc44c14708762f882d7910463a9ac) with a database i have at > hand, and got an assertion failure. > It seems to happen during planning on prunning time but only when > tables get bigger than certain size. > > I configured it with "--enable-debug --enable-depend > --enable-cassert", attached is the backtrace and a script to reproduce > the problem. It looks like an AppendPath which has a non-NIL partitioned_rels List won't always only contain leaf partitions as subpaths. The plan that the planner is trying to make here is: https://explain.depesz.com/s/26zj In this case, the 2nd to 5th Append subplan's parents are actually partitioned tables rather than leaf partitions. make_partition_pruneinfo() assumes these will always be leaf partitions. Looking more closely at accumulate_append_subpath() I can see that there are cases where it won't pull sub-Append paths into the main Append. That's what we're seeing in this plan. The fix is either to have make_partition_pruneinfo() return NIL to disable run-time pruning when any of the subpaths are not a leaf partition (it might be a good safety precaution regardless), or we could maybe think harder about what can be done in accumulate_append_subpath() to flatten the Append/MergeAppend paths in this case. I've attached a patch for the former. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
David Rowley <david.rowley@2ndquadrant.com> writes: > On 24 July 2018 at 16:55, Jaime Casanova <jaime.casanova@2ndquadrant.com> wrote: >> I was trying sqlsmith on REL_11_STABLE (commit >> 1b957e59b92dc44c14708762f882d7910463a9ac) with a database i have at >> hand, and got an assertion failure. > In this case, the 2nd to 5th Append subplan's parents are actually > partitioned tables rather than leaf partitions. > make_partition_pruneinfo() assumes these will always be leaf > partitions. Looking more closely at accumulate_append_subpath() I can > see that there are cases where it won't pull sub-Append paths into the > main Append. That's what we're seeing in this plan. Hm, wouldn't this be fixed by your pending patch at <CAKJS1f_eYwHk2x0xX7qW42rV_GRsJGBMe3AqN9MYLRSs1S+CiA@mail.gmail.com> ? (I apologize for having seemingly dropped the ball on that thread; RL has intruded on my time a bit lately. I hope to get it dealt with next week.) regards, tom lane
On 25 July 2018 at 01:46, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hm, wouldn't this be fixed by your pending patch at > <CAKJS1f_eYwHk2x0xX7qW42rV_GRsJGBMe3AqN9MYLRSs1S+CiA@mail.gmail.com> > ? It's a different issue. I coded run-time pruning with the incorrect assumption that we only get leaf partitions under an Append which have a non-empty partitioned_rels List. The other patch fixes it to supported mixed hierarchies from UNION ALLs. It'll still trip up on anything apart from leaf partitions being in the subpaths list. Thinking again about the patch I submitted upthread; I wonder if it's actually possible to support pruning with Jamie's query. Without looking at the code, I don't quite see the reason that the sub-partitioned table wouldn't be correctly pruned by the run-time pruning code. It could just be a matter of removing the failing Assert(). I'll do a bit more testing and confirm. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2018-Jul-25, David Rowley wrote: > Thinking again about the patch I submitted upthread; I wonder if it's > actually possible to support pruning with Jamie's query. Without > looking at the code, I don't quite see the reason that the > sub-partitioned table wouldn't be correctly pruned by the run-time > pruning code. It could just be a matter of removing the failing > Assert(). I'll do a bit more testing and confirm. Not looking at the code right now either, but removing that assert and then removing the TABLESAMPLE clause, the query returns identical results with and without pruning, so maybe you're right. No time for further looking now. (SELECT 'Jaime' <> 'Jamie' COLLATE es_EC) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 31 July 2018 at 11:25, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Not looking at the code right now either, but removing that assert and > then removing the TABLESAMPLE clause, the query returns identical > results with and without pruning, so maybe you're right. No time for > further looking now. I thought about this a bit more and it seems fine just to remove the Assert. The patch that's pending in [1] adds all the correct handling for subplans that don't belong in any partition hierarchy. That's not the case for Jaime's plan, but those sub-partitioned tables will be identified just like any other partition by the run-time pruning code and can be pruned in the same way. The attached patch removes the Assert, but I think it should be probably be done as part of [1]'s patch since that's also adding the code to handle subplans for tables that don't belong in the partition hierarchy. I also spent a bit of time trying to create a simple test case for this and I've discovered that it's really not very easy and I have doubts about how stable such a plan might be. > (SELECT 'Jaime' <> 'Jamie' COLLATE es_EC) Apologies. It was a finger auto-pilot malfunction. [1] https://www.postgresql.org/message-id/CAKJS1f_eYwHk2x0xX7qW42rV_GRsJGBMe3AqN9MYLRSs1S%2BCiA%40mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
[ bringing this discussion back to the original thread ] David Rowley <david.rowley@2ndquadrant.com> writes: > The attached patch removes the Assert, but I think it should be > probably be done as part of [1]'s patch since that's also adding the > code to handle subplans for tables that don't belong in the partition > hierarchy. Over in that thread, we had David Rowley <david.rowley@2ndquadrant.com> writes: > On 2 August 2018 at 11:48, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I didn't include this change because (a) it's late, (b) no test >> case was included, and (c) I don't entirely believe it anyway. >> How would a rel be both leaf and nonleaf? Isn't this indicative >> of a problem somewhere upstream in the planner? > It's probably best discussed on the other thread, but it seems to be > by design in accumulate_append_subpath(). Parallel Append nodes don't > get flattened if they contain a mix of parallel aware and non-parallel > aware subplans. Yeah, looking at the explain posted upthread, the issue is specifically that some of the child paths might be for Gathers over subsets of the partitioning hierarchy. It's not real clear to me whether such a subset would necessarily correspond to a subtree of the hierarchy, but if it does, you'd want to be able to apply run-time pruning at the parent Append, in hopes of not having to execute the Gather at all. Separately from that, if you do execute the Gather, applying runtime pruning within its child Append is also a good thing. So what's unclear to me is whether removing that Assert is sufficient to make all of that work nicely. I'm concerned in particular that the planner might get confused about which pruning tests to apply at each level. (EXPLAIN isn't a very adequate tool for testing this, since it fails to show anything about what pruning tests are attached to an Append. I wonder if we need to create something that does show that.) I think it'd be a real good idea to have a test case to exercise this. Jaime's presented test case is too expensive to consider as a regression test, but I bet that a skinnied-down version could be made to work once you'd set all the parallelization parameters to liberal values, like in select_parallel.sql. regards, tom lane
On Thu, Aug 2, 2018 at 1:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > (EXPLAIN isn't a very adequate tool for testing this, since > it fails to show anything about what pruning tests are attached to an > Append. I wonder if we need to create something that does show that.) I've asked for that more than once and still think it's a good idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
David Rowley <david.rowley@2ndquadrant.com> writes: >> It's probably best discussed on the other thread, but it seems to be >> by design in accumulate_append_subpath(). Parallel Append nodes don't >> get flattened if they contain a mix of parallel aware and non-parallel >> aware subplans. I don't really understand the issue at hand, but let me just make a comment about accumulate_append_subpath(). If we have a regular Append on top of another regular Append or on top of a MergeAppend, we can flatten the lower Merge(Append) into the upper one. No problem. If, however, the lower path is a Parallel Append, we can't. Consider this plan: Gather -> Append -> Partial Subpath #1 -> Parallel Append -> Partial Subpath #2 -> Non-partial Subpath Making Partial Subpath #2 a child of the Append rather than the Parallel Append would be correct; the only downside is that we'd lose the nice worker-balancing stuff that the Parallel Append does. However, making Non-partial Subpath a child of the Append rather than the Parallel Append would be dead wrong, because the Parallel Append will make sure that it only gets executed by one of the cooperating processes, and the regular Append won't. If the upper Append were changed to a Parallel Append, then we could flatten the whole thing just fine, as long as we're careful to keep partial paths and non-partial paths separated. Also, if the situation is reversed, with an upper Parallel Append and a regular Append beneath it, that's always flatten-able. > Yeah, looking at the explain posted upthread, the issue is specifically > that some of the child paths might be for Gathers over subsets of the > partitioning hierarchy. It's not real clear to me whether such a subset > would necessarily correspond to a subtree of the hierarchy, but if it > does, you'd want to be able to apply run-time pruning at the parent > Append, in hopes of not having to execute the Gather at all. Separately > from that, if you do execute the Gather, applying runtime pruning within > its child Append is also a good thing. That sounds right to me. If I'm not mistaken, in that EXPLAIN, the radicado2012 and radicado2017 partitions are ineligible for parallel query for some reason, maybe a reloption setting, but the other partitions are all eligible, or rather, their subpartitions are eligible. If they were all OK for parallel query, the plan would probably just end up with Gather over Parallel Append over individual sample scans. In general, I think the only ways to get an Append are inheritance, partitioning, and set operations. If an upper Append is due to partitioning or inheritance, then I think that any (Merge)Append nodes beneath it must be a subtree of the partitioning or inheritance hierarchy, respectively. Partitioning and inheritance can't be mixed, and set operations can't occur below partition expansion. However, if the upper Append was created by a set operation, then its child Append nodes could be from any of those causes. For instance, it seems possible to imagine an upper Append that is implementing EXCEPT and a lower Append that is implementing both partition expansion and UNION ALL. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 3 August 2018 at 05:54, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Yeah, looking at the explain posted upthread, the issue is specifically > that some of the child paths might be for Gathers over subsets of the > partitioning hierarchy. It's not real clear to me whether such a subset > would necessarily correspond to a subtree of the hierarchy, but if it > does, you'd want to be able to apply run-time pruning at the parent > Append, in hopes of not having to execute the Gather at all. Separately > from that, if you do execute the Gather, applying runtime pruning within > its child Append is also a good thing. Yeah, that "good thing" was the reason I changed my mind from aborting run-time pruning in my first patch idea to allowing it because if pruning says so, it's fine to remove an entire sub-partitioned hierarchy. If we look at the Assert that's in question. /* Same rel cannot be both leaf and non-leaf */ Assert(relid_subplan_map[rti] == 0); If this fails then the subplan path that was found must have a parent that's a partitioned table. If we're worried about those containing subplans which are not partitions then we can probably add some Asserts somewhere else. One place that we could do that with a bit of cover would be inside the bms_num_members(allmatchedsubplans) < list_length(subpaths) condition, we could do: Assert(root->simple_rte_array[parentrel->relid]->relkind != RELKIND_PARTITIONED_TABLE); There should never be any other_subplans when the parent is a partitioned table. I've attached a patch to demonstrate what I mean here. > So what's unclear to me is whether removing that Assert is sufficient > to make all of that work nicely. I'm concerned in particular that the > planner might get confused about which pruning tests to apply at each > level. (EXPLAIN isn't a very adequate tool for testing this, since > it fails to show anything about what pruning tests are attached to an > Append. I wonder if we need to create something that does show that.) Pruning is only ever applied at the top-level when the RelOptInfo is a RELOPT_BASEREL. It might well be that we want to reconsider that now that we've seen some plans where the subpaths are not always pulled into the top-level [Merge]Append path, but as of now I don't quite see how this could get broken. We're just pruning with a coarser granularity. > I think it'd be a real good idea to have a test case to exercise this. > Jaime's presented test case is too expensive to consider as a regression > test, but I bet that a skinnied-down version could be made to work > once you'd set all the parallelization parameters to liberal values, > like in select_parallel.sql. Yeah, I did spend a bit of time trying to cut down that test and failed at it. The fact that it seemed so hard to do that didn't give me much confidence that the plan produced by the test would be very stable, but perhaps we can just deal with that if we see its unstable. I'll try again with the test. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 3 August 2018 at 07:53, Robert Haas <robertmhaas@gmail.com> wrote: > I don't really understand the issue at hand, but let me just make a > comment about accumulate_append_subpath(). If we have a regular > Append on top of another regular Append or on top of a MergeAppend, we > can flatten the lower Merge(Append) into the upper one. No problem. > If, however, the lower path is a Parallel Append, we can't. Consider > this plan: > > Gather > -> Append > -> Partial Subpath #1 > -> Parallel Append > -> Partial Subpath #2 > -> Non-partial Subpath > > Making Partial Subpath #2 a child of the Append rather than the > Parallel Append would be correct; the only downside is that we'd lose > the nice worker-balancing stuff that the Parallel Append does. > However, making Non-partial Subpath a child of the Append rather than > the Parallel Append would be dead wrong, because the Parallel Append > will make sure that it only gets executed by one of the cooperating > processes, and the regular Append won't. If the upper Append were > changed to a Parallel Append, then we could flatten the whole thing > just fine, as long as we're careful to keep partial paths and > non-partial paths separated. Also, if the situation is reversed, with > an upper Parallel Append and a regular Append beneath it, that's > always flatten-able. Wouldn't that code have more flexibility to flatten the Append if it were to, instead of looking at the Append's subpaths, look at the subpath's Parent RelOptInfo paths and just use the Append and MergeAppend as a container to identify the relations that must be included, rather than the paths that should be? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Aug 2, 2018 at 8:42 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: > Wouldn't that code have more flexibility to flatten the Append if it > were to, instead of looking at the Append's subpaths, look at the > subpath's Parent RelOptInfo paths and just use the Append and > MergeAppend as a container to identify the relations that must be > included, rather than the paths that should be? Well, we actually start with a list of RelOptInfos, take the cheapest path from each whether partial or non-partial, and then build a Parallel Append out of the result, so there's no need to work backward from the chosen path to the Parallel Append. As to whether it's ever better to take a path other than the cheapest to facilitate flattening, I'm not sure. At the moment, I can't think of a scenario in which that would pay off, but there may be one. I think that if we're using Parallel Append at all, we're likely going to use it throughout the plan tree, in which case everything will get flattened just fine. There might well be corner cases where that isn't so, though, and in some of those flattening may be advantageous, but I can't think of anything really clear and obvious. You might have better ideas than I do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
I spent some more time poking at Jaime's example. I can reduce the problem query to this and still get the Assert crash: select random() from radicado tablesample bernoulli (9.7) where radi_texto = radi_inst_actu limit 33; None of the elements of this query can be removed without causing the Assert to go away, which is weird because none of them have any apparent connection to partition pruning. With the Assert taken out, we find that this is the plan that's causing the problem: QUERY PLAN -------------------------------------------------------------------------------------------------- Limit (cost=0.00..919.71 rows=33 width=8) -> Append (cost=0.00..12792.40 rows=459 width=8) -> Sample Scan on radicado2012 (cost=0.00..2939.18 rows=93 width=8) Sampling: bernoulli ('9.7'::real) Filter: (radi_texto = radi_inst_actu) -> Gather (cost=1000.00..2485.72 rows=93 width=8) Workers Planned: 2 -> Parallel Append (cost=0.00..1476.18 rows=39 width=0) -> Sample Scan on radicado2013_part00 (cost=0.00..1475.99 rows=47 width=0) Sampling: bernoulli ('9.7'::real) Filter: (radi_texto = radi_inst_actu) -> Sample Scan on radicado2013_part01 (cost=0.00..1461.88 rows=46 width=0) Sampling: bernoulli ('9.7'::real) Filter: (radi_texto = radi_inst_actu) -> Gather (cost=1000.00..2491.15 rows=93 width=8) Workers Planned: 2 -> Parallel Append (cost=0.00..1481.62 rows=39 width=0) -> Sample Scan on radicado2014_part00 (cost=0.00..1481.42 rows=47 width=0) Sampling: bernoulli ('9.7'::real) Filter: (radi_texto = radi_inst_actu) -> Sample Scan on radicado2014_part01 (cost=0.00..1464.05 rows=46 width=0) Sampling: bernoulli ('9.7'::real) Filter: (radi_texto = radi_inst_actu) -> Gather (cost=1000.00..2482.47 rows=93 width=8) Workers Planned: 2 -> Parallel Append (cost=0.00..1472.93 rows=39 width=0) -> Sample Scan on radicado2015_part00 (cost=0.00..1472.74 rows=47 width=0) Sampling: bernoulli ('9.7'::real) Filter: (radi_texto = radi_inst_actu) -> Sample Scan on radicado2015_part01 (cost=0.00..1454.28 rows=46 width=0) Sampling: bernoulli ('9.7'::real) Filter: (radi_texto = radi_inst_actu) -> Gather (cost=1000.00..2380.72 rows=86 width=8) Workers Planned: 2 -> Parallel Append (cost=0.00..1371.90 rows=36 width=0) -> Sample Scan on radicado2016_part00 (cost=0.00..1371.72 rows=43 width=0) Sampling: bernoulli ('9.7'::real) Filter: (radi_texto = radi_inst_actu) -> Sample Scan on radicado2016_part01 (cost=0.00..1365.21 rows=43 width=0) Sampling: bernoulli ('9.7'::real) Filter: (radi_texto = radi_inst_actu) -> Sample Scan on radicado2017 (cost=0.00..10.87 rows=1 width=8) Sampling: bernoulli ('9.7'::real) Filter: (radi_texto = radi_inst_actu) (44 rows) Now that seems to me to be a rather weird plan: why doesn't it prefer to flatten everything into one parallel append? Indeed, if you take out any of the remaining query parts such as the LIMIT, that's what it does. I think that its willingness to do this is actually kind of a bug, because this query is going to be a total disaster in terms of the number of workers it will try to use --- way more than the user would expect given max_parallel_workers_per_gather = 2. In any case, I now understand David's concern about creating a usable regression test case that produces a plan like this. It seems to depend on a very corner-case-y and possibly buggy set of costing behaviors. So, while I'd be willing to go ahead and commit the Assert-removal, this actually increases my concern about whether a correct set of pruning instructions get generated in cases like this, because I now feel fairly confident in saying that we haven't tested the logic for such cases *at all*. (Note that Jaime's query doesn't have any pruning steps after the dust settles --- if make_partition_pruneinfo doesn't crash, it eventually figures out that there are no quals usable for pruning. So the fact that it goes through without the Assert does nothing to assuage my fear.) I think we'd be well advised to either change the logic to prohibit multi-level pruning plans from being generated, or do something to allow forcing them to be generated for testing purposes. regards, tom lane
I wrote: > Now that seems to me to be a rather weird plan: why doesn't it prefer > to flatten everything into one parallel append? Indeed, if you take > out any of the remaining query parts such as the LIMIT, that's what > it does. I think that its willingness to do this is actually kind > of a bug, because this query is going to be a total disaster in terms > of the number of workers it will try to use --- way more than the > user would expect given max_parallel_workers_per_gather = 2. Oh ... never mind that last. The parent Append will run its children sequentially, so that the Gathers execute one at a time, and at no point will more than 2 workers be active. Nonetheless, it's a damfool query plan, because we'll be going through parallel worker startup/shutdown overhead 4 times for no benefit. We really should put in some kind of logic to force those sibling Gathers to be aggregated into one, or else disallow them altogether. regards, tom lane
On 9 August 2018 at 15:33, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Nonetheless, it's a damfool query plan, because we'll be going through > parallel worker startup/shutdown overhead 4 times for no benefit. > We really should put in some kind of logic to force those sibling > Gathers to be aggregated into one, or else disallow them altogether. I started debugging this to see where things go wrong. I discovered that add_paths_to_append_rel() is called yet again from apply_scanjoin_target_to_paths() and this is where it's all going wrong. The problem is that the gather paths have been tagged onto the partial paths by this time, so accumulate_append_subpath() has no code to look through those to fetch the Append/MergeAppend subpaths, so it just appends the entire path to the subpaths List. This all worked before 96030f9a481. This commit moved where generate_gather_paths() is called. I'm having a hard time understanding why we need to call add_paths_to_append_rel() from apply_scanjoin_target_to_paths(). The name of the function does not seem to imply that we'd do this. The comment just explains "what" rather than "why" making it a bit useless. /* Build new paths for this relation by appending child paths. */ if (live_children != NIL) add_paths_to_append_rel(root, rel, live_children); I also think that accumulate_append_subpath() is a bit of a fragile way of flattening the append rel's paths, but I feel it's probably apply_scanjoin_target_to_paths() that's at fault here. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Aug 8, 2018 at 11:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Oh ... never mind that last. The parent Append will run its children > sequentially, so that the Gathers execute one at a time, and at no > point will more than 2 workers be active. Yep. > Nonetheless, it's a damfool query plan, because we'll be going through > parallel worker startup/shutdown overhead 4 times for no benefit. > We really should put in some kind of logic to force those sibling > Gathers to be aggregated into one, or else disallow them altogether. Disallowing them could be a big performance regression. Combining them is reasonable, but it's not clear to me how you'd fit that into the planner structure. There's no convenient RelOptInfo to associate with the aggregated-Gather. It can't use the parent RelOptInfo unless all of the children have a partial path, and in that case this plan never would have been generated in the first place. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Aug 11, 2018 at 9:16 AM, David Rowley <david.rowley@2ndquadrant.com> wrote: > I started debugging this to see where things go wrong. I discovered > that add_paths_to_append_rel() is called yet again from > apply_scanjoin_target_to_paths() and this is where it's all going > wrong. The problem is that the gather paths have been tagged onto the > partial paths by this time, so accumulate_append_subpath() has no code > to look through those to fetch the Append/MergeAppend subpaths, so it > just appends the entire path to the subpaths List. This all worked > before 96030f9a481. This commit moved where generate_gather_paths() is > called. I'm baffled as to why looking through Gather to find Append/MergeAppend subpaths would ever be a sane thing to do. > I'm having a hard time understanding why we need to call > add_paths_to_append_rel() from apply_scanjoin_target_to_paths(). The > name of the function does not seem to imply that we'd do this. The > comment just explains "what" rather than "why" making it a bit > useless. > > /* Build new paths for this relation by appending child paths. */ > if (live_children != NIL) > add_paths_to_append_rel(root, rel, live_children); > > I also think that accumulate_append_subpath() is a bit of a fragile > way of flattening the append rel's paths, but I feel it's probably > apply_scanjoin_target_to_paths() that's at fault here. In my original design, apply_scanjoin_target_to_paths() -- or whatever I called it in the original patch versions -- built an entirely new RelOptInfo, so that we ended up with one RelOptInfo representing the unvarnished result of scan/join planning, and a second one representing the result of applying the scan/join target to that result. However, Amit Kapila was able to demonstrate that this caused a small but noticeable regression in planning speed, which seemed like it might plausibly cause some issues for people running very simple queries. In that original design, if the topmost scan/join rel was partitioned, the new "tlist" upper rel was also partitioned, and in the same way. In short, this was a kind of "partitionwise tlist" feature. For each child of the topmost scan/join rel, we built a child "tlist" rel, which got the same paths but with the correct tlist applied to each one. The path for the toplevel "tlist" upper rel could then be built by appending the children from each child "tlist" rel, or else by the usual method of sticking a Result node over the cheapest path for the topmost scan/join rel. In general, the first method is superior. Instead of a plan like Result -> Append -> (N children each producing the unprojected tlist), you get Append -> (N children each producing the projected tlist), which is cheaper. To avoid the performance overhead of creating a whole new tree of upper rels, I rewrote the code so that it modifies the RelOptInfos in place. That unfortunately makes the logic a bit more confusing, and it sounds from your remarks like I didn't comment it as clearly as might have been possible. But the basic idea is the same: we want the projection to be passed down to the child nodes rather than getting a Result node on top. The commit that did most of this -- 11cf92f6e2e13c0a6e3f98be3e629e6bd90b74d5 -- also solved a few other problems, as noted in the commit message, so I don't think we want to rip it out wholesale. There might be better ways to do some of it, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2018-Aug-13, Robert Haas wrote: > In my original design, apply_scanjoin_target_to_paths() -- or whatever > I called it in the original patch versions -- built an entirely new > RelOptInfo, so that we ended up with one RelOptInfo representing the > unvarnished result of scan/join planning, and a second one > representing the result of applying the scan/join target to that > result. However, Amit Kapila was able to demonstrate that this caused > a small but noticeable regression in planning speed, which seemed like > it might plausibly cause some issues for people running very simple > queries. > > In that original design, if the topmost scan/join rel was partitioned, > the new "tlist" upper rel was also partitioned, and in the same way. > In short, this was a kind of "partitionwise tlist" feature. For each > child of the topmost scan/join rel, we built a child "tlist" rel, > which got the same paths but with the correct tlist applied to each > one. The path for the toplevel "tlist" upper rel could then be built > by appending the children from each child "tlist" rel, or else by the > usual method of sticking a Result node over the cheapest path for the > topmost scan/join rel. In general, the first method is superior. > Instead of a plan like Result -> Append -> (N children each producing > the unprojected tlist), you get Append -> (N children each producing > the projected tlist), which is cheaper. I have to admit I have a hard time understanding this stuff. I don't know what a "scan/join target" is (or scan/join relation for that matter) ... and this "tlist relation" thing is an entirely new concept to me. I agree that pushing down the projection looks worthwhile. > To avoid the performance overhead of creating a whole new tree of > upper rels, I rewrote the code so that it modifies the RelOptInfos in > place. That unfortunately makes the logic a bit more confusing, and > it sounds from your remarks like I didn't comment it as clearly as > might have been possible. But the basic idea is the same: we want the > projection to be passed down to the child nodes rather than getting a > Result node on top. The commit that did most of this -- > 11cf92f6e2e13c0a6e3f98be3e629e6bd90b74d5 -- also solved a few other > problems, as noted in the commit message, so I don't think we want to > rip it out wholesale. There might be better ways to do some of it, > though. So, 1. I'm not sure that partition pruning devs are on the hook for producing a test case for the problem originally reported, since we're clearly dealing with a wacko plan, 2. I wonder if this should be considered an open item related to commit 11cf92f6e2e1 instead of partprune. The other option is to ignore this whole affair, mark this as solved in the open items list, and move on. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-Jul-23, Jaime Casanova wrote: > I was trying sqlsmith on REL_11_STABLE (commit > 1b957e59b92dc44c14708762f882d7910463a9ac) with a database i have at > hand, and got an assertion failure. > It seems to happen during planning on prunning time but only when > tables get bigger than certain size. Hi Jaime Did you try your luck running sqlsmith after this fix? It'd be interesting to know if there are further failures. Thanks for the report, -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Aug 16, 2018 at 12:36 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > I have to admit I have a hard time understanding this stuff. I don't > know what a "scan/join target" is (or scan/join relation for that > matter) ... and this "tlist relation" thing is an entirely new concept > to me. Well, I invented the "tlist relation" concept, and it never actually got committed, so it's not surprising that it's new to you. The point is that if you do something like SELECT a.x + b.y FROM x, y WHERE <stuff>, the planner will initially produce a plan that emits a.x and b.y. But that's not actually what we want to produce: we want to compute the expression a.x + b.y. So that expression is the scan/join target. After getting the initial plan that produces a.x and b.y as separate columns, we perform surgery on that plan to make it return the computed expression instead. If x and y are partitioned and the join is partition-wise, it's better to perform this transformation recursively, getting each child-join to return a.x + b.y; the alternative is for each child-join to return a.x and b.y as a separate columns and then perform the projection at the top level. > So, > 1. I'm not sure that partition pruning devs are on the hook for > producing a test case for the problem originally reported, > since we're clearly dealing with a wacko plan, > 2. I wonder if this should be considered an open item related to commit > 11cf92f6e2e1 instead of partprune. > > The other option is to ignore this whole affair, mark this as solved in > the open items list, and move on. I don't know whether there's actually a defect here any more. I was trying to dispel some perceived confusion on the part of David and Tom about what this code was trying to accomplish, but the fact that the code caused some confusion does not necessarily mean that there's anything wrong with it. On the other hand, perhaps it does have functional deficiencies, or perhaps the comments need work. Or maybe this code is fine taken in isolation but there are still problems in how it interacts with partition pruning. Let's start by deciding what we think the problems are, and then we can decide who should fix them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 17 August 2018 at 06:52, Robert Haas <robertmhaas@gmail.com> wrote: > I don't know whether there's actually a defect here any more. I was > trying to dispel some perceived confusion on the part of David and Tom > about what this code was trying to accomplish, but the fact that the > code caused some confusion does not necessarily mean that there's > anything wrong with it. On the other hand, perhaps it does have > functional deficiencies, or perhaps the comments need work. Or maybe > this code is fine taken in isolation but there are still problems in > how it interacts with partition pruning. Let's start by deciding what > we think the problems are, and then we can decide who should fix them. I think Tom and I both agree that the plan mentioned in [1] and [2] is not quite as optimal as it could be. The investigation I did in [3] highlights where this got broken and the reason why it got broken. Tom removed the offending Assert() in 59ef49d26d2, but mentioned there's no test case which tries to ensure having a partitioned table as an Append child works correctly. I agree that it would be nice to have a test for this, but I did try a couple of times to come up with a test case to do this, but I was unable to come up with anything suitable compact for the regression tests, and even at that, given how difficult it is to get a plan into this shape, I didn't hold up much hope of the test's plan remaining stable. Tom seems to agree with this as he mentioned in [2]. I think the best path forward is if you (Robert) could verify the behaviour that I mentioned in [3] is expected behaviour, if it is then perhaps the path forward is for me to try harder to write a compact test for this, but if you discover that the behaviour is unexpected then I think the best course of action is to fix it to properly flatten the Append which will mean there's no need for a test case, and perhaps the Assert that was removed can be put back again. [1] https://www.postgresql.org/message-id/CAKJS1f9MQd5ntg6xXg7jJe0jB_bWvCDRmaH6yNzZGF%2BTVa5M%3DA%40mail.gmail.com [2] https://www.postgresql.org/message-id/8600.1533765148%40sss.pgh.pa.us [3] https://www.postgresql.org/message-id/CAKJS1f_j9BSJ7svDF7uid6EMm%2Bfu%2BcAhonZ7hcqiYU4ssv3rtg%40mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 14 August 2018 at 09:23, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Aug 11, 2018 at 9:16 AM, David Rowley > <david.rowley@2ndquadrant.com> wrote: >> I started debugging this to see where things go wrong. I discovered >> that add_paths_to_append_rel() is called yet again from >> apply_scanjoin_target_to_paths() and this is where it's all going >> wrong. The problem is that the gather paths have been tagged onto the >> partial paths by this time, so accumulate_append_subpath() has no code >> to look through those to fetch the Append/MergeAppend subpaths, so it >> just appends the entire path to the subpaths List. This all worked >> before 96030f9a481. This commit moved where generate_gather_paths() is >> called. > > I'm baffled as to why looking through Gather to find > Append/MergeAppend subpaths would ever be a sane thing to do. Can you explain why it's less sane than what the current code is doing? Below a Gather there will be partial paths, but we can also get those in a Parallel Append, which the accumulate_append_subpath() code already attempts to handle. If the Gather Path is there already then I guess one difference would be that the caller would need to ensure that another Gather path is placed below the Parallel Append again. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Aug 17, 2018, at 2:49 AM, David Rowley <david.rowley@2ndquadrant.com> wrote:On 17 August 2018 at 06:52, Robert Haas <robertmhaas@gmail.com> wrote:I don't know whether there's actually a defect here any more. I was
trying to dispel some perceived confusion on the part of David and Tom
about what this code was trying to accomplish, but the fact that the
code caused some confusion does not necessarily mean that there's
anything wrong with it. On the other hand, perhaps it does have
functional deficiencies, or perhaps the comments need work. Or maybe
this code is fine taken in isolation but there are still problems in
how it interacts with partition pruning. Let's start by deciding what
we think the problems are, and then we can decide who should fix them.
I think Tom and I both agree that the plan mentioned in [1] and [2] is
not quite as optimal as it could be. The investigation I did in [3]
highlights where this got broken and the reason why it got broken.
Tom removed the offending Assert() in 59ef49d26d2, but mentioned
there's no test case which tries to ensure having a partitioned table
as an Append child works correctly. I agree that it would be nice to
have a test for this, but I did try a couple of times to come up with
a test case to do this, but I was unable to come up with anything
suitable compact for the regression tests, and even at that, given how
difficult it is to get a plan into this shape, I didn't hold up much
hope of the test's plan remaining stable. Tom seems to agree with this
as he mentioned in [2].
I think the best path forward is if you (Robert) could verify the
behaviour that I mentioned in [3] is expected behaviour, if it is then
perhaps the path forward is for me to try harder to write a compact
test for this, but if you discover that the behaviour is unexpected
then I think the best course of action is to fix it to properly
flatten the
Append which will mean there's no need for a test case, and perhaps
the Assert that was removed can be put back again.
[1] https://www.postgresql.org/message-id/CAKJS1f9MQd5ntg6xXg7jJe0jB_bWvCDRmaH6yNzZGF%2BTVa5M%3DA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/8600.1533765148%40sss.pgh.pa.us
[3] https://www.postgresql.org/message-id/CAKJS1f_j9BSJ7svDF7uid6EMm%2Bfu%2BcAhonZ7hcqiYU4ssv3rtg%40mail.gmail.com
On behalf of the RMT, I just want to make sure this keeps moving along.
It sounds like the next step is for Robert to verify that [3] is the expected
behavior and then David can decide what to do from there.
Thanks!
Jonathan
Attachment
On Mon, Aug 27, 2018 at 6:05 PM, Jonathan S. Katz <jkatz@postgresql.org> wrote: > On behalf of the RMT, I just want to make sure this keeps moving along. > It sounds like the next step is for Robert to verify that [3] is the > expected > behavior and then David can decide what to do from there. Yes, that's the expected behavior. If we didn't regenerate the paths there, we'd end up with Result -> Append -> [various paths that produce a tlist which needs to be adjusted later by the result node] Instead of: Append -> [various paths that produce an adjusted tlist] Paths of the former kind have already been generated; we regenerate paths here to get the latter kind as well, which end up displacing the old ones on cost. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Aug 17, 2018 at 2:58 AM, David Rowley <david.rowley@2ndquadrant.com> wrote: >> I'm baffled as to why looking through Gather to find >> Append/MergeAppend subpaths would ever be a sane thing to do. > > Can you explain why it's less sane than what the current code is > doing? Below a Gather there will be partial paths, but we can also > get those in a Parallel Append, which the accumulate_append_subpath() > code already attempts to handle. Sorry, I lost track of this email. accumulate_append_subpath's job is to flatten stacked Append nodes into a single Append node. But you can't in general flatten two Append nodes that are separated by something else in the middle, because that other node probably does something. To take a really simple example, consider: Append -> Result -> Append -> ... -> ... Well, the Result node computes a new target list, so the inner Append node can't just blindly be combined with the outer Append node. You might be able to make it work if you adjusted the inner Append node's children to produce the same target list that the Result node produces, but you can't just flatten it blindly. The same thing is true of Gather: Append -> Gather -> Append -> ... -> ... The child paths of the inner Append node are known to be parallel-safe and are necessarily partial paths unless the inner Append is a Parallel Append; the paths under the outer Append are definitely not partial and might not even be parallel-safe (they can't be parallel-unsafe, or we couldn't have a Gather anywhere in the plan, but they *could* be parallel-restricted). If you tried to flatten the inner Append into the outer one, you'd no longer have anything like a valid plan, because you can't have a partial path anywhere in the tree without a Gather someplace higher up, which wouldn't be true here any more, which I guess is what you meant by ... > If the Gather Path is there already > then I guess one difference would be that the caller would need to > ensure that another Gather path is placed below the Parallel Append > again. ...this, but that's not right. The Gather would have to go ABOVE the Parallel Append. The broader point here, though, is that even if you could do this kind of surgery, it's the wrong way of going about the problem. In any case where Append-Gather-Append could be flattened to Gather-Append, we should have just generated Gather-Append initially, rather than trying to rejigger things that way later. And I think that's what the code does. I'm not sure exactly what's going wrong with runtime partition pruning in this case, but ISTM that if runtime partition pruning doesn't work in some obscure case, that's just a limitation we should accept until somebody gets around to working on it. The key thing right now is not to have crashes or wrong behavior, so that we can ship. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, 16 Aug 2018 at 11:38, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > On 2018-Jul-23, Jaime Casanova wrote: > > > > I was trying sqlsmith on REL_11_STABLE (commit > > 1b957e59b92dc44c14708762f882d7910463a9ac) with a database i have at > > hand, and got an assertion failure. > > It seems to happen during planning on prunning time but only when > > tables get bigger than certain size. > > Hi Jaime > > Did you try your luck running sqlsmith after this fix? It'd be > interesting to know if there are further failures. > intermitently... so far so good! i'm adding things (taken from release notes) slowly every often -- Jaime Casanova www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 8/29/18 1:38 PM, Robert Haas wrote: > On Mon, Aug 27, 2018 at 6:05 PM, Jonathan S. Katz <jkatz@postgresql.org> wrote: >> On behalf of the RMT, I just want to make sure this keeps moving along. >> It sounds like the next step is for Robert to verify that [3] is the >> expected >> behavior and then David can decide what to do from there. > > Yes, that's the expected behavior. If we didn't regenerate the paths > there, we'd end up with > > Result > -> Append > -> [various paths that produce a tlist which needs to be adjusted > later by the result node] > > Instead of: > > Append > -> [various paths that produce an adjusted tlist] > > Paths of the former kind have already been generated; we regenerate > paths here to get the latter kind as well, which end up displacing the > old ones on cost. > An update from the RMT: after our meeting today, we decided that we would drop this from the list of open items for the 11 major release. The initial issue was already fixed and we understand that developing the test for this will take some work and we do not thing it is needed for the v11 release. Thanks again for working on this! Jonathan