Re: plan shape work - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: plan shape work |
Date | |
Msg-id | 2939541.1758564922@sss.pgh.pa.us Whole thread Raw |
In response to | Re: plan shape work (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
Robert Haas <robertmhaas@gmail.com> writes: > Done in this version. I've also now gone back and rebased the rest of > the patches as well, so this email includes all 7 patches instead of > just the first 3. To recall, my goal for this CF was to get 1-4 > committed. I found time finally to look through all of these. Some notes: 0001: committable, no further comments. 0002: Yeah, this does seem like an improvement. There is something faintly weird about output like -> Result - Output: i3 + Output: t3.i3 Replaces: Scan on t3 One-Time Filter: false because the whole point of this construct is that we're *not* scanning t3. However, failing to supply the Var prefix is surely not better. I like the fact that the output in upper plan levels is now just like what you'd see with a non-phony t3 scan. I think you could commit this too. Merging it with 0001 is reasonable, but if you'd rather keep them separate that's okay too. 0003: doesn't feel ready for commit. In the first place, you've used the "completed an outer join" terminology in the commit message and a couple of comments, but defined it nowhere. I think a para or two in optimizer/README to define "starting" and "completing" outer joins is essential. (I'm also still quite bemused by marking nodes that complete outer joins but not those that start them.) In the second place, we should not need to add two hundred lines of new code to createplan.c to accomplish this. Why not simply bms_difference the joinrel's relids from the union of the inputs' relids? (And no, I do not believe that computing the value two different ways so you can assert they're the same is adding anything whatsoever except wasted cycles.) By and large, I don't believe that it's necessary for 0003 to depend on 0001 either. We already have the information available from the paths' parent RelOptInfos. 0004: commit msg is not very convincing about why this is a good idea. It really looks like change for the sake of change, so you need to make a better case for it. Also, this output seems inconsistent: Function Scan on pg_catalog.generate_series x - Output: ARRAY(SubPlan 1) + Output: ARRAY(array_1) Function Call: generate_series(1, 3) - SubPlan 1 + SubPlan array_1 Why isn't it now "ARRAY(SubPlan array_1)"? The previous notation was chosen to be not-confusable with a plain function call, but you've lost that. Likewise for cases like - Group Key: (InitPlan 1).col1 + Group Key: (minmax_1).col1 which now looks exactly like an ordinary Var. nitpick: sublinktype_to_string() should return const char *. 0005: I'm even less convinced about there being a need for this. It's not that hard to see which RTIs are in which subplan, especially after the other changes in this patchset. 0006: I agree with the need for this, but the details seem messy, and it's not very clear that the proposed data structure would be convenient to use. Do we really need to rely on plan_node_id? Why haven't you integrated record_elided_node into clean_up_removed_plan_level? An idea maybe worth thinking about is that instead of completely eliding the plan node, we could replace it with a "no-op" plan node that both EXPLAIN and the executor will look right through. That node could carry the relid(s) that we lost. Not sure how messy this'd be to integrate, though. 0007: not sure about this either. Why not simply add those relid(s) to the surviving node's apprelids? Again, I don't love the amount of code and data structure that's being added for a hypothetical use-case. It's not even clear that this form of the data structure would be helpful to anyone. regards, tom lane
pgsql-hackers by date: