On Fri, Feb 28, 2020 at 11:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes: > On Fri, Feb 28, 2020 at 3:02 PM Andy Fan <zhihui.fan1213@gmail.com> wrote: >> Glad to see this. I think the hard part is this transform is not *always* >> good. for example foo.a only has 1 rows, but bar has a lot of rows, if >> so the original would be the better one.
> Yes exactly. TBH I'm not sure how to achieve that.
Yeah, I was about to make the same objection when I saw Andy already had. Without some moderately-reliable way of estimating whether the change is actually a win, I think we're better off leaving it out. The user can always rewrite the query for themselves if the grouped implementation would be better -- but if the planner just does it blindly, there's no recourse when it's worse.
Yes, that makes sense.
> Any ideas on this part?
I wonder whether it'd be possible to rewrite the query, but then consider two implementations, one where the equality clause is pushed down into the aggregating subquery as though it were LATERAL. You'd want to be able to figure out that the presence of that clause made it unnecessary to do the GROUP BY ... but having done so, a plan treating the aggregating subquery as LATERAL ought to be pretty nearly performance-equivalent to the current way. So this could be mechanized in the current planner structure by treating that as a parameterized path for the subquery, and comparing it to unparameterized paths that calculate the full grouped output.
I suppose this would happen in/around function set_subquery_pathlist. When we generate access paths for the subquery, we try to push down the equality clause into subquery, remove the unnecessary GROUP BY, etc. and then perform another run of subquery_planner to generate the parameterized path, and add it to the RelOptInfo for the subquery. So that we can do comparison to unparameterized paths.
Am I understanding it correctly?
Obviously it'd be a long slog from here to there, but it seems like maybe that could be made to work. There's a separate question about whether it's really worth the trouble, seeing that the optimization is available today to people who rewrite their queries.
If I understand correctly as above, yes, this would take quite a lot of