Re: Trying to pull up EXPR SubLinks - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Trying to pull up EXPR SubLinks
Date
Msg-id CAMbWs4-Je0y28o29ExX9qMovOO4xXmuAWJSDwphkpeXXX+cv-Q@mail.gmail.com
Whole thread Raw
In response to Re: Trying to pull up EXPR SubLinks  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Trying to pull up EXPR SubLinks
List pgsql-hackers
On Fri, Feb 28, 2020 at 11:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
> On Fri, Feb 28, 2020 at 3:02 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>> Glad to see this.  I think the hard part is this transform is not *always*
>> good.  for example foo.a only has 1 rows, but bar has a lot  of rows, if
>> so the original would be the better one.

> Yes exactly. TBH I'm not sure how to achieve that.

Yeah, I was about to make the same objection when I saw Andy already had.
Without some moderately-reliable way of estimating whether the change
is actually a win, I think we're better off leaving it out.  The user
can always rewrite the query for themselves if the grouped implementation
would be better -- but if the planner just does it blindly, there's no
recourse when it's worse.

Yes, that makes sense.
 

> Any ideas on this part?

I wonder whether it'd be possible to rewrite the query, but then
consider two implementations, one where the equality clause is
pushed down into the aggregating subquery as though it were LATERAL.
You'd want to be able to figure out that the presence of that clause
made it unnecessary to do the GROUP BY ... but having done so, a
plan treating the aggregating subquery as LATERAL ought to be pretty
nearly performance-equivalent to the current way.  So this could be
mechanized in the current planner structure by treating that as a
parameterized path for the subquery, and comparing it to unparameterized
paths that calculate the full grouped output.

I suppose this would happen in/around function set_subquery_pathlist.
When we generate access paths for the subquery, we try to push down the
equality clause into subquery, remove the unnecessary GROUP BY, etc.
and then perform another run of subquery_planner to generate the
parameterized path, and add it to the RelOptInfo for the subquery. So
that we can do comparison to unparameterized paths.

Am I understanding it correctly?
 

Obviously it'd be a long slog from here to there, but it seems like
maybe that could be made to work.  There's a separate question about
whether it's really worth the trouble, seeing that the optimization
is available today to people who rewrite their queries.

If I understand correctly as above, yes, this would take quite a lot of
effort. Not sure if it's still worth doing.

Thanks
Richard

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Cleanup - Removal of apply_typmod function in #if 0
Next
From: Fujii Masao
Date:
Subject: Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side