Re: Pull up aggregate subquery - Mailing list pgsql-hackers

From Hitoshi Harada
Subject Re: Pull up aggregate subquery
Date
Msg-id BANLkTi=+JV6YYafV6P3nsn1Zi2xCYUpG5w@mail.gmail.com
Whole thread Raw
In response to Re: Pull up aggregate subquery  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Pull up aggregate subquery  (Robert Haas <robertmhaas@gmail.com>)
Re: Pull up aggregate subquery  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
2011/5/24 Robert Haas <robertmhaas@gmail.com>:
> On Sat, May 21, 2011 at 12:49 PM, Hitoshi Harada <umi.tanuki@gmail.com> wrote:
>> 2011/5/5 Hitoshi Harada <umi.tanuki@gmail.com>:
>> Do I understand correctly? If so, could someone explain more detail of
>> how to get Parameterized Scan in the planner?
>
> I think we're going to need Tom to give the definitive word on this,
> but I believe that the current situation is that the executor is
> capable of handling a parameterized scan (yeah!)

Ah, greping git log pointed me to
http://git.postgresql.org/gitweb?p=postgresql.git;a=commit;h=53e757689ce94520f1c53a89dbaa14ea57b09da7

> The problem is how to figure out whether a parameterized scan is a win
> without expending too much planning time.  For example, in the case
> you mention:
>
> select m_id, sum_len from size_m m inner join (select m_id,
> sum(length(val)) as sum_len from size_l group by m_id) l on m.id =
> l.m_id where val = '1';
>
> ...we'd need to plan the subquery twice, once with a parameterized
> qual m_id = $1 pushed down, and once without that.  We could then
> compare the cost of a nest-loop with the qual to the cost of a merge
> or hash join without it.  But this seems very expensive.  In the
> particular case you have here, the subquery is simple enough that this
> probably wouldn't be any big deal, but in general there's no reason
> why that subquery couldn't be quite complex - or why it couldn't have
> subqueries of its own that would requite the same treatment
> recursively.

That's true. But if the planning cost is an only issue, why not adding
new GUC for user to choose if they prefer it or not? Of course if we
have some method to predict which way to go before proving both ways,
it's great. Do you have some blue picture on it?

In addition, I wonder if the "generalized nestloop" pattern can do
whole outer scan before proving inner? I mean, in my example if the
outer scan qualified more than one tuple I'm afraid that inner Agg and
its underlying SeqScan are re-scanned more than one. That will bloat
performance if the Agg is expensive. It would be great if we can outer
scan can do scan to the end and stores param variables in something
like array and prove once inner Agg, so that we can ensure the Agg is
computed once. To do that, do we need more work on executor?

Regards,



--
Hitoshi Harada


pgsql-hackers by date:

Previous
From: Greg Smith
Date:
Subject: 9.2 schedule
Next
From: Hitoshi Harada
Date:
Subject: Re: Pull up aggregate subquery