Re: Pull up aggregate subquery - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Pull up aggregate subquery
Date
Msg-id BANLkTinYrD+NmuuWeQ7ye06tYj3xEjn4PQ@mail.gmail.com
Whole thread Raw
In response to Re: Pull up aggregate subquery  (Hitoshi Harada <umi.tanuki@gmail.com>)
Responses Re: Pull up aggregate subquery  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Pull up aggregate subquery  (Hitoshi Harada <umi.tanuki@gmail.com>)
List pgsql-hackers
On Sat, May 21, 2011 at 12:49 PM, Hitoshi Harada <umi.tanuki@gmail.com> wrote:
> 2011/5/5 Hitoshi Harada <umi.tanuki@gmail.com>:
>> https://commitfest.postgresql.org/action/patch_view?id=548
>>
>> I'll work further if I find time.
>
> After more thought, pulling up aggregate subquery in narrow
> conditional cases is quite hard path, especially when the joinrel is
> more than 2. It will be hard to check pulling up is safe for other
> relations than the target relation.
>
> It was a big shame I missed Tom Lane's session in PGCon, but finding
> "Parameterized Scan" in his slides, it occurred to me that it might
> help my problem, too. Before hitting the "pull up" idea, I once
> thought if it would be possible to push outer Var of join down to
> Agg's HAVING, which is transferred to underlying SeqScan's filter.
> Resulted in something like:
>
>  NestLoop
>    -> SeqScan M (filter: M.val = '1')
>    -> GroupAggregate
>      -> SeqScan M (filter: L.m_id = M.id)
>
> However, currently we don't have such mechanism to push down Var as a
> qual to non-NestLoop. Yeah, it could be even now, but we should avoid
> N-loop of Agg. We want to scan Agg once, with Param $1 = M.id =
> multiple values. Since I didn't attend his session I'm afraid I don't
> understand "Parameterized Scan" correctly, but once we've got such
> mechanism, one example introduced in Robert Haas's blog[1] (originally
> shown by Andrew Gierth[2])  and LATERAL maybe.
>
> Do I understand correctly? If so, could someone explain more detail of
> how to get Parameterized Scan in the planner?

I think we're going to need Tom to give the definitive word on this,
but I believe that the current situation is that the executor is
capable of handling a parameterized scan (yeah!) but the planner
doesn't know how to generate them (boo!).   This is an improvement of
a sort over the 9.0 code base, where neither the planner nor the
executor could handle this case, but we need planner to support in
order to get anywhere useful with it.

The problem is how to figure out whether a parameterized scan is a win
without expending too much planning time.  For example, in the case
you mention:

select m_id, sum_len from size_m m inner join (select m_id,
sum(length(val)) as sum_len from size_l group by m_id) l on m.id =
l.m_id where val = '1';

...we'd need to plan the subquery twice, once with a parameterized
qual m_id = $1 pushed down, and once without that.  We could then
compare the cost of a nest-loop with the qual to the cost of a merge
or hash join without it.  But this seems very expensive.  In the
particular case you have here, the subquery is simple enough that this
probably wouldn't be any big deal, but in general there's no reason
why that subquery couldn't be quite complex - or why it couldn't have
subqueries of its own that would requite the same treatment
recursively.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Identifying no-op length coercions
Next
From: Robert Haas
Date:
Subject: Re: Identifying no-op length coercions