Re: Pull up aggregate subquery - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Pull up aggregate subquery
Date
Msg-id BANLkTimURp+3_LZuaOyP0GSMOeZ7qNNdbw@mail.gmail.com
Whole thread Raw
In response to Re: Pull up aggregate subquery  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, May 24, 2011 at 12:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Oh, I see.  I have a general gripe with nested loop plans: we already
>> consider too many of them.  IIRC, when I last fooled around with this,
>> the number of nested loop paths that we generate far exceeded the
>> number of merge or hash join paths, and most of those paths suck and
>> are a complete waste of time.
>
> Hm, really?  My experience is that it's the mergejoin paths that breed
> like rabbits, because there are so many potential sort orders.

*scratches head*

Well, I'm pretty sure that's how it looked when I was testing it.  I
wonder how this could be different for the two of us.  Or maybe one of
us is confused.  Admittedly, I haven't looked at it in a while.

>>> But I think this is all fairly unrelated to the case that Hitoshi is on
>>> about.  As you said earlier, it seems like we'd have to derive both
>>> parameterized and unparameterized plans for the subquery, which seems
>>> mighty expensive.
>
>> That was my first thought, too, but then I wondered if I was getting
>> cheap.
>
> Yeah, it's certainly possible that we're worrying too much.  Usually
> I only get concerned about added planner logic if it will impact the
> planning time for simple queries.  "Simple" tends to be in the eye of
> the beholder, but something with a complicated aggregate subquery is
> probably not simple by anyone's definition.
>
> In this case the sticky point is that there could be multiple possible
> sets of clauses available to be pushed down, depending on what you
> assume is the outer relation for the eventual upper-level nestloop.
> So worst case, you could have not just one parameterized plan to
> generate in addition to the regular kind, but 2^N of them ...

Hmm.  Well, 2^N is more than 2.  But I bet most of them are boring.
Judging by his followup email, Hitoshi Harada seems to think we can
just look at the case where we can parameterize on all of the grouping
columns.  The only other case that seems like it might be interesting
is parameterizing on any single one of the grouping columns.  I can't
get excited about pushing down arbitrary subsets.  Of course, even
O(N) in the number of grouping columns might be too much, but then we
could fall back to just all or nothing.  I think the "all" case by
itself would probably extract 90%+ of the benefit, especially since
"all" will often mean "the only one there is".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: 9.2 schedule
Next
From: Andrew Dunstan
Date:
Subject: Re: "errno" not set in case of "libm" functions (HPUX)