Re: Pull up aggregate subquery - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Pull up aggregate subquery
Date
Msg-id BANLkTi=iTUfvYgL5=WXfBK+xe0=Lh7MG9Q@mail.gmail.com
Whole thread Raw
In response to Re: Pull up aggregate subquery  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Pull up aggregate subquery  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, May 24, 2011 at 11:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I must be misunderstanding you, because index scans are the thing we
>> already *do* parameterize; and what else would make any sense?
>
> The point I was trying to make is that the ultimate reason for having a
> parameterized portion-of-a-plan will be that there's a parameterized
> indexscan somewhere down at the bottom.  I had originally imagined that
> we might parameterize any old scan; for example consider replacing
>
>        Nestloop
>          Join Filter: a.x = b.y
>          -> Seqscan on a
>          -> Seqscan on b
>
> with
>
>        Nestloop
>          -> Seqscan on a
>          -> Seqscan on b
>               Filter: b.y = a.x

Oh, I see.  I have a general gripe with nested loop plans: we already
consider too many of them.  IIRC, when I last fooled around with this,
the number of nested loop paths that we generate far exceeded the
number of merge or hash join paths, and most of those paths suck and
are a complete waste of time.  It strikes me that we ought to be
trying to find ways to get rid of some of the paths we're already
considering, rather than adding any more.  In this particular case, if
the second plan is actually faster, it probably won't be by much; we
could think about trying to make some kind of ex-post-facto
transformation instead of throwing everything into the path machinery.

> Although this isn't nearly as useful as if we could be using an index on
> b.y, there would still be some marginal gain to be had, because we'd be
> able to reject rows of b without first passing them up to the join node.
> But I'm afraid that going all-out like that would slow the planner down
> far too much (too many Paths to consider) to be justified by a marginal
> runtime gain.

Agreed.

> So the idea I have at the moment is that we'll still only parameterize
> indexscans, but then allow those to be joined to unrelated relations
> while still remaining parameterized.  That should reduce the number
> of parameterized Paths hanging around, because only joinclauses that
> match indexes will give rise to such Paths.

That seems fine, yeah.  If anything, we might want to limit it even
more, but certainly that's a good place to start, and see how it
shakes out.

> But I think this is all fairly unrelated to the case that Hitoshi is on
> about.  As you said earlier, it seems like we'd have to derive both
> parameterized and unparameterized plans for the subquery, which seems
> mighty expensive.

That was my first thought, too, but then I wondered if I was getting
cheap.  Most of the time, the subquery will be something simple, and
replanning it twice won't really matter much.  If it happens to be
something complicated, then it will take longer, but on the other hand
that's exactly the sort of byzantine query where you probably want the
planner to pull out all the stops.  Aggregates tend to "feel" slow
almost invariably, because the amount of data being processed under
the hood is much larger than what actually gets emitted, so I think we
should at least consider the possibility that users really won't care
about a bit of extra work.  The case I'm concerned about is where you
have several levels of nested aggregates, and the effect starts to
multiply.  But even if that turns out to be a problem, we could handle
it by limiting consideration of the alternate path to the top 1 or 2
levels and handle the rest as we do now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Reducing overhead of frequent table locks
Next
From: Andrew Dunstan
Date:
Subject: Re: Cannot build docs of 9.1 on Windows