Re: Pull up aggregate subquery - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Pull up aggregate subquery
Date
Msg-id BANLkTinsKqGLt3h56DumkVJxDFyBOoOQzw@mail.gmail.com
Whole thread Raw
In response to Re: Pull up aggregate subquery  (Hitoshi Harada <umi.tanuki@gmail.com>)
Responses Re: Pull up aggregate subquery
List pgsql-hackers
On Mon, May 23, 2011 at 10:47 PM, Hitoshi Harada <umi.tanuki@gmail.com> wrote:
> That's true. But if the planning cost is an only issue, why not adding
> new GUC for user to choose if they prefer it or not? Of course if we
> have some method to predict which way to go before proving both ways,
> it's great. Do you have some blue picture on it?

I don't really like the idea of adding a GUC for this, unless we
convince ourselves that nothing else is sensible.  I mean, that leads
to conversations like this:

Newbie: My query is slow.
Hacker: Turn on enable_magic_pixie_dust and it'll get better.
Newbie: Oh, yeah, that is better.  Why isn't this turned on by default, anyway?
Hacker: Well, on pathological queries, it makes planning take way too
long, so we think it's not really safe to enable it by default.
Newbie: Wait... I thought you just told me to enable it.  It's not safe?
Hacker: Well, sort of.  I mean, uh... hey, look, an elephant!

I think that if you're interested in working on this, the first step
would be to get it to work at all, even if it's a bit inefficient.
Even though in theory you could get exponential planning time growth
if you have an aggregate inside of an aggregate inside of an aggregate
inside of an aggregate, very few people are going to do that.  If it
turns out to be a problem, we can dream up some suitable defense; as I
think about it a little more, I don't think that would be terribly
difficult to work out.  The hard part is probably to get this to work
in the first place.

> In addition, I wonder if the "generalized nestloop" pattern can do
> whole outer scan before proving inner? I mean, in my example if the
> outer scan qualified more than one tuple I'm afraid that inner Agg and
> its underlying SeqScan are re-scanned more than one. That will bloat
> performance if the Agg is expensive. It would be great if we can outer
> scan can do scan to the end and stores param variables in something
> like array and prove once inner Agg, so that we can ensure the Agg is
> computed once. To do that, do we need more work on executor?

That sounds to me like way too much complexity for the first
go-around.  We should either push the agg down or not; it sounds like
you're proposing some kind of half-way state.  That might not be
useful in practice, and it will certainly be a lot more work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Pull up aggregate subquery
Next
From: Robert Haas
Date:
Subject: Re: [BUGS] BUG #6034: pg_upgrade fails when it should not.