Re: Pull up aggregate sublink (was: Parameterized aggregate subquery (was: Pull up aggregate subquery)) - Mailing list pgsql-hackers

From Yeb Havinga
Subject Re: Pull up aggregate sublink (was: Parameterized aggregate subquery (was: Pull up aggregate subquery))
Date
Msg-id 4E302357.90704@gmail.com
Whole thread Raw
In response to Re: Pull up aggregate sublink (was: Parameterized aggregate subquery (was: Pull up aggregate subquery))  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Pull up aggregate sublink (was: Parameterized aggregate subquery (was: Pull up aggregate subquery))
List pgsql-hackers
On 2011-07-27 16:16, Robert Haas wrote:
> On Tue, Jul 26, 2011 at 5:37 PM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>> Yeb Havinga<yebhavinga@gmail.com>  writes:
>>> A few days ago I read Tomas Vondra's blog post about dss tpc-h queries
>>> on PostgreSQL at
>>> http://fuzzy.cz/en/articles/dss-tpc-h-benchmark-with-postgresql/ - in
>>> which he showed how to manually pull up a dss subquery to get a large
>>> speed up. Initially I thought: cool, this is probably now handled by
>>> Hitoshi's patch, but it turns out the subquery type in the dss query is
>>> different.
>> Actually, I believe this example is the exact opposite of the
>> transformation Hitoshi proposes.  Tomas was manually replacing an
>> aggregated subquery by a reference to a grouped table, which can be
>> a win if the subquery would be executed enough times to amortize
>> calculation of the grouped table over all the groups (some of which
>> might never be demanded by the outer query).  Hitoshi was talking about
>> avoiding calculations of grouped-table elements that we don't need,
>> which would be a win in different cases.  Or at least that was the
>> thrust of his original proposal; I'm not sure where the patch went since
>> then.
>>
>> This leads me to think that we need to represent both cases as the same
>> sort of query and make a cost-based decision as to which way to go.
>> Thinking of it as a pull-up or push-down transformation is the wrong
>> approach because those sorts of transformations are done too early to
>> be able to use cost comparisons.
> I think you're right.  OTOH, our estimates of what will pop out of an
> aggregate are so poor that denying the user to control the plan on the
> basis of how they write the query might be a net negative.  :-(
>

Tom and Robert, thank you both for your replies. I think I'm having some 
blind spots and maybe false assumptions regarding the overal work in the 
optimizer, as it is not clear to me what 'the same sort of query' would 
look like. I was under the impression that using cost to select the best 
paths is only done per simple query, and fail to see how a total 
combined plan with pulled up subquery could be compared on cost with a 
total plan where the subquery is still a separate subplan, since the 
range tables / simple-queries to compare are different.

regards,
Yeb



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: XMLATTRIBUTES vs. values of type XML
Next
From: Alexander Korotkov
Date:
Subject: Re: WIP: Fast GiST index build