Re: Parameterized aggregate subquery (was: Pull up aggregate subquery) - Mailing list pgsql-hackers

From Hitoshi Harada
Subject Re: Parameterized aggregate subquery (was: Pull up aggregate subquery)
Date
Msg-id BANLkTinAFrutO_cd9hsTb_5C5u5kMYcxxQ@mail.gmail.com
Whole thread Raw
In response to Re: Parameterized aggregate subquery (was: Pull up aggregate subquery)  (Yeb Havinga <yebhavinga@gmail.com>)
List pgsql-hackers
2011/6/30 Yeb Havinga <yebhavinga@gmail.com>:
> On 2011-06-29 19:22, Hitoshi Harada wrote:
>>
>> Other things are all good points. Thanks for elaborate review!
>> More than anything, I'm going to fix the 6) issue, at least to find the
>> cause.
>>
> Some more questions:
> 8) why are cheapest start path and cheapest total path in
> best_inner_subqueryscan the same?

Because best_inner_indexscan has the two. Actually one of them is
enough so far. I aligned it as the existing interface but they might
be one.

> 10) I have a hard time imagining use cases that will actually result in a
> alternative plan, especially since not all subqueries are allowed to have
> quals pushed down into, and like Simon Riggs pointed out that many users
> will write queries like this with the subqueries pulled up. If it is the
> case that the subqueries that can't be pulled up have a large overlap with
> the ones that are not pushdown safe (limit, set operations etc), there might
> be little actual use cases for this patch.

I have seen many cases that this planner hack would help
significantly, which were difficult to rewrite. Why were they
difficult to write? Because, quals on size_m (and they have quals on
size_l in fact) are usually very complicated (5-10 op clauses) and the
join+agg part itself is kind of subquery in other big query. Of course
there were workaround like split the statement to two, filtering
size_m then aggregate size_l by the result of the first statement.
However, it's against instinct. The reason why planner is in RDBMS is
to let users to write simple (as needed) statements. I don't know if
the example I raise here is common or not, but I believe the example
represents "one to many" relation simply, therefore there should be
many users who just don't find themselves currently in the slow query
performance.

> I think the most important thing for this patch to go forward is to have a
> few examples, from which it's clear that the patch is beneficial.

What will be good examples to show benefit of the patch? I guess the
test case of size_m/size_l shows it. What lacks on the case, do you
think?

Regards,


-- 
Hitoshi Harada


pgsql-hackers by date:

Previous
From: Radosław Smogura
Date:
Subject: Re: Review of patch Bugfix for XPATH() if expression returns a scalar value
Next
From: Simon Riggs
Date:
Subject: Re: time-delayed standbys