Re: window function induces full table scan - Mailing list pgsql-performance
From | Thomas Mayer |
---|---|
Subject | Re: window function induces full table scan |
Date | |
Msg-id | 52C6F712.6040804@student.kit.edu Whole thread Raw |
In response to | Re: window function induces full table scan (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: window function induces full table scan
|
List | pgsql-performance |
Am 03.01.2014 15:54, schrieb Tom Lane: > Thomas Mayer <thomas.mayer@student.kit.edu> writes: >> To implement the optimization, subquery_is_pushdown_safe() needs to >> return true if pushing down the quals to a subquery which has window >> functions is in fact safe ("quals that only reference subquery >> outputs that are listed in the PARTITION clauses of all window functions >> in the subquery"). > > I'd just remove that check. > >> Plus, there is a function qual_is_pushdown_safe(...) which contains an >> assertion, which might possibly become obsolete: > > No, that should stay. There are no window functions in the upper query's > WHERE, there will be none pushed into the lower's WHERE, and that's as it > must be. > >> Tom, do you think that these two changes could be sufficient? > > Certainly not. What you'd need to do is include the > is-it-listed-in-all-PARTITION-clauses consideration in the code that marks > "unsafe" subquery output columns. Clear. That's what I intended to write ;) For better performance, we could first check subquery->hasWindowFuncs and only if this evaluates to true, check if the attribute is marked as a "unsafe subquery output column". If it's unsafe subquery_is_pushdown_safe() needs to return false. I was first thinking to do the decision safe/unsafe in the subquery_is_pushdown_safe() function or in a new function that is called by subquery_is_pushdown_safe(). ... "mark" ...: Do I understand you correctly, that you prefer doing the decision elsewhere and store the result (safe/unsafe) boolean value besides to the subquery output fields? For the push-down, a subquery output field must be available anyways. More in general, one could possibly "mark" safe subquery output fields for all the tests in subquery_is_pushdown_safe(). So this function would not necessarily have to be in allpaths.c at all. But that kind of refactoring would be a different issue which also could be implemented separately. > And update all the relevant comments. > And maybe add a couple of regression test cases. > Indeed, that would be nice ;) Straightforward, there should be - A test case with one window function (in the subquery) and one condition, which is safe to be pushed down. Then, assert that subquery_is_pushdown_safe() returns true - A test case with one window function and one condition, which is unsafe to be pushed down. Then, assert that subquery_is_pushdown_safe() returns false - A test case with multiple window functions and multiple conditions, all safe to be pushed down. Then, assert that subquery_is_pushdown_safe() returns true - A test case with multiple window functions and multiple conditions, all except one safe to be pushed down. Then, assert that subquery_is_pushdown_safe() returns true for the safe ones and false for the unsafe ones - a test case that ensures that, after the change, the right query plan is chosen (integration test). - execute some example queries (integration test). What do you think about it? What else needs to be tested? > Offhand I think the details of testing whether a given output column > appears in a given partition clause are identical to testing whether > it appears in the distinctClause. So you'd just be mechanizing running > through the windowClause list to verify whether this holds for all > the WINDOW clauses. When a field is element of all PARTITION BY clauses of all window functions, it does not mean that this field is distinct. Think of a query like SELECT * FROM ( SELECT b, rank() OVER (PARTITION BY b ORDER BY c DESC) AS rankc FROM tbl; ) as tmp WHERE tmp.b=1 In that case, the field b is not distinct (as there is no GROUP BY b). Anyways, tmp.b=1 would be safe to to be pushed down. Do you mean that a GROUP BY b is done implicitly (and internally) at a deeper level just for the window function and _therefore_ is distinct at that point? Does the safe/unsafe marking survive recursiveness (over subquery levels) then? > > Note that if you just look at the windowClause list, then you might > be filtering by named window definitions that appeared in the WINDOW > clause but were never actually referenced by any window function. > I don't have a problem with blowing off the optimization in such cases. > I don't think it's appropriate to expend the cycles that would be needed > to discover whether they're all referenced at this point. (If anyone ever > complains, it'd be much cheaper to modify the parser to get rid of > unreferenced window definitions.) > > regards, tom lane > . > Best regards Thomas
pgsql-performance by date: