Thread: window function induces full table scan
When querying a view with a WHERE condition, postgresql normally is able to perform an index scan which reduces time for evaluation dramatically. However, if a window function is evaluated in the view, postgresql is evaluating the window function before the WHERE condition is applied. This induces a full table scan. These are the results of EXPLAIN: -- without window function (non-equivalent) explain select * from without_window_function where user_id = 43; QUERY PLAN ----------------------------------------------------------------------------------------------- Index Scan using idx_checkin_node_user_id on checkin_node (cost=0.43..26.06 rows=2 width=20) Index Cond: (user_id = 43) Filter: (((id % 1000) + 1) = 1) -- with window function explain select * from last_position where user_id = 43; QUERY PLAN ------------------------------------------------------------------------------------------ Subquery Scan on tmp_last_position (cost=973803.66..1151820.09 rows=2 width=20) Filter: ((tmp_last_position.datepos = 1) AND (tmp_last_position.user_id = 43)) -> WindowAgg (cost=973803.66..1080613.52 rows=4747105 width=32) -> Sort (cost=973803.66..985671.42 rows=4747105 width=32) Sort Key: checkin_node.user_id, checkin_node.date, checkin_node.id -> Seq Scan on checkin_node (cost=0.00..106647.05 rows=4747105 width=32) To work around this, I avoid using a view for that (equivalent): EXPLAIN SELECT user_id, latitude, longitude FROM ( SELECT user_id, latitude, longitude, rank() OVER (PARTITION BY user_id ORDER BY date DESC, id DESC) AS datepos FROM checkin_node WHERE user_id = 43 ) AS tmp_last_position WHERE datepos = 1; -- takes 2 ms QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Subquery Scan on tmp_last_position (cost=39.70..52.22 rows=2 width=20) Filter: (tmp_last_position.datepos = 1) -> WindowAgg (cost=39.70..47.40 rows=385 width=32) -> Sort (cost=39.70..40.67 rows=385 width=32) Sort Key: checkin_node.date, checkin_node.id -> Index Scan using idx_checkin_node_user_id on checkin_node (cost=0.43..23.17 rows=385 width=32) Index Cond: (user_id = 43) I would expect postgresql to apply this query plan also for the view last_position. It's 6621ms vs. 2ms, so the speedup is 3310! Is it a bug in the optimizer? How to reproduce: ================= OS: ubuntu 12.04 Postgresql v9.3.2 get some sample data: wget -qO- http://snap.stanford.edu/data/loc-brightkite_totalCheckins.txt.gz|gunzip -c|dos2unix|awk '{ if (length($0) > 20) print }'>test.csv execute psql script: \timing on BEGIN; DROP TABLE IF EXISTS checkin_node CASCADE; CREATE TABLE checkin_node ( id SERIAL NOT NULL PRIMARY KEY, user_id INTEGER NOT NULL, date TIMESTAMP NOT NULL, latitude DOUBLE PRECISION NOT NULL, longitude DOUBLE PRECISION NOT NULL, original_id VARCHAR NOT NULL ); \COPY checkin_node (user_id, date, latitude, longitude, original_id) FROM 'test.csv' WITH DELIMITER E'\t'; ALTER TABLE checkin_node DROP COLUMN original_id; CREATE INDEX idx_checkin_node_user_id ON checkin_node(user_id); CREATE INDEX idx_checkin_node_date ON checkin_node(date); COMMIT; VACUUM ANALYZE checkin_node; -- doing window function in a view DROP VIEW IF EXISTS last_position CASCADE; CREATE VIEW last_position (user_id, latitude, longitude) AS ( SELECT user_id, latitude, longitude FROM ( SELECT user_id, latitude, longitude, rank() OVER (PARTITION BY user_id ORDER BY date DESC, id DESC) AS datepos FROM checkin_node ) AS tmp_last_position WHERE datepos = 1 ); select * from last_position where user_id = 43; -- takes 6621ms -- similar view but without window function (non-equivalent) DROP VIEW IF EXISTS without_window_function CASCADE; CREATE VIEW without_window_function (user_id, latitude, longitude) AS ( SELECT user_id, latitude, longitude FROM ( SELECT user_id, latitude, longitude, (id % 1000)+1 AS datepos --to not use a constant here FROM checkin_node ) AS tmp_last_position WHERE datepos = 1 ); select * from without_window_function where user_id = 43; -- takes 10ms -- workaround: avoid using views (equivalent) SELECT user_id, latitude, longitude FROM ( SELECT user_id, latitude, longitude, rank() OVER (PARTITION BY user_id ORDER BY date DESC, id DESC) AS datepos FROM checkin_node WHERE user_id = 43 ) AS tmp_last_position WHERE datepos = 1; -- takes 2 ms
Thomas Mayer <thomas.mayer@student.kit.edu> writes: > When querying a view with a WHERE condition, postgresql normally is able > to perform an index scan which reduces time for evaluation dramatically. > However, if a window function is evaluated in the view, postgresql is > evaluating the window function before the WHERE condition is applied. > This induces a full table scan. You haven't exactly provided full details, but it looks like you are thinking that WHERE clauses applied above a window function should be pushed to below it. A moment's thought about the semantics should convince you that such an optimization would be incorrect: the window function would see fewer input rows than it should, and therefore would (in general) return the wrong values for the selected rows. It's possible that in the specific case you exhibit here, pushing down the clause wouldn't result in changes in the window function's output for the selected rows, but the optimizer doesn't have enough knowledge about window functions to determine that. regards, tom lane
On Thu, Jan 2, 2014 at 1:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Mayer <thomas.mayer@student.kit.edu> writes:You haven't exactly provided full details, but it looks like you are
> When querying a view with a WHERE condition, postgresql normally is able
> to perform an index scan which reduces time for evaluation dramatically.
> However, if a window function is evaluated in the view, postgresql is
> evaluating the window function before the WHERE condition is applied.
> This induces a full table scan.
thinking that WHERE clauses applied above a window function should
be pushed to below it. A moment's thought about the semantics should
convince you that such an optimization would be incorrect: the window
function would see fewer input rows than it should, and therefore would
(in general) return the wrong values for the selected rows.
It's possible that in the specific case you exhibit here, pushing down
the clause wouldn't result in changes in the window function's output for
the selected rows, but the optimizer doesn't have enough knowledge about
window functions to determine that.
A restriction in the WHERE clause which corresponds to the PARTITION BY should be pushable, no? I think it doesn't need to understand the internal semantics of the window function itself, just of the PARTITION BY, which should be doable, at least in principle.
Cheers,
Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > On Thu, Jan 2, 2014 at 1:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It's possible that in the specific case you exhibit here, pushing down >> the clause wouldn't result in changes in the window function's output for >> the selected rows, but the optimizer doesn't have enough knowledge about >> window functions to determine that. > A restriction in the WHERE clause which corresponds to the PARTITION BY > should be pushable, no? I think it doesn't need to understand the internal > semantics of the window function itself, just of the PARTITION BY, which > should be doable, at least in principle. If the restriction clause must give the same answer for any two rows of the same partition, then yeah, we could in principle push it down without knowing anything about the specific window function. It'd be a less than trivial test to make, I think. In any case, it's not a "bug" that the optimizer doesn't do this currently. regards, tom lane
You understood me correctly, Tom. As you mention, the result would be correct in my case: - The window function is performing a "PARTITION BY user_id". - user_id is used for the WHERE condition. I agree, that in general (PARTITION BY and WHERE don't use the same set of attributes), incorrect results could occur when performing the WHERE condition before performing the window function. However, in this special case, PARTITION BY and WHERE use the same set of attributes which safely allows some optimization. In fact, this is getting complicated when multiple window functions with different PARTITION BY's are used in one statement. I think, performing the WHERE condition before performing the window function would be safe if the WHERE condition attribute is element of the PARTITION-BY-set of attributes of _every_ window function of the statement. To ensure correctness, WHERE condition attributes which are _not_ element of the PARTITION-BY-set of attributes of _every_ window function of the statement need to be performed after performing the window function. So, the optimizer could check if it's safe or not. Regards, Thomas Am 02.01.2014 22:52, schrieb Tom Lane: > Thomas Mayer <thomas.mayer@student.kit.edu> writes: >> When querying a view with a WHERE condition, postgresql normally is able >> to perform an index scan which reduces time for evaluation dramatically. > >> However, if a window function is evaluated in the view, postgresql is >> evaluating the window function before the WHERE condition is applied. >> This induces a full table scan. > > You haven't exactly provided full details, but it looks like you are > thinking that WHERE clauses applied above a window function should > be pushed to below it. A moment's thought about the semantics should > convince you that such an optimization would be incorrect: the window > function would see fewer input rows than it should, and therefore would > (in general) return the wrong values for the selected rows. > > It's possible that in the specific case you exhibit here, pushing down > the clause wouldn't result in changes in the window function's output for > the selected rows, but the optimizer doesn't have enough knowledge about > window functions to determine that. > > regards, tom lane > . > -- ====================================== Thomas Mayer Durlacher Allee 61 D-76131 Karlsruhe Telefon: +49-721-2081661 Fax: +49-721-72380001 Mobil: +49-174-2152332 E-Mail: thomas.mayer@student.kit.edu =======================================
Am 02.01.2014 23:43, schrieb Tom Lane: > Jeff Janes <jeff.janes@gmail.com> writes: >> On Thu, Jan 2, 2014 at 1:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> It's possible that in the specific case you exhibit here, pushing down >>> the clause wouldn't result in changes in the window function's output for >>> the selected rows, but the optimizer doesn't have enough knowledge about >>> window functions to determine that. > >> A restriction in the WHERE clause which corresponds to the PARTITION BY >> should be pushable, no? I think it doesn't need to understand the internal >> semantics of the window function itself, just of the PARTITION BY, which >> should be doable, at least in principle. > > If the restriction clause must give the same answer for any two rows of > the same partition, then yeah, we could in principle push it down without > knowing anything about the specific window function. It'd be a less than > trivial test to make, I think. In any case, it's not a "bug" that the > optimizer doesn't do this currently. I agree, this is not a "bug" in v9.3.2 in terms of correctness. But it's a limitation, because the query plan is by far not optimal. You may consider this report as a feature request then. The optimization I suggested is normally performed, when no window function occurs in the statement. It seems like the optimizer is already capable of doing a check if the WHERE can be done first. However, this check seems to be done too conservative: I guess, the check is ignoring the PARTITION-BY-sets of attributes completely. > > regards, tom lane > . > Best regards Thomas
I wrote: > If the restriction clause must give the same answer for any two rows of > the same partition, then yeah, we could in principle push it down without > knowing anything about the specific window function. It'd be a less than > trivial test to make, I think. On reflection, really this concern is isomorphic to whether or not it is safe to push quals down into a SELECT DISTINCT. In principle, we should only do that for quals that cannot distinguish values that are seen as equal by the equality operator used for DISTINCT. For instance, the float8 equality operator treats IEEE minus zero and plus zero as "equal", but it's not hard to contrive a WHERE clause that can tell the difference. Pushing such a clause down into a SELECT DISTINCT can change the results; but we do it anyway, and have done so since the nineties, and I don't recall hearing complaints about this. If we wanted to be really principled about it, I think we'd have to restrict pushdown to quals that tested subquery outputs with operators that are members of the relevant equality operator's btree opclass. Which would cause a lot of howls of anguish, while making things better for a set of users that seems to be about empty. So maybe it'd be all right to push down quals that only reference subquery outputs that are listed in the PARTITION clauses of all window functions in the subquery. I think that'd be a reasonably straightforward extension of the existing tests in allpaths.c. regards, tom lane
Just to track it down: The limitation can also be reproduced without using views. Using views is just a use case where the suggested optimization is actually needed. Plus, when I remove the condition "WHERE datepos = 1", the same behaviour still occurs. Here, I wanted to see if postgresql is preferring the condition "WHERE datepos = 1" (datepos is the result of the window function) over the condition "user_id = 43" for optimization. But this is not the case. -- workaround example: "WHERE user_id = 43" condition in subselect SELECT user_id, latitude, longitude FROM ( SELECT user_id, latitude, longitude, rank() OVER (PARTITION BY user_id ORDER BY date DESC, id DESC) AS datepos FROM checkin_node WHERE user_id = 43 ) AS tmp_last_position WHERE datepos = 1; -- takes 2 ms -- track it down: reproduce limitation without a view: SELECT user_id, latitude, longitude FROM ( SELECT user_id, latitude, longitude, rank() OVER (PARTITION BY user_id ORDER BY date DESC, id DESC) AS datepos FROM checkin_node ) AS tmp_last_position WHERE datepos = 1 AND user_id = 43; -- takes 6621 ms -- without datepos condition SELECT user_id, latitude, longitude FROM ( SELECT user_id, latitude, longitude, rank() OVER (PARTITION BY user_id ORDER BY date DESC, id DESC) AS datepos FROM checkin_node ) AS tmp_last_position WHERE user_id = 43; -- takes 6574 ms Best regards, Thomas Am 03.01.2014 00:12, schrieb Thomas Mayer: > > Am 02.01.2014 23:43, schrieb Tom Lane: >> Jeff Janes <jeff.janes@gmail.com> writes: >>> On Thu, Jan 2, 2014 at 1:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> It's possible that in the specific case you exhibit here, pushing down >>>> the clause wouldn't result in changes in the window function's output for >>>> the selected rows, but the optimizer doesn't have enough knowledge about >>>> window functions to determine that. >> >>> A restriction in the WHERE clause which corresponds to the PARTITION BY >>> should be pushable, no? I think it doesn't need to understand the internal >>> semantics of the window function itself, just of the PARTITION BY, which >>> should be doable, at least in principle. >> >> If the restriction clause must give the same answer for any two rows of >> the same partition, then yeah, we could in principle push it down without >> knowing anything about the specific window function. It'd be a less than >> trivial test to make, I think. In any case, it's not a "bug" that the >> optimizer doesn't do this currently. > > I agree, this is not a "bug" in v9.3.2 in terms of correctness. > > But it's a limitation, because the query plan is by far not optimal. You > may consider this report as a feature request then. > > The optimization I suggested is normally performed, when no window > function occurs in the statement. > > It seems like the optimizer is already capable of doing a check if the > WHERE can be done first. > > However, this check seems to be done too conservative: I guess, the > check is ignoring the PARTITION-BY-sets of attributes completely. > >> >> regards, tom lane >> . >> > > Best regards > Thomas > > -- ====================================== Thomas Mayer Durlacher Allee 61 D-76131 Karlsruhe Telefon: +49-721-2081661 Fax: +49-721-72380001 Mobil: +49-174-2152332 E-Mail: thomas.mayer@student.kit.edu =======================================
I have just cloned the postgresql git repository and checked out the REL9_3_2 tagged version to have a look at the src/backend/optimizer/path/allpaths.c file. As Tom already mentioned, quals are currently not pushed down when subqueries with window functions occur: There is a function subquery_is_pushdown_safe(...) which is asserting that quals are not pushed down when window functions occur: " * 2. If the subquery contains any window functions, we can't push quals * into it, because that could change the results. [...] /* Check point 2 */ if (subquery->hasWindowFuncs) return false; " To implement the optimization, subquery_is_pushdown_safe() needs to return true if pushing down the quals to a subquery which has window functions is in fact safe ("quals that only reference subquery outputs that are listed in the PARTITION clauses of all window functions in the subquery"). Plus, there is a function qual_is_pushdown_safe(...) which contains an assertion, which might possibly become obsolete: " /* * It would be unsafe to push down window function calls, but at least for * the moment we could never see any in a qual anyhow. (The same applies * to aggregates, which we check for in pull_var_clause below.) */ Assert(!contain_window_function(qual)); " Tom, do you think that these two changes could be sufficient? Do you have a more general aproach in mind? Best regards Thomas Am 03.01.2014 00:55, schrieb Tom Lane: > I wrote: >> If the restriction clause must give the same answer for any two rows of >> the same partition, then yeah, we could in principle push it down without >> knowing anything about the specific window function. It'd be a less than >> trivial test to make, I think. > > On reflection, really this concern is isomorphic to whether or not it is > safe to push quals down into a SELECT DISTINCT. In principle, we should > only do that for quals that cannot distinguish values that are seen as > equal by the equality operator used for DISTINCT. For instance, the > float8 equality operator treats IEEE minus zero and plus zero as "equal", > but it's not hard to contrive a WHERE clause that can tell the difference. > Pushing such a clause down into a SELECT DISTINCT can change the results; > but we do it anyway, and have done so since the nineties, and I don't > recall hearing complaints about this. > > If we wanted to be really principled about it, I think we'd have to > restrict pushdown to quals that tested subquery outputs with operators > that are members of the relevant equality operator's btree opclass. > Which would cause a lot of howls of anguish, while making things better > for a set of users that seems to be about empty. > > So maybe it'd be all right to push down quals that only reference subquery > outputs that are listed in the PARTITION clauses of all window functions > in the subquery. I think that'd be a reasonably straightforward extension > of the existing tests in allpaths.c. > > regards, tom lane > . >
Thomas Mayer <thomas.mayer@student.kit.edu> writes: > To implement the optimization, subquery_is_pushdown_safe() needs to > return true if pushing down the quals to a subquery which has window > functions is in fact safe ("quals that only reference subquery > outputs that are listed in the PARTITION clauses of all window functions > in the subquery"). I'd just remove that check. > Plus, there is a function qual_is_pushdown_safe(...) which contains an > assertion, which might possibly become obsolete: No, that should stay. There are no window functions in the upper query's WHERE, there will be none pushed into the lower's WHERE, and that's as it must be. > Tom, do you think that these two changes could be sufficient? Certainly not. What you'd need to do is include the is-it-listed-in-all-PARTITION-clauses consideration in the code that marks "unsafe" subquery output columns. And update all the relevant comments. And maybe add a couple of regression test cases. Offhand I think the details of testing whether a given output column appears in a given partition clause are identical to testing whether it appears in the distinctClause. So you'd just be mechanizing running through the windowClause list to verify whether this holds for all the WINDOW clauses. Note that if you just look at the windowClause list, then you might be filtering by named window definitions that appeared in the WINDOW clause but were never actually referenced by any window function. I don't have a problem with blowing off the optimization in such cases. I don't think it's appropriate to expend the cycles that would be needed to discover whether they're all referenced at this point. (If anyone ever complains, it'd be much cheaper to modify the parser to get rid of unreferenced window definitions.) regards, tom lane
Am 03.01.2014 15:54, schrieb Tom Lane: > Thomas Mayer <thomas.mayer@student.kit.edu> writes: >> To implement the optimization, subquery_is_pushdown_safe() needs to >> return true if pushing down the quals to a subquery which has window >> functions is in fact safe ("quals that only reference subquery >> outputs that are listed in the PARTITION clauses of all window functions >> in the subquery"). > > I'd just remove that check. > >> Plus, there is a function qual_is_pushdown_safe(...) which contains an >> assertion, which might possibly become obsolete: > > No, that should stay. There are no window functions in the upper query's > WHERE, there will be none pushed into the lower's WHERE, and that's as it > must be. > >> Tom, do you think that these two changes could be sufficient? > > Certainly not. What you'd need to do is include the > is-it-listed-in-all-PARTITION-clauses consideration in the code that marks > "unsafe" subquery output columns. Clear. That's what I intended to write ;) For better performance, we could first check subquery->hasWindowFuncs and only if this evaluates to true, check if the attribute is marked as a "unsafe subquery output column". If it's unsafe subquery_is_pushdown_safe() needs to return false. I was first thinking to do the decision safe/unsafe in the subquery_is_pushdown_safe() function or in a new function that is called by subquery_is_pushdown_safe(). ... "mark" ...: Do I understand you correctly, that you prefer doing the decision elsewhere and store the result (safe/unsafe) boolean value besides to the subquery output fields? For the push-down, a subquery output field must be available anyways. More in general, one could possibly "mark" safe subquery output fields for all the tests in subquery_is_pushdown_safe(). So this function would not necessarily have to be in allpaths.c at all. But that kind of refactoring would be a different issue which also could be implemented separately. > And update all the relevant comments. > And maybe add a couple of regression test cases. > Indeed, that would be nice ;) Straightforward, there should be - A test case with one window function (in the subquery) and one condition, which is safe to be pushed down. Then, assert that subquery_is_pushdown_safe() returns true - A test case with one window function and one condition, which is unsafe to be pushed down. Then, assert that subquery_is_pushdown_safe() returns false - A test case with multiple window functions and multiple conditions, all safe to be pushed down. Then, assert that subquery_is_pushdown_safe() returns true - A test case with multiple window functions and multiple conditions, all except one safe to be pushed down. Then, assert that subquery_is_pushdown_safe() returns true for the safe ones and false for the unsafe ones - a test case that ensures that, after the change, the right query plan is chosen (integration test). - execute some example queries (integration test). What do you think about it? What else needs to be tested? > Offhand I think the details of testing whether a given output column > appears in a given partition clause are identical to testing whether > it appears in the distinctClause. So you'd just be mechanizing running > through the windowClause list to verify whether this holds for all > the WINDOW clauses. When a field is element of all PARTITION BY clauses of all window functions, it does not mean that this field is distinct. Think of a query like SELECT * FROM ( SELECT b, rank() OVER (PARTITION BY b ORDER BY c DESC) AS rankc FROM tbl; ) as tmp WHERE tmp.b=1 In that case, the field b is not distinct (as there is no GROUP BY b). Anyways, tmp.b=1 would be safe to to be pushed down. Do you mean that a GROUP BY b is done implicitly (and internally) at a deeper level just for the window function and _therefore_ is distinct at that point? Does the safe/unsafe marking survive recursiveness (over subquery levels) then? > > Note that if you just look at the windowClause list, then you might > be filtering by named window definitions that appeared in the WINDOW > clause but were never actually referenced by any window function. > I don't have a problem with blowing off the optimization in such cases. > I don't think it's appropriate to expend the cycles that would be needed > to discover whether they're all referenced at this point. (If anyone ever > complains, it'd be much cheaper to modify the parser to get rid of > unreferenced window definitions.) > > regards, tom lane > . > Best regards Thomas
Thomas Mayer <thomas.mayer@student.kit.edu> writes: > ... "mark" ...: Do I understand you correctly, that you prefer doing the > decision elsewhere and store the result (safe/unsafe) boolean value > besides to the subquery output fields? For the push-down, a subquery > output field must be available anyways. See check_output_columns(). The infrastructure for deciding whether a potentially-pushable qual refers to any unsafe subquery outputs already exists; we just need to extend it to consider outputs unsafe if they don't appear in all PARTITION BY lists. >> Offhand I think the details of testing whether a given output column >> appears in a given partition clause are identical to testing whether >> it appears in the distinctClause. So you'd just be mechanizing running >> through the windowClause list to verify whether this holds for all >> the WINDOW clauses. > When a field is element of all PARTITION BY clauses of all window > functions, it does not mean that this field is distinct. No, I didn't say that. What I meant was that (a) the is_pushdown_safe logic can treat non-partitioning subquery outputs much like non-DISTINCT outputs, and (b) the parsetree representation of PARTITION BY is enough like DISTINCT ON that the same kind of test (viz, a targetIsInSortList call) will serve. I think you need to read the code around subquery_is_pushdown_safe and qual_is_pushdown_safe some more. regards, tom lane
Am 03.01.2014 19:04, schrieb Tom Lane: > I think you need to read the code around subquery_is_pushdown_safe and > qual_is_pushdown_safe some more. > > regards, tom lane > . > In general, I'd need to go throught the pg source code which will take some time. For instance, I wanted to see which are the requirements for a patch to be accepted. Currently, I can't provide you with a patch anyways (lack of time and knowledge). However, I can possibly give it a try in a few weeks. If someone is working on that in the meantime, please let me know.