Thread: window function induces full table scan

window function induces full table scan

From
Thomas Mayer
Date:
When querying a view with a WHERE condition, postgresql normally is able
to perform an index scan which reduces time for evaluation dramatically.

However, if a window function is evaluated in the view, postgresql is
evaluating the window function before the WHERE condition is applied.
This induces a full table scan.

These are the results of EXPLAIN:
-- without window function (non-equivalent)
explain select * from without_window_function where user_id = 43;
           QUERY PLAN
-----------------------------------------------------------------------------------------------
  Index Scan using idx_checkin_node_user_id on checkin_node
(cost=0.43..26.06 rows=2 width=20)
    Index Cond: (user_id = 43)
    Filter: (((id % 1000) + 1) = 1)

-- with window function
  explain select * from last_position where user_id = 43;
           QUERY PLAN
------------------------------------------------------------------------------------------
  Subquery Scan on tmp_last_position  (cost=973803.66..1151820.09 rows=2
width=20)
    Filter: ((tmp_last_position.datepos = 1) AND
(tmp_last_position.user_id = 43))
    ->  WindowAgg  (cost=973803.66..1080613.52 rows=4747105 width=32)
          ->  Sort  (cost=973803.66..985671.42 rows=4747105 width=32)
                Sort Key: checkin_node.user_id, checkin_node.date,
checkin_node.id
                ->  Seq Scan on checkin_node  (cost=0.00..106647.05
rows=4747105 width=32)

To work around this, I avoid using a view for that (equivalent):
EXPLAIN SELECT user_id, latitude, longitude
  FROM (
         SELECT
           user_id,
           latitude,
           longitude,
           rank() OVER (PARTITION BY user_id ORDER BY date DESC, id
DESC) AS datepos
         FROM checkin_node
         WHERE user_id = 43
       ) AS tmp_last_position
  WHERE datepos = 1; -- takes 2 ms
         QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
  Subquery Scan on tmp_last_position  (cost=39.70..52.22 rows=2 width=20)
    Filter: (tmp_last_position.datepos = 1)
    ->  WindowAgg  (cost=39.70..47.40 rows=385 width=32)
          ->  Sort  (cost=39.70..40.67 rows=385 width=32)
                Sort Key: checkin_node.date, checkin_node.id
                ->  Index Scan using idx_checkin_node_user_id on
checkin_node  (cost=0.43..23.17 rows=385 width=32)
                      Index Cond: (user_id = 43)


I would expect postgresql to apply this query plan also for the view
last_position. It's 6621ms vs. 2ms, so the speedup is 3310!

Is it a bug in the optimizer?

How to reproduce:
=================
OS: ubuntu 12.04
Postgresql v9.3.2

get some sample data:
wget -qO-
http://snap.stanford.edu/data/loc-brightkite_totalCheckins.txt.gz|gunzip
-c|dos2unix|awk '{ if (length($0) > 20) print }'>test.csv

execute psql script:


\timing on
BEGIN;
DROP TABLE IF EXISTS checkin_node CASCADE;
CREATE TABLE checkin_node (
   id SERIAL NOT NULL PRIMARY KEY,
   user_id INTEGER NOT NULL,
   date TIMESTAMP NOT NULL,
   latitude DOUBLE PRECISION NOT NULL,
   longitude DOUBLE PRECISION NOT NULL,
   original_id VARCHAR NOT NULL
);
\COPY checkin_node (user_id, date, latitude, longitude, original_id)
FROM 'test.csv' WITH DELIMITER E'\t';

ALTER TABLE checkin_node DROP COLUMN original_id;
CREATE INDEX idx_checkin_node_user_id ON checkin_node(user_id);
CREATE INDEX idx_checkin_node_date ON checkin_node(date);

COMMIT;

VACUUM ANALYZE checkin_node;

-- doing window function in a view

DROP VIEW IF EXISTS last_position CASCADE;
CREATE VIEW last_position (user_id, latitude, longitude) AS (
   SELECT user_id, latitude, longitude
   FROM (
          SELECT
            user_id,
            latitude,
            longitude,
            rank() OVER (PARTITION BY user_id ORDER BY date DESC, id
DESC) AS datepos
          FROM checkin_node
        ) AS tmp_last_position
   WHERE datepos = 1
);

select * from last_position where user_id = 43; -- takes 6621ms

-- similar view but without window function (non-equivalent)

DROP VIEW IF EXISTS without_window_function CASCADE;
CREATE VIEW without_window_function (user_id, latitude, longitude) AS (
   SELECT user_id, latitude, longitude
   FROM (
          SELECT
            user_id,
            latitude,
            longitude,
            (id % 1000)+1 AS datepos --to not use a constant here
          FROM checkin_node
        ) AS tmp_last_position
   WHERE datepos = 1
);
select * from without_window_function where user_id = 43; -- takes 10ms


-- workaround: avoid using views (equivalent)

SELECT user_id, latitude, longitude
FROM (
        SELECT
          user_id,
          latitude,
          longitude,
          rank() OVER (PARTITION BY user_id ORDER BY date DESC, id DESC)
AS datepos
        FROM checkin_node
        WHERE user_id = 43
      ) AS tmp_last_position
WHERE datepos = 1; -- takes 2 ms


Re: window function induces full table scan

From
Tom Lane
Date:
Thomas Mayer <thomas.mayer@student.kit.edu> writes:
> When querying a view with a WHERE condition, postgresql normally is able
> to perform an index scan which reduces time for evaluation dramatically.

> However, if a window function is evaluated in the view, postgresql is
> evaluating the window function before the WHERE condition is applied.
> This induces a full table scan.

You haven't exactly provided full details, but it looks like you are
thinking that WHERE clauses applied above a window function should
be pushed to below it.  A moment's thought about the semantics should
convince you that such an optimization would be incorrect: the window
function would see fewer input rows than it should, and therefore would
(in general) return the wrong values for the selected rows.

It's possible that in the specific case you exhibit here, pushing down
the clause wouldn't result in changes in the window function's output for
the selected rows, but the optimizer doesn't have enough knowledge about
window functions to determine that.

            regards, tom lane


Re: window function induces full table scan

From
Jeff Janes
Date:
On Thu, Jan 2, 2014 at 1:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Mayer <thomas.mayer@student.kit.edu> writes:
> When querying a view with a WHERE condition, postgresql normally is able
> to perform an index scan which reduces time for evaluation dramatically.

> However, if a window function is evaluated in the view, postgresql is
> evaluating the window function before the WHERE condition is applied.
> This induces a full table scan.

You haven't exactly provided full details, but it looks like you are
thinking that WHERE clauses applied above a window function should
be pushed to below it.  A moment's thought about the semantics should
convince you that such an optimization would be incorrect: the window
function would see fewer input rows than it should, and therefore would
(in general) return the wrong values for the selected rows.

It's possible that in the specific case you exhibit here, pushing down
the clause wouldn't result in changes in the window function's output for
the selected rows, but the optimizer doesn't have enough knowledge about
window functions to determine that.

A restriction in the WHERE clause which corresponds to the PARTITION BY should be pushable, no?  I think it doesn't need to understand the internal semantics of the window function itself, just of the PARTITION BY, which should be doable, at least in principle.

Cheers,

Jeff

Re: window function induces full table scan

From
Tom Lane
Date:
Jeff Janes <jeff.janes@gmail.com> writes:
> On Thu, Jan 2, 2014 at 1:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It's possible that in the specific case you exhibit here, pushing down
>> the clause wouldn't result in changes in the window function's output for
>> the selected rows, but the optimizer doesn't have enough knowledge about
>> window functions to determine that.

> A restriction in the WHERE clause which corresponds to the PARTITION BY
> should be pushable, no?  I think it doesn't need to understand the internal
> semantics of the window function itself, just of the PARTITION BY, which
> should be doable, at least in principle.

If the restriction clause must give the same answer for any two rows of
the same partition, then yeah, we could in principle push it down without
knowing anything about the specific window function.  It'd be a less than
trivial test to make, I think.  In any case, it's not a "bug" that the
optimizer doesn't do this currently.

            regards, tom lane


Re: window function induces full table scan

From
Thomas Mayer
Date:
You understood me correctly, Tom.

As you mention, the result would be correct in my case:
- The window function is performing a "PARTITION BY user_id".
- user_id is used for the WHERE condition.

I agree, that in general (PARTITION BY and WHERE don't use the same set
of attributes), incorrect results could occur when performing the WHERE
condition before performing the window function.

However, in this special case, PARTITION BY and WHERE use the same set
of attributes which safely allows some optimization.

In fact, this is getting complicated when multiple window functions with
different PARTITION BY's are used in one statement.

I think, performing the WHERE condition before performing the window
function would be safe if the WHERE condition attribute is element of
the PARTITION-BY-set of attributes of _every_ window function of the
statement.

To ensure correctness, WHERE condition attributes which are _not_
element of the PARTITION-BY-set of attributes of _every_ window function
of the statement need to be performed after performing the window function.

So, the optimizer could check if it's safe or not.

Regards,
Thomas

Am 02.01.2014 22:52, schrieb Tom Lane:
> Thomas Mayer <thomas.mayer@student.kit.edu> writes:
>> When querying a view with a WHERE condition, postgresql normally is able
>> to perform an index scan which reduces time for evaluation dramatically.
>
>> However, if a window function is evaluated in the view, postgresql is
>> evaluating the window function before the WHERE condition is applied.
>> This induces a full table scan.
>
> You haven't exactly provided full details, but it looks like you are
> thinking that WHERE clauses applied above a window function should
> be pushed to below it.  A moment's thought about the semantics should
> convince you that such an optimization would be incorrect: the window
> function would see fewer input rows than it should, and therefore would
> (in general) return the wrong values for the selected rows.
>
> It's possible that in the specific case you exhibit here, pushing down
> the clause wouldn't result in changes in the window function's output for
> the selected rows, but the optimizer doesn't have enough knowledge about
> window functions to determine that.
>
>             regards, tom lane
> .
>

--
======================================
Thomas Mayer
Durlacher Allee 61
D-76131 Karlsruhe
Telefon: +49-721-2081661
Fax:     +49-721-72380001
Mobil:   +49-174-2152332
E-Mail:  thomas.mayer@student.kit.edu
=======================================



Re: window function induces full table scan

From
Thomas Mayer
Date:
Am 02.01.2014 23:43, schrieb Tom Lane:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> On Thu, Jan 2, 2014 at 1:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> It's possible that in the specific case you exhibit here, pushing down
>>> the clause wouldn't result in changes in the window function's output for
>>> the selected rows, but the optimizer doesn't have enough knowledge about
>>> window functions to determine that.
>
>> A restriction in the WHERE clause which corresponds to the PARTITION BY
>> should be pushable, no?  I think it doesn't need to understand the internal
>> semantics of the window function itself, just of the PARTITION BY, which
>> should be doable, at least in principle.
>
> If the restriction clause must give the same answer for any two rows of
> the same partition, then yeah, we could in principle push it down without
> knowing anything about the specific window function.  It'd be a less than
> trivial test to make, I think.  In any case, it's not a "bug" that the
> optimizer doesn't do this currently.

I agree, this is not a "bug" in v9.3.2 in terms of correctness.

But it's a limitation, because the query plan is by far not optimal. You
may consider this report as a feature request then.

The optimization I suggested is normally performed, when no window
function occurs in the statement.

It seems like the optimizer is already capable of doing a check if the
WHERE can be done first.

However, this check seems to be done too conservative: I guess, the
check is ignoring the PARTITION-BY-sets of attributes completely.

>
>             regards, tom lane
> .
>

Best regards
Thomas


Re: window function induces full table scan

From
Tom Lane
Date:
I wrote:
> If the restriction clause must give the same answer for any two rows of
> the same partition, then yeah, we could in principle push it down without
> knowing anything about the specific window function.  It'd be a less than
> trivial test to make, I think.

On reflection, really this concern is isomorphic to whether or not it is
safe to push quals down into a SELECT DISTINCT.  In principle, we should
only do that for quals that cannot distinguish values that are seen as
equal by the equality operator used for DISTINCT.  For instance, the
float8 equality operator treats IEEE minus zero and plus zero as "equal",
but it's not hard to contrive a WHERE clause that can tell the difference.
Pushing such a clause down into a SELECT DISTINCT can change the results;
but we do it anyway, and have done so since the nineties, and I don't
recall hearing complaints about this.

If we wanted to be really principled about it, I think we'd have to
restrict pushdown to quals that tested subquery outputs with operators
that are members of the relevant equality operator's btree opclass.
Which would cause a lot of howls of anguish, while making things better
for a set of users that seems to be about empty.

So maybe it'd be all right to push down quals that only reference subquery
outputs that are listed in the PARTITION clauses of all window functions
in the subquery.  I think that'd be a reasonably straightforward extension
of the existing tests in allpaths.c.

            regards, tom lane


Re: window function induces full table scan

From
Thomas Mayer
Date:
Just to track it down: The limitation can also be reproduced without
using views. Using views is just a use case where the suggested
optimization is actually needed.

Plus, when I remove the condition "WHERE datepos = 1", the same
behaviour still occurs. Here, I wanted to see if postgresql is
preferring the condition "WHERE datepos = 1" (datepos is the result of
the window function) over the condition "user_id = 43" for optimization.
But this is not the case.

-- workaround example: "WHERE user_id = 43" condition in subselect

SELECT user_id, latitude, longitude
FROM (
        SELECT
          user_id,
          latitude,
          longitude,
          rank() OVER (PARTITION BY user_id ORDER BY date DESC, id DESC)
AS datepos
        FROM checkin_node
        WHERE user_id = 43
      ) AS tmp_last_position
WHERE datepos = 1; -- takes 2 ms

-- track it down: reproduce limitation without a view:

SELECT user_id, latitude, longitude
FROM (
        SELECT
          user_id,
          latitude,
          longitude,
          rank() OVER (PARTITION BY user_id ORDER BY date DESC, id DESC)
            AS datepos
        FROM checkin_node
      ) AS tmp_last_position
WHERE datepos = 1
AND user_id = 43; -- takes 6621 ms

-- without datepos condition

SELECT user_id, latitude, longitude
FROM (
        SELECT
          user_id,
          latitude,
          longitude,
          rank() OVER (PARTITION BY user_id ORDER BY date DESC, id DESC)
            AS datepos
        FROM checkin_node
      ) AS tmp_last_position
WHERE user_id = 43; -- takes 6574 ms

Best regards,
Thomas


Am 03.01.2014 00:12, schrieb Thomas Mayer:
>
> Am 02.01.2014 23:43, schrieb Tom Lane:
>> Jeff Janes <jeff.janes@gmail.com> writes:
>>> On Thu, Jan 2, 2014 at 1:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> It's possible that in the specific case you exhibit here, pushing down
>>>> the clause wouldn't result in changes in the window function's output for
>>>> the selected rows, but the optimizer doesn't have enough knowledge about
>>>> window functions to determine that.
>>
>>> A restriction in the WHERE clause which corresponds to the PARTITION BY
>>> should be pushable, no?  I think it doesn't need to understand the internal
>>> semantics of the window function itself, just of the PARTITION BY, which
>>> should be doable, at least in principle.
>>
>> If the restriction clause must give the same answer for any two rows of
>> the same partition, then yeah, we could in principle push it down without
>> knowing anything about the specific window function.  It'd be a less than
>> trivial test to make, I think.  In any case, it's not a "bug" that the
>> optimizer doesn't do this currently.
>
> I agree, this is not a "bug" in v9.3.2 in terms of correctness.
>
> But it's a limitation, because the query plan is by far not optimal. You
> may consider this report as a feature request then.
>
> The optimization I suggested is normally performed, when no window
> function occurs in the statement.
>
> It seems like the optimizer is already capable of doing a check if the
> WHERE can be done first.
>
> However, this check seems to be done too conservative: I guess, the
> check is ignoring the PARTITION-BY-sets of attributes completely.
>
>>
>>             regards, tom lane
>> .
>>
>
> Best regards
> Thomas
>
>

--
======================================
Thomas Mayer
Durlacher Allee 61
D-76131 Karlsruhe
Telefon: +49-721-2081661
Fax:     +49-721-72380001
Mobil:   +49-174-2152332
E-Mail:  thomas.mayer@student.kit.edu
=======================================



Re: window function induces full table scan

From
Thomas Mayer
Date:
I have just cloned the postgresql git repository and checked out the
REL9_3_2 tagged version to have a look at the
src/backend/optimizer/path/allpaths.c file.

As Tom already mentioned, quals are currently not pushed down when
subqueries with window functions occur:

There is a function subquery_is_pushdown_safe(...) which is asserting
that quals are not pushed down when window functions occur:

"
  * 2. If the subquery contains any window functions, we can't push quals
  * into it, because that could change the results.
[...]
/* Check point 2 */
if (subquery->hasWindowFuncs)
    return false;
"

To implement the optimization, subquery_is_pushdown_safe() needs to
return true if pushing down the quals to a subquery which has window
functions is in fact safe ("quals that only reference subquery
outputs that are listed in the PARTITION clauses of all window functions
in the subquery").

Plus, there is a function qual_is_pushdown_safe(...) which contains an
assertion, which might possibly become obsolete:

"
/*
  * It would be unsafe to push down window function calls, but at least for
  * the moment we could never see any in a qual anyhow.    (The same applies
  * to aggregates, which we check for in pull_var_clause below.)
  */
Assert(!contain_window_function(qual));
"

Tom, do you think that these two changes could be sufficient? Do you
have a more general aproach in mind?

Best regards
Thomas

Am 03.01.2014 00:55, schrieb Tom Lane:
> I wrote:
>> If the restriction clause must give the same answer for any two rows of
>> the same partition, then yeah, we could in principle push it down without
>> knowing anything about the specific window function.  It'd be a less than
>> trivial test to make, I think.
>
> On reflection, really this concern is isomorphic to whether or not it is
> safe to push quals down into a SELECT DISTINCT.  In principle, we should
> only do that for quals that cannot distinguish values that are seen as
> equal by the equality operator used for DISTINCT.  For instance, the
> float8 equality operator treats IEEE minus zero and plus zero as "equal",
> but it's not hard to contrive a WHERE clause that can tell the difference.
> Pushing such a clause down into a SELECT DISTINCT can change the results;
> but we do it anyway, and have done so since the nineties, and I don't
> recall hearing complaints about this.
>
> If we wanted to be really principled about it, I think we'd have to
> restrict pushdown to quals that tested subquery outputs with operators
> that are members of the relevant equality operator's btree opclass.
> Which would cause a lot of howls of anguish, while making things better
> for a set of users that seems to be about empty.
>
> So maybe it'd be all right to push down quals that only reference subquery
> outputs that are listed in the PARTITION clauses of all window functions
> in the subquery.  I think that'd be a reasonably straightforward extension
> of the existing tests in allpaths.c.
>
>             regards, tom lane
> .
>


Re: window function induces full table scan

From
Tom Lane
Date:
Thomas Mayer <thomas.mayer@student.kit.edu> writes:
> To implement the optimization, subquery_is_pushdown_safe() needs to
> return true if pushing down the quals to a subquery which has window
> functions is in fact safe ("quals that only reference subquery
> outputs that are listed in the PARTITION clauses of all window functions
> in the subquery").

I'd just remove that check.

> Plus, there is a function qual_is_pushdown_safe(...) which contains an
> assertion, which might possibly become obsolete:

No, that should stay.  There are no window functions in the upper query's
WHERE, there will be none pushed into the lower's WHERE, and that's as it
must be.

> Tom, do you think that these two changes could be sufficient?

Certainly not.  What you'd need to do is include the
is-it-listed-in-all-PARTITION-clauses consideration in the code that marks
"unsafe" subquery output columns.  And update all the relevant comments.
And maybe add a couple of regression test cases.

Offhand I think the details of testing whether a given output column
appears in a given partition clause are identical to testing whether
it appears in the distinctClause.  So you'd just be mechanizing running
through the windowClause list to verify whether this holds for all
the WINDOW clauses.

Note that if you just look at the windowClause list, then you might
be filtering by named window definitions that appeared in the WINDOW
clause but were never actually referenced by any window function.
I don't have a problem with blowing off the optimization in such cases.
I don't think it's appropriate to expend the cycles that would be needed
to discover whether they're all referenced at this point.  (If anyone ever
complains, it'd be much cheaper to modify the parser to get rid of
unreferenced window definitions.)

            regards, tom lane


Re: window function induces full table scan

From
Thomas Mayer
Date:

Am 03.01.2014 15:54, schrieb Tom Lane:
> Thomas Mayer <thomas.mayer@student.kit.edu> writes:
>> To implement the optimization, subquery_is_pushdown_safe() needs to
>> return true if pushing down the quals to a subquery which has window
>> functions is in fact safe ("quals that only reference subquery
>> outputs that are listed in the PARTITION clauses of all window functions
>> in the subquery").
>
> I'd just remove that check.
>
>> Plus, there is a function qual_is_pushdown_safe(...) which contains an
>> assertion, which might possibly become obsolete:
>
> No, that should stay.  There are no window functions in the upper query's
> WHERE, there will be none pushed into the lower's WHERE, and that's as it
> must be.
>
>> Tom, do you think that these two changes could be sufficient?
>
> Certainly not.  What you'd need to do is include the
> is-it-listed-in-all-PARTITION-clauses consideration in the code that marks
> "unsafe" subquery output columns.

Clear. That's what I intended to write ;)

For better performance, we could first check subquery->hasWindowFuncs
and only if this evaluates to true, check if the attribute is marked as
a "unsafe subquery output column". If it's unsafe
subquery_is_pushdown_safe() needs to return false.

I was first thinking to do the decision safe/unsafe in the
subquery_is_pushdown_safe() function or in a new function that is called
by subquery_is_pushdown_safe().

... "mark" ...: Do I understand you correctly, that you prefer doing the
decision elsewhere and store the result (safe/unsafe) boolean value
besides to the subquery output fields? For the push-down, a subquery
output field must be available anyways.

More in general, one could possibly "mark" safe subquery output fields
for all the tests in subquery_is_pushdown_safe(). So this function would
not necessarily have to be in allpaths.c at all. But that kind of
refactoring would be a different issue which also could be implemented
separately.

> And update all the relevant comments.
> And maybe add a couple of regression test cases.
>

Indeed, that would be nice ;) Straightforward, there should be
- A test case with one window function (in the subquery) and one
condition, which is safe to be pushed down. Then, assert that
subquery_is_pushdown_safe() returns true
- A test case with one window function and one condition, which is
unsafe to be pushed down. Then, assert that subquery_is_pushdown_safe()
returns false
- A test case with multiple window functions and multiple conditions,
all safe to be pushed down. Then, assert that
subquery_is_pushdown_safe() returns true
- A test case with multiple window functions and multiple conditions,
all except one safe to be pushed down. Then, assert that
subquery_is_pushdown_safe() returns true for the safe ones and false for
the unsafe ones
- a test case that ensures that, after the change, the right query plan
is chosen (integration test).
- execute some example queries (integration test).

What do you think about it? What else needs to be tested?

> Offhand I think the details of testing whether a given output column
> appears in a given partition clause are identical to testing whether
> it appears in the distinctClause.  So you'd just be mechanizing running
> through the windowClause list to verify whether this holds for all
> the WINDOW clauses.

When a field is element of all PARTITION BY clauses of all window
functions, it does not mean that this field is distinct. Think of a
query like

SELECT * FROM (
   SELECT
   b,
   rank() OVER (PARTITION BY b ORDER BY c DESC) AS rankc
   FROM tbl;
) as tmp
WHERE tmp.b=1

In that case, the field b is not distinct (as there is no GROUP BY b).
Anyways, tmp.b=1 would be safe to to be pushed down.

Do you mean that a GROUP BY b is done implicitly (and internally) at a
deeper level just for the window function and _therefore_ is distinct at
that point?

Does the safe/unsafe marking survive recursiveness (over subquery
levels) then?

>
> Note that if you just look at the windowClause list, then you might
> be filtering by named window definitions that appeared in the WINDOW
> clause but were never actually referenced by any window function.
> I don't have a problem with blowing off the optimization in such cases.
> I don't think it's appropriate to expend the cycles that would be needed
> to discover whether they're all referenced at this point.  (If anyone ever
> complains, it'd be much cheaper to modify the parser to get rid of
> unreferenced window definitions.)
>
>             regards, tom lane
> .
>

Best regards
Thomas


Re: window function induces full table scan

From
Tom Lane
Date:
Thomas Mayer <thomas.mayer@student.kit.edu> writes:
> ... "mark" ...: Do I understand you correctly, that you prefer doing the
> decision elsewhere and store the result (safe/unsafe) boolean value
> besides to the subquery output fields? For the push-down, a subquery
> output field must be available anyways.

See check_output_columns().  The infrastructure for deciding whether
a potentially-pushable qual refers to any unsafe subquery outputs already
exists; we just need to extend it to consider outputs unsafe if they
don't appear in all PARTITION BY lists.

>> Offhand I think the details of testing whether a given output column
>> appears in a given partition clause are identical to testing whether
>> it appears in the distinctClause.  So you'd just be mechanizing running
>> through the windowClause list to verify whether this holds for all
>> the WINDOW clauses.

> When a field is element of all PARTITION BY clauses of all window
> functions, it does not mean that this field is distinct.

No, I didn't say that.  What I meant was that (a) the is_pushdown_safe
logic can treat non-partitioning subquery outputs much like non-DISTINCT
outputs, and (b) the parsetree representation of PARTITION BY is enough
like DISTINCT ON that the same kind of test (viz, a targetIsInSortList
call) will serve.

I think you need to read the code around subquery_is_pushdown_safe and
qual_is_pushdown_safe some more.

            regards, tom lane


Re: window function induces full table scan

From
Thomas Mayer
Date:
Am 03.01.2014 19:04, schrieb Tom Lane:

> I think you need to read the code around subquery_is_pushdown_safe and
> qual_is_pushdown_safe some more.
>
>             regards, tom lane
> .
>

In general, I'd need to go throught the pg source code which will take
some time. For instance, I wanted to see which are the requirements for
a patch to be accepted.

Currently, I can't provide you with a patch anyways (lack of time and
knowledge). However, I can possibly give it a try in a few weeks.

If someone is working on that in the meantime, please let me know.