Thread: Window function optimisation, allow pushdowns of items matching PARTITION BY clauses

On this thread http://www.postgresql.org/message-id/52C6F712.6040804@student.kit.edu there was some discussion around allowing push downs of quals that happen to be in every window clause of the sub query. I've quickly put together a patch which does this (see attached)

I'm posting this just mainly to let Thomas know that I'm working on it, per his request on the other thread.

The patch seems to work with all my test cases, and I've not quite gotten around to thinking of any more good cases to throw at it.

Oh and I know that my function var_exists_in_all_query_partition_by_clauses has no business in allpaths.c, I'll move it out as soon as I find a better home for it.

Here's my test case:

drop table if exists winagg;

create table winagg (
  id serial not null primary key,
  partid int not null
);

insert into winagg (partid) select x.x % 100000 from generate_series(1,2000000) x(x);


create index winagg_partid_idx on winagg(partid);


-- Should push: this should push WHERE partid=1 to the inner query as partid is in the only parition by clause in the query.
explain analyze select partid,n from (select partid,count(*) over (partition by partid) n from winagg) winagg where partid=1;
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 WindowAgg  (cost=4.58..82.23 rows=20 width=4) (actual time=0.196..0.207 rows=20 loops=1)
   ->  Bitmap Heap Scan on winagg  (cost=4.58..81.98 rows=20 width=4) (actual time=0.102..0.170 rows=20 loops=1)
         Recheck Cond: (partid = 1)
         Heap Blocks: exact=20
         ->  Bitmap Index Scan on winagg_partid_idx  (cost=0.00..4.58 rows=20 width=0) (actual time=0.084..0.084 rows=20 loops=1)
               Index Cond: (partid = 1)
 Planning time: 0.208 ms
 Total runtime: 0.276 ms
(8 rows)

-- Should not push: Added a +0 to partition by clause.
explain analyze select partid,n from (select partid,count(*) over (partition by partid + 0) n from winagg) winagg where partid=1;
                                                                 QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on winagg  (cost=265511.19..330511.19 rows=20 width=12) (actual time=2146.642..4257.267 rows=20 loops=1)
   Filter: (winagg.partid = 1)
   Rows Removed by Filter: 1999980
   ->  WindowAgg  (cost=265511.19..305511.19 rows=2000000 width=4) (actual time=2146.614..4099.169 rows=2000000 loops=1)
         ->  Sort  (cost=265511.19..270511.19 rows=2000000 width=4) (actual time=2146.587..2994.993 rows=2000000 loops=1)
               Sort Key: ((winagg_1.partid + 0))
               Sort Method: external merge  Disk: 35136kB
               ->  Seq Scan on winagg winagg_1  (cost=0.00..28850.00 rows=2000000 width=4) (actual time=0.025..418.306 rows=2000000 loops=1)
 Planning time: 0.249 ms
 Total runtime: 4263.933 ms
(10 rows)


-- Should not push: Add a window clause (which is not used) that has a partition by clause that does not have partid
explain analyze select partid,n from (select partid,count(*) over (partition by partid) n from winagg window stopPushDown as (partition by id)) winagg where partid=1;

-- Should not push: 1 window clause does not have partid
explain analyze select partid,n from (select partid,count(*) over (partition by partid) n from winagg window stopPushDown as (order id)) winagg where partid=1;

-- Should not push: 1 window clause does not have partid
explain analyze select partid,n from (select partid,count(*) over (partition by partid) n from winagg window stopPushDown as ()) winagg where partid=1;

As of now the patch is a couple of hours old, I've not even bothered to run the regression tests yet, let alone add any new ones.

Comments are welcome...

Regards

David Rowley

Attachment
Hello David,

thanks for your work. The results look promising.

What I'm missing is a test case with multiple fields in the partition by 
clauses:

-- should push down, because partid is part of all PARTITION BY clauses
explain analyze select partid,n,m from (  select partid,  count(*) over (partition by partid) n,  count(*) over
(partitionby partid, partid+0) m  from winagg
 
) winagg
where partid=1;

current production 9.3.4 is returning                          QUERY PLAN 


----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SubqueryScan on winagg  (cost=350955.11..420955.11 rows=20 width=20) 
 
(actual time=2564.360..3802.413 rows=20 loops=1)   Filter: (winagg.partid = 1)   Rows Removed by Filter: 1999980   ->
WindowAgg (cost=350955.11..395955.11 rows=2000000 width=4) 
 
(actual time=2564.332..3657.051 rows=2000000 loops=1)         ->  Sort  (cost=350955.11..355955.11 rows=2000000
width=4)
 
(actual time=2564.320..2802.444 rows=2000000 loops=1)               Sort Key: winagg_1.partid, ((winagg_1.partid + 0))
            Sort Method: external sort  Disk: 50840kB               ->  WindowAgg  (cost=0.43..86948.43 rows=2000000 
 
width=4) (actual time=0.084..1335.081 rows=2000000 loops=1)                     ->  Index Only Scan using
winagg_partid_idxon 
 
winagg winagg_1  (cost=0.43..51948.43 rows=2000000 width=4) (actual 
time=0.051..378.232 rows=2000000 loops=1)                           Heap Fetches: 0

"Index Only Scan" currently returns all rows (without pushdown) on 
current production 9.3.4. What happens with the patch you provided?

-- Already Part of your tests:
-- should NOT push down, because partid is NOT part of all PARTITION BY 
clauses
explain analyze select partid,n,m from (  select partid,  count(*) over (partition by partid) n,  count(*) over
(partitionby partid+0) m  from winagg
 
) winagg
where partid=1;

Reordering the fields should also be tested:
-- should push down, because partid is part of all PARTITION BY clauses
-- here: partid at the end
explain analyze select partid,n,m from (  select partid,  count(*) over (partition by partid) n,  count(*) over
(partitionby partid+0, partid) m  from winagg
 
) winagg
where partid=1;

-- should push down, because partid is part of all PARTITION BY clauses
-- here: partid in the middle
explain analyze select partid,n,m from (  select partid,  count(*) over (partition by partid) n,  count(*) over
(partitionby partid+0, partid, partid+1) m  from winagg
 
) winagg
where partid=1;


Best regards
Thomas


Am 13.04.2014 13:32, schrieb David Rowley:
> On this thread
> http://www.postgresql.org/message-id/52C6F712.6040804@student.kit.edu
> there was some discussion around allowing push downs of quals that
> happen to be in every window clause of the sub query. I've quickly put
> together a patch which does this (see attached)
>
> I'm posting this just mainly to let Thomas know that I'm working on it,
> per his request on the other thread.
>
> The patch seems to work with all my test cases, and I've not quite
> gotten around to thinking of any more good cases to throw at it.
>
> Oh and I know that my
> function var_exists_in_all_query_partition_by_clauses has no business in
> allpaths.c, I'll move it out as soon as I find a better home for it.
>
> Here's my test case:
>
> drop table if exists winagg;
>
> create table winagg (
>    id serial not null primary key,
>    partid int not null
> );
>
> insert into winagg (partid) select x.x % 100000 from
> generate_series(1,2000000) x(x);
>
>
> create index winagg_partid_idx on winagg(partid);
>
>
> -- Should push: this should push WHERE partid=1 to the inner query as
> partid is in the only parition by clause in the query.
> explain analyze select partid,n from (select partid,count(*) over
> (partition by partid) n from winagg) winagg where partid=1;
>                                                              QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------
>   WindowAgg  (cost=4.58..82.23 rows=20 width=4) (actual
> time=0.196..0.207 rows=20 loops=1)
>     ->  Bitmap Heap Scan on winagg  (cost=4.58..81.98 rows=20 width=4)
> (actual time=0.102..0.170 rows=20 loops=1)
>           Recheck Cond: (partid = 1)
>           Heap Blocks: exact=20
>           ->  Bitmap Index Scan on winagg_partid_idx  (cost=0.00..4.58
> rows=20 width=0) (actual time=0.084..0.084 rows=20 loops=1)
>                 Index Cond: (partid = 1)
>   Planning time: 0.208 ms
>   Total runtime: 0.276 ms
> (8 rows)
>
> -- Should not push: Added a +0 to partition by clause.
> explain analyze select partid,n from (select partid,count(*) over
> (partition by partid + 0) n from winagg) winagg where partid=1;
>                                                                   QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------------------------
>   Subquery Scan on winagg  (cost=265511.19..330511.19 rows=20 width=12)
> (actual time=2146.642..4257.267 rows=20 loops=1)
>     Filter: (winagg.partid = 1)
>     Rows Removed by Filter: 1999980
>     ->  WindowAgg  (cost=265511.19..305511.19 rows=2000000 width=4)
> (actual time=2146.614..4099.169 rows=2000000 loops=1)
>           ->  Sort  (cost=265511.19..270511.19 rows=2000000 width=4)
> (actual time=2146.587..2994.993 rows=2000000 loops=1)
>                 Sort Key: ((winagg_1.partid + 0))
>                 Sort Method: external merge  Disk: 35136kB
>                 ->  Seq Scan on winagg winagg_1  (cost=0.00..28850.00
> rows=2000000 width=4) (actual time=0.025..418.306 rows=2000000 loops=1)
>   Planning time: 0.249 ms
>   Total runtime: 4263.933 ms
> (10 rows)
>
>
> -- Should not push: Add a window clause (which is not used) that has a
> partition by clause that does not have partid
> explain analyze select partid,n from (select partid,count(*) over
> (partition by partid) n from winagg window stopPushDown as (partition by
> id)) winagg where partid=1;
>
> -- Should not push: 1 window clause does not have partid
> explain analyze select partid,n from (select partid,count(*) over
> (partition by partid) n from winagg window stopPushDown as (order id))
> winagg where partid=1;
>
> -- Should not push: 1 window clause does not have partid
> explain analyze select partid,n from (select partid,count(*) over
> (partition by partid) n from winagg window stopPushDown as ()) winagg
> where partid=1;
>
> As of now the patch is a couple of hours old, I've not even bothered to
> run the regression tests yet, let alone add any new ones.
>
> Comments are welcome...
>
> Regards
>
> David Rowley
>

-- 
======================================
Thomas Mayer
Durlacher Allee 61
D-76131 Karlsruhe
Telefon: +49-721-2081661
Fax:     +49-721-72380001
Mobil:   +49-174-2152332
E-Mail:  thomas.mayer@student.kit.edu
=======================================




David Rowley <dgrowley@gmail.com> writes:
> On this thread
> http://www.postgresql.org/message-id/52C6F712.6040804@student.kit.edu there
> was some discussion around allowing push downs of quals that happen to be
> in every window clause of the sub query. I've quickly put together a patch
> which does this (see attached)

I think you should have check_output_expressions deal with this, instead.
Compare the existing test there for non-DISTINCT output columns.

> Oh and I know that my function var_exists_in_all_query_partition_by_clauses
> has no business in allpaths.c, I'll move it out as soon as I find a better
> home for it.

I might be wrong, but I think you could just embed that search loop in
check_output_expressions, and it wouldn't be too ugly.
        regards, tom lane



On 14 April 2014 02:50, Thomas Mayer <thomas.mayer@student.kit.edu> wrote:
Hello David,

thanks for your work. The results look promising.

Thanks
 

What I'm missing is a test case with multiple fields in the partition by clauses:


I've modified the patch and added some regression tests that I think cover all of your cases, but please let me know if I've missed any. The patch will follow very soon.
 
-- should push down, because partid is part of all PARTITION BY clauses
explain analyze select partid,n,m from (
  select partid,
  count(*) over (partition by partid) n,
  count(*) over (partition by partid, partid+0) m
  from winagg
) winagg
where partid=1;

current production 9.3.4 is returning


                          QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on winagg  (cost=350955.11..420955.11 rows=20 width=20) (actual time=2564.360..3802.413 rows=20 loops=1)

   Filter: (winagg.partid = 1)
   Rows Removed by Filter: 1999980
   ->  WindowAgg  (cost=350955.11..395955.11 rows=2000000 width=4) (actual time=2564.332..3657.051 rows=2000000 loops=1)
         ->  Sort  (cost=350955.11..355955.11 rows=2000000 width=4) (actual time=2564.320..2802.444 rows=2000000 loops=1)
               Sort Key: winagg_1.partid, ((winagg_1.partid + 0))
               Sort Method: external sort  Disk: 50840kB
               ->  WindowAgg  (cost=0.43..86948.43 rows=2000000 width=4) (actual time=0.084..1335.081 rows=2000000 loops=1)
                     ->  Index Only Scan using winagg_partid_idx on winagg winagg_1  (cost=0.43..51948.43 rows=2000000 width=4) (actual time=0.051..378.232 rows=2000000 loops=1)
                           Heap Fetches: 0

"Index Only Scan" currently returns all rows (without pushdown) on current production 9.3.4. What happens with the patch you provided?


I get a push down as expected.

                                                                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on winagg  (cost=82.71..83.31 rows=20 width=20) (actual time=0.168..0.179 rows=20 loops=1)
   ->  WindowAgg  (cost=82.71..83.11 rows=20 width=4) (actual time=0.166..0.174 rows=20 loops=1)
         ->  Sort  (cost=82.71..82.76 rows=20 width=4) (actual time=0.151..0.154 rows=20 loops=1)
               Sort Key: ((winagg_1.partid + 0))
               Sort Method: quicksort  Memory: 17kB
               ->  WindowAgg  (cost=4.58..82.28 rows=20 width=4) (actual time=0.127..0.135 rows=20 loops=1)
                     ->  Bitmap Heap Scan on winagg winagg_1  (cost=4.58..81.98 rows=20 width=4) (actual time=0.058..0.104 rows=20 loops=1)
                           Recheck Cond: (partid = 1)
                           Heap Blocks: exact=20
                           ->  Bitmap Index Scan on winagg_partid_idx  (cost=0.00..4.58 rows=20 width=0) (actual time=0.037..0.037 rows=20 loops=1)
                                 Index Cond: (partid = 1)
 Planning time: 0.235 ms
 Total runtime: 0.280 ms

 
-- Already Part of your tests:
-- should NOT push down, because partid is NOT part of all PARTITION BY clauses
explain analyze select partid,n,m from (
  select partid,
  count(*) over (partition by partid) n,
  count(*) over (partition by partid+0) m
  from winagg
) winagg
where partid=1;

Reordering the fields should also be tested:
-- should push down, because partid is part of all PARTITION BY clauses
-- here: partid at the end
explain analyze select partid,n,m from (
  select partid,
  count(*) over (partition by partid) n,
  count(*) over (partition by partid+0, partid) m
  from winagg
) winagg
where partid=1;


Covered in regression and works as expected.
 
-- should push down, because partid is part of all PARTITION BY clauses
-- here: partid in the middle
explain analyze select partid,n,m from (
  select partid,
  count(*) over (partition by partid) n,
  count(*) over (partition by partid+0, partid, partid+1) m
  from winagg
) winagg
where partid=1;


I covered this in the regression tests too. 
 
Regards

David Rowley

On 14 April 2014 03:31, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowley@gmail.com> writes:
> On this thread
> http://www.postgresql.org/message-id/52C6F712.6040804@student.kit.edu there
> was some discussion around allowing push downs of quals that happen to be
> in every window clause of the sub query. I've quickly put together a patch
> which does this (see attached)

I think you should have check_output_expressions deal with this, instead.
Compare the existing test there for non-DISTINCT output columns.


I've moved the code there and it seems like a much better place for it. Thanks for the tip.
 
> Oh and I know that my function var_exists_in_all_query_partition_by_clauses
> has no business in allpaths.c, I'll move it out as soon as I find a better
> home for it.

I might be wrong, but I think you could just embed that search loop in
check_output_expressions, and it wouldn't be too ugly.


I've changed the helper function to take the TargetEntry now the same as targetIsInSortList does for the hasDistinctOn test and I renamed the helper function to targetExistsInAllQueryPartitionByClauses. It seems much better, but there's probably a bit of room for improving on the names some more.

I've included the updated patch with some regression tests.
The final test I added tests that an unused window which would, if used, cause the pushdown not to take place still stops the pushdownf from happening... I know you both talked about this case in the other thread, but I've done nothing for it as Tom mentioned that this should likely be fixed elsewhere, if it's even worth worrying about at all. I'm not quite sure if I needed to bother including this test for it, but I did anyway, it can be removed if it is deemed unneeded.

Per Thomas' comments, I added a couple of tests that ensure that the order of the columns in the partition by clause don't change the behaviour. Looking at the implementation of the actual code makes this test seems a bit unneeded as really but I thought that the test should not assume anything about the implementation so from that point of view the extra test seems like a good idea.

For now I can't think of much else to do for the patch, but I'd really appreciate it Thomas if you could run it through the paces if you can find any time for it, or maybe just give my regression tests a once over to see if you can think of any other cases that should be covered.

Regards

David Rowley
Attachment
Hello David,

sorry for the late response. I will try out your changes from the view 
of a user in mid-June. However, I can't do a trustworthy code review as 
I'm not an experienced postgre-hacker (yet).

Best Regards
Thomas


Am 14.04.2014 13:19, schrieb David Rowley:
> On 14 April 2014 03:31, Tom Lane <tgl@sss.pgh.pa.us
> <mailto:tgl@sss.pgh.pa.us>> wrote:
>
>     David Rowley <dgrowley@gmail.com <mailto:dgrowley@gmail.com>> writes:
>      > On this thread
>      >
>     http://www.postgresql.org/message-id/52C6F712.6040804@student.kit.edu there
>      > was some discussion around allowing push downs of quals that
>     happen to be
>      > in every window clause of the sub query. I've quickly put
>     together a patch
>      > which does this (see attached)
>
>     I think you should have check_output_expressions deal with this,
>     instead.
>     Compare the existing test there for non-DISTINCT output columns.
>
>
> I've moved the code there and it seems like a much better place for it.
> Thanks for the tip.
>
>      > Oh and I know that my function
>     var_exists_in_all_query_partition_by_clauses
>      > has no business in allpaths.c, I'll move it out as soon as I find
>     a better
>      > home for it.
>
>     I might be wrong, but I think you could just embed that search loop in
>     check_output_expressions, and it wouldn't be too ugly.
>
>
> I've changed the helper function to take the TargetEntry now the same
> as targetIsInSortList does for the hasDistinctOn test and I renamed the
> helper function to targetExistsInAllQueryPartitionByClauses. It seems
> much better, but there's probably a bit of room for improving on the
> names some more.
>
> I've included the updated patch with some regression tests.
> The final test I added tests that an unused window which would, if used,
> cause the pushdown not to take place still stops the pushdownf from
> happening... I know you both talked about this case in the other thread,
> but I've done nothing for it as Tom mentioned that this should likely be
> fixed elsewhere, if it's even worth worrying about at all. I'm not quite
> sure if I needed to bother including this test for it, but I did anyway,
> it can be removed if it is deemed unneeded.
>
> Per Thomas' comments, I added a couple of tests that ensure that the
> order of the columns in the partition by clause don't change the
> behaviour. Looking at the implementation of the actual code makes this
> test seems a bit unneeded as really but I thought that the test should
> not assume anything about the implementation so from that point of view
> the extra test seems like a good idea.
>
> For now I can't think of much else to do for the patch, but I'd really
> appreciate it Thomas if you could run it through the paces if you can
> find any time for it, or maybe just give my regression tests a once over
> to see if you can think of any other cases that should be covered.
>
> Regards
>
> David Rowley

-- 
======================================
Thomas Mayer
Durlacher Allee 61
D-76131 Karlsruhe
Telefon: +49-721-2081661
Fax:     +49-721-72380001
Mobil:   +49-174-2152332
E-Mail:  thomas.mayer@student.kit.edu
=======================================




On Sun, May 25, 2014 at 2:10 PM, Thomas Mayer <thomas.mayer@student.kit.edu> wrote:
Hello David,

sorry for the late response. I will try out your changes from the view of a user in mid-June. However, I can't do a trustworthy code review as I'm not an experienced postgre-hacker (yet).


Thanks, that sort of review will be a great start. 

I've attached an updated patch as it seems there's been a change that removes redundant sorts which was breaking one of the regression tests in v0.2 of the patch.

+EXPLAIN (COSTS OFF)
+SELECT depname,depsalary FROM (SELECT depname,
+                      sum(salary) OVER (PARTITION BY depname) depsalary,
+                      rank() OVER (ORDER BY enroll_date) emprank
+  FROM empsalary) emp
+WHERE depname = 'sales';

This used to produced a plan which included a sort by enroll_date, but this is no longer the case. I've updated the expected results to remove the extra sort from the explain output

Regards

David Rowley
Attachment
I've had a chance to look at this and here is my review.

On 04/14/2014 01:19 PM, David Rowley wrote:
> I've included the updated patch with some regression tests.

The first thing I noticed is there is no documentation, but I don't
think we document such things outside of the release notes, so that's
probably normal.

> The final test I added tests that an unused window which would, if used,
> cause the pushdown not to take place still stops the pushdownf from
> happening... I know you both talked about this case in the other thread,
> but I've done nothing for it as Tom mentioned that this should likely be
> fixed elsewhere, if it's even worth worrying about at all. I'm not quite
> sure if I needed to bother including this test for it, but I did anyway,
> it can be removed if it is deemed unneeded.

I don't think this test has any business being in this patch.  Removal
of unused "things" should be a separate patch and once that's done your
patch should work as expected.  This regression test you have here
almost demands that that other removal patch shouldn't happen.

> Per Thomas' comments, I added a couple of tests that ensure that the
> order of the columns in the partition by clause don't change the
> behaviour. Looking at the implementation of the actual code makes this
> test seems a bit unneeded as really but I thought that the test should
> not assume anything about the implementation so from that point of view
> the extra test seems like a good idea.

Every possible permutation of everything is overkill, but I think having
a test that makes sure the pushdown happens when the partition in
question isn't first in the list is a good thing.

> For now I can't think of much else to do for the patch, but I'd really
> appreciate it Thomas if you could run it through the paces if you can
> find any time for it, or maybe just give my regression tests a once over
> to see if you can think of any other cases that should be covered.

I'm not Thomas, but...

This patch is very simple.  There's nothing wrong with that, of course,
but I wonder how hard it would be to push more stuff down.  For example,
you have this case which works perfectly by not pushing down:

SELECT * FROM (
    SELECT partid,
           winagg() OVER (PARTITION BY partid+0)
    FROM t) wa
WHERE partid = 1;

But then you have this case which doesn't push down:

SELECT * FROM (
    SELECT partid,
           winagg() OVER (PARTITION BY partid+0)
    FROM t) wa
WHERE partid+0 = 1;

I don't know what's involved in fixing that, or if we do that sort of
thing elsewhere, but it's something to consider.

Multi-column pushdowns work as expected.


I have an issue with some of the code comments:

In subquery_is_pushdown_safe you reduced the number of "points" from
three to two.  The comments used to say "Check point 1" and "Check point
2" but the third was kind of tested throughout the rest of the code so
didn't have a dedicated comment.  Now that there are only two checks,
the "Check point 1" without a 2 seems a little bit odd.  The attached
patch provides a suggestion for improvement.

The same kind of weirdness is found in check_output_expressions but I
don't really have any suggestions to fix it so I just left it.

The attached patch also fixes some typos.


I'm marking this Waiting on Author pending discussion on pushing down
entire expressions, but on the whole I think this is pretty much ready.
--
Vik

Attachment
On 21 June 2014 01:38, Vik Fearing <vik.fearing@dalibo.com> wrote:
I've had a chance to look at this and here is my review.

On 04/14/2014 01:19 PM, David Rowley wrote:
> I've included the updated patch with some regression tests.

The first thing I noticed is there is no documentation, but I don't
think we document such things outside of the release notes, so that's
probably normal.


This prompted me to have a look to see if there's anything written in the documents about the reasons why we might not push quals down, but I didn't manage to find anything to that affect.
 
> The final test I added tests that an unused window which would, if used,
> cause the pushdown not to take place still stops the pushdownf from
> happening... I know you both talked about this case in the other thread,
> but I've done nothing for it as Tom mentioned that this should likely be
> fixed elsewhere, if it's even worth worrying about at all. I'm not quite
> sure if I needed to bother including this test for it, but I did anyway,
> it can be removed if it is deemed unneeded.

I don't think this test has any business being in this patch.  Removal
of unused "things" should be a separate patch and once that's done your
patch should work as expected.  This regression test you have here
almost demands that that other removal patch shouldn't happen.


Agreed, I've removed that test now.
 
> Per Thomas' comments, I added a couple of tests that ensure that the
> order of the columns in the partition by clause don't change the
> behaviour. Looking at the implementation of the actual code makes this
> test seems a bit unneeded as really but I thought that the test should
> not assume anything about the implementation so from that point of view
> the extra test seems like a good idea.

Every possible permutation of everything is overkill, but I think having
a test that makes sure the pushdown happens when the partition in
question isn't first in the list is a good thing.


In the updated patch I've removed some a few extra tests that were just testing the same clauses in the PARTITION BY but in a different order.
 
> For now I can't think of much else to do for the patch, but I'd really
> appreciate it Thomas if you could run it through the paces if you can
> find any time for it, or maybe just give my regression tests a once over
> to see if you can think of any other cases that should be covered.

I'm not Thomas, but...

This patch is very simple.  There's nothing wrong with that, of course,
but I wonder how hard it would be to push more stuff down.  For example,
you have this case which works perfectly by not pushing down:

SELECT * FROM (
    SELECT partid,
           winagg() OVER (PARTITION BY partid+0)
    FROM t) wa
WHERE partid = 1;

But then you have this case which doesn't push down:

SELECT * FROM (
    SELECT partid,
           winagg() OVER (PARTITION BY partid+0)
    FROM t) wa
WHERE partid+0 = 1;

I don't know what's involved in fixing that, or if we do that sort of
thing elsewhere, but it's something to consider.


I had a look at this and at first I thought it was just as simple as removing the if (tle->resjunk) continue; from check_output_expressions, but after removing that I see that it's a bit more complex. In qual_is_pushdown_safe it pulls (using pull_var_clause) the Vars from the outer WHERE clause and not the whole expression, it then checks, in your case if "partid" (not partid+0) is safe to push down, and sees that it's not, due to this patches code marking it as not because partid is not in the PARTITION BY clause.

I really don't think it's the business of this patch to make changes around here. Perhaps it would be worth looking into in more detail in the future. Although, you can probably get what you want by just writing the query as:

SELECT * FROM (
    SELECT partid+0 as partid,
           count(*) OVER (PARTITION BY partid+0)
    FROM t) wa
WHERE partid = 1;

Or if you really need the unmodified partid, then you could add another column to the target list and just not do select * in the outer query.


Multi-column pushdowns work as expected.


I have an issue with some of the code comments:

In subquery_is_pushdown_safe you reduced the number of "points" from
three to two.  The comments used to say "Check point 1" and "Check point
2" but the third was kind of tested throughout the rest of the code so
didn't have a dedicated comment.  Now that there are only two checks,
the "Check point 1" without a 2 seems a little bit odd.  The attached
patch provides a suggestion for improvement.

The same kind of weirdness is found in check_output_expressions but I
don't really have any suggestions to fix it so I just left it.

The attached patch also fixes some typos.


That seems like a good change, oh and well spotted on the typo.
I've applied your patch into my local copy of the code here. Thanks 

I'm marking this Waiting on Author pending discussion on pushing down
entire expressions, but on the whole I think this is pretty much ready.

As I said above, I don't think playing around with that code is really work for this patch. It can be done later in another patch if people think it would be useful.

I've attached a delta patch with just the changes from v0.3 and a complete updated patch.

Thanks for taking the time to look at this.

Regards

David Rowley
 
--
Vik

Attachment
On 06/21/2014 02:03 PM, David Rowley wrote:
>     I'm marking this Waiting on Author pending discussion on pushing down
>     entire expressions, but on the whole I think this is pretty much ready.
> 
> 
> As I said above, I don't think playing around with that code is really
> work for this patch. It can be done later in another patch if people
> think it would be useful.

I tend to agree.

This latest patch is ready for a committer to look at now.  The weird
comments have been changed, superfluous regression tests removed, and
nothing done about expression pushdown per (brief) discussion.
-- 
Vik



Vik Fearing <vik.fearing@dalibo.com> writes:
> This latest patch is ready for a committer to look at now.  The weird
> comments have been changed, superfluous regression tests removed, and
> nothing done about expression pushdown per (brief) discussion.

I started to look at this patch and realized that there's an issue that
isn't covered, which is not too surprising because the existing code fails
to cover it too.  Remember that the argument for pushing down being safe
at all is that we expect the pushed-down qual to yield the same result at
all rows of a given partition, so that we either include or exclude the
whole partition and thereby don't change window function results.  This
means that not only must the qual expression depend only on partitioning
columns, but *it had better not be volatile*.

In exactly the same way, it isn't safe to push down quals into
subqueries that use DISTINCT unless the quals are non-volatile.  This
consideration is missed by the current code, and I think that's a bug.

(Pushing down volatile quals would also be unsafe in subqueries involving
aggregation, except that we put them into HAVING so that they're executed
only once per subquery output row anyway.)

Given the lack of prior complaints, I'm not excited about back-patching a
change to prevent pushing down volatile quals in the presence of DISTINCT;
but I think we probably ought to fix it in 9.5, and maybe 9.4 too.

Thoughts?
        regards, tom lane



On 06/27/2014 02:49 AM, Tom Lane wrote:
> Vik Fearing <vik.fearing@dalibo.com> writes:
>> This latest patch is ready for a committer to look at now.  The weird
>> comments have been changed, superfluous regression tests removed, and
>> nothing done about expression pushdown per (brief) discussion.
> 
> I started to look at this patch and realized that there's an issue that
> isn't covered, which is not too surprising because the existing code fails
> to cover it too.  Remember that the argument for pushing down being safe
> at all is that we expect the pushed-down qual to yield the same result at
> all rows of a given partition, so that we either include or exclude the
> whole partition and thereby don't change window function results.  This
> means that not only must the qual expression depend only on partitioning
> columns, but *it had better not be volatile*.
> 
> In exactly the same way, it isn't safe to push down quals into
> subqueries that use DISTINCT unless the quals are non-volatile.  This
> consideration is missed by the current code, and I think that's a bug.
> 
> (Pushing down volatile quals would also be unsafe in subqueries involving
> aggregation, except that we put them into HAVING so that they're executed
> only once per subquery output row anyway.)

Are you going to take care of all this, or should David or I take a
crack at it?  The commitfest app still shows Ready for Committer.

> Given the lack of prior complaints, I'm not excited about back-patching a
> change to prevent pushing down volatile quals in the presence of DISTINCT;
> but I think we probably ought to fix it in 9.5, and maybe 9.4 too.
> 
> Thoughts?

I didn't test it myself, I'm just taking your word on it.

If it's a bug, it should obviously be fixed in 9.5.  As for 9.4, I have
always viewed a beta as a time to fix bugs so I vote to backpatch it at
least that far.

Why wouldn't it go back all the way to 9.0?  (assuming 8.4 is dead)
-- 
Vik



Vik Fearing <vik.fearing@dalibo.com> writes:
> On 06/27/2014 02:49 AM, Tom Lane wrote:
>> In exactly the same way, it isn't safe to push down quals into
>> subqueries that use DISTINCT unless the quals are non-volatile.  This
>> consideration is missed by the current code, and I think that's a bug.

>> Given the lack of prior complaints, I'm not excited about back-patching a
>> change to prevent pushing down volatile quals in the presence of DISTINCT;
>> but I think we probably ought to fix it in 9.5, and maybe 9.4 too.

> Why wouldn't it go back all the way to 9.0?  (assuming 8.4 is dead)

People get unhappy when minor releases de-optimize queries that had
been working for them.  It's not too late to change the behavior of
9.4, but I'm hesitant to do it in already-released branches, especially
in the absence of any complaints from the field.

> Are you going to take care of all this, or should David or I take a
> crack at it?  The commitfest app still shows Ready for Committer.

I can deal with it.
        regards, tom lane



* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> > Why wouldn't it go back all the way to 9.0?  (assuming 8.4 is dead)
>
> People get unhappy when minor releases de-optimize queries that had
> been working for them.  It's not too late to change the behavior of
> 9.4, but I'm hesitant to do it in already-released branches, especially
> in the absence of any complaints from the field.

Agreed.  Back-patching to released versions would likely end up making
for a nasty surprise in a minor point release for some users.
Thanks,
    Stephen

David Rowley <dgrowley@gmail.com> writes:
> [ wfunc_pushdown_partitionby_v0.4.patch ]

I've committed this with the addition of a volatility check and some
other basically-cosmetic adjustments.
        regards, tom lane



On 28 June 2014 18:12, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowley@gmail.com> writes:
> [ wfunc_pushdown_partitionby_v0.4.patch ]

I've committed this with the addition of a volatility check and some
other basically-cosmetic adjustments.


Great, thank you for making the required changes.

Vik, thank you for reviewing the patch.

Regards

David Rowley 

                        regards, tom lane