Thread: bad execution plan for subselects containing windowing-function

bad execution plan for subselects containing windowing-function

From
Andreas Kretschmer
Date:
Hi,

version: 8.4.2


I have a table called values:

test=*# \d values
    Table "public.values"
 Column |  Type   | Modifiers
--------+---------+-----------
 id     | integer |
 value  | real    |
Indexes:
    "idx_id" btree (id)

The table contains 100000 random rows and is analysed.



And i have 2 queries, both returns the same result:

test=*# explain analyse select id, avg(value) over (partition by value) from values where id = 50 order by id;
                                                           QUERY PLAN
         

---------------------------------------------------------------------------------------------------------------------------------
 WindowAgg  (cost=531.12..549.02 rows=1023 width=8) (actual time=2.032..4.165 rows=942 loops=1)
         
   ->  Sort  (cost=531.12..533.68 rows=1023 width=8) (actual time=2.021..2.270 rows=942 loops=1)
         
         Sort Key: value
         
         Sort Method:  quicksort  Memory: 53kB
         
         ->  Bitmap Heap Scan on "values"  (cost=24.19..479.98 rows=1023 width=8) (actual time=0.269..1.167 rows=942
loops=1)    
               Recheck Cond: (id = 50)
         
               ->  Bitmap Index Scan on idx_id  (cost=0.00..23.93 rows=1023 width=0) (actual time=0.202..0.202 rows=942
loops=1) 
                     Index Cond: (id = 50)
         
 Total runtime: 4.454 ms
         
(9 rows)
         

Time: 4.859 ms
test=*# explain analyse select * from (select id, avg(value) over (partition by value) from values  order by id) foo
whereid = 50; 
                                                               QUERY PLAN
                

----------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan foo  (cost=22539.64..24039.64 rows=500 width=12) (actual time=677.196..722.975 rows=942 loops=1)
                
   Filter: (foo.id = 50)
                
   ->  Sort  (cost=22539.64..22789.64 rows=100000 width=8) (actual time=631.991..690.411 rows=100000 loops=1)
                
         Sort Key: "values".id
                
         Sort Method:  external merge  Disk: 2528kB
                
         ->  WindowAgg  (cost=11116.32..12866.32 rows=100000 width=8) (actual time=207.462..479.330 rows=100000
loops=1)                
               ->  Sort  (cost=11116.32..11366.32 rows=100000 width=8) (actual time=207.442..281.546 rows=100000
loops=1)               
                     Sort Key: "values".value
                
                     Sort Method:  external merge  Disk: 1752kB
                
                     ->  Seq Scan on "values"  (cost=0.00..1443.00 rows=100000 width=8) (actual time=0.010..29.742
rows=100000loops=1)  
 Total runtime: 725.362 ms
                
(11 rows)
                


No question, this is a silly query, but the problem is the 2nd query: it
is obviously not possible for the planner to put the where-condition
into the subquery. That's bad if i want to create a view:

test=*# create view view_values as select id, avg(value) over (partition by value) from values  order by id;
CREATE VIEW
Time: 41.280 ms
test=*# commit;
COMMIT
Time: 0.514 ms
test=# explain analyse select * from view_values where id=50;

It is the same bad plan with the Seq Scan on "values".


Is this a bug or PEBKAC or something else?




Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

Re: bad execution plan for subselects containing windowing-function

From
Tom Lane
Date:
Andreas Kretschmer <akretschmer@spamfence.net> writes:
> No question, this is a silly query, but the problem is the 2nd query: it
> is obviously not possible for the planner to put the where-condition
> into the subquery.

Well, yeah: it might change the results of the window functions.
I see no bug here.  Your second query asks for a much more complicated
computation, it's not surprising it takes longer.

            regards, tom lane

Re: bad execution plan for subselects containing windowing-function

From
Andreas Kretschmer
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Andreas Kretschmer <akretschmer@spamfence.net> writes:
> > No question, this is a silly query, but the problem is the 2nd query: it
> > is obviously not possible for the planner to put the where-condition
> > into the subquery.
>
> Well, yeah: it might change the results of the window functions.
> I see no bug here.  Your second query asks for a much more complicated
> computation, it's not surprising it takes longer.

Thank you for the fast answer.

But sorry, I disagree. It is the same query with the same result. I can't see
how the queries should return different results.

What have i overlooked?


tia, Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

Re: bad execution plan for subselects containing windowing-function

From
Tom Lane
Date:
Andreas Kretschmer <akretschmer@spamfence.net> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I see no bug here.  Your second query asks for a much more complicated
>> computation, it's not surprising it takes longer.

> But sorry, I disagree. It is the same query with the same result. I can't see
> how the queries should return different results.

In the first query

select id, avg(value) over (partition by value) from values where id = 50 order by id;

the avg() calculations are being done over only rows with id = 50.  In
the second query

select * from (select id, avg(value) over (partition by value) from values  order by id) foo where id = 50;

they are being done over all rows.  In this particular example you
happen to get the same result, but that's just because "avg(foo) over
partition by foo" is a dumb example --- it will necessarily just yield
identically foo.  In more realistic computations the results would be
different.

            regards, tom lane

Re: bad execution plan for subselects containing windowing-function

From
Andreas Kretschmer
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Andreas Kretschmer <akretschmer@spamfence.net> writes:
> > Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I see no bug here.  Your second query asks for a much more complicated
> >> computation, it's not surprising it takes longer.
>
> > But sorry, I disagree. It is the same query with the same result. I can't see
> > how the queries should return different results.
>
> In the first query
>
> select id, avg(value) over (partition by value) from values where id = 50 order by id;
>
> the avg() calculations are being done over only rows with id = 50.  In
> the second query
>
> select * from (select id, avg(value) over (partition by value) from values  order by id) foo where id = 50;
>
> they are being done over all rows.  In this particular example you
> happen to get the same result, but that's just because "avg(foo) over
> partition by foo" is a dumb example --- it will necessarily just yield
> identically foo.  In more realistic computations the results would be
> different.

Okay, i believe you now ;-)

I will try to find a case with different results ...

Thx for your fast help!


Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

Re: bad execution plan for subselects containing windowing-function

From
Andreas Kretschmer
Date:
Andreas Kretschmer <akretschmer@spamfence.net> wrote:
> > they are being done over all rows.  In this particular example you
> > happen to get the same result, but that's just because "avg(foo) over
> > partition by foo" is a dumb example --- it will necessarily just yield
> > identically foo.  In more realistic computations the results would be
> > different.
>
> Okay, i believe you now ;-)
>
> I will try to find a case with different results ...

I have got it!


test=# select * from values;
 id | value
----+-------
  1 |    10
  2 |    20
  3 |    30
  4 |    40
  5 |    50
  6 |    60
  7 |    70
  8 |    80
  9 |    90
(9 rows)

Time: 0.240 ms
test=*# select id, sum(value) over (order by id) from values where id = 5;
 id | sum
----+-----
  5 |  50
(1 row)

Time: 0.352 ms
test=*# select * from (select id, sum(value) over (order by id) from values) foo where id = 5;
 id | sum
----+-----
  5 | 150
(1 row)

Time: 0.383 ms



Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°