No optimization with a partition window in a view - Mailing list pgsql-bugs

From Pierre
Subject No optimization with a partition window in a view
Date
Msg-id 6539554.0HoVISsQxY@peanuts2
Whole thread Raw
Responses Re: No optimization with a partition window in a view
List pgsql-bugs
Hello

I tried to avoid implementing window functions inside my ORM by using a=
 view,=20
but it seems the optimizer is missing an obvious optimization and thus =
doing a=20
full table scan.

Affected versions : 9.2.4 and 9.3.2 (9.4 not tested yet)

How to reproduce :

=3D> create table test_history (i serial, piece integer not null, date =
timestamp=20
with time zone default now(), location integer not null);

=3D> create table test_history (i serial, piece integer not null, date =
timestamp=20
with time zone default now(), location integer not null);
CREATE TABLE

=3D> insert into test_history (piece, location) select i, 1 from=20
generate_series(1, 1000000) i;
INSERT 0 1000000
=3D> insert into test_history (piece, location) select i, 2 from=20
generate_series(1, 1000000) i;
INSERT 0 1000000
=3D> insert into test_history (piece, location) select i, 3 from=20
generate_series(1, 1000000) i;
^[[AINSERT 0 1000000
=3D> insert into test_history (piece, location) select i, 4 from=20
generate_series(1, 1000000, 2) i;
INSERT 0 500000

=3D> alter table test_history add primary key(i);
ALTER TABLE
=3D> create index on test_history(piece);
CREATE INDEX

=3D> select * from test_history where piece =3D 42;
    i    | piece |             date              | location=20
=2D--------+-------+-------------------------------+----------
      42 |    42 | 2014-02-15 08:52:50.946586+01 |        1
 1000042 |    42 | 2014-02-15 08:52:56.634685+01 |        2
 2000042 |    42 | 2014-02-15 08:53:00.762706+01 |        3
(3 rows)

Time: 0.158 ms

=3D> select *, lag(location, 1) over w, lead(location, 1) over w from t=
est_history=20
where piece =3D 42 window w as (partition by piece order by date);
    i    | piece |             date              | location | lag | lea=
d=20
=2D--------+-------+-------------------------------+----------+-----+----=
=2D-
      42 |    42 | 2014-02-15 08:52:50.946586+01 |        1 |     |    =
2
 1000042 |    42 | 2014-02-15 08:52:56.634685+01 |        2 |   1 |    =
3
 2000042 |    42 | 2014-02-15 08:53:00.762706+01 |        3 |   2 |    =
=20
(3 rows)

Time: 0.203 ms

=3D> create view test_history_lag_lead as select *, lag(location, 1) ov=
er w,=20
lead(location, 1) over w from test_history window w as (partition by pi=
ece order=20
by date);
CREATE VIEW
=3D> select * from test_history_lag_lead where piece =3D 42;
    i    | piece |             date              | location | lag | lea=
d=20
=2D--------+-------+-------------------------------+----------+-----+----=
=2D-
      42 |    42 | 2014-02-15 08:52:50.946586+01 |        1 |     |    =
2
 1000042 |    42 | 2014-02-15 08:52:56.634685+01 |        2 |   1 |    =
3
 2000042 |    42 | 2014-02-15 08:53:00.762706+01 |        3 |   2 |    =
=20
(3 rows)

Time: 2915.756 ms

=3D> explain analyze select * from test_history_lag_lead where piece =3D=
 42;
                                                                QUERY P=
LAN                                                                =20
=2D----------------------------------------------------------------------=
=2D-------------------------------------------------------------------
 Subquery Scan on test_history_lag_lead  (cost=3D653058.16..775558.16 r=
ows=3D4=20
width=3D28) (actual time=3D1411.423..3375.794 rows=3D3 loops=3D1)
   Filter: (test_history_lag_lead.piece =3D 42)
   Rows Removed by Filter: 3499997
   ->  WindowAgg  (cost=3D653058.16..731808.16 rows=3D3500000 width=3D2=
0) (actual=20
time=3D1411.343..3206.959 rows=3D3500000 loops=3D1)
         ->  Sort  (cost=3D653058.16..661808.16 rows=3D3500000 width=3D=
20) (actual=20
time=3D1411.337..1867.895 rows=3D3500000 loops=3D1)
               Sort Key: test_history.piece, test_history.date
               Sort Method: external merge  Disk: 116328kB
               ->  Seq Scan on test_history  (cost=3D0.00..57293.00 row=
s=3D3500000=20
width=3D20) (actual time=3D0.004..310.558 rows=3D3500000 loops=3D1)
 Total runtime: 3386.455 ms
(9 rows)

=3D> explain analyze select *, lag(location, 1) over w, lead(location, =
1) over w=20
from=20test_history where piece =3D 42 window w as (partition by piece or=
der by=20
date);
                                                                    QUE=
RY PLAN                                                                =
    =20
=2D----------------------------------------------------------------------=
=2D----------------------------------------------------------------------=
=2D----
 WindowAgg  (cost=3D19.70..19.78 rows=3D4 width=3D20) (actual time=3D0.=
018..0.019 rows=3D3=20
loops=3D1)
   ->  Sort  (cost=3D19.70..19.71 rows=3D4 width=3D20) (actual time=3D0=
.015..0.015=20
rows=3D3 loops=3D1)
         Sort Key: date
         Sort Method: quicksort  Memory: 25kB
         ->  Index Scan using test_history_piece_idx on test_history =20=

(cost=3D0.43..19.66 rows=3D4 width=3D20) (actual time=3D0.009..0.010 ro=
ws=3D3 loops=3D1)
               Index Cond: (piece =3D 42)
 Total runtime: 0.037 ms
(7 rows)





As you can see, the optimizer decided to do the WindowAgg first on the =
3,5M=20
lines before filtering, instead of filtering using an Index scan and th=
en doing=20
the aggregation.
IMHO, that optimization would be a special case possible only with wind=
ows on a=20
partition that is contained in the filtering fields, but I aint no expe=
rt here=20
:)


Thanks

 Pierre Ducroquet

pgsql-bugs by date:

Previous
From: pythonesque@gmail.com
Date:
Subject: BUG #9227: Error on SELECT ROW OVERLAPS ROW with single ROW argument
Next
From: Joshua Yanovski
Date:
Subject: Re: BUG #9227: Error on SELECT ROW OVERLAPS ROW with single ROW argument