Re: limit is sometimes not pushed in view with order - Mailing list pgsql-performance
From | Rikard Pavelic |
---|---|
Subject | Re: limit is sometimes not pushed in view with order |
Date | |
Msg-id | 20130414123414.00000718@unknown Whole thread Raw |
In response to | Re: limit is sometimes not pushed in view with order (Rikard Pavelic <rikard@ngs.hr>) |
List | pgsql-performance |
On Sat, 13 Apr 2013 20:08:16 +0200 Rikard Pavelic <rikard@ngs.hr> wrote: > While one could argue that optimizer doesn't know to optimize left > join with group by its primary key, you can replace that join with > some other joins (ie left join to another table pk) and the same > behavior will be displayed (joining all tables and applying limit at > the end). > That's why I asked if fence for pushing limit is a known behavior. While I can work around that problem by pushing left join in select subquery this doesn't solve all problems since limit is not pushed down on nontrivial queries. This is probably the best example: create table big_table(i serial primary key, delay int); create function some_calculation(i int) returns int as $$ begin perform pg_sleep(i); return i*i; end $$ language plpgsql stable cost 100000; create view big_view as select t.i, some_calculation(t.delay) as calc, s.delay as d2 from big_table t left join big_table s on t.i = s.i + s.i order by t.i asc; insert into big_table select i, i%5 from generate_series(1, 100000) i; analyze big_table; explain analyze select * from big_view v where i >= 100 and i <= 105 limit 1; Limit (cost=3201.63..3201.64 rows=1 width=12) (actual time=10017.471..10017.471 rows=1 loops=1) -> Sort (cost=3201.63..3201.64 rows=5 width=12) (actual time=10017.469..10017.469rows=1 loops=1) Sort Key: t.i Sort Method: top-N heapsort Memory: 25kB -> Hash Right Join (cost=8.52..3201.57 rows=5 width=12) (actual time=0.078..10017.436 rows=6 loops=1) Hash Cond: ((s.i + s.i) = t.i) -> Seq Scan on big_table s (cost=0.00..1443.00 rows=100000 width=8) (actual time=0.005..6.294 rows=100000loops=1) -> Hash (cost=8.46..8.46 rows=5 width=8) (actual time=0.012..0.012 rows=6 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 1kB -> Index Scan using big_table_pkey on big_table t (cost=0.00..8.46 rows=5 width=8) (actual time=0.007..0.008rows=6 loops=1) Index Cond: ((i >= 100) AND (i <= 105)) Total runtime: 10017.514 ms explain analyze select * from big_view v where i >= 100 and i <= 10005 limit 1; Limit (cost=0.00..2391.22 rows=1 width=12) (actual time=0.088..0.088 rows=1 loops=1) -> Nested Loop Left Join (cost=0.00..23780547.26 rows=9945 width=12) (actual time=0.087..0.087 rows=1 loops=1) Join Filter: (t.i = (s.i + s.i)) Rows Removed by Join Filter: 49 -> Index Scan using big_table_pkey on big_table t (cost=0.00..359.26 rows=9945 width=8) (actual time=0.014..0.014rows=1 loops=1) Index Cond: ((i >= 100) AND (i <= 10005)) -> Materialize (cost=0.00..2334.00 rows=100000 width=8) (actual time=0.009..0.020 rows=50 loops=1) -> Seq Scan on big_table s (cost=0.00..1443.00 rows=100000 width=8) (actual time=0.005..0.010 rows=50 loops=1) Total runtime: 0.122 ms explain analyze select * from big_view v where i >= 100 and i <= 10005 limit 10; takes too long... To me this looks like it should be fixable if limit is applied before all targets are evaluated. If I remove the left join from the view, Postgres works as expected, so I guess it already knows how to apply limit before selects, but this is probably missing for subqueries where targets are pulled out of it. Maybe this is a problem since you would probably need closures to pull of a general solution, but there are plenty of use cases without group by that would benefit from applying limit before evaluating targets that are used only in topmost result. So, I was wondering if this is a known problem and is there any interest in tackling it? Regards, Rikard
pgsql-performance by date: