Re: limit is sometimes not pushed in view with order - Mailing list pgsql-performance

From Rikard Pavelic
Subject Re: limit is sometimes not pushed in view with order
Date
Msg-id 20130414123414.00000718@unknown
Whole thread Raw
In response to Re: limit is sometimes not pushed in view with order  (Rikard Pavelic <rikard@ngs.hr>)
List pgsql-performance
On Sat, 13 Apr 2013 20:08:16 +0200
Rikard Pavelic <rikard@ngs.hr> wrote:

> While one could argue that optimizer doesn't know to optimize left
> join with group by its primary key, you can replace that join with
> some other joins (ie left join to another table pk) and the same
> behavior will be displayed (joining all tables and applying limit at
> the end).
> That's why I asked if fence for pushing limit is a known behavior.

While I can work around that problem by pushing left join in select
subquery this doesn't solve all problems since limit is not pushed
down on nontrivial queries.

This is probably the best example:

create table big_table(i serial primary key, delay int);
create function some_calculation(i int) returns int as $$
begin
  perform pg_sleep(i);
  return i*i;
end $$ language plpgsql stable cost 100000;

create view big_view
as select t.i, some_calculation(t.delay) as calc, s.delay as d2
from big_table t
left join big_table s on t.i = s.i + s.i
order by t.i asc;

insert into big_table select i, i%5 from generate_series(1, 100000) i;
analyze big_table;

explain analyze select * from big_view v where i >= 100 and i <= 105 limit 1;

Limit  (cost=3201.63..3201.64 rows=1 width=12) (actual
time=10017.471..10017.471 rows=1 loops=1) ->  Sort  (cost=3201.63..3201.64 rows=5 width=12) (actual
time=10017.469..10017.469rows=1 loops=1) 
        Sort Key: t.i
        Sort Method: top-N heapsort  Memory: 25kB
        ->  Hash Right Join  (cost=8.52..3201.57 rows=5 width=12) (actual time=0.078..10017.436 rows=6 loops=1)
              Hash Cond: ((s.i + s.i) = t.i)
              ->  Seq Scan on big_table s  (cost=0.00..1443.00 rows=100000 width=8) (actual time=0.005..6.294
rows=100000loops=1) 
              ->  Hash  (cost=8.46..8.46 rows=5 width=8) (actual time=0.012..0.012 rows=6 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 1kB
                    ->  Index Scan using big_table_pkey on big_table t  (cost=0.00..8.46 rows=5 width=8) (actual
time=0.007..0.008rows=6 loops=1) 
                          Index Cond: ((i >= 100) AND (i <= 105))
Total runtime: 10017.514 ms

explain analyze select * from big_view v where i >= 100 and i <= 10005 limit 1;

Limit  (cost=0.00..2391.22 rows=1 width=12) (actual time=0.088..0.088 rows=1 loops=1)
  ->  Nested Loop Left Join  (cost=0.00..23780547.26 rows=9945 width=12) (actual time=0.087..0.087 rows=1 loops=1)
        Join Filter: (t.i = (s.i + s.i))
        Rows Removed by Join Filter: 49
        ->  Index Scan using big_table_pkey on big_table t  (cost=0.00..359.26 rows=9945 width=8) (actual
time=0.014..0.014rows=1 loops=1) 
              Index Cond: ((i >= 100) AND (i <= 10005))
        ->  Materialize  (cost=0.00..2334.00 rows=100000 width=8) (actual time=0.009..0.020 rows=50 loops=1)
              ->  Seq Scan on big_table s  (cost=0.00..1443.00 rows=100000 width=8) (actual time=0.005..0.010 rows=50
loops=1)
Total runtime: 0.122 ms

explain analyze select * from big_view v where i >= 100 and i <= 10005 limit 10;

takes too long...

To me this looks like it should be fixable if limit is applied before
all targets are evaluated.
If I remove the left join from the view, Postgres works as expected,
so I guess it already knows how to apply limit before selects, but this
is probably missing for subqueries where targets are pulled out of it.

Maybe this is a problem since you would probably need closures to pull
of a general solution, but there are plenty of use cases without group
by that would benefit from applying limit before evaluating targets
that are used only in topmost result.

So, I was wondering if this is a known problem and is there any
interest in tackling it?

Regards,
Rikard

pgsql-performance by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: Segment best size
Next
From: Rikard Pavelic
Date:
Subject: Re: limit is sometimes not pushed in view with order