Re: Optimizer questions - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: Optimizer questions
Date
Msg-id 56ADBDBB.8070102@postgrespro.ru
Whole thread Raw
In response to Re: Optimizer questions  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
List pgsql-hackers
Unfortunately this two statements are not equivalent: second one can (in theory, but not for this particular data set) return more than 10 result records.
Such optimization will be correct if t2.i is declared as unique.

But the most efficient plan for this query will be generated if there is index for t1.v.
In this case no explicit sot is needed. Limit is still not pushed down, but it is not a problem because nested join is used which is not materializing its result
(produces records on demand):

# explain analyze select * from t1 left outer join t2 on t1.k=t2.k order by t1.v limit 10;
                                                          QUERY PLAN                                                         
------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.58..4.38 rows=10 width=16) (actual time=0.069..0.157 rows=10 loops=1)
   ->  Nested Loop Left Join  (cost=0.58..37926.63 rows=100001 width=16) (actual time=0.067..0.154 rows=10 loops=1)
         ->  Index Scan using idxv on t1  (cost=0.29..3050.31 rows=100001 width=8) (actual time=0.046..0.053 rows=10 loops=1)
         ->  Index Scan using t2_pkey on t2  (cost=0.29..0.34 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=10)
               Index Cond: (t1.k = k)
 Planning time: 0.537 ms
 Execution time: 0.241 ms
(7 rows)


On 01/30/2016 01:01 AM, Alexander Korotkov wrote:
On Fri, Jan 8, 2016 at 11:58 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
Attached please find improved version of the optimizer patch for LIMIT clause.
Now I try to apply this optimization only for non-trivial columns requiring evaluation.
May be it will be better to take in account cost of this columns evaluation but right now I just detect non-variable columns.

We may think about more general feature: LIMIT pushdown. In the Konstantin's patch planner push LIMIT before tlist calculation.
But there are other cases when calculating LIMIT first would be beneficial. For instance, we can do LIMIT before JOINs. That is possible only for LEFT JOIN which is not used in filter and ordering clauses. See the example below.

# create table t1 as (select i, random() v from generate_series(1,1000000) i);
SELECT 1000000

# create table t2 as (select i, random() v from generate_series(1,1000000) i);
SELECT 1000000

# explain analyze select * from t1 left join t2 on t1.i = t2.i order by t1.v limit 10;
                                                              QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=87421.64..87421.67 rows=10 width=24) (actual time=1486.276..1486.278 rows=10 loops=1)
   ->  Sort  (cost=87421.64..89921.64 rows=1000000 width=24) (actual time=1486.275..1486.275 rows=10 loops=1)
         Sort Key: t1.v
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Hash Left Join  (cost=27906.00..65812.00 rows=1000000 width=24) (actual time=226.180..1366.238 rows=1000000
               Hash Cond: (t1.i = t2.i)
               ->  Seq Scan on t1  (cost=0.00..15406.00 rows=1000000 width=12) (actual time=0.010..77.041 rows=1000000 l
               ->  Hash  (cost=15406.00..15406.00 rows=1000000 width=12) (actual time=226.066..226.066 rows=1000000 loop
                     Buckets: 131072  Batches: 1  Memory Usage: 46875kB
                     ->  Seq Scan on t2  (cost=0.00..15406.00 rows=1000000 width=12) (actual time=0.007..89.002 rows=100
 Planning time: 0.123 ms
 Execution time: 1492.118 ms
(12 rows)

# explain analyze select * from (select * from t1 order by t1.v limit 10) t1 left join t2 on t1.i = t2.i;
                                                              QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=37015.89..56171.99 rows=10 width=24) (actual time=198.478..301.278 rows=10 loops=1)
   Hash Cond: (t2.i = t1.i)
   ->  Seq Scan on t2  (cost=0.00..15406.00 rows=1000000 width=12) (actual time=0.005..74.303 rows=1000000 loops=1)
   ->  Hash  (cost=37015.77..37015.77 rows=10 width=12) (actual time=153.584..153.584 rows=10 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 1kB
         ->  Limit  (cost=37015.64..37015.67 rows=10 width=12) (actual time=153.579..153.580 rows=10 loops=1)
               ->  Sort  (cost=37015.64..39515.64 rows=1000000 width=12) (actual time=153.578..153.578 rows=10 loops=1)
                     Sort Key: t1.v
                     Sort Method: top-N heapsort  Memory: 25kB
                     ->  Seq Scan on t1  (cost=0.00..15406.00 rows=1000000 width=12) (actual time=0.012..78.828 rows=100
 Planning time: 0.132 ms
 Execution time: 301.308 ms
(12 rows)

In this example LIMIT pushdown makes query 5 times faster. It would be very nice if optimizer make this automatically.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

pgsql-hackers by date:

Previous
From: Vitaly Burovoy
Date:
Subject:
Next
From: David Rowley
Date:
Subject: Re: WIP: Covering + unique indexes.