Sort and limit push down - Mailing list pgsql-general

From Adrien Nayrat
Subject Sort and limit push down
Date
Msg-id 2a6b4c49-140b-48b1-ba68-b92857ca76c9@anayrat.info
Whole thread Raw
List pgsql-general
Hello,

While working on a query, I found a possible optimization by pushing 
down sort and limit :



create table t1 (id int primary key) ;
create table t2 (c1_id int references t1(id), c2 int, primary 
key(c1_id,c2) );


insert into t1 values (1),(2);
insert into t2 (c1_id,c2) select 1,i from generate_series(1,10000) i;
insert into t2 (c1_id,c2) select 2,i from generate_series(1,10000) i;

create index on t2 (c1_id,c2);

vacuum analyze t1;
vacuum analyze t2;

Here is a simple query :

EXPLAIN SELECT * FROM t1
   JOIN t2 ON t1.id = t2.c1_id
WHERE t1.id = 1
ORDER BY c2 LIMIT 50;

It is quite straightforward, Postgres is able to read the index in the 
order (id,c2) :

  Limit  (cost=0.29..2.08 rows=50 width=12)
    ->  Nested Loop  (cost=0.29..359.32 rows=10000 width=12)
          ->  Index Only Scan using t2_c1_id_c2_idx on t2 
(cost=0.29..233.29 rows=10000 width=8)
                Index Cond: (c1_id = 1)
          ->  Materialize  (cost=0.00..1.03 rows=1 width=4)
                ->  Seq Scan on t1  (cost=0.00..1.02 rows=1 width=4)
                      Filter: (id = 1)


(I am surprised by this choice. I would rather first do a seqscan on t1 
then read t2 by traversing index)


But if I want two t1 ids :

EXPLAIN SELECT * FROM t1
   JOIN t2 ON t1.id = t2.c1_id
WHERE t1.id IN (1,2)
ORDER BY c2 LIMIT 50;

Postgres has no choice to read all t2 records, join them, sort, then limit :

  Limit  (cost=1118.19..1118.31 rows=50 width=12)
    ->  Sort  (cost=1118.19..1168.19 rows=20000 width=12)
          Sort Key: t2.c2
          ->  Hash Join  (cost=1.05..453.80 rows=20000 width=12)
                Hash Cond: (t2.c1_id = t1.id)
                ->  Seq Scan on t2  (cost=0.00..289.00 rows=20000 width=8)
                ->  Hash  (cost=1.02..1.02 rows=2 width=4)
                      ->  Seq Scan on t1  (cost=0.00..1.02 rows=2 width=4)
                            Filter: (id = ANY ('{1,2}'::integer[]))


We could push the order and limit farther in the plan :

EXPLAIN SELECT * FROM (
   SELECT * FROM t1 WHERE id IN (1, 2)) t1,
   LATERAL (
     SELECT * FROM t2
     WHERE c1_id = t1.id
     ORDER BY c2
     LIMIT 50) lat
ORDER BY lat.c2
LIMIT 50;

  Limit  (cost=8.25..8.38 rows=50 width=12)
    ->  Sort  (cost=8.25..8.50 rows=100 width=12)
          Sort Key: t2.c2
          ->  Nested Loop  (cost=0.29..4.93 rows=100 width=12)
                ->  Seq Scan on t1  (cost=0.00..1.02 rows=2 width=4)
                      Filter: (id = ANY ('{1,2}'::integer[]))
                ->  Limit  (cost=0.29..1.45 rows=50 width=8)
                      ->  Index Only Scan using t2_c1_id_c2_idx on t2 
(cost=0.29..233.29 rows=10000 width=8)
                            Index Cond: (c1_id = t1.id)



Difference can be huge, 90 blocks 11ms vs 6 blocks and 0.6ms on this 
simple example.

Am I wrong ? It also looks like close to Index skip scan work.

Thanks

-- 
Adrien NAYRAT





pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Index (primary key) corrupt?
Next
From: Igor Korot
Date:
Subject: How to properly use TRIM()?