Re: ORDER BY, LIMIT and indexes - Mailing list pgsql-performance

From Mark Kirkwood
Subject Re: ORDER BY, LIMIT and indexes
Date
Msg-id 52017F1F.7020700@catalyst.net.nz
Whole thread Raw
In response to Re: ORDER BY, LIMIT and indexes  (Ivan Voras <ivoras@freebsd.org>)
Responses Re: ORDER BY, LIMIT and indexes  (Claudio Freire <klaussfreire@gmail.com>)
List pgsql-performance
On 06/08/13 22:46, Ivan Voras wrote:
> Here are two more unexpected results. Same test table (1 mil. records,
> "id" is SERIAL PRIMARY KEY, PostgreSQL 9.1, VACUUM ANALYZE performed
> before the experiments):
>
> ivoras=# explain analyze select * from lt where id > 900000 limit 10;
>                                                     QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------
>   Limit  (cost=0.00..1.71 rows=10 width=9) (actual
> time=142.669..142.680 rows=10 loops=1)
>     ->  Seq Scan on lt  (cost=0.00..17402.00 rows=101630 width=9)
> (actual time=142.665..142.672 rows=10 loops=1)
>           Filter: (id > 900000)
>   Total runtime: 142.735 ms
> (4 rows)
>
> Note the Seq Scan.
>
> ivoras=# explain analyze select * from lt where id > 900000;
>                                                            QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------
>   Bitmap Heap Scan on lt  (cost=1683.97..7856.35 rows=101630 width=9)
> (actual time=38.462..85.780 rows=100000 loops=1)
>     Recheck Cond: (id > 900000)
>     ->  Bitmap Index Scan on lt_pkey  (cost=0.00..1658.56 rows=101630
> width=0) (actual time=38.310..38.310 rows=100000 loops=1)
>           Index Cond: (id > 900000)
>   Total runtime: 115.674 ms
> (5 rows)
>
> This somewhat explains the above case - we are simply fetching 100,000
> records here, and it's slow enough even with the index scan, so
> planner skips the index in the former case. BUT, if it did use the
> index, it would have been expectedly fast:
>
> ivoras=# set enable_seqscan to off;
> SET
> ivoras=# explain analyze select * from lt where id > 900000 limit 10;
>                                                           QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------
>   Limit  (cost=0.00..1.74 rows=10 width=9) (actual time=0.081..0.112
> rows=10 loops=1)
>     ->  Index Scan using lt_pkey on lt  (cost=0.00..17644.17
> rows=101630 width=9) (actual time=0.078..0.100 rows=10 loops=1)
>           Index Cond: (id > 900000)
>   Total runtime: 0.175 ms
> (4 rows)
>
> It looks like the problem is in the difference between what the
> planner expects and what the Filter or Index operations deliver:
> (cost=0.00..17402.00 rows=101630 width=9) (actual
> time=142.665..142.672 rows=10 loops=1).
>
>

Hmm - I wonder if the lack or ORDER BY is part of the problem here.
Consider a similar query on pgbench_accounts:

bench=# explain analyze select aid from pgbench_accounts where aid >
100000 limit 20;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..0.91 rows=20 width=4) (actual time=0.005..0.464
rows=20 loops=1)
    ->  Seq Scan on pgbench_accounts (cost=0.00..499187.31 rows=10994846
width=4) (actual time=0.005..0.463 rows=20 loops=1)
          Filter: (aid > 100000)
  Total runtime: 0.474 ms
(4 rows)

bench=# explain analyze select aid from pgbench_accounts where aid >
10000000 limit 20;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..2.25 rows=20 width=4) (actual time=0.014..0.018
rows=20 loops=1)
    ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts
(cost=0.00..207204.06 rows=1844004 width=4) (actual time=0.014..0.017
rows=20 loops=1)
          Index Cond: (aid > 10000000)
  Total runtime: 0.030 ms
(4 rows)


So at some point you get index scans. Now add an ORDER BY:

bench=# explain analyze select aid from pgbench_accounts where aid >
100000 order by aid limit 20;
QUERY PLAN


----------------------------------------------------------------------------------------------------------------------------------------------------------
--
  Limit  (cost=0.00..2.25 rows=20 width=4) (actual time=0.008..0.012
rows=20 loops=1)
    ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts
(cost=0.00..1235355.34 rows=10994846 width=4) (actual time=0.008..0.011
rows=20 loops=1
)
          Index Cond: (aid > 100000)
  Total runtime: 0.023 ms
(4 rows)

bench=# explain analyze select aid from pgbench_accounts where aid >
10000000 order by aid limit 20;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..2.25 rows=20 width=4) (actual time=0.014..0.018
rows=20 loops=1)
    ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts
(cost=0.00..207204.06 rows=1844004 width=4) (actual time=0.014..0.016
rows=20 loops=1)
          Index Cond: (aid > 10000000)
  Total runtime: 0.029 ms
(4 rows)


...and we have index scans for both cases.

Cheers

Mark


pgsql-performance by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: subselect requires offset 0 for good performance.
Next
From: Claudio Freire
Date:
Subject: Re: ORDER BY, LIMIT and indexes