Thread: seq-scan or index-scan

seq-scan or index-scan

From
Andreas Kretschmer
Date:
Dear list,

i have a table and i'm selecting all records without a where-condition, and i don't need a ORDER BY:



production=*# explain analyse select * from boxes;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Seq Scan on boxes  (cost=0.00..990783.99 rows=6499 width=581) (actual time=6.514..4588.136 rows=3060 loops=1)
 Total runtime: 4588.729 ms
(2 rows)


It's slow, so i tried with an ORDER BY:



production=# explain analyse select * from boxes order by id;
                                                           QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
 Index Scan using boxes_pkey on boxes  (cost=0.00..162437.00 rows=6499 width=581) (actual time=0.065..55.730 rows=3060
loops=1)
 Total runtime: 56.169 ms
(2 rows)


Why not using the index (it's a primary key) for the first query?

PG 9.1.3



Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

Re: seq-scan or index-scan

From
"Tomas Vondra"
Date:
On 3 Červenec 2012, 17:58, Andreas Kretschmer wrote:
> Dear list,
>
> i have a table and i'm selecting all records without a where-condition,
> and i don't need a ORDER BY:
>
>
>
> production=*# explain analyse select * from boxes;
>                                                   QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------
>  Seq Scan on boxes  (cost=0.00..990783.99 rows=6499 width=581) (actual
> time=6.514..4588.136 rows=3060 loops=1)
>  Total runtime: 4588.729 ms
> (2 rows)
>
>
> It's slow, so i tried with an ORDER BY:
>
>
>
> production=# explain analyse select * from boxes order by id;
>                                                            QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using boxes_pkey on boxes  (cost=0.00..162437.00 rows=6499
> width=581) (actual time=0.065..55.730 rows=3060 loops=1)
>  Total runtime: 56.169 ms
> (2 rows)
>
>
> Why not using the index (it's a primary key) for the first query?

I suppose the second run was cached, i.e. most of the data was in the page
cache. And it's not that PostgreSQL can read "just" the index - it needs
to check tuple visibility which is stored in the table.

Try to do "sync && echo 3 > /proc/sys/vm/drop_caches" and rerun the second
query. How did that work?

How much data are we talking about anyway? How much RAM is in the server?

Tomas


Re: seq-scan or index-scan

From
Tom Lane
Date:
Andreas Kretschmer <akretschmer@spamfence.net> writes:
> production=*# explain analyse select * from boxes;
>                                                   QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------
>  Seq Scan on boxes  (cost=0.00..990783.99 rows=6499 width=581) (actual time=6.514..4588.136 rows=3060 loops=1)
>  Total runtime: 4588.729 ms
> (2 rows)

That cost estimate seems pretty dang large for a table with only 6500
rows.  I suspect this table is horribly bloated, and the indexscan
manages to win because it's not visiting pages that contain only dead
rows.  Try VACUUM FULL, and if that makes things saner, re-examine
your autovacuum settings.

            regards, tom lane

Re: seq-scan or index-scan

From
Andreas Kretschmer
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Andreas Kretschmer <akretschmer@spamfence.net> writes:
> > production=*# explain analyse select * from boxes;
> >                                                   QUERY PLAN
> > ---------------------------------------------------------------------------------------------------------------
> >  Seq Scan on boxes  (cost=0.00..990783.99 rows=6499 width=581) (actual time=6.514..4588.136 rows=3060 loops=1)
> >  Total runtime: 4588.729 ms
> > (2 rows)
>
> That cost estimate seems pretty dang large for a table with only 6500
> rows.  I suspect this table is horribly bloated, and the indexscan
> manages to win because it's not visiting pages that contain only dead

You're right as always ;-)

A LOT of updates and dead rows and the table is bloated.
(some rows contains more than 1MByte of TEXT and some rows updated once
per second or so)


> rows.  Try VACUUM FULL, and if that makes things saner, re-examine
> your autovacuum settings.

I can't run a VACUUM FULL because of the workload, but i have decrease
the fillfactor.


Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°