The usual sequential scan, but with LIMIT ! - Mailing list pgsql-performance

From Pierre-Frédéric Caillaud
Subject The usual sequential scan, but with LIMIT !
Date
Msg-id opsdwm0iujcq72hf@musicbox
Whole thread Raw
In response to Re: Table UPDATE is too slow  (Kevin Barnard <kevin.barnard@gmail.com>)
Responses Re: The usual sequential scan, but with LIMIT !
Re: The usual sequential scan, but with LIMIT !
Re: The usual sequential scan, but with LIMIT !
List pgsql-performance
    Hello,

    I have this table :
CREATE TABLE apparts
(
       id                    SERIAL NOT NULL PRIMARY KEY,

       price            FLOAT NOT NULL,
       surface            INTEGER NOT NULL,
       price_sq      FLOAT NOT NULL,

       rooms            INTEGER NULL,
       vente            BOOL NOT NULL,
       category      TEXT NOT NULL,
       zipcode            INTEGER NOT NULL,
       departement      INTEGER NOT NULL
) WITHOUT OIDS;

There is a BTREE index on 'departement'.
The table fits in RAM.

When I want to SELECT according to my indexed field, postgres chooses a
sequential scan unless the number of rows to be returned is very, very
small :

apparts=> explain analyze select * from apparts where departement=42;
  Seq Scan on apparts  (cost=0.00..853.12 rows=1403 width=47) (actual
time=5.094..52.026 rows=1516 loops=1)
    Filter: (departement = 42)
  Total runtime: 52.634 ms

OK, it returns 1516 rows, so maybe the seq scan is right.

apparts=> SET enable_seqscan = 0;
apparts=> explain analyze select * from apparts where departement=42;
  Index Scan using apparts_dept on apparts  (cost=0.00..1514.59 rows=1403
width=47) (actual time=0.045..2.770 rows=1516 loops=1)
    Index Cond: (departement = 42)
  Total runtime: 3.404 ms

Um, 15 times faster...

Index scan is called only when there are few rows. With other values for
'departement' where there are few rows, the Index is used automatically.
This is logical, even if I should adjust the page costs. I wish I could
tell postgres "this table will fit in RAM and be accessed often, so for
this table, the page seek cost should be very low".

Everything is vacuum full analyze.

Now, if I LIMIT the query to 10 rows, the index should be used all the
time, because it will always return few rows... well, it doesn't !

apparts=> SET enable_seqscan = 1;
apparts=> explain analyze select * from apparts where departement=42 LIMIT
10;
  Limit  (cost=0.00..6.08 rows=10 width=47) (actual time=5.003..5.023
rows=10 loops=1)
    ->  Seq Scan on apparts  (cost=0.00..853.12 rows=1403 width=47) (actual
time=4.998..5.013 rows=10 loops=1)
          Filter: (departement = 42)
  Total runtime: 5.107 ms


Now, let's try :

apparts=> SET enable_seqscan = 0;
apparts=> explain analyze select * from apparts where departement=42 LIMIT
10;
  Limit  (cost=0.00..10.80 rows=10 width=47) (actual time=0.047..0.072
rows=10 loops=1)
    ->  Index Scan using apparts_dept on apparts  (cost=0.00..1514.59
rows=1403 width=47) (actual time=0.044..0.061 rows=10 loops=1)
          Index Cond: (departement = 42)
  Total runtime: 0.157 ms

So, by itself, Postgres will select a very bad query plan (32x slower) on
a query which would be executed very fast using indexes. If I use OFFSET
+ LIMIT, it only gets worse because the seq scan has to scan more rows :

apparts=> SET enable_seqscan = 1;
apparts=> explain analyze select * from apparts where departement=42 LIMIT
10 OFFSET 85;
                                                   QUERY PLAN
---------------------------------------------------------------------------------------------------------------
  Limit  (cost=51.69..57.77 rows=10 width=47) (actual time=10.224..10.246
rows=10 loops=1)
    ->  Seq Scan on apparts  (cost=0.00..853.12 rows=1403 width=47) (actual
time=5.254..10.200 rows=95 loops=1)
          Filter: (departement = 42)
  Total runtime: 10.326 ms


apparts=> SET enable_seqscan = 1;
apparts=> explain analyze select * from apparts where departement=42 LIMIT
10 OFFSET 1000;
  Limit  (cost=608.07..614.15 rows=10 width=47) (actual time=43.993..44.047
rows=10 loops=1)
    ->  Seq Scan on apparts  (cost=0.00..853.12 rows=1403 width=47) (actual
time=5.328..43.791 rows=1010 loops=1)
          Filter: (departement = 42)
  Total runtime: 44.128 ms

apparts=> SET enable_seqscan = 0;
apparts=> explain analyze select * from apparts where departement=42 LIMIT
10 OFFSET 1000;
  Limit  (cost=1079.54..1090.33 rows=10 width=47) (actual time=2.147..2.170
rows=10 loops=1)
    ->  Index Scan using apparts_dept on apparts  (cost=0.00..1514.59
rows=1403 width=47) (actual time=0.044..1.860 rows=1010 loops=1)
          Index Cond: (departement = 42)
  Total runtime: 2.259 ms


    Why is it that way ? The planner should use the LIMIT values when
planning the query, should it not ?












pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Tanking a server with shared memory
Next
From: Pierre-Frédéric Caillaud
Date:
Subject: Re: The usual sequential scan, but with LIMIT !