The usual sequential scan, but with LIMIT ! - Mailing list pgsql-performance
From | Pierre-Frédéric Caillaud |
---|---|
Subject | The usual sequential scan, but with LIMIT ! |
Date | |
Msg-id | opsdwm0iujcq72hf@musicbox Whole thread Raw |
In response to | Re: Table UPDATE is too slow (Kevin Barnard <kevin.barnard@gmail.com>) |
Responses |
Re: The usual sequential scan, but with LIMIT !
Re: The usual sequential scan, but with LIMIT ! Re: The usual sequential scan, but with LIMIT ! |
List | pgsql-performance |
Hello, I have this table : CREATE TABLE apparts ( id SERIAL NOT NULL PRIMARY KEY, price FLOAT NOT NULL, surface INTEGER NOT NULL, price_sq FLOAT NOT NULL, rooms INTEGER NULL, vente BOOL NOT NULL, category TEXT NOT NULL, zipcode INTEGER NOT NULL, departement INTEGER NOT NULL ) WITHOUT OIDS; There is a BTREE index on 'departement'. The table fits in RAM. When I want to SELECT according to my indexed field, postgres chooses a sequential scan unless the number of rows to be returned is very, very small : apparts=> explain analyze select * from apparts where departement=42; Seq Scan on apparts (cost=0.00..853.12 rows=1403 width=47) (actual time=5.094..52.026 rows=1516 loops=1) Filter: (departement = 42) Total runtime: 52.634 ms OK, it returns 1516 rows, so maybe the seq scan is right. apparts=> SET enable_seqscan = 0; apparts=> explain analyze select * from apparts where departement=42; Index Scan using apparts_dept on apparts (cost=0.00..1514.59 rows=1403 width=47) (actual time=0.045..2.770 rows=1516 loops=1) Index Cond: (departement = 42) Total runtime: 3.404 ms Um, 15 times faster... Index scan is called only when there are few rows. With other values for 'departement' where there are few rows, the Index is used automatically. This is logical, even if I should adjust the page costs. I wish I could tell postgres "this table will fit in RAM and be accessed often, so for this table, the page seek cost should be very low". Everything is vacuum full analyze. Now, if I LIMIT the query to 10 rows, the index should be used all the time, because it will always return few rows... well, it doesn't ! apparts=> SET enable_seqscan = 1; apparts=> explain analyze select * from apparts where departement=42 LIMIT 10; Limit (cost=0.00..6.08 rows=10 width=47) (actual time=5.003..5.023 rows=10 loops=1) -> Seq Scan on apparts (cost=0.00..853.12 rows=1403 width=47) (actual time=4.998..5.013 rows=10 loops=1) Filter: (departement = 42) Total runtime: 5.107 ms Now, let's try : apparts=> SET enable_seqscan = 0; apparts=> explain analyze select * from apparts where departement=42 LIMIT 10; Limit (cost=0.00..10.80 rows=10 width=47) (actual time=0.047..0.072 rows=10 loops=1) -> Index Scan using apparts_dept on apparts (cost=0.00..1514.59 rows=1403 width=47) (actual time=0.044..0.061 rows=10 loops=1) Index Cond: (departement = 42) Total runtime: 0.157 ms So, by itself, Postgres will select a very bad query plan (32x slower) on a query which would be executed very fast using indexes. If I use OFFSET + LIMIT, it only gets worse because the seq scan has to scan more rows : apparts=> SET enable_seqscan = 1; apparts=> explain analyze select * from apparts where departement=42 LIMIT 10 OFFSET 85; QUERY PLAN --------------------------------------------------------------------------------------------------------------- Limit (cost=51.69..57.77 rows=10 width=47) (actual time=10.224..10.246 rows=10 loops=1) -> Seq Scan on apparts (cost=0.00..853.12 rows=1403 width=47) (actual time=5.254..10.200 rows=95 loops=1) Filter: (departement = 42) Total runtime: 10.326 ms apparts=> SET enable_seqscan = 1; apparts=> explain analyze select * from apparts where departement=42 LIMIT 10 OFFSET 1000; Limit (cost=608.07..614.15 rows=10 width=47) (actual time=43.993..44.047 rows=10 loops=1) -> Seq Scan on apparts (cost=0.00..853.12 rows=1403 width=47) (actual time=5.328..43.791 rows=1010 loops=1) Filter: (departement = 42) Total runtime: 44.128 ms apparts=> SET enable_seqscan = 0; apparts=> explain analyze select * from apparts where departement=42 LIMIT 10 OFFSET 1000; Limit (cost=1079.54..1090.33 rows=10 width=47) (actual time=2.147..2.170 rows=10 loops=1) -> Index Scan using apparts_dept on apparts (cost=0.00..1514.59 rows=1403 width=47) (actual time=0.044..1.860 rows=1010 loops=1) Index Cond: (departement = 42) Total runtime: 2.259 ms Why is it that way ? The planner should use the LIMIT values when planning the query, should it not ?
pgsql-performance by date: