Hello Everyone,
I am having some problems getting my queries to use the correct index, and as
a result they use a sequential scan that takes a very long time.
The table in question is used for replicating data between computers and
contains over 7 million records. The table is on a few different linux
computers, some running postgresql version 7.3 and some 7.3.2; my problem is
the same on all. The relevent fields are:
Table "public.replicate"
Column | Type | Modifiers
--------------+-----------------------------+-----------
computer | integer | not null
sequence | integer | not null
The majority of records (about 6.8 million) have computer = 8 with sequence
starting at 2200000 and incrementing by 1.
There are about 497000 records with computer = 3 with the sequence starting at
1 and also incrementing by 1.
There are only a few records with other computer numbers.
Records are inserted (they are never deleted but sometimes updated) in
numerical order by the sequence field for each computer and together these
fields (computer, sequence) are unique.
I have a few queries that attempt to find recently inserted records for a
particular computer. Most of my queries include other terms in the where
clause and sort the results (usually by sequence), however this simple query
can be used as an example of my problem:
select * from replicate where computer = 3 and sequence >= 490000;
I have created several different indexes (always doing a vacuum analyse
afterwards etc), but the explain always reports a sequential scan. If I
force an index scan, it runs very quickly - as it should. Also, it appears
that if a specify an upper limit for sequence (a value which I cannot always
easily predict), it also uses the index.
If my query is for those records with computer = 8, I believe it chooses the
correct index every time.
Here are some examples of the indexes I created and the explains:
=======
This is the original index which works fine until there are lots (several
thousand) of records for a particular computer number:
create unique index replicate_id on replicate (computer, sequence);
explain analyse select * from replicate where computer = 3 and sequence >=
490000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Seq Scan on replicate (cost=0.00..400259.20 rows=300970 width=386) (actual
time=80280.18..80974.41 rows=7459 loops=1)
Filter: ((computer = 3) AND ("sequence" >= 490000))
Total runtime: 80978.67 msec
(3 rows)
But if we put in an upper limit for the sequence we get:
explain analyse select * from replicate where computer = 3 and sequence >=
490000 and sequence < 600000;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Index Scan using replicate_id on replicate (cost=0.00..625.99 rows=182
width=388) (actual time=45.00..446.31 rows=7789 loops=1)
Index Cond: ((computer = 3) AND ("sequence" >= 490000) AND ("sequence" <
600000))
Total runtime: 451.18 msec
(3 rows)
set enable_seqscan=off;
explain analyse select * from replicate where computer = 3 and sequence >=
490000;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Index Scan using replicate_id on replicate (cost=0.00..991401.16 rows=289949
width=388) (actual time=0.06..47.84 rows=7788 loops=1)
Index Cond: ((computer = 3) AND ("sequence" >= 490000))
Total runtime: 52.48 msec
(3 rows)
======
I tried adding this index, and it seemed to work until (I think) there were
about 400000 records for computer = 3.
create index replicate_computer_3 on replicate (sequence) WHERE computer = 3;
explain analyse select * from replicate where computer = 3 and sequence >=
490000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Seq Scan on replicate (cost=0.00..400262.91 rows=287148 width=386) (actual
time=74371.70..74664.84 rows=7637 loops=1)
Filter: ((computer = 3) AND ("sequence" >= 490000))
Total runtime: 74669.22 msec
(3 rows)
But if we put an upper limit for the sequence we get:
explain analyse select * from replicate where computer = 3 and sequence >=
490000 and sequence < 600000;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Index Scan using replicate_computer_3 on replicate (cost=0.00..417.29
rows=180 width=386) (actual time=0.06..54.21 rows=7657 loops=1)
Index Cond: (("sequence" >= 490000) AND ("sequence" < 600000))
Filter: (computer = 3)
Total runtime: 58.86 msec
(4 rows)
And, forcing the index:
set enable_seqscan=off;
explain analyse select * from replicate where computer = 3 and sequence >=
490000;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using replicate_computer_3 on replicate (cost=0.00..660538.28
rows=287148 width=386) (actual time=0.06..53.05 rows=7657 loops=1)
Index Cond: ("sequence" >= 490000)
Filter: (computer = 3)
Total runtime: 57.66 msec
(4 rows)
I have tried quiet a few different indexes and they all seem to do the same
sort of thing. Is there anything that I should be doing to make these
queries use the index? I am unable to easily predict what the upper limit
for the sequence should be in all cases, so I would rather a solution that
didn't require specifying it. Am I missing something?
Is there some other information that I should have provided?
Thanks
Mark Halliwell