Thread: Query not using the index

Query not using the index

From
Mark Halliwell
Date:
Hello Everyone,

I am having some problems getting my queries to use the correct index, and as
a result they use a sequential scan that takes a very long time.

The table in question is used for replicating data between computers and
contains over 7 million records.  The table is on a few different linux
computers, some running postgresql version 7.3 and some 7.3.2; my problem is
the same on all.  The relevent fields are:
                Table "public.replicate"
    Column    |            Type             | Modifiers
--------------+-----------------------------+-----------
 computer     | integer                     | not null
 sequence     | integer                     | not null

The majority of records (about 6.8 million) have computer = 8 with sequence
starting at 2200000 and incrementing by 1.
There are about 497000 records with computer = 3 with the sequence starting at
1 and also incrementing by 1.
There are only a few records with other computer numbers.
Records are inserted (they are never deleted but sometimes updated) in
numerical order by the sequence field for each computer and together these
fields (computer, sequence) are unique.

I have a few queries that attempt to find recently inserted records for a
particular computer.  Most of my queries include other terms in the where
clause and sort the results (usually by sequence), however this simple query
can be used as an example of my problem:
select * from replicate where computer = 3 and sequence >= 490000;

I have created several different indexes (always doing a vacuum analyse
afterwards etc), but the explain always reports a sequential scan.  If I
force an index scan, it runs very quickly - as it should.  Also, it appears
that if a specify an upper limit for sequence (a value which I cannot always
easily predict), it also uses the index.

If my query is for those records with computer = 8, I believe it chooses the
correct index every time.

Here are some examples of the indexes I created and the explains:
=======
This is the original index which works fine until there are lots (several
thousand) of records for a particular computer number:
create unique index replicate_id on replicate (computer, sequence);

explain analyse select * from replicate where computer = 3 and sequence >=
490000;
                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
 Seq Scan on replicate  (cost=0.00..400259.20 rows=300970 width=386) (actual
time=80280.18..80974.41 rows=7459 loops=1)
   Filter: ((computer = 3) AND ("sequence" >= 490000))
 Total runtime: 80978.67 msec
(3 rows)

But if we put in an upper limit for the sequence we get:
explain analyse select * from replicate where computer = 3 and sequence >=
490000 and sequence < 600000;
                                                            QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
 Index Scan using replicate_id on replicate  (cost=0.00..625.99 rows=182
width=388) (actual time=45.00..446.31 rows=7789 loops=1)
   Index Cond: ((computer = 3) AND ("sequence" >= 490000) AND ("sequence" <
600000))
 Total runtime: 451.18 msec
(3 rows)


set enable_seqscan=off;
explain analyse select * from replicate where computer = 3 and sequence >=
490000;
                                                              QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using replicate_id on replicate  (cost=0.00..991401.16 rows=289949
width=388) (actual time=0.06..47.84 rows=7788 loops=1)
   Index Cond: ((computer = 3) AND ("sequence" >= 490000))
 Total runtime: 52.48 msec
(3 rows)

======
I tried adding this index, and it seemed to work until (I think) there were
about 400000 records for computer = 3.
create index replicate_computer_3 on replicate (sequence) WHERE computer = 3;

explain analyse select * from replicate where computer = 3 and sequence >=
490000;
                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
 Seq Scan on replicate  (cost=0.00..400262.91 rows=287148 width=386) (actual
time=74371.70..74664.84 rows=7637 loops=1)
   Filter: ((computer = 3) AND ("sequence" >= 490000))
 Total runtime: 74669.22 msec
(3 rows)

But if we put an upper limit for the sequence we get:
explain analyse select * from replicate where computer = 3 and sequence >=
490000 and sequence < 600000;
                                                               QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using replicate_computer_3 on replicate  (cost=0.00..417.29
rows=180 width=386) (actual time=0.06..54.21 rows=7657 loops=1)
   Index Cond: (("sequence" >= 490000) AND ("sequence" < 600000))
   Filter: (computer = 3)
 Total runtime: 58.86 msec
(4 rows)


And, forcing the index:
set enable_seqscan=off;
explain analyse select * from replicate where computer = 3 and sequence >=
490000;
                                                                  QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using replicate_computer_3 on replicate  (cost=0.00..660538.28
rows=287148 width=386) (actual time=0.06..53.05 rows=7657 loops=1)
   Index Cond: ("sequence" >= 490000)
   Filter: (computer = 3)
 Total runtime: 57.66 msec
(4 rows)


I have tried quiet a few different indexes and they all seem to do the same
sort of thing.  Is there anything that I should be doing to make these
queries use the index?  I am unable to easily predict what the upper limit
for the sequence should be in all cases, so I would rather a solution that
didn't require specifying it.  Am I missing something?

Is there some other information that I should have provided?

Thanks
Mark Halliwell



Re: Query not using the index

From
Andrew Sullivan
Date:
On Tue, Feb 25, 2003 at 07:03:40PM +1100, Mark Halliwell wrote:

> The majority of records (about 6.8 million) have computer = 8 with sequence
> starting at 2200000 and incrementing by 1.
> There are about 497000 records with computer = 3 with the sequence starting at
> 1 and also incrementing by 1.
> There are only a few records with other computer numbers.

> select * from replicate where computer = 3 and sequence >= 490000;
>
> I have created several different indexes (always doing a vacuum analyse
> afterwards etc), but the explain always reports a sequential scan.  If I

Try setting the statistics on computer to a much wider value -- say

    ALTER TABLE computer ALTER COLUMN computer SET STATISTICS 1000

and see if it helps.  You can poke around in the pg_stats view to see
why this might help, and perhaps to get a more realistic idea of what
you need to set the statistics to.  The problem is likely the
overwhelming commonality of computer=8.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: Query not using the index

From
Bruno Wolff III
Date:
On Tue, Feb 25, 2003 at 19:03:40 +1100,
  Mark Halliwell <mark@transportservices.com.au> wrote:
>
> The majority of records (about 6.8 million) have computer = 8 with sequence
> starting at 2200000 and incrementing by 1.
> There are about 497000 records with computer = 3 with the sequence starting at
> 1 and also incrementing by 1.
> There are only a few records with other computer numbers.

You might get some benefit using a partial index that just covers the
rows where computer = 3.

Re: Query not using the index

From
Tom Lane
Date:
Mark Halliwell <mark@transportservices.com.au> writes:
> The majority of records (about 6.8 million) have computer = 8 with sequence
> starting at 2200000 and incrementing by 1.
> There are about 497000 records with computer = 3 with the sequence starting at
> 1 and also incrementing by 1.
> There are only a few records with other computer numbers.

You aren't going to find any non-kluge solution, because Postgres keeps
no cross-column statistics and thus is quite unaware that there's any
correlation between the computer and sequence fields.  So in a query
like

> select * from replicate where computer = 3 and sequence >= 490000;

the sequence constraint looks extremely unselective to the planner, and
you get a seqscan, even though *in the domain of computer = 3* it's a
reasonably selective constraint.

> that if a specify an upper limit for sequence (a value which I cannot always
> easily predict), it also uses the index.

I would think that it'd be sufficient to say

  select * from replicate where computer = 3 and sequence >= 490000
  and sequence < 2200000;

If it's not, try increasing the statistics target for the sequence
column so that ANALYZE gathers a finer-grain histogram for that column.

            regards, tom lane