Thread: Index not used past a certain threshold

Index not used past a certain threshold

From
Nishad Prakash
Date:
I seem to have run into an indexing problem in postgres 7.4 on Solaris 8.
An index on a certain table is not being used if a column referenced in
the query has more than a certain number of rows for a given value.
Here's an example:

I have a table like so:

      Table "public.a1"
 Column |  Type   | Modifiers
--------+---------+-----------
 aid    | integer | not null
Indexes:
    "a1b" hash (aid)


With an index like so:

Index "public.a1b"
 Column |  Type
--------+---------
 aid    | integer
hash, for table "public.a1"


If we group this table by "number of rows for a given aid", using this
query:
   select aid, count (aid) from a1 group by aid order by count (aid) using >;
then the largest rows are:

 aid  | count
------+-------
 4085 | 51039
 5065 | 45750
 5026 | 39224
 9010 | 31418
  527 | 30691
 5014 | 29421
 5010 | 24958
    7 | 20723
   57 | 19167
  722 | 17180
[...snip...]

Now, the index is *not* being used if I query for rows
where count >= 19167.

it=> explain select aid from a1 where aid = 57;
                        QUERY PLAN
----------------------------------------------------------
 Seq Scan on a1  (cost=0.00..17986.81 rows=15738 width=4)
   Filter: (aid = 57)

And the rows with higher counts as well.  But beneath that threshold, the
index is indeed being used:

it=> explain select aid from a1 where aid = 722;
                             QUERY PLAN
--------------------------------------------------------------------
 Index Scan using a1b on a1  (cost=0.00..4796.23 rows=1337 width=4)
   Index Cond: (aid = 722)


The actual table in which I found this bug has 12 columns, and the
index involves three columns, but the problem is reproducible in a
table with just one column, if that helps to simplify things for you.
I have tried using hash indices and b-tree indices, but in either
case, once you hit the aid values with the larger number of rows, you
get a sequential scan (although the threshold varies slightly depending
on which indexing method you use).

Can this be fixed or is it just a feature of postgres indexing?

Thanks,

Nishad
ps> I can provide the data for the table if it's any help.
--
"Underneath the concrete, the dream is still alive" -- Talking Heads

Re: Index not used past a certain threshold

From
Tom Lane
Date:
Nishad Prakash <prakashn@uci.edu> writes:
> An index on a certain table is not being used if a column referenced in
> the query has more than a certain number of rows for a given value.

This is not a bug; it's intended and correct behavior.

You may have an issue that the crossover point isn't very well tuned for
your environment.  If so, the answer is to fool with the optimizer cost
parameters.  See the plgsql-performance archives...

            regards, tom lane