Index not used past a certain threshold - Mailing list pgsql-bugs

From Nishad Prakash
Subject Index not used past a certain threshold
Date
Msg-id Pine.GSO.4.58.0404291839200.24603@e4e.oac.uci.edu
Whole thread Raw
Responses Re: Index not used past a certain threshold  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
I seem to have run into an indexing problem in postgres 7.4 on Solaris 8.
An index on a certain table is not being used if a column referenced in
the query has more than a certain number of rows for a given value.
Here's an example:

I have a table like so:

      Table "public.a1"
 Column |  Type   | Modifiers
--------+---------+-----------
 aid    | integer | not null
Indexes:
    "a1b" hash (aid)


With an index like so:

Index "public.a1b"
 Column |  Type
--------+---------
 aid    | integer
hash, for table "public.a1"


If we group this table by "number of rows for a given aid", using this
query:
   select aid, count (aid) from a1 group by aid order by count (aid) using >;
then the largest rows are:

 aid  | count
------+-------
 4085 | 51039
 5065 | 45750
 5026 | 39224
 9010 | 31418
  527 | 30691
 5014 | 29421
 5010 | 24958
    7 | 20723
   57 | 19167
  722 | 17180
[...snip...]

Now, the index is *not* being used if I query for rows
where count >= 19167.

it=> explain select aid from a1 where aid = 57;
                        QUERY PLAN
----------------------------------------------------------
 Seq Scan on a1  (cost=0.00..17986.81 rows=15738 width=4)
   Filter: (aid = 57)

And the rows with higher counts as well.  But beneath that threshold, the
index is indeed being used:

it=> explain select aid from a1 where aid = 722;
                             QUERY PLAN
--------------------------------------------------------------------
 Index Scan using a1b on a1  (cost=0.00..4796.23 rows=1337 width=4)
   Index Cond: (aid = 722)


The actual table in which I found this bug has 12 columns, and the
index involves three columns, but the problem is reproducible in a
table with just one column, if that helps to simplify things for you.
I have tried using hash indices and b-tree indices, but in either
case, once you hit the aid values with the larger number of rows, you
get a sequential scan (although the threshold varies slightly depending
on which indexing method you use).

Can this be fixed or is it just a feature of postgres indexing?

Thanks,

Nishad
ps> I can provide the data for the table if it's any help.
--
"Underneath the concrete, the dream is still alive" -- Talking Heads

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: [ADMIN] INITDB-error - end-of-copy marker error
Next
From: Tom Lane
Date:
Subject: Re: Index not used past a certain threshold