Index selection on a large table - Mailing list pgsql-bugs

From Michael Richards
Subject Index selection on a large table
Date
Msg-id 397E1CCA.000017.47769@frodo.searchcanada.ca
Whole thread Raw
Responses Re: [SQL] Index selection on a large table  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Hi.

I believe this to be a bug, but I am submitting it to the SQL list as 
well in the case I overlooked something.

I'm running 
Postgres 7.0.2 on FreeBSD 4.0-STABLE/Intel 
compiled with [gcc version 2.95.2 19991024 (release)] 

I've got this interesting problem where the query plan is not what I 
expect. Every month we log details of users logging into their 
webmail accounts. Every week before the database is vacuumed the 
oldest week's entries are removed. The table can be quite large. Here 
is the relevant parts of its structure:
          Table "logins"Attribute |    Type     | Modifier 
-----------+-------------+----------loginid   | varchar(16) | ip        | varchar(15) | 
[...]
Indices: logins_ip_idx,        logins_loginid_idx

The indexes are as follows: Index "logins_ip_idx"Attribute |    Type     
-----------+-------------ip        | varchar(15)
btree

Index "logins_loginid_idx"Attribute |    Type     
-----------+-------------loginid   | varchar(16)
btree

Size of the table:
fastmail=> select count(*) from logins; count  
---------1082564
(1 row)

Now here is a query plan from a selection using loginid:
explain select * from logins where loginid='michael';
NOTICE:  QUERY PLAN:
Index Scan using logins_loginid_idx on logins  (cost=0.00..500.57 
rows=130 width=48)

As expected it uses the logins_loginid_idx to select the rows that 
match loginid='michael';

Now I should note that I store the IP's as type varchar(15).
The following query yeilds the questionable query plan:

explain select * from logins where ip='38.30.141.44';
NOTICE:  QUERY PLAN:
Seq Scan on logins  (cost=0.00..25248.51 rows=11602 width=48)

This one decides to ignore the fact that IP is indexed (and the table 
was vacuumed) and it does a slow-assed sequential scan through a 
million or so rows to pick out (in this case 3 matching rows).

Just to be sure, I re-vacuumed the table and tried the questionable 
query again with the same results. Here is the vacuum output in case 
it helps:

NOTICE:  --Relation logins--
NOTICE:  Pages 11717: Changed 1, reaped 0, Empty 0, New 0; Tup 
1082580: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 64, MaxLen 
92; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 
2.28s/15.38u sec.
NOTICE:  Index logins_ip_idx: Pages 4550; Tuples 1082580. CPU 
0.78s/2.65u sec.
NOTICE:  Index logins_loginid_idx: Pages 3881; Tuples 1082580. CPU 
0.62s/2.67u sec.
VACUUM

-Michael

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: Converting a timestamp to a date when it contains nulls.
Next
From: Tom Lane
Date:
Subject: Re: [SQL] Index selection on a large table