Indexes not always used after inserts/updates/vacuum analyze - Mailing list pgsql-bugs

From Michael G. Martin
Subject Indexes not always used after inserts/updates/vacuum analyze
Date
Msg-id 3C7D9BB9.4060006@vpmonline.com
Whole thread Raw
Responses Re: Indexes not always used after inserts/updates/vacuum analyze
List pgsql-bugs
I recently upgraded to 7.2 from 7.1.
Prior to 7,2, I was shutting down the database, droping indexes,
vacuuming analayze, re-building all the indexes on a nightly basis ( all
automated of course ;) ).  Things ran fine.
After upgrading to 7.2, I replaced all that with a nightly  on-line
/usr/local/pgsql/bin/vacuumdb -v -z  on the database.

The problems I am seeing is this.  As new data is added and modified,
indexes are not being used with some queries, but work fine on others.
 Even a vacuum full analyze did not fix the problem.  I had to drop and
re-build the index for the query to use the index. Now, I see the
problem starting again.  The table has about 20-30 million rows (
4-5Gigs in size), so seq scan kills any access.

This email is lengthy, but I wanted to detail this well.

Here is the table definition:

                Table "symbol_data"

       Column       |         Type          | Modifiers

--------------------+-----------------------+-----------

 symbol_name        | character varying(10) | not null
 date               | date                  | not null
 open               | numeric(15,3)         |
 high               | numeric(15,3)         |
 low                | numeric(15,3)         |
 d_close            | numeric(15,3)         |
 volume             | numeric(15,0)         |
 earnings           | numeric(15,3)         |
 dividend           | numeric(15,3)         |
 source             | character varying(50) |
 daily_mp_12a_long  | character(3)          |
 weekly_mp_12a_long | character(3)          |
 daily_mp_32a_long  | character(3)          |
 weekly_mp_32a_long | character(3)          |
Indexes: symbol_data_date_indx
Unique keys: symbol_data_pkey
------------------------------------------------------
Index "symbol_data_date_indx"
 Column | Type
--------+------
 date   | date
btree
------------------------------------------------------
    Index "symbol_data_pkey"
   Column    |         Type
-------------+-----------------------
 symbol_name | character varying(10)
 date        | date
unique btree
----------------------------------------------------------

Here is what I would expect which usually happens:

explain select * from symbol_data where symbol_name='IBM';
Index Scan using symbol_data_pkey on symbol_data  (cost=0.00..512.99 rows=128 width=129)

Here is one that fails:

explain select * from symbol_data where symbol_name='ELTE';
Seq Scan on symbol_data  (cost=0.00..707415.32 rows=438015 width=129)

Now I thought maybe it had something to do with the concatenated primary key, but:
 explain select * from symbol_data where symbol_name='IBM' and date between '1990-01-01' and '2002-01-01';
 Index Scan using symbol_data_pkey on symbol_data  (cost=0.00..402.55 rows=100 width=129)

explain select * from symbol_data where symbol_name='ELTE' and date between '1990-01-01' and '2002-01-01';
Seq Scan on symbol_data  (cost=0.00..810075.06 rows=342903 width=129)

Now, changing the date range will eventually use the index:

explain select * from symbol_data where symbol_name='ELTE' and date between '2002-01-01' and '2002-02-01';
NOTICE:  QUERY PLAN:

Index Scan using symbol_data_pkey on symbol_data  (cost=0.00..10815.42 rows=2706 width=129)




...now I do a vacuum analyze:
 VACUUM verbose ANALYZE symbol_data;
NOTICE:  --Relation symbol_data--
NOTICE:  Index symbol_data_date_indx: Pages 49709; Tuples 20536054: Deleted 4221.
    CPU 4.35s/16.30u sec elapsed 45.33 sec.
NOTICE:  Index symbol_data_pkey: Pages 74029; Tuples 20536054: Deleted 4221.
    CPU 6.44s/15.15u sec elapsed 31.00 sec.
NOTICE:  Removed 4221 tuples in 70 pages.
    CPU 0.00s/0.04u sec elapsed 0.08 sec.
NOTICE:  Pages 450911: Changed 0, Empty 0; Tup 20536054: Vac 4221, Keep 0, UnUsed 1858963.
    Total CPU 49.20s/36.31u sec elapsed 149.00 sec.
NOTICE:  Analyzing symbol_data
VACUUM

explain select * from symbol_data where symbol_name='ELTE' and date between '1990-01-01' and '2002-01-01';
Seq Scan on symbol_data  (cost=0.00..810291.94 rows=292916 width=129)

No change on the name.

A new index:

 create index test on symbol_data (symbol_name);

explain select * from symbol_data where symbol_name='ELTE';
NOTICE:  QUERY PLAN:

Seq Scan on symbol_data  (cost=0.00..707611.68 rows=383340 width=129)

EXPLAIN
vpm=> VACUUM verbose ANALYZE  symbol_data;
NOTICE:  --Relation symbol_data--
NOTICE:  Pages 450911: Changed 0, Empty 0; Tup 20536054: Vac 0, Keep 0, UnUsed 1863184.
    Total CPU 38.99s/4.50u sec elapsed 67.95 sec.
NOTICE:  Analyzing symbol_data
VACUUM
vpm=>  explain select * from symbol_data where symbol_name='ELTE';
NOTICE:  QUERY PLAN:

Seq Scan on symbol_data  (cost=0.00..707611.68 rows=355958 width=129)

Doesnt works.  I think the only way to make this ever work is to drop the indexes, vacuum full, and rebuild.

Any thoughts?  This is a very dynamic table, but I was hoping the online vacuum in 7.2 would fix the problems.

I'm in the process of splitting this table up into smaller pieces which will make life easier anyway,
but I think there is something going on here.

Thanks,
Michael

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: missing foreign key fails silently using COPY
Next
From: Tom Lane
Date:
Subject: Re: Indexes not always used after inserts/updates/vacuum analyze