Thread: VACUUM kills Index Scans ?!

VACUUM kills Index Scans ?!

From
Gerald Gutierrez
Date:
I'm confused over two question involving PostgreSQL index scans. I'm using 
Linux Mandrake 7.2 on Intel, and PostgreSQL 7.0.3. If someone can explain 
what's going on, I'd greatly appreciate it.

---------------------------------------------

1) When I create a empty table, and then immediate create an index on a 
column, I can get /index scans/ when searching on that column. But when I 
then run VACUUM, the same search becomes a /sequential scan/. Shouldn't it 
still be an index scan? What's going on here?


test1=> create table t1 (a varchar(64), b int);
CREATE
test1=> create index t1_a_ndx on t1 (a);
CREATE
test1=> explain select * from t1 where a='asd';
NOTICE:  QUERY PLAN:

Index Scan using t1_a_ndx on t1  (cost=0.00..8.14 rows=10 width=16)

EXPLAIN
test1=> vacuum;
NOTICE:  Skipping "pg_type" --- only table owner can VACUUM it (a bunch of 
these)
VACUUM
test1=> explain select * from t1 where a='asd';
NOTICE:  QUERY PLAN:

Seq Scan on t1  (cost=0.00..0.00 rows=1 width=16)

EXPLAIN
test1=>

---------------------------------------------

2) If I already have some data in a table and I create an index on a 
column, why doesn't subsequent searches then change from sequential scans 
to index scans?


test1=> create table t2 (a varchar(64), b int);
CREATE
test1=> insert into t2 values ('a', 1);
INSERT 41255 1
test1=> insert into t2 values ('b', 2);
INSERT 41256 1
test1=> insert into t2 values ('c', 3);
INSERT 41257 1
test1=> explain select * from t2 where a='a';
NOTICE:  QUERY PLAN:

Seq Scan on t2  (cost=0.00..22.50 rows=10 width=16)

EXPLAIN
test1=> create index t2_a_ndx on t2 (a);
CREATE
test1=> explain select * from t2 where a='a';
NOTICE:  QUERY PLAN:

Seq Scan on t2  (cost=0.00..1.04 rows=1 width=16)

EXPLAIN
test1=>



Re: VACUUM kills Index Scans ?!

From
Gerald Gutierrez
Date:
As additional information that I neglected to include in the first message, 
after both tests, the indices clearly still exist and can be seed in the 
following commands:

\d t1
\d t2
\di
\d t1_a_ndx
\d t2_a_ndx

The output shows what's expected, e.g:

test1=> \di
List of relations
Name | Type | Owner
----------+-------+-------
t1_a_ndx | index | gutz
t2_a_ndx | index | gutz
(1 row)

test1=> \d t2_a_ndx
Index "t2_a_ndx"
Attribute | Type
-----------+-----------
a | varchar()
btree


At 11:42 AM 3/15/2001 -0800, Gerald Gutierrez wrote:

>I'm confused over two question involving PostgreSQL index scans. I'm using 
>Linux Mandrake 7.2 on Intel, and PostgreSQL 7.0.3. If someone can explain 
>what's going on, I'd greatly appreciate it.
>
>---------------------------------------------
>
>1) When I create a empty table, and then immediate create an index on a 
>column, I can get /index scans/ when searching on that column. But when I 
>then run VACUUM, the same search becomes a /sequential scan/. Shouldn't it 
>still be an index scan? What's going on here?
>
>
>test1=> create table t1 (a varchar(64), b int);
>CREATE
>test1=> create index t1_a_ndx on t1 (a);
>CREATE
>test1=> explain select * from t1 where a='asd';
>NOTICE:  QUERY PLAN:
>
>Index Scan using t1_a_ndx on t1  (cost=0.00..8.14 rows=10 width=16)
>
>EXPLAIN
>test1=> vacuum;
>NOTICE:  Skipping "pg_type" --- only table owner can VACUUM it (a bunch of 
>these)
>VACUUM
>test1=> explain select * from t1 where a='asd';
>NOTICE:  QUERY PLAN:
>
>Seq Scan on t1  (cost=0.00..0.00 rows=1 width=16)
>
>EXPLAIN
>test1=>
>
>---------------------------------------------
>
>2) If I already have some data in a table and I create an index on a 
>column, why doesn't subsequent searches then change from sequential scans 
>to index scans?
>
>
>test1=> create table t2 (a varchar(64), b int);
>CREATE
>test1=> insert into t2 values ('a', 1);
>INSERT 41255 1
>test1=> insert into t2 values ('b', 2);
>INSERT 41256 1
>test1=> insert into t2 values ('c', 3);
>INSERT 41257 1
>test1=> explain select * from t2 where a='a';
>NOTICE:  QUERY PLAN:
>
>Seq Scan on t2  (cost=0.00..22.50 rows=10 width=16)
>
>EXPLAIN
>test1=> create index t2_a_ndx on t2 (a);
>CREATE
>test1=> explain select * from t2 where a='a';
>NOTICE:  QUERY PLAN:
>
>Seq Scan on t2  (cost=0.00..1.04 rows=1 width=16)
>
>EXPLAIN
>test1=>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster




Re: VACUUM kills Index Scans ?!

From
Tom Lane
Date:
Gerald Gutierrez <gutz@kalador.com> writes:
> 1) When I create a empty table, and then immediate create an index on a 
> column, I can get /index scans/ when searching on that column. But when I 
> then run VACUUM, the same search becomes a /sequential scan/.

VACUUM updates the planner's statistics so that it knows the table is
empty (note the change in cost estimates).  The default numbers for a
never-yet-vacuumed table (10 disk blocks and 1000 rows, IIRC) just
happen to be large enough to cause an indexscan.  Put in a reasonable
amount of data and then repeat the VACUUM, and it'll go back to index
scan.

> 2) If I already have some data in a table and I create an index on a 
> column, why doesn't subsequent searches then change from sequential scans 
> to index scans?

Again, you haven't got enough data to justify an indexscan.  You need at
least several disk blocks worth of data before an indexscan can possibly
save more table I/O than it costs to read the index.

There is an undocumented little factoid here: CREATE INDEX will update
(some of) the planner stats, but only if it finds some data in the
table.  CREATE INDEX on an empty table leaves the initial default
numbers alone.  This may be contributing to your confusion, but it was
deemed necessary so that the common sequence
CREATE TABLECREATE INDEXload data

wouldn't leave the planner believing the table to be completely empty
(and hence generating abysmally bad plans if you had actually loaded
quite a bit of data).  On the other hand, the preferred bulk-load
method is
CREATE TABLEload dataCREATE INDEX

and this leaves the planner's stats set correctly.
        regards, tom lane


Re: VACUUM kills Index Scans ?!

From
Stephan Szabo
Date:
On Thu, 15 Mar 2001, Gerald Gutierrez wrote:

> 1) When I create a empty table, and then immediate create an index on a 
> column, I can get /index scans/ when searching on that column. But when I 
> then run VACUUM, the same search becomes a /sequential scan/. Shouldn't it 
> still be an index scan? What's going on here?

> ---------------------------------------------
> 
> 2) If I already have some data in a table and I create an index on a 
> column, why doesn't subsequent searches then change from sequential scans 
> to index scans?

With a small number of rows, a sequence scan will require less
reads/seeks from the filesystem.  It's not always correct for the
optimizer to choose to use an index even if it's there.

If you put in lots of rows with distinct values and vacuum analyze (you
want to do that rather than just vacuum) and do a comparison it should
use the index, with only a few rows, the seq scan is probably better.




Re: VACUUM kills Index Scans ?!

From
Gerald Gutierrez
Date:
>There is an undocumented little factoid here: CREATE INDEX will update
>(some of) the planner stats, but only if it finds some data in the
>table.  CREATE INDEX on an empty table leaves the initial default
>numbers alone.  This may be contributing to your confusion, but it was
>deemed necessary ...

I understand now; it makes sense. I'll be using a number of tables that are 
initially very small, perhaps 5 or 10 records. But I expect that the tables 
will grow very quickly to several tens (or hundreds) of thousands of 
records. It seems reasonable to me that the table should then be set up to 
use index scan right from the beginning so that as the table grows the 
index scan will become more useful. Thus, the correct sequence for me is 
probably:

>         CREATE TABLE
>         CREATE INDEX
>         load data

I also understand that VACUUM and VACUUM ANALYZE takes a significant amount 
of time and locks the tables that it works on. Does it do locking and 
unlocking per table as it goes through them (i.e. only lock while it's 
reading the table) or does it do something else? If the locks are for large 
amounts of time I'll have to shut down my application to avoid connections 
from timing out and JDBC exceptions from being thrown.

Thanks for your help :)





Re: VACUUM kills Index Scans ?!

From
Joseph Shraibman
Date:
Gerald Gutierrez wrote:
> 
> >There is an undocumented little factoid here: CREATE INDEX will update
> >(some of) the planner stats, but only if it finds some data in the
> >table.  CREATE INDEX on an empty table leaves the initial default
> >numbers alone.  This may be contributing to your confusion, but it was
> >deemed necessary ...
> 
> I understand now; it makes sense. I'll be using a number of tables that are
> initially very small, perhaps 5 or 10 records. But I expect that the tables
> will grow very quickly to several tens (or hundreds) of thousands of
> records. It seems reasonable to me that the table should then be set up to
> use index scan right from the beginning so that as the table grows the
> index scan will become more useful. Thus, the correct sequence for me is
> probably:
> 
> >         CREATE TABLE
> >         CREATE INDEX
> >         load data
> 

no, the correct sequence is to create the index last, which will create
statistics that will tell postgres if it really wants to use an index or
not.  Don't try and second guess postgres.

Even better do a VACUUM ANALYZE.


-- 
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio.  http://www.targabot.com