Thread: VACUUM kills Index Scans ?!
I'm confused over two question involving PostgreSQL index scans. I'm using Linux Mandrake 7.2 on Intel, and PostgreSQL 7.0.3. If someone can explain what's going on, I'd greatly appreciate it. --------------------------------------------- 1) When I create a empty table, and then immediate create an index on a column, I can get /index scans/ when searching on that column. But when I then run VACUUM, the same search becomes a /sequential scan/. Shouldn't it still be an index scan? What's going on here? test1=> create table t1 (a varchar(64), b int); CREATE test1=> create index t1_a_ndx on t1 (a); CREATE test1=> explain select * from t1 where a='asd'; NOTICE: QUERY PLAN: Index Scan using t1_a_ndx on t1 (cost=0.00..8.14 rows=10 width=16) EXPLAIN test1=> vacuum; NOTICE: Skipping "pg_type" --- only table owner can VACUUM it (a bunch of these) VACUUM test1=> explain select * from t1 where a='asd'; NOTICE: QUERY PLAN: Seq Scan on t1 (cost=0.00..0.00 rows=1 width=16) EXPLAIN test1=> --------------------------------------------- 2) If I already have some data in a table and I create an index on a column, why doesn't subsequent searches then change from sequential scans to index scans? test1=> create table t2 (a varchar(64), b int); CREATE test1=> insert into t2 values ('a', 1); INSERT 41255 1 test1=> insert into t2 values ('b', 2); INSERT 41256 1 test1=> insert into t2 values ('c', 3); INSERT 41257 1 test1=> explain select * from t2 where a='a'; NOTICE: QUERY PLAN: Seq Scan on t2 (cost=0.00..22.50 rows=10 width=16) EXPLAIN test1=> create index t2_a_ndx on t2 (a); CREATE test1=> explain select * from t2 where a='a'; NOTICE: QUERY PLAN: Seq Scan on t2 (cost=0.00..1.04 rows=1 width=16) EXPLAIN test1=>
As additional information that I neglected to include in the first message, after both tests, the indices clearly still exist and can be seed in the following commands: \d t1 \d t2 \di \d t1_a_ndx \d t2_a_ndx The output shows what's expected, e.g: test1=> \di List of relations Name | Type | Owner ----------+-------+------- t1_a_ndx | index | gutz t2_a_ndx | index | gutz (1 row) test1=> \d t2_a_ndx Index "t2_a_ndx" Attribute | Type -----------+----------- a | varchar() btree At 11:42 AM 3/15/2001 -0800, Gerald Gutierrez wrote: >I'm confused over two question involving PostgreSQL index scans. I'm using >Linux Mandrake 7.2 on Intel, and PostgreSQL 7.0.3. If someone can explain >what's going on, I'd greatly appreciate it. > >--------------------------------------------- > >1) When I create a empty table, and then immediate create an index on a >column, I can get /index scans/ when searching on that column. But when I >then run VACUUM, the same search becomes a /sequential scan/. Shouldn't it >still be an index scan? What's going on here? > > >test1=> create table t1 (a varchar(64), b int); >CREATE >test1=> create index t1_a_ndx on t1 (a); >CREATE >test1=> explain select * from t1 where a='asd'; >NOTICE: QUERY PLAN: > >Index Scan using t1_a_ndx on t1 (cost=0.00..8.14 rows=10 width=16) > >EXPLAIN >test1=> vacuum; >NOTICE: Skipping "pg_type" --- only table owner can VACUUM it (a bunch of >these) >VACUUM >test1=> explain select * from t1 where a='asd'; >NOTICE: QUERY PLAN: > >Seq Scan on t1 (cost=0.00..0.00 rows=1 width=16) > >EXPLAIN >test1=> > >--------------------------------------------- > >2) If I already have some data in a table and I create an index on a >column, why doesn't subsequent searches then change from sequential scans >to index scans? > > >test1=> create table t2 (a varchar(64), b int); >CREATE >test1=> insert into t2 values ('a', 1); >INSERT 41255 1 >test1=> insert into t2 values ('b', 2); >INSERT 41256 1 >test1=> insert into t2 values ('c', 3); >INSERT 41257 1 >test1=> explain select * from t2 where a='a'; >NOTICE: QUERY PLAN: > >Seq Scan on t2 (cost=0.00..22.50 rows=10 width=16) > >EXPLAIN >test1=> create index t2_a_ndx on t2 (a); >CREATE >test1=> explain select * from t2 where a='a'; >NOTICE: QUERY PLAN: > >Seq Scan on t2 (cost=0.00..1.04 rows=1 width=16) > >EXPLAIN >test1=> > > >---------------------------(end of broadcast)--------------------------- >TIP 4: Don't 'kill -9' the postmaster
Gerald Gutierrez <gutz@kalador.com> writes: > 1) When I create a empty table, and then immediate create an index on a > column, I can get /index scans/ when searching on that column. But when I > then run VACUUM, the same search becomes a /sequential scan/. VACUUM updates the planner's statistics so that it knows the table is empty (note the change in cost estimates). The default numbers for a never-yet-vacuumed table (10 disk blocks and 1000 rows, IIRC) just happen to be large enough to cause an indexscan. Put in a reasonable amount of data and then repeat the VACUUM, and it'll go back to index scan. > 2) If I already have some data in a table and I create an index on a > column, why doesn't subsequent searches then change from sequential scans > to index scans? Again, you haven't got enough data to justify an indexscan. You need at least several disk blocks worth of data before an indexscan can possibly save more table I/O than it costs to read the index. There is an undocumented little factoid here: CREATE INDEX will update (some of) the planner stats, but only if it finds some data in the table. CREATE INDEX on an empty table leaves the initial default numbers alone. This may be contributing to your confusion, but it was deemed necessary so that the common sequence CREATE TABLECREATE INDEXload data wouldn't leave the planner believing the table to be completely empty (and hence generating abysmally bad plans if you had actually loaded quite a bit of data). On the other hand, the preferred bulk-load method is CREATE TABLEload dataCREATE INDEX and this leaves the planner's stats set correctly. regards, tom lane
On Thu, 15 Mar 2001, Gerald Gutierrez wrote: > 1) When I create a empty table, and then immediate create an index on a > column, I can get /index scans/ when searching on that column. But when I > then run VACUUM, the same search becomes a /sequential scan/. Shouldn't it > still be an index scan? What's going on here? > --------------------------------------------- > > 2) If I already have some data in a table and I create an index on a > column, why doesn't subsequent searches then change from sequential scans > to index scans? With a small number of rows, a sequence scan will require less reads/seeks from the filesystem. It's not always correct for the optimizer to choose to use an index even if it's there. If you put in lots of rows with distinct values and vacuum analyze (you want to do that rather than just vacuum) and do a comparison it should use the index, with only a few rows, the seq scan is probably better.
>There is an undocumented little factoid here: CREATE INDEX will update >(some of) the planner stats, but only if it finds some data in the >table. CREATE INDEX on an empty table leaves the initial default >numbers alone. This may be contributing to your confusion, but it was >deemed necessary ... I understand now; it makes sense. I'll be using a number of tables that are initially very small, perhaps 5 or 10 records. But I expect that the tables will grow very quickly to several tens (or hundreds) of thousands of records. It seems reasonable to me that the table should then be set up to use index scan right from the beginning so that as the table grows the index scan will become more useful. Thus, the correct sequence for me is probably: > CREATE TABLE > CREATE INDEX > load data I also understand that VACUUM and VACUUM ANALYZE takes a significant amount of time and locks the tables that it works on. Does it do locking and unlocking per table as it goes through them (i.e. only lock while it's reading the table) or does it do something else? If the locks are for large amounts of time I'll have to shut down my application to avoid connections from timing out and JDBC exceptions from being thrown. Thanks for your help :)
Gerald Gutierrez wrote: > > >There is an undocumented little factoid here: CREATE INDEX will update > >(some of) the planner stats, but only if it finds some data in the > >table. CREATE INDEX on an empty table leaves the initial default > >numbers alone. This may be contributing to your confusion, but it was > >deemed necessary ... > > I understand now; it makes sense. I'll be using a number of tables that are > initially very small, perhaps 5 or 10 records. But I expect that the tables > will grow very quickly to several tens (or hundreds) of thousands of > records. It seems reasonable to me that the table should then be set up to > use index scan right from the beginning so that as the table grows the > index scan will become more useful. Thus, the correct sequence for me is > probably: > > > CREATE TABLE > > CREATE INDEX > > load data > no, the correct sequence is to create the index last, which will create statistics that will tell postgres if it really wants to use an index or not. Don't try and second guess postgres. Even better do a VACUUM ANALYZE. -- Joseph Shraibman jks@selectacast.net Increase signal to noise ratio. http://www.targabot.com