Thread: Dissapearing indexes, what's that all about?
Hey people, have a very strange problem now... It's that when my database with the script i'm using a explain select * from acc where username='britta' yields an index scan. Very nice. But after some time being up the same thing yields a SEQUENTIAL scan. What's happening folks? =) No more than update's select's and vacuums has been invoked. Thanks! *hug* Daniel Åkerud
>Hey people, >have a very strange problem now... >It's that when my database with the script i'm using a >explain select * from acc where username='britta' yields an >index scan. Very nice. But after some time being up the >same thing yields a SEQUENTIAL scan. What's happening >folks? =) No more than update's select's and vacuums has >been invoked. Are you doing vacuum or vacuum analyze? I believe only the 'vacuum analyze' function cleans up the indexing statistics. len
> >Hey people, > >have a very strange problem now... > >It's that when my database with the script i'm using a > >explain select * from acc where username='britta' yields an > >index scan. Very nice. But after some time being up the > >same thing yields a SEQUENTIAL scan. What's happening > >folks? =) No more than update's select's and vacuums has > >been invoked. > > Are you doing vacuum or vacuum analyze? I believe only the 'vacuum analyze' > function cleans up the indexing statistics. > > len > Please look at this: I just ran an interesting little test here... watch this: <DROP DB><CREATE DB><RUN DB SCRIPT> explain select * from acc where username='britta'; Index scan using acc_username_key on acc... VACUUM; explain select * from acc where username='britta'; Seq Scan on acc <DROP DB><CREATE DB><RUN DB SCRIPT> explain select * from acc where username='britta'; Index scan VACUUM ANALYZE explain select * from acc where username='britta'; Seq scan What the wakk is goin on here people? =) version: PostgreSQL 7.0.0 on i686-pc-linux-gnu, compiled by gcc 2.95.2 Thanks Daniel Åkerud
What are the costs associated with the EXPLAIN output? Perhaps a sequential scan is *faster* then an index scan. Mike Mascari mascarm@mascari.com -----Original Message----- From: Daniel ?erud [SMTP:zilch@home.se] Sent: Sunday, April 01, 2001 12:31 PM To: pgsql-general@postgresql.org Subject: Re: Re: [GENERAL] Dissapearing indexes, what's that all about? > >Hey people, > >have a very strange problem now... > >It's that when my database with the script i'm using a > >explain select * from acc where username='britta' yields an > >index scan. Very nice. But after some time being up the > >same thing yields a SEQUENTIAL scan. What's happening > >folks? =) No more than update's select's and vacuums has > >been invoked. > > Are you doing vacuum or vacuum analyze? I believe only the 'vacuum analyze' > function cleans up the indexing statistics. > > len > Please look at this: I just ran an interesting little test here... watch this: <DROP DB><CREATE DB><RUN DB SCRIPT> explain select * from acc where username='britta'; Index scan using acc_username_key on acc... VACUUM; explain select * from acc where username='britta'; Seq Scan on acc <DROP DB><CREATE DB><RUN DB SCRIPT> explain select * from acc where username='britta'; Index scan VACUUM ANALYZE explain select * from acc where username='britta'; Seq scan What the wakk is goin on here people? =) version: PostgreSQL 7.0.0 on i686-pc-linux-gnu, compiled by gcc 2.95.2 Thanks Daniel Akerud ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
after a refresh database the explain yields: index scan using xXxX (cost=0.00..8.14 rows=10 width=147) after a vacuum + vacuum analyze the explain yields: seq scan on acc xXxX Ä(cost=0.00..1.23 rows=1 width=147) humm, seems you are right here... but why is it choosing a index scan in the first place then? > What are the costs associated with the EXPLAIN output? Perhaps a sequential scan is *faster* then an index scan. > > Mike Mascari > mascarm@mascari.com > > -----Original Message----- > From: Daniel ?erud [SMTP:zilch@home.se] > Sent: Sunday, April 01, 2001 12:31 PM > To: pgsql-general@postgresql.org > Subject: Re: Re: [GENERAL] Dissapearing indexes, what's that all about? > > > > > >Hey people, > > >have a very strange problem now... > > >It's that when my database with the script i'm using a > > >explain select * from acc where username='britta' yields > an > > >index scan. Very nice. But after some time being up the > > >same thing yields a SEQUENTIAL scan. What's happening > > >folks? =) No more than update's select's and vacuums has > > >been invoked. > > > > Are you doing vacuum or vacuum analyze? I believe only > the 'vacuum analyze' > > function cleans up the indexing statistics. > > > > len > > > > Please look at this: > I just ran an interesting little test here... > watch this: > > <DROP DB><CREATE DB><RUN DB SCRIPT> > > explain select * from acc where username='britta'; > > Index scan using acc_username_key on acc... > > VACUUM; > > explain select * from acc where username='britta'; > > Seq Scan on acc > > <DROP DB><CREATE DB><RUN DB SCRIPT> > > explain select * from acc where username='britta'; > > Index scan > > VACUUM ANALYZE > > explain select * from acc where username='britta'; > > Seq scan > > What the wakk is goin on here people? =) > > version: PostgreSQL 7.0.0 on i686-pc-linux-gnu, compiled by > gcc 2.95.2 > > Thanks > > Daniel Akerud > > > ---------------------------(end of broadcast)------------- -------------- > TIP 4: Don't 'kill -9' the postmaster > >
If I recall correctly, when the optimizer was changed (greatly enhanced), there was a debate about what the default behavior should be. The problem was that a large number of users would populate they're database after index creation and see sluggishness because the statistics had not yet been updated vs. the much smaller number of users that would suffer at the hands of an index scan against a table that would be better served with a sequential scan. I *think* the result of assuming 0 rows in a newly created table, until the next vacuum, would yield a significant increase in mailing-list traffic complaints to the tune of: "Why isn't PostgreSQL using my index?" followed by the usual "Did you run VACUUM ANALYZE?" So an assumption of 1000 rows was made, with 10 rows matching your WHERE clause. Mike Mascari mascarm@mascari.com -----Original Message----- From: Daniel ?erud [SMTP:zilch@home.se] Sent: Sunday, April 01, 2001 12:43 PM To: pgsql-general@postgresql.org Subject: Re: RE: Re: [GENERAL] Dissapearing indexes, what's that all about? after a refresh database the explain yields: index scan using xXxX (cost=0.00..8.14 rows=10 width=147) after a vacuum + vacuum analyze the explain yields: seq scan on acc xXxX A(cost=0.00..1.23 rows=1 width=147) humm, seems you are right here... but why is it choosing a index scan in the first place then? > What are the costs associated with the EXPLAIN output? Perhaps a sequential scan is *faster* then an index scan. > > Mike Mascari > mascarm@mascari.com
I appriciate all the help i've gotten here... anyway, creating a table: CREATE TABLE index_with ( id SERIAL, name text ); CREATE INDEX name_index ON index_with(name); and filling it with 10000 rows made out of $pwgen 8 10000 > data [enter] and then running VACUUM and VACUUM ANALYZE still yields a sequential scan doing a select * from index_with where name > 'm'; namely seq scan on index_with (cost=0.00..189 rows 5170 width=16) Sorry to bother ýou guys this much. Daniel Åkerud > If I recall correctly, when the optimizer was changed (greatly enhanced), > there was a debate about what the default behavior should be. The problem > was that a large number of users would populate they're database after > index creation and see sluggishness because the statistics had not yet been > updated vs. the much smaller number of users that would suffer at the hands > of an index scan against a table that would be better served with a > sequential scan. I *think* the result of assuming 0 rows in a newly created > table, until the next vacuum, would yield a significant increase in > mailing-list traffic complaints to the tune of: > > "Why isn't PostgreSQL using my index?" > > followed by the usual > > "Did you run VACUUM ANALYZE?" > > So an assumption of 1000 rows was made, with 10 rows matching your WHERE > clause. > > Mike Mascari > mascarm@mascari.com > > -----Original Message----- > From: Daniel ?erud [SMTP:zilch@home.se] > Sent: Sunday, April 01, 2001 12:43 PM > To: pgsql-general@postgresql.org > Subject: Re: RE: Re: [GENERAL] Dissapearing indexes, what's that all about? > > > after a refresh database the explain yields: > index scan using xXxX (cost=0.00..8.14 rows=10 width=147) > after a vacuum + vacuum analyze the explain yields: > seq scan on acc xXxX A(cost=0.00..1.23 rows=1 width=147) > > humm, seems you are right here... but why is it choosing a > index scan in the first place then? > > > What are the costs associated with the EXPLAIN output? > Perhaps a sequential scan is *faster* then an index scan. > > > > Mike Mascari > > mascarm@mascari.com > > > ---------------------------(end of broadcast)------------- -------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >
Mike Mascari <mascarm@mascari.com> writes: > I *think* the result of assuming 0 rows in a newly created > table, until the next vacuum, would yield a significant increase in > mailing-list traffic complaints to the tune of: > "Why isn't PostgreSQL using my index?" > followed by the usual > "Did you run VACUUM ANALYZE?" > So an assumption of 1000 rows was made, with 10 rows matching your WHERE > clause. Yup, exactly. The initial default statistics are set (with malice aforethought) to provoke an indexscan. After you VACUUM, the optimizer knows how large the table really is (ie, tiny), and so it decides that looking at the index is a waste of time, it might as well just scan the table. Load up some more data, VACUUM again, and you'll probably see an indexscan used. > after a refresh database the explain yields: > index scan using xXxX (cost=0.00..8.14 rows=10 width=147) > after a vacuum + vacuum analyze the explain yields: > seq scan on acc xXxX A(cost=0.00..1.23 rows=1 width=147) BTW, comparing those two cost numbers is pretty pointless since they are based on different information about the size of the table. regards, tom lane
Cool!! Can you guess where the limit is? ten thousand is not enought obviously, and putting 10000 more in there takes 10 minutes... even on a clean database. That is another problem however, bevuase on a slower machine it takes 13 seconds > Mike Mascari <mascarm@mascari.com> writes: > > I *think* the result of assuming 0 rows in a newly created > > table, until the next vacuum, would yield a significant increase in > > mailing-list traffic complaints to the tune of: > > "Why isn't PostgreSQL using my index?" > > followed by the usual > > "Did you run VACUUM ANALYZE?" > > So an assumption of 1000 rows was made, with 10 rows matching your WHERE > > clause. > > Yup, exactly. The initial default statistics are set (with malice > aforethought) to provoke an indexscan. After you VACUUM, the optimizer > knows how large the table really is (ie, tiny), and so it decides that > looking at the index is a waste of time, it might as well just scan the > table. Load up some more data, VACUUM again, and you'll probably see an > indexscan used. > > > after a refresh database the explain yields: > > index scan using xXxX (cost=0.00..8.14 rows=10 width=147) > > after a vacuum + vacuum analyze the explain yields: > > seq scan on acc xXxX A(cost=0.00..1.23 rows=1 width=147) > > BTW, comparing those two cost numbers is pretty pointless since they are > based on different information about the size of the table. > > regards, tom lane >
Daniel ?erud <zilch@home.se> writes: > and filling it with 10000 rows made out of > $pwgen 8 10000 > data [enter] > and then running VACUUM and VACUUM ANALYZE > still yields a sequential scan doing a > select * from index_with where name > 'm'; > namely > seq scan on index_with (cost=0.00..189 rows 5170 width=16) So? You're asking it to retrieve over half of the table (or at least the planner estimates so, and I don't see any evidence here that its estimate is wildly off). An indexscan would still be a loser in this scenario. If you want to see an indexscan with an inequality query, try giving it a reasonably tight range. Probably select * from index_with where name > 'm' and name < 'n'; would use the index in this example. regards, tom lane
Wohooo, deluxe :-) THANKS EVERYBODY!! Can't see the logic behind that though The jump in the b-tree must save about 5000 checks... half the table?? Thanks! Daniel Åkerud > Daniel ?erud <zilch@home.se> writes: > > and filling it with 10000 rows made out of > > $pwgen 8 10000 > data [enter] > > and then running VACUUM and VACUUM ANALYZE > > still yields a sequential scan doing a > > select * from index_with where name > 'm'; > > namely > > seq scan on index_with (cost=0.00..189 rows 5170 width=16) > > So? You're asking it to retrieve over half of the table (or at least > the planner estimates so, and I don't see any evidence here that its > estimate is wildly off). An indexscan would still be a loser in this > scenario. > > If you want to see an indexscan with an inequality query, try giving > it a reasonably tight range. Probably > > select * from index_with where name > 'm' and name < 'n'; > > would use the index in this example. > > regards, tom lane >
Daniel ?erud <zilch@home.se> writes: > Can't see the logic behind that though > The jump in the b-tree must save about 5000 checks... half > the table?? CPUs are fast. Disks are slow. If you think about CPU time rather than disk accesses, you will usually draw the wrong conclusions. Even more to the point, disks do not like random access. A seqscan can typically fetch four or more sequential blocks from disk in the time it takes an indexscan to fetch one block on a random-access basis. When you do the math it turns out seqscan wins unless you are fetching just a small percentage of the rows. regards, tom lane