Thread: Re: Indexes not used
"D. Duccini" <duccini@backpack.com> writes: > # select count(*) from radusage; > count > -------- > 573042 > (1 row) In that case 5757 is definitely a default estimate (.01 is the default selectivity IIRC). > what is the analyze? i've run vacuum several times VACUUM ANALYZE regards, tom lane
> > what is the analyze? i've run vacuum several times > > VACUUM ANALYZE i've done this as well, but it gives me no output other than "VACUUM" or does it work silently in the background? i've even dropped the index and recreated, but it didn't have any impact on its use ----------------------------------------------------------------------------- david@backpack.com BackPack Software, Inc. www.backpack.com +1 651.645.7550 voice "Life is an Adventure. +1 651.645.9798 fax Don't forget your BackPack!" -----------------------------------------------------------------------------
"D. Duccini" <duccini@backpack.com> writes: >> VACUUM ANALYZE > i've done this as well, but it gives me no output other than "VACUUM" That's what it's supposed to do. > or does it work silently in the background? If you want noise, try VACUUM VERBOSE ANALYZE. Anyway: don't the EXPLAIN numbers change once you've done VACUUM ANALYZE? How many rows are there matching 'someuser', anyhow? It might be useful to see the planner's statistics, too -- try select attname,attdisbursion,s.* from pg_statistic s, pg_attribute a, pg_class c where starelid = c.oid and attrelid = c.oid and staattnum = attnum and relname = 'FOO'; (substitute name of interesting table for FOO) regards, tom lane
Just as an example, here's the query plan of the *SAME* query before and after a VACUUM ANALYZE Notice the way the two plans are *COMPLETELY* different. l_portal_statuses and b_portal_statuses only have *3 rows* right now, but there was no way for the planner to know that. Anyway, this should be evidence that a good VACUUM ANALYZE periodically is a Good Thing(tm). (BTW, the site that this database drives is now significantly more responsive) Before: ---------------------------------------- Merge Join (cost=97.62..170.37 rows=1000 width=110) -> Index Scan using l_portal_statuses_pkey on l_portal_statuses lps (cost=0.00..59.00 rows=1000 width=16) -> Sort (cost=97.62..97.62 rows=100 width=94) -> Merge Join (cost=22.67..94.30 rows=100 width=94) -> Index Scan using b_portal_statuses_pkey on b_portal_statuses bps (cost=0.00..59.00 rows=1000 width=16) -> Sort (cost=22.67..22.67 rows=10 width=78) -> Seq Scan on contracts c (cost=0.00..22.50 rows=10 width=78) After: ---------------------------------------- Nested Loop (cost=0.00..3.47 rows=1 width=110) -> Nested Loop (cost=0.00..2.40 rows=1 width=94) -> Seq Scan on contracts c (cost=0.00..1.34 rows=1 width=78) -> Seq Scan on b_portal_statuses bps (cost=0.00..1.03 rows=3 width=16) -> Seq Scan on l_portal_statuses lps (cost=0.00..1.03 rows=3 width=16) -- Dave
Maybe I'm not getting something here...but how is a sequential scan EVER faster than a B-tree / index lookup on a database with over 500,000 records? Certainly I could split out the data, and do some "roll-up" ops on the information in there, it just seems odd that in 6.5.x it was using the indices and was blazing fast Now in 7.0.3 its like they are not even considered...at least on this particular table....other tables they seem to be working On Thu, 15 Mar 2001, David Olbersen wrote: > Just as an example, here's the query plan of the *SAME* query before and after a > VACUUM ANALYZE > > Notice the way the two plans are *COMPLETELY* different. l_portal_statuses and > b_portal_statuses only have *3 rows* right now, but there was no way for the > planner to know that. Anyway, this should be evidence that a good VACUUM ANALYZE > periodically is a Good Thing(tm). > > (BTW, the site that this database drives is now significantly more responsive) > > Before: > ---------------------------------------- > Merge Join (cost=97.62..170.37 rows=1000 width=110) > -> Index Scan using l_portal_statuses_pkey on l_portal_statuses lps (cost=0.00..59.00 rows=1000 width=16) > -> Sort (cost=97.62..97.62 rows=100 width=94) > -> Merge Join (cost=22.67..94.30 rows=100 width=94) > -> Index Scan using b_portal_statuses_pkey on b_portal_statuses bps (cost=0.00..59.00 rows=1000 width=16) > -> Sort (cost=22.67..22.67 rows=10 width=78) > -> Seq Scan on contracts c (cost=0.00..22.50 rows=10 width=78) > > After: > ---------------------------------------- > Nested Loop (cost=0.00..3.47 rows=1 width=110) > -> Nested Loop (cost=0.00..2.40 rows=1 width=94) > -> Seq Scan on contracts c (cost=0.00..1.34 rows=1 width=78) > -> Seq Scan on b_portal_statuses bps (cost=0.00..1.03 rows=3 width=16) > -> Seq Scan on l_portal_statuses lps (cost=0.00..1.03 rows=3 width=16) > > -- Dave > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > ----------------------------------------------------------------------------- david@backpack.com BackPack Software, Inc. www.backpack.com +1 651.645.7550 voice "Life is an Adventure. +1 651.645.9798 fax Don't forget your BackPack!" -----------------------------------------------------------------------------
"D. Duccini" <duccini@backpack.com> writes: > Maybe I'm not getting something here...but how is a sequential scan EVER > faster than a B-tree / index lookup on a database with over 500,000 > records? If the system needs to fetch more than a small percentage of the records, then seqscan *will* be faster. The issue you are dealing with seems to be misestimation of the retrieval percentage for this particular query, causing the planner to guess wrong about which kind of plan to use. regards, tom lane
> If the system needs to fetch more than a small percentage of the > records, then seqscan *will* be faster. The issue you are dealing > with seems to be misestimation of the retrieval percentage for this > particular query, causing the planner to guess wrong about which > kind of plan to use. no worries...i'll try building a subset of the data and see if there is some "threshhold" value or...maybe its time i actually contributed some code to the project :) i built an OO database engine a few years ago (in objective-c) that used a modified N-tree approach to indicies that massively accelerated the retrieval of a lot of "highly similar" data items -duck ----------------------------------------------------------------------------- david@backpack.com BackPack Software, Inc. www.backpack.com +1 651.645.7550 voice "Life is an Adventure. +1 651.645.9798 fax Don't forget your BackPack!" -----------------------------------------------------------------------------
perhaps the db gawds can explain this.... # \d radusage Table "radusage" Attribute | Type | Modifier -----------+-------------+---------- datetime | timestamp | not null account | varchar(64) | usage | integer | sent | integer | recv | integer | ip | bigint | host | bigint | port | smallint | Indices: idxradaccount, idxraddate, idxradoid # \d idxradaccount Index "idxradaccount" Attribute | Type -----------+------------- account | varchar(64) btree # \d idxraddate Index "idxraddate" Attribute | Type -----------+----------- datetime | timestamp btree # explain select * from radusage where account = 'someuser'; NOTICE: QUERY PLAN: Seq Scan on radusage (cost=0.00..13870.80 rows=5674 width=50) and if i add in datetime (without effectively changing the semantic meaning of the search) # explain select * from radusage where account = 'someuser' and datetime > '1900-01-01'; NOTICE: QUERY PLAN: Index Scan using idxradaccount on radusage (cost=0.00..15295.37 rows=5668 width=50) first case doesn't use the index, the second does use what would seem to be the correct index isn't that wacky???? -duck ----------------------------------------------------------------------------- david@backpack.com BackPack Software, Inc. www.backpack.com +1 651.645.7550 voice "Life is an Adventure. +1 651.645.9798 fax Don't forget your BackPack!" -----------------------------------------------------------------------------
"D. Duccini" <duccini@backpack.com> writes: > # explain select * from radusage where account = 'someuser'; > NOTICE: QUERY PLAN: > Seq Scan on radusage (cost=0.00..13870.80 rows=5674 width=50) > and if i add in datetime (without effectively changing the semantic > meaning of the search) > # explain select * from radusage where account = 'someuser' and datetime > > '1900-01-01'; > NOTICE: QUERY PLAN: > Index Scan using idxradaccount on radusage (cost=0.00..15295.37 rows=5668 > width=50) You could get more information by looking at the estimated cost of the other alternative in each case (do SET ENABLE_SEQSCAN = OFF or SET ENABLE_INDEXSCAN = OFF, respectively, to force the planner to choose the other alternative). I bet you'll find that the estimated costs are pretty close together. What's probably happening here is that the small extra cost estimated for evaluating the "datetime > '1900-01-01'" condition at each row is pushing the cost of the seqscan up to be more than the cost of the indexscan. That extra cost gets charged for every row in the table in the seqscan case, but only for those rows pulled from the index in the indexscan case, so adding extra WHERE conditions favors the indexscan case. Not by a lot, but evidently by enough in this example. regards, tom lane