Thread: Why those queries do not utilize indexes?
Hi everybody! Here is my queries: 1. explain SELECT * FROM messageinfo WHERE user_id = CAST( 20000 AS BIGINT ) and msgstatus = CAST( 0 AS smallint ); 2. explain SELECT * FROM messageinfo WHERE messageinfo.user_id = 20000::int8 and msgstatus = 0::smallint; In both cases Explain command shows: 1. Sequential search and very high cost if set enable_seqscan to on; Seq scan on messageinfo ( cost=0.00..24371.30, rows =36802 ) 2. Index scan but even bigger cost if set enable_seqscan to off; Index �messagesStatus� on messageinfo ( Cost=0.00..27220.72, rows=36802 ) messageinfo table has 200 records which meet this criteria and 662420 in total: CREATE TABLE messageinfo ( user_id int8 NOT NULL, msgstatus int2 NOT NULL DEFAULT (0)::smallint, receivedtime timestamp NOT NULL DEFAULT now(), � msgread bool DEFAULT false, � CONSTRAINT "$1" FOREIGN KEY (user_id) REFERENCES users (id) ON UPDATE CASCADE ON DELETE CASCADE, ) WITH OIDS; CREATE INDEX msgstatus ON messageinfo USING btree (user_id, msgstatus); CREATE INDEX "messagesStatus" ON messageinfo USING btree (msgstatus); CREATE INDEX msgread ON messageinfo USING btree (user_id, msgread); CREATE INDEX "receivedTime" ON messageinfo USING btree (receivedtime); MY QUESTIONS ARE: 1. Should I afraid of high cost indexes? Or query will still be very efficient? 2. Postgres does not use the index I need. For my data sets it�s always msgstatus index is narrowest compare with �messagesStatus� one. Is any way to �enforce� to use a particular index? What�s the logic when Postgres chooses one index compare with the other. 3. I can change db structure to utilize Postgres specifics if you can tell them to me. 4. Also, originally I had �messagesStatus� index having 2 components ( �msgstatus�, �user_id� ). But query SELECT * FROM messageinfo WHERE msgstatus = 0 did not utilize indexes in this case. It only worked if both index components are in WHERE part. So I have to remove 2-nd component �user_id� from messagesStatus index even I wanted it. Is any way that where clause has only 1-st component but index is utilized? Igor Artimenko __________________________________ Do you Yahoo!? Yahoo! Mail Address AutoComplete - You start. We finish. http://promotions.yahoo.com/new_mail
On Fri, 27 Aug 2004, Artimenko Igor wrote: > 1. Sequential search and very high cost if set enable_seqscan to on; > Seq scan on messageinfo ( cost=0.00..24371.30, rows =36802 ) > > 2. Index scan but even bigger cost if set enable_seqscan to off; > Index messagesStatus on messageinfo ( Cost=0.00..27220.72, rows=36802 ) So pg thinks that a sequential scan will be a little bit faster (The cost is a little bit smaller). If you compare the actual runtimes maybe you will see that pg was right. In this case the cost is almost the same so the runtime is probably almost the same. When you have more data pg will start to use the index since then it will be faster to use an index compared to a seq. scan. -- /Dennis Björklund
I could force Postgres to use the best index by removing condition "msgstatus = CAST( 0 AS smallint );" from WHERE clause & set enable_seqscan to off; Total runtime in this case dropped from 1883 ms ( sequential reads ) to 1.598 ms ( best index ). But unfortunatelly It does not resolve my problem. I can not remove above condition. I need to find a way to use whole condition "WHERE user_id = CAST( 20000 AS BIGINT ) and msgstatus = CAST( 0 AS smallint );" and still utilyze index. Yes you are right. Using "messagesStatus" index is even worse for my data set then sequential scan. Igor Artimenko --- Dennis Bjorklund <db@zigo.dhs.org> wrote: > On Fri, 27 Aug 2004, Artimenko Igor wrote: > > > 1. Sequential search and very high cost if set enable_seqscan to on; > > Seq scan on messageinfo ( cost=0.00..24371.30, rows =36802 ) > > > > 2. Index scan but even bigger cost if set enable_seqscan to off; > > Index �messagesStatus� on messageinfo ( Cost=0.00..27220.72, rows=36802 ) > > So pg thinks that a sequential scan will be a little bit faster (The cost > is a little bit smaller). If you compare the actual runtimes maybe you > will see that pg was right. In this case the cost is almost the same so > the runtime is probably almost the same. > > When you have more data pg will start to use the index since then it will > be faster to use an index compared to a seq. scan. > > -- > /Dennis Bj�rklund > > _______________________________ Do you Yahoo!? Win 1 of 4,000 free domain names from Yahoo! Enter now. http://promotions.yahoo.com/goldrush
First things first: try vacuum full analyze on all the tables involved. > 1. Should I afraid of high cost indexes? Or query will still be very efficient? Not necessarily. However, EXPLAIN output is pretty much useless for us for helping you. You need to post EXPLAIN ANALYZE output. Then, you need to use explain analyze to check the speed difference between the index and seq scan versions. Is the seq scan actually slower? > 2. Postgres does not use the index I need. For my data sets it’s always msgstatus index is > narrowest compare with ‘messagesStatus’ one. Is any way to “enforce” to use a particular index? > What’s the logic when Postgres chooses one index compare with the other. It's complicated, but it's based on teh statistics in pg_statistic that the vacuum analyze command gathers. > 3. I can change db structure to utilize Postgres specifics if you can tell them to me. I avoid using int8 and int2 in the first place :) In PostgreSQL 8.0, they will be less troublesome, but I've never seen a need for them! > 4. Also, originally I had “messagesStatus” index having 2 components ( “msgstatus”, “user_id” ). > But query SELECT * FROM messageinfo WHERE msgstatus = 0 did not utilize indexes in this case. It > only worked if both index components are in WHERE part. So I have to remove 2-nd component > “user_id” from messagesStatus index even I wanted it. Is any way that where clause has only 1-st > component but index is utilized? So long as your where clause matches a subset of the columns in the index in left to right order, the index can be used. For example, if your index is over (a, b, c) then select * where a=1 and b=2; can use the index. Chris