Thread: Index doesn't appear to be working.
I have a table with a column titled 'passfail' that only contains either a P for pass or an F for fail. The table name is 'one'. I created the index on the table with: CREATE INDEX one_passfail_idx ON one USING btree (passfail); I then do: VACUUM ANALYZE one; Then I do an explain on this query: SELECT * FROM one where passfail = 'P'; and it tells me: Seq Scan on one (cost=0.00..263.02 rows=5613 width=56) Shouldn't it tell me Index Scan using one_passfail_idx on one? Why isn't it using the index? This query returns about 5,600 of 10,000 records. It is faster to just do a seq scan because it returns such a high percentage or records? Thanks, I appreciate the help! John Oakes
On Sat, 2 Mar 2002, John Oakes wrote: > I have a table with a column titled 'passfail' that only contains either a P > for pass or an F for fail. The table name is 'one'. I created the index on > the table with: > > CREATE INDEX one_passfail_idx ON one USING btree (passfail); > > I then do: > > VACUUM ANALYZE one; > > Then I do an explain on this query: > > SELECT * FROM one where passfail = 'P'; > > and it tells me: > > Seq Scan on one (cost=0.00..263.02 rows=5613 width=56) > > Shouldn't it tell me > > Index Scan using one_passfail_idx on one? > > Why isn't it using the index? This query returns about 5,600 of 10,000 > records. It is faster to just do a seq scan because it returns such a high > percentage or records? Thanks, I appreciate the help! Usually the seq scan will be faster in this kind of situation. Because the system needs to load the matching rows anyway, it's likely to result in reading all the blocks of the table and paying a penalty for seeking around the file.
> > Why isn't it using the index? This query returns about 5,600 of 10,000 > > records. It is faster to just do a seq scan because it returns such a high > > percentage or records? Thanks, I appreciate the help! > > Usually the seq scan will be faster in this kind of situation. Because > the system needs to load the matching rows anyway, it's likely to result > in reading all the blocks of the table and paying a penalty for seeking > around the file. We are getting this "no index use" question over and over again, and I was wondering why the FAQ item was not answering this question for people. It turns out the wording was not very clear and I have updated this FAQ item to more clearly answer the question. Seems I kept adding to that item without restructuring the information: <H4><A name="4.8">4.8</A>) My queries are slow or don't make use of the indexes. Why?</H4> Indexes are not automatically used by every query. Indexes are only used if the table is larger than a minimum size,and the index selects only a small percentage of the rows in the table. This is because the random disk access causedby an index scan is sometimes slower than a straight read through the table, or sequential scan. <P>To determineif an index should be used, PostgreSQL must have statistics about the table. These statistics are collected using <SMALL>VACUUM ANALYZE</SMALL>, or simply <SMALL>ANALYZE</SMALL>. Using statistics, the optimizer knows how manyrows are in the table, and can better determine if indexes should be used. Statistics are also valuable in determiningoptimal join order and join methods. Statistics collection should be performed periodically as the contentsof the table change. <P>Indexes are normally not used for <SMALL>ORDER BY</SMALL> or to perform joins. A sequential scan followed by an explicitsort is usually faster than an index scan of a large table.</P> However, <SMALL>LIMIT</SMALL> combined with <SMALL>ORDERBY</SMALL> often will use an index because only a small portion of the table is returned. <P>When using wild-card operators such as <SMALL>LIKE</SMALL> or <I>~</I>, indexes can only be used if the beginningof the search is anchored to the start of the string. Therefore, to use indexes, <SMALL>LIKE</SMALL> patternsmust not start with <I>%</I>, and <I>~</I>(regular expression) patterns must start with <I>^</I>.</P> -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Is there any plan of making Postgresql collecting statistics a little more automatically? -----Original Message----- From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of Bruce Momjian Sent: Sunday, March 03, 2002 11:03 AM To: Stephan Szabo Cc: John Oakes; pgsql-sql@postgresql.org Subject: Re: [SQL] Index doesn't appear to be working. > > Why isn't it using the index? This query returns about 5,600 of 10,000 > > records. It is faster to just do a seq scan because it returns such a high > > percentage or records? Thanks, I appreciate the help! > > Usually the seq scan will be faster in this kind of situation. Because > the system needs to load the matching rows anyway, it's likely to result > in reading all the blocks of the table and paying a penalty for seeking > around the file. We are getting this "no index use" question over and over again, and I was wondering why the FAQ item was not answering this question for people. It turns out the wording was not very clear and I have updated this FAQ item to more clearly answer the question. Seems I kept adding to that item without restructuring the information: <H4><A name="4.8">4.8</A>) My queries are slow or don't make use of the indexes. Why?</H4> Indexes are not automatically used by every query. Indexes are only used if the table is larger than a minimum size,and the index selects only a small percentage of the rows in the table. This is because the random disk access causedby an index scan is sometimes slower than a straight read through the table, or sequential scan. <P>To determine if an index should be used, PostgreSQL must have statistics about the table. These statistics are collectedusing <SMALL>VACUUM ANALYZE</SMALL>, or simply <SMALL>ANALYZE</SMALL>. Using statistics, the optimizer knowshow many rows are in the table, and can better determine if indexes should be used. Statistics are also valuablein determining optimal join order and join methods. Statistics collection should be performed periodically asthe contents of the table change. <P>Indexes are normally not used for <SMALL>ORDER BY</SMALL> or to perform joins. A sequential scan followed by an explicitsort is usually faster than an index scan of a large table.</P> However, <SMALL>LIMIT</SMALL> combined with <SMALL>ORDERBY</SMALL> often will use an index because only a small portion of the table is returned. <P>When using wild-card operators such as <SMALL>LIKE</SMALL> or <I>~</I>, indexes can only be used if the beginningof the search is anchored to the start of the string. Therefore, to use indexes, <SMALL>LIKE</SMALL> patternsmust not start with <I>%</I>, and <I>~</I>(regular expression) patterns must start with <I>^</I>.</P> -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026 ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
> Is there any plan of making Postgresql collecting statistics a little > more automatically? Well, most of the commerical OS's require some manual intervention. I don't know how we could do any better than we are doing now, but we are open to suggestions. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
At 11:02 AM 03/03/02 -0500, Bruce Momjian wrote: > Indexes are not automatically used by every query. Indexes are only > used if the table is larger than a minimum size, and the index > selects only a small percentage of the rows in the table. Shouldn't it say: "and the *query* selects only a small percentage..."
Frank Bax wrote: > At 11:02 AM 03/03/02 -0500, Bruce Momjian wrote: > > Indexes are not automatically used by every query. Indexes are only > > used if the table is larger than a minimum size, and the index > > selects only a small percentage of the rows in the table. > > > Shouldn't it say: > "and the *query* selects only a small percentage..." Yes, I meant "index returns only a small percentage", but query selects is better. Thanks. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026