Thread: Index usage
Hi. I'm having some trouble on the use of indexes. The querys below are exactly the same but refer to different months. One case uses indexes, the other doesn't. Is there anything I can do? Increasing index mem size? Query 2 hash 9105 entries matching the given conditions Query 2 hash 9248 entries matching the given conditions QUERY 1: explain select ra_datacolh::date, count(distinct ra_utente) from requisicaoanalise where (ra_datacolh::date >= '2003-4-01'::dateand ra_datacolh::date < '2003-5-1'::date) and isactive=0 group by ra_datacolh; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Aggregate (cost=0.00..12300.68 rows=313 width=8) -> Group (cost=0.00..12292.86 rows=3130 width=8) -> Index Scan using requisicaoanalise_datacolh on requisicaoanalise (cost=0.00..12285.03 rows=3130 width=8) Index Cond: ((ra_datacolh >= '2003-04-01'::date) AND (ra_datacolh < '2003-05-01'::date)) Filter: (isactive = 0) QUERY 2: netlab=> explain select ra_datacolh::date, count(distinct ra_utente) from requisicaoanalise where (ra_datacolh::date>= '2003-6-01'::date and ra_datacolh::date < '2003-7-1'::date) and isactive=0 group by ra_datacolh; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=32452.98..32632.06 rows=2388 width=8) -> Group (cost=32452.98..32572.36 rows=23877 width=8) -> Sort (cost=32452.98..32512.67 rows=23877 width=8) Sort Key: ra_datacolh -> Seq Scan on requisicaoanalise (cost=0.00..30716.71 rows=23877 width=8) Filter: ((ra_datacolh >= '2003-06-01'::date) AND (ra_datacolh < '2003-07-01'::date) AND (isactive =0)) -- Pedro Miguel G. Alves pmalves@think.pt THINK - Tecnologias de Informação www.think.pt Tel: +351 21 413 46 00 Av. José Gomes Ferreira Fax: +351 21 413 46 09 nº 13 1495-139 ALGÉS
On Mon, 25 Aug 2003, Pedro Alves wrote: > The querys below are exactly the same but refer to different months. > One case uses indexes, the other doesn't. > > Is there anything I can do? Increasing index mem size? Run "vacuum analyze". The planner seems to think that one of the queries returns 313 rows while the other returns 2388 rows. To me that looks like the statistics need to be updated using vacuum analyze. Also, explain analyze gives a little more information and is better to run then just explain. -- /Dennis
First of all tkx for the answer, Dennis. The vacuum analyze is run on daily basis, so that cannot be the point. Is there any way to force the planner to use the index? Bellow is the explain analyze of the querys. Indeed, the second range has more rows (9105 / 21503), but is this SO big that the planner cannot handle? This is running in a dedicated machine with 512Mb ram. Is there any configuration parameter so that I can increase the "index to seq turn point"? :) Thanks OK __________________________________ explain ANALYZE select count(1) from requisicaoanalise where (ra_datacolh::date >= '2003-4-01'::date and ra_datacolh::date < '2003-5-1'::date) and isactive=0; Aggregate (cost=10660.84..10660.84 rows=1 width=0) (actual time=172.41..172.41 rows=1 loops=1) -> Index Scan using requisicaoanalise_datacolh on requisicaoanalise (cost=0.00..10654.06 rows=2711 width=0) (actualtime=0.13..145.50 rows=9105 loops=1) Index Cond: ((ra_datacolh >= '2003-04-01'::date) AND (ra_datacolh < '2003-05-01'::date)) Filter: (isactive = 0) Total runtime: 172.62 msec (5 rows) NOK __________________________________ explain ANALYZE select count(1) from requisicaoanalise where (ra_datacolh::date >= '2003-6-01'::date and ra_datacolh::date< '2003-7-1'::date) and isactive=0; Aggregate (cost=31019.00..31019.00 rows=1 width=0) (actual time=43252.40..43252.40 rows=1 loops=1) -> Seq Scan on requisicaoanalise (cost=0.00..30965.24 rows=21503 width=0) (actual time=8.43..43224.01 rows=9248 loops=1) Filter: ((ra_datacolh >= '2003-06-01'::date) AND (ra_datacolh < '2003-07-01'::date) AND (isactive = 0)) Total runtime: 43252.57 msec (4 rows) On Tue, Aug 26, 2003 at 08:37:08AM +0200, Dennis Björklund wrote: > On Mon, 25 Aug 2003, Pedro Alves wrote: > > > The querys below are exactly the same but refer to different months. > > One case uses indexes, the other doesn't. > > > > Is there anything I can do? Increasing index mem size? > > Run "vacuum analyze". The planner seems to think that one of the queries > returns 313 rows while the other returns 2388 rows. > > To me that looks like the statistics need to be updated using vacuum > analyze. > > Also, explain analyze gives a little more information and is better to > run then just explain. > > -- > /Dennis -- Pedro Miguel G. Alves pmalves@think.pt THINK - Tecnologias de Informação www.think.pt Tel: +351 21 413 46 00 Av. José Gomes Ferreira Fax: +351 21 413 46 09 nº 13 1495-139 ALGÉS
One more thing I just noticed. Right after making a vacuum analyze on the table, I saw the following: Seq scan: - Cost: 10484 - Timing: 624ms Index scan (with enable_seqscan = false): - Cost: 10628 - Timing: 41ms In production state the query goes up to a minute long (I ran this in a test database) and it takes less than a second using indexes. What can be causing this? Is it safe to turn enable_seqscan = false in production environment? Thanks On Tue, Aug 26, 2003 at 09:59:35AM +0100, Pedro Alves wrote: > > First of all tkx for the answer, Dennis. > > The vacuum analyze is run on daily basis, so that cannot be the point. > > Is there any way to force the planner to use the index? > > > Bellow is the explain analyze of the querys. Indeed, the second range has > more rows (9105 / 21503), but is this SO big that the planner cannot > handle? > > > This is running in a dedicated machine with 512Mb ram. Is there any > configuration parameter so that I can increase the "index to seq turn point"? :) > > > Thanks > > > OK __________________________________ > > explain ANALYZE select count(1) from requisicaoanalise where > (ra_datacolh::date >= '2003-4-01'::date and ra_datacolh::date < > '2003-5-1'::date) and isactive=0; > > Aggregate (cost=10660.84..10660.84 rows=1 width=0) (actual > time=172.41..172.41 rows=1 loops=1) > -> Index Scan using requisicaoanalise_datacolh on requisicaoanalise (cost=0.00..10654.06 rows=2711 width=0) (actualtime=0.13..145.50 rows=9105 loops=1) > Index Cond: ((ra_datacolh >= '2003-04-01'::date) AND (ra_datacolh > < '2003-05-01'::date)) > Filter: (isactive = 0) > Total runtime: 172.62 msec > (5 rows) > > > > NOK __________________________________ > > explain ANALYZE select count(1) from requisicaoanalise where (ra_datacolh::date >= '2003-6-01'::date and ra_datacolh::date< '2003-7-1'::date) and isactive=0; > > Aggregate (cost=31019.00..31019.00 rows=1 width=0) (actual time=43252.40..43252.40 rows=1 loops=1) > -> Seq Scan on requisicaoanalise (cost=0.00..30965.24 rows=21503 width=0) (actual time=8.43..43224.01 rows=9248 loops=1) > Filter: ((ra_datacolh >= '2003-06-01'::date) AND (ra_datacolh < '2003-07-01'::date) AND (isactive = 0)) > Total runtime: 43252.57 msec > (4 rows) > > > > > > On Tue, Aug 26, 2003 at 08:37:08AM +0200, Dennis Björklund wrote: > > On Mon, 25 Aug 2003, Pedro Alves wrote: > > > > > The querys below are exactly the same but refer to different months. > > > One case uses indexes, the other doesn't. > > > > > > Is there anything I can do? Increasing index mem size? > > > > Run "vacuum analyze". The planner seems to think that one of the queries > > returns 313 rows while the other returns 2388 rows. > > > > To me that looks like the statistics need to be updated using vacuum > > analyze. > > > > Also, explain analyze gives a little more information and is better to > > run then just explain. > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org -- Pedro Miguel G. Alves pmalves@think.pt THINK - Tecnologias de Informação www.think.pt Tel: +351 21 413 46 00 Av. José Gomes Ferreira Fax: +351 21 413 46 09 nº 13 1495-139 ALGÉS
On Tue, 26 Aug 2003, Pedro Alves wrote: > The vacuum analyze is run on daily basis, so that cannot be the point. On all tables I assume. > Is there any way to force the planner to use the index? For experiments you can set enable_seqscan to false, but the real solution is not to force it to always use indexscans since sometimes seqscans is faster. > Bellow is the explain analyze of the querys. Indeed, the second range has > more rows (9105 / 21503), but is this SO big that the planner cannot > handle? Can not handle is to say to much. If you are going to use lots of the rows in a table a sequential scan is going to be faster then using the index. THe problem here seems to be that it thinks that it needs a lot of rows so it chooses to do a seq. scan. The best way is to understand why the estimate is off. Usually it's because one have no run vacuum analyze on that table. > This is running in a dedicated machine with 512Mb ram. Is there any > configuration parameter so that I can increase the "index to seq turn point"? :) If you have a fast disk you might for example set random_page_cost to 2 or something. Here is a small text about tuning pg: http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html > Aggregate (cost=10660.84..10660.84 rows=1 width=0) (actual > time=172.41..172.41 rows=1 loops=1) > -> Index Scan using requisicaoanalise_datacolh on requisicaoanalise (cost=0.00..10654.06 rows=2711 width=0) (actualtime=0.13..145.50 rows=9105 loops=1) > Index Cond: ((ra_datacolh >= '2003-04-01'::date) AND (ra_datacolh > < '2003-05-01'::date)) > Filter: (isactive = 0) > Total runtime: 172.62 msec Here he estimated 2711 rows and choosed a index scan. The actual number of rows was 9105 > Aggregate (cost=31019.00..31019.00 rows=1 width=0) (actual time=43252.40..43252.40 rows=1 loops=1) > -> Seq Scan on requisicaoanalise (cost=0.00..30965.24 rows=21503 width=0) (actual time=8.43..43224.01 rows=9248 loops=1) > Filter: ((ra_datacolh >= '2003-06-01'::date) AND (ra_datacolh < '2003-07-01'::date) AND (isactive = 0)) > Total runtime: 43252.57 msec Here the estimate is to use 21503 rows, but the actual number was 9248. Both estimates are way of. Sure, you can lower the point where it uses index scan instead of a seq. scan, but maybe you get even bigger missmatches in the future. What I would do is to run vacuum analyse manually on the tables that are used in the query and try again, to be really sure. Then, if that does not work maybe one should try increase the statistics-gathering for the date column (using alter table). Also I would try to understand what is in pg_statistic. I don't know much about these last parts and have never looked inside pg_statistics, but that is what I would look into. Maybe someone else have other ideas or can help you figure out why the estimates are wrong. -- /Dennis
On Tue, 26 Aug 2003, Pedro Alves wrote: > I can't see much of a difference. Here goes the details I forgot one setting that is important that you change. The effective cache size can have a big influence on the planner. It is set very low as default and with your machine that is used only for the database this setting needs to be increased a lot. The default is I think 8MB, but I think you said you had 512MB then I guess something like say 300MB could be a good setting here. That is, the setting should be something like 37000. But it depends on what else you run on the computer and so on. Maybe you can even go a little higher. In any case, all the default settings are set low. -- /Dennis
On Tue, 26 Aug 2003, Pedro Alves wrote: > > First of all tkx for the answer, Dennis. > > The vacuum analyze is run on daily basis, so that cannot be the point. Have you tried raising the statistics targets on ra_datacolh? Is there a significant correlation between ra_datacolh and isactive values (which I think is problematic right now since cross column stats aren't generated)? Is isactive effectively a boolean (just 0 and 1 values?), if so maybe once some of the other issue are done, you might want a partial index where isactive=0 depending on how often you do queries where that's important. As for settings, lowering random_page cost a bit may help for this case although to balance it seems problematic so it'll probably just move the switch over point. effective_cache_size was mentioned already I believe.
On Tue, 26 Aug 2003, Pedro Alves wrote: > I can't see much of a difference. Here goes the details But there is a difference. > alter table requisicaoanalise alter column ra_datacolh set statistics -1; > > Aggregate (cost=10496.29..10496.29 rows=1 width=0) (actual time=633.52..633.52 rows=1 loops=1) > -> Seq Scan on requisicaoanalise (cost=0.00..10484.89 rows=4561 width=0) (actual time=127.77..627.76 rows=9032 loops=1) > Filter: ((isactive = 0) AND (ra_datacolh >= '2003-02-01'::date) AND (ra_datacolh < '2003-03-01'::date)) Here the estimated row count is 4561 which is only half of the real count. > alter table requisicaoanalise alter column ra_datacolh set statistics 1000; > > Aggregate (cost=10507.43..10507.43 rows=1 width=0) (actual time=638.73..638.73 rows=1 loops=1) > -> Seq Scan on requisicaoanalise (cost=0.00..10484.89 rows=9015 width=0) (actual time=119.19..633.36 rows=9032 loops=1) > Filter: ((isactive = 0) AND (ra_datacolh >= '2003-02-01'::date) AND (ra_datacolh < '2003-03-01'::date)) Here the estimate is 9015 which is very near the real count of 9032. I would say that this estimate is much better. And before the other case gave you an estimate of 20000 something, I bet that is also near the real value of 9000 now. What you need to do now is to try to lower RANDOM_PAGE_COST, and the other settings like CPU_TUPLE_COST you can also play with. -- /Dennis
On Tue, Aug 26, 2003 at 10:25:11AM -0400, Tom Lane wrote: > Pedro Alves <pmalves@think.pt> writes: > > It's the same even with statistics=1000. > > Uh, you did actually ANALYZE the table after each change, right? > Doesn't the EXPLAIN output change at all? > > regards, tom lane I can't see much of a difference. Here goes the details alter table requisicaoanalise alter column ra_datacolh set statistics -1; VACUUM ANALYZE requisicaoanalise; explain analyse select count(1) from requisicaoanalise where isactive=0 and ra_datacolh >= '2003-2-01' and ra_datacolh <'2003-3-1'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=10496.29..10496.29 rows=1 width=0) (actual time=633.52..633.52 rows=1 loops=1) -> Seq Scan on requisicaoanalise (cost=0.00..10484.89 rows=4561 width=0) (actual time=127.77..627.76 rows=9032 loops=1) Filter: ((isactive = 0) AND (ra_datacolh >= '2003-02-01'::date) AND (ra_datacolh < '2003-03-01'::date)) Total runtime: 633.61 msec (4 rows) alter table requisicaoanalise alter column ra_datacolh set statistics 1000; VACUUM ANALYZE requisicaoanalise; explain analyse select count(1) from requisicaoanalise where isactive=0 and ra_datacolh >= '2003-2-01' and ra_datacolh <'2003-3-1'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=10507.43..10507.43 rows=1 width=0) (actual time=638.73..638.73 rows=1 loops=1) -> Seq Scan on requisicaoanalise (cost=0.00..10484.89 rows=9015 width=0) (actual time=119.19..633.36 rows=9032 loops=1) Filter: ((isactive = 0) AND (ra_datacolh >= '2003-02-01'::date) AND (ra_datacolh < '2003-03-01'::date)) Total runtime: 639.33 msec (4 rows) -- Pedro Miguel G. Alves pmalves@think.pt THINK - Tecnologias de Informação www.think.pt Tel: +351 21 413 46 00 Av. José Gomes Ferreira Fax: +351 21 413 46 09 nº 13 1495-139 ALGÉS
Pedro Alves <pmalves@think.pt> writes: > It's the same even with statistics=1000. Uh, you did actually ANALYZE the table after each change, right? Doesn't the EXPLAIN output change at all? regards, tom lane
On Tue, Aug 26, 2003 at 04:21:52PM +0200, Dennis Björklund wrote: > On Tue, 26 Aug 2003, Pedro Alves wrote: > > > It's the same even with statistics=1000. > > And just to be sure, you ran vacuum analyze on the tables by hand after > the alter table? Yes > > > Is there any way to enable_seqscan=off just for this query? > > Just set it before you run the query. Ok -- Pedro Miguel G. Alves pmalves@think.pt THINK - Tecnologias de Informação www.think.pt Tel: +351 21 413 46 00 Av. José Gomes Ferreira Fax: +351 21 413 46 09 nº 13 1495-139 ALGÉS
On Tue, 26 Aug 2003, Pedro Alves wrote: > It's the same even with statistics=1000. And just to be sure, you ran vacuum analyze on the tables by hand after the alter table? > Is there any way to enable_seqscan=off just for this query? Just set it before you run the query. -- /Dennis
It's the same even with statistics=1000. Is there any way to enable_seqscan=off just for this query? On Tue, Aug 26, 2003 at 10:07:02AM -0400, Tom Lane wrote: > Pedro Alves <pmalves@think.pt> writes: > > The vacuum analyze is run on daily basis, so that cannot be the point. > > The next thing I'd try to improve the planner's guess is to increase the > statistics target for the ra_datacolh column (see ALTER TABLE SET > STATISTICS). It looks like the default of 10 is too small for that > column --- try 100 and see if the rowcount estimates get any closer. > > regards, tom lane -- Pedro Miguel G. Alves pmalves@think.pt THINK - Tecnologias de Informação www.think.pt Tel: +351 21 413 46 00 Av. José Gomes Ferreira Fax: +351 21 413 46 09 nº 13 1495-139 ALGÉS
Pedro Alves <pmalves@think.pt> writes: > The vacuum analyze is run on daily basis, so that cannot be the point. The next thing I'd try to improve the planner's guess is to increase the statistics target for the ra_datacolh column (see ALTER TABLE SET STATISTICS). It looks like the default of 10 is too small for that column --- try 100 and see if the rowcount estimates get any closer. regards, tom lane
Look through the docs. By altering the values of random_page_cost, effect_cache_size and cpu_tuple_cost you can make the estimates approximate real life better. On Tue, Aug 26, 2003 at 12:32:23PM +0100, Pedro Alves wrote: > > > > One more thing I just noticed. Right after making a vacuum analyze on > the table, I saw the following: > > Seq scan: > > - Cost: 10484 > - Timing: 624ms > > Index scan (with enable_seqscan = false): > > - Cost: 10628 > - Timing: 41ms > > > In production state the query goes up to a minute long (I ran this in a > test database) and it takes less than a second using indexes. What can be > causing this? > > > Is it safe to turn enable_seqscan = false in production environment? > > > > Thanks > > > > On Tue, Aug 26, 2003 at 09:59:35AM +0100, Pedro Alves wrote: > > > > First of all tkx for the answer, Dennis. > > > > The vacuum analyze is run on daily basis, so that cannot be the point. > > > > Is there any way to force the planner to use the index? > > > > > > Bellow is the explain analyze of the querys. Indeed, the second range has > > more rows (9105 / 21503), but is this SO big that the planner cannot > > handle? > > > > > > This is running in a dedicated machine with 512Mb ram. Is there any > > configuration parameter so that I can increase the "index to seq turn point"? :) > > > > > > Thanks > > > > > > OK __________________________________ > > > > explain ANALYZE select count(1) from requisicaoanalise where > > (ra_datacolh::date >= '2003-4-01'::date and ra_datacolh::date < > > '2003-5-1'::date) and isactive=0; > > > > Aggregate (cost=10660.84..10660.84 rows=1 width=0) (actual > > time=172.41..172.41 rows=1 loops=1) > > -> Index Scan using requisicaoanalise_datacolh on requisicaoanalise (cost=0.00..10654.06 rows=2711 width=0) (actualtime=0.13..145.50 rows=9105 loops=1) > > Index Cond: ((ra_datacolh >= '2003-04-01'::date) AND (ra_datacolh > > < '2003-05-01'::date)) > > Filter: (isactive = 0) > > Total runtime: 172.62 msec > > (5 rows) > > > > > > > > NOK __________________________________ > > > > explain ANALYZE select count(1) from requisicaoanalise where (ra_datacolh::date >= '2003-6-01'::date and ra_datacolh::date< '2003-7-1'::date) and isactive=0; > > > > Aggregate (cost=31019.00..31019.00 rows=1 width=0) (actual time=43252.40..43252.40 rows=1 loops=1) > > -> Seq Scan on requisicaoanalise (cost=0.00..30965.24 rows=21503 width=0) (actual time=8.43..43224.01 rows=9248loops=1) > > Filter: ((ra_datacolh >= '2003-06-01'::date) AND (ra_datacolh < '2003-07-01'::date) AND (isactive = 0)) > > Total runtime: 43252.57 msec > > (4 rows) > > > > > > > > > > > > On Tue, Aug 26, 2003 at 08:37:08AM +0200, Dennis Björklund wrote: > > > On Mon, 25 Aug 2003, Pedro Alves wrote: > > > > > > > The querys below are exactly the same but refer to different months. > > > > One case uses indexes, the other doesn't. > > > > > > > > Is there anything I can do? Increasing index mem size? > > > > > > Run "vacuum analyze". The planner seems to think that one of the queries > > > returns 313 rows while the other returns 2388 rows. > > > > > > To me that looks like the statistics need to be updated using vacuum > > > analyze. > > > > > > Also, explain analyze gives a little more information and is better to > > > run then just explain. > > > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > -- > Pedro Miguel G. Alves pmalves@think.pt > THINK - Tecnologias de Informação www.think.pt > Tel: +351 21 413 46 00 Av. José Gomes Ferreira > Fax: +351 21 413 46 09 nº 13 1495-139 ALGÉS > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > "All that is needed for the forces of evil to triumph is for enough good > men to do nothing." - Edmond Burke > "The penalty good people pay for not being interested in politics is to be > governed by people worse than themselves." - Plato