Re: Index usage - Mailing list pgsql-general

From Dennis Björklund
Subject Re: Index usage
Date
Msg-id Pine.LNX.4.44.0308261321210.31606-100000@zigo.dhs.org
Whole thread Raw
In response to Re: Index usage  (Pedro Alves <pmalves@think.pt>)
List pgsql-general
On Tue, 26 Aug 2003, Pedro Alves wrote:

> The vacuum analyze is run on daily basis, so that cannot be the point.

On all tables I assume.

> Is there any way to force the planner to use the index?

For experiments you can set enable_seqscan to false, but the real
solution is not to force it to always use indexscans since sometimes
seqscans is faster.

> Bellow is the explain analyze of the querys. Indeed, the second range has
> more rows (9105 / 21503), but is this SO big that the planner cannot
> handle?

Can not handle is to say to much. If you are going to use lots of the rows
in a table a sequential scan is going to be faster then using the index.
THe problem here seems to be that it thinks that it needs a lot of rows so
it chooses to do a seq. scan. The best way is to understand why the
estimate is off. Usually it's because one have no run vacuum analyze on
that table.

> This is running in a dedicated machine with 512Mb ram. Is there any
> configuration parameter so that I can increase the "index to seq turn point"? :)

If you have a fast disk you might for example set random_page_cost to 2 or
something. Here is a small text about tuning pg:

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

>  Aggregate  (cost=10660.84..10660.84 rows=1 width=0) (actual
> time=172.41..172.41 rows=1 loops=1)
>    ->  Index Scan using requisicaoanalise_datacolh on requisicaoanalise  (cost=0.00..10654.06 rows=2711 width=0)
(actualtime=0.13..145.50 rows=9105 loops=1) 
>          Index Cond: ((ra_datacolh >= '2003-04-01'::date) AND (ra_datacolh
> < '2003-05-01'::date))
>          Filter: (isactive = 0)
>  Total runtime: 172.62 msec

Here he estimated 2711 rows and choosed a index scan. The actual number of
rows was 9105

> Aggregate  (cost=31019.00..31019.00 rows=1 width=0) (actual time=43252.40..43252.40 rows=1 loops=1)
>    ->  Seq Scan on requisicaoanalise  (cost=0.00..30965.24 rows=21503 width=0) (actual time=8.43..43224.01 rows=9248
loops=1)
>          Filter: ((ra_datacolh >= '2003-06-01'::date) AND (ra_datacolh < '2003-07-01'::date) AND (isactive = 0))
>  Total runtime: 43252.57 msec

Here the estimate is to use 21503 rows, but the actual number was 9248.

Both estimates are way of. Sure, you can lower the point where it uses
index scan instead of a seq. scan, but maybe you get even bigger
missmatches in the future.

What I would do is to run vacuum analyse manually on the tables that are
used in the query and try again, to be really sure. Then, if that does not
work maybe one should try increase the statistics-gathering for the date
column (using alter table). Also I would try to understand what is in
pg_statistic. I don't know much about these last parts and have never
looked inside pg_statistics, but that is what I would look into. Maybe
someone else have other ideas or can help you figure out why the estimates
are wrong.

--
/Dennis


pgsql-general by date:

Previous
From: Pedro Alves
Date:
Subject: Re: Index usage and wrong cost analisys
Next
From:
Date:
Subject: function restore order