Martin Hampl <Martin.Hampl@gmx.de> writes:
> Am 30.04.2004 um 01:32 schrieb Tom Lane:
>> Did you ANALYZE these tables?
> I did.
Hm. I'm wondering why the row estimates for 's' are off by several
orders of magnitude:
> -> Index Scan using s_pkey on s (cost=0.00..24698.44 rows=3367
> width=16) (actual time=75.933..589743.642 rows=1111220 loops=1)
It could be that this table has a lot of empty pages near the front,
which is a condition that's known to lead to underestimated row count
from ANALYZE. (Manfred is working on a better ANALYZE sampling method
that should avoid such errors in future.) Try doing a straight VACUUM
and see if the row count estimate gets better. If so, it might be worth
the trouble to do a VACUUM FULL to get rid of the empty space. (And you
should also think about doing routine vacuums more often, and perhaps
increasing the FSM settings, to ensure you don't get back into this state.)
The other thing I'm wondering about is why the devil it's choosing an
indexscan at all, when it has no indexscan conditions to use. Are you
perhaps forcing that choice via "enable_seqscan = false"? If so, don't.
> Maybe that *is* what i wanted it to do? However, searching just for
> 'FACTSHEET' is very quick (I rebooted before this query to clear any
> cache---is there a better way to do this?):
> and I would have thought that the results of this query could have been
> used to search for the respective records in s (using on of the
> indexes)?
Undoubtedly it did consider that plan, but rejected it because it looked
more expensive than the alternatives. This is not too surprising given
the overestimate of the number of rows matching 'FACTSHEET' (7892 vs
reality of 5). You might need to increase the statistics target for
token.word (see ALTER TABLE SET STATISTICS) to give the planner more
data to work with about the distribution of words.
regards, tom lane