"Jeremy Dunn" <jdunn@autorevenue.com> writes:
> The question: why does the planner consider a sequential scan to be
> better for these top 10 values?
At some point a seqscan *will* be better. In the limit, if the key
being sought is common enough to occur on every page of the table,
it's certain that a seqscan will require less I/O than an indexscan
(because reading the index isn't actually saving you any heap fetches).
In practice the breakeven point is less than that because Unix kernels
are better at handling sequential than random access.
Your gripe appears to be basically that the planner's idea of the
breakeven point is off a bit. It looks to me like it's within about
a factor of 2 of being right, though, which is not all that bad when
it's using generic cost parameters.
> A) alter table xxx alter column cid set statistics 500;
> analyze xxx;
> This does not affect the results.
It probably improved the accuracy of the row count estimates, no?
The estimate you show for cid=7191032 is off by more than 25% (37765 vs
50792), which seems like a lot of error for one of the most common
values in the table. (I hope that was with default stats target and
not 500.) That leads directly to a 25% overestimate of the cost of
an indexscan, while having IIRC no impact on the cost of a seqscan.
Since the cost ratio was more than 25%, this didn't change the selected
plan, but you want to fix that error as best you can before you move
on to tweaking cost parameters.
> C) decreasing cpu_index_tuple_cost by a factor of up to 1000, with no
> success
Wrong thing. You should be tweaking random_page_cost. Looks to me like
a value near 2 might be appropriate for your setup. Also it is likely
appropriate to increase effective_cache_size, which is awfully small in
the default configuration. I'd set that to something related to your
available RAM before trying to home in on a suitable random_page_cost.
AFAIK hardly anyone bothers with changing the cpu_xxx costs ...
regards, tom lane