"Gregory Wood" <gregw@com-stock.com> writes:
> Of course if PostgreSQL were estimating the number of rows correctly, that
> would be less of a problem. Seems that our data is throwing off the
> statistics... we have some values that appear tens of thousands of times and
> others that appear only a few times, with a few values (such as the example
> I sent) in between. Perhaps it's time to look at TABLE SET STATISTICS...
I believe that pushing the SET STATISTICS target up to 50 or so would
solve the problem nicely, at the cost of making ANALYZE run longer.
However, it also bothered me that your actual runtime ratio was nearly
500:1 when the rows estimation was off "only" 36:1. There's still an
estimation error of more than a factor of 10 in there, and that can't be
explained by arguing about the appropriate value of random_page_cost.
(random_page_cost less than one is certainly nonsensical.)
I'm wondering whether the indexscan case was benefiting from pages
having been read into memory by the preceding seqscan. If you run the
seqscan plan twice in a row, does the runtime stay about the same?
regards, tom lane