Re: BUG #7619: Query cost estimate appears to not use n_distinct setting - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #7619: Query cost estimate appears to not use n_distinct setting
Date
Msg-id 20815.1351011720@sss.pgh.pa.us
Whole thread Raw
In response to BUG #7619: Query cost estimate appears to not use n_distinct setting  (niko.kiirala@mapvision.fi)
List pgsql-bugs
niko.kiirala@mapvision.fi writes:
> I am working on a potentially large database table, let's call it
> "observation", that has a foreign key to table "measurement". Each
> measurement is associated with either none or around five observations. In
> this kind of situation, it is well known that the statistics on the foreign
> key column in observation table can get arbitrarily bad as the row count
> increases. Especially, the estimate of the number of distinct values in the
> foreign key column can be completely off.

> To combat this issue I have set n_distinct=-0.2 on the foreign key column.
> With this, the query planner gets good estimates of row counts, but it would
> appear that this setting does not affect the cost estimate of an index
> scan.

[ traces through example with gdb ... ]

The reason the cost estimates for these two cases are nearly the same is
that in both cases, the planner predicts that only one index page and
one heap page will be visited.  There are about 360 tuples per page in
your index, so whether 83 of them or 5 of them are to be visited, the
index access cost is just about the same --- they'll most likely all be
on the same index page, and the per-tuple CPU costs are negligible
compared to the page access cost.  The table is a little less dense,
185 tuples/page, but that's still a lot more than 83.  It might seem
like the planner should be estimating 83 vs 5 heap pages fetched, which
would change the costs a lot, but it doesn't --- it thinks only one page
is fetched in both cases.  And it's right, plus or minus a page, because
this artificial test case has loaded up the table with perfectly
sequential measurement_ids.  ANALYZE has recorded the index order
correlation as exactly 1.0, so the planner predicts that all the rows
with measurement_id = 200001 will be found on the same page, even when
there are 83 of them.

In short then, the cost estimate for this query really ought to be 2 *
random_page_cost plus some not-very-large CPU-per-tuple charges, and so
it's not surprising at all that the more accurate rowcount estimate
isn't moving the result very much.

So if the cost ought to be somewhere in the vicinity of 9 units, why is
it actually coming out over 100?  The difference is coming from this
fudge-factor in genericcostestimate:

    /*
     * A difficulty with the leaf-pages-only cost approach is that for small
     * selectivities (eg, single index tuple fetched) all indexes will look
     * equally attractive because we will estimate exactly 1 leaf page to be
     * fetched.  All else being equal, we should prefer physically smaller
     * indexes over larger ones.  (An index might be smaller because it is
     * partial or because it contains fewer columns; presumably the other
     * columns in the larger index aren't useful to the query, or the larger
     * index would have better selectivity.)
     *
     * We can deal with this by adding a very small "fudge factor" that
     * depends on the index size.  The fudge factor used here is one
     * spc_random_page_cost per 10000 index pages, which should be small
     * enough to not alter index-vs-seqscan decisions, but will prevent
     * indexes of different sizes from looking exactly equally attractive.
     */
    *indexTotalCost += index->pages * spc_random_page_cost / 10000.0;

This was intended to be a very small correction, but your index is so
large that it's coming out as the dominant part of the cost.  (Of
course, this also explains your observation that the cost seems to be
linear in the index size.)

So we probably ought to rethink this calculation, but I'm not sure
offhand what to do instead.  It still seems like a good thing to
penalize larger indexes compared to smaller ones even when we are
estimating a single index page being touched.  Maybe this should be
a log rather than linear calculation, though.

            regards, tom lane

pgsql-bugs by date:

Previous
From: Sree Krishna Priya Kuppa
Date:
Subject: Posrgresql for Suse linux 64-bit version on OS/390
Next
From: John R Pierce
Date:
Subject: Re: Posrgresql for Suse linux 64-bit version on OS/390