Re: Problems with adding a is not null to a query. - Mailing list pgsql-bugs

From Tom Lane
Subject Re: Problems with adding a is not null to a query.
Date
Msg-id 24112.1295308549@sss.pgh.pa.us
Whole thread Raw
In response to Re: Problems with adding a is not null to a query.  (Tim Uckun <timuckun@gmail.com>)
Responses Re: Problems with adding a is not null to a query.  (Tim Uckun <timuckun@gmail.com>)
List pgsql-bugs
Tim Uckun <timuckun@gmail.com> writes:
>> Am I right in guessing that pg_stats.n_distinct is much too low for
>> the domain_id column?

> the domain_id is in the topical urls. A select count of domains shows
> that there are 700 domains, the pg_stats shows 170 which seems kind of
> low but maybe is not out of bounds because most domains are probably
> inactive.

It sounds like you've got one of those long-tail distributions where
there are lots and lots of the first couple hundred domains, and not
many at all of the last few.  The problem with the stats as you've got
them is that the planner isn't aware of the long tail, so for this
specific domain id that's not even there at all, you're getting an
estimate of a couple of thousand matches --- which is why it goes for
the indexscan-according-to-ORDER-BY plan.  It's figuring it will hit one
of those matches and be able to end the scan after reading much less
than all of the table.

As I've stated repeatedly, your next move needs to be to increase the
stats target, at least for that column if not globally.  You probably
don't need to have it know about every last domain id, but you need to
have it know about enough that it realizes that domains not included in
the MCV list are going to appear less than a couple of thousand times.

            regards, tom lane

pgsql-bugs by date:

Previous
From: Tim Uckun
Date:
Subject: Re: Problems with adding a is not null to a query.
Next
From: Tim Uckun
Date:
Subject: Re: Problems with adding a is not null to a query.