Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics
Date
Msg-id 24109.1213129997@sss.pgh.pa.us
Whole thread Raw
In response to Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics  ("Nathan Boley" <npboley@gmail.com>)
Responses Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics  ("Nathan Boley" <npboley@gmail.com>)
Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-hackers
"Nathan Boley" <npboley@gmail.com> writes:
>>> If we query on values that aren't in the table, the planner will
>>> always overestimate the expected number of returned rows because it (
>>> implicitly ) assumes that every query will return at least 1 record.
>> 
>> That's intentional and should not be changed.

> Why?  What if ( somehow ) we knew that there was a 90% chance that
> query would return an empty result set on a big table with 20 non-mcv
> distinct values. Currently the planner would always choose a seq scan,
> where an index scan might be better.

(1) On what grounds do you assert the above?

(2) What makes you think that an estimate of zero rather than one row
would change the plan?

(In fact, I don't think the plan would change, in this case.  The reason
for the clamp to 1 row is to avoid foolish results for join situations.)
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Nathan Boley"
Date:
Subject: Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics
Next
From: Peter Eisentraut
Date:
Subject: Re: Automating our version-stamping a bit better