Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics
Date
Msg-id 26169.1213139023@sss.pgh.pa.us
Whole thread Raw
In response to Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics  ("Nathan Boley" <npboley@gmail.com>)
Responses Re: Runtime checking of MCV (Was: ... histogram bucket numdistinct statistics)  (Csaba Nagy <nagy@ecircle-ag.com>)
Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics  ("Nathan Boley" <npboley@gmail.com>)
List pgsql-hackers
"Nathan Boley" <npboley@gmail.com> writes:
>> (1) On what grounds do you assert the above?

> For a table with 1000000 non-mcv rows, the planner estimates a result
> set of cardinality 1000000/20 = 50000, not 1.

The real problem in that situation is that you need another twenty slots
in the MCV list.  The MCV list should *always* exhaust the set of values
for which it'd be bad to do an indexscan.  Assuming that the threshold
for switching to an indexscan is somewhere around selectivity 0.005
(I am not certain offhand, but it's in that general area), this cannot
possibly require more than 200 MCV slots, and for most data
distributions it'd be a whole lot less.

Given such an MCV list, the planner will always make the right choice
of whether to do index or seqscan ... as long as it knows the value
being searched for, that is.  Parameterized plans have a hard time here,
but that's not really the fault of the statistics.

> I see where the confusion is coming from. When I said
> What if ( somehow ) we knew that there was a 90%
> chance that query would return an empty result set
> I meant that the planner doesn't know that information. And how could it?

Hmm.  IIRC the estimates are set up on the assumption that you are
searching for a value that occurs in the table.  I suppose there are
applications where that's often false, but as you say, it's hard to know
that in advance.
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Nathan Boley"
Date:
Subject: Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics
Next
From: Gregory Stark
Date:
Subject: Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics