Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics - Mailing list pgsql-hackers

From Nathan Boley
Subject Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics
Date
Msg-id 6fa3b6e20806101431u1e4395d7ob5d925d59eb45b20@mail.gmail.com
Whole thread Raw
In response to Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
>> Why?  What if ( somehow ) we knew that there was a 90% chance that
>> query would return an empty result set on a big table with 20 non-mcv
>> distinct values. Currently the planner would always choose a seq scan,
>> where an index scan might be better.
>
> (1) On what grounds do you assert the above?

For a table with 1000000 non-mcv rows, the planner estimates a result
set of cardinality 1000000/20 = 50000, not 1.

> (2) What makes you think that an estimate of zero rather than one row
> would change the plan?

I see where the confusion is coming from. When I said

>> What if ( somehow ) we knew that there was a 90%
>> chance that query would return an empty result set

I meant that the planner doesn't know that information. And how could it?

The estimate for ndistinct is an estimate for the number of distinct
values in the table, not an estimate for the number of distinct values
that will be queried for.  My original point was that we sometimes
care about the distribution of what's being queried for and not just
what's in the table.

But this is all silly anyways: if this was really a concern you would
write a function

if values exist  return values
else return none

> (In fact, I don't think the plan would change, in this case.  The reason
> for the clamp to 1 row is to avoid foolish results for join situations.)

Which makes sense. My point certainly wasn't, in any way, a criticism
of clamping selectivity to 1.

Cheers,
Nathan


pgsql-hackers by date:

Previous
From: Teodor Sigaev
Date:
Subject: Re: Proposal: GiST constraints
Next
From: Tom Lane
Date:
Subject: Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics