Re: GIN indexscans versus equality selectivity estimation - Mailing list pgsql-hackers

From Robert Haas
Subject Re: GIN indexscans versus equality selectivity estimation
Date
Msg-id AANLkTimeZkKm=_Do-bv5okESiF+Vcjz-HWCAHGe2PaQa@mail.gmail.com
Whole thread Raw
In response to Re: GIN indexscans versus equality selectivity estimation  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: GIN indexscans versus equality selectivity estimation  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, Jan 10, 2011 at 10:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sun, Jan 9, 2011 at 6:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> or we could hack eqsel() to bound the no-stats estimate to a bit less
>>> than 1.
>
>> This seems like a pretty sensible thing to do.  I can't immediately
>> imagine a situation in which 1.0 is a sensible selectivity estimate in
>> the no-stats case and 0.90 (say) is a major regression.
>
> After sleeping on it, that seems like my least favorite option.  It's
> basically a kluge, as is obvious because there's no principled way to
> choose what the bound is (or the minimum result from
> get_variable_numdistinct, if we were to hack it there).

Well, the general problem is that we have no reasonable way of
handling planning uncertainty.  We have no way of throwing our hands
up in the air and saying "I really have no clue how many rows are
going to come out of that node"; as far as the rest of the planning
process is concerned, a selectivity estimate of 0.005 based on
<column> = <some MCV with a frequency of 0.005> is exactly identical
to one that results from a completely inscrutable equality condition.
So while I agree with you that there's no particular principled way to
choose the exact value, that doesn't strike me as a compelling
argument against fixing some value.  ISTM that selectivity estimates
of exactly 0 and exactly 1 ought to be viewed with a healthy dose of
suspicion.

> I'm currently
> leaning to the idea of tweaking the logic in indxpath.c; in particular,
> why wouldn't it be a good idea to force consideration of the bitmap path
> if the index type hasn't got amgettuple?  If we don't, then we've
> completely wasted the effort spent up to that point inside
> find_usable_indexes.

I guess the obvious question is: why wouldn't it be a good idea to
force consideration of the bitmap path even if the index type DOES
have amgettuple?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Compatibility GUC for serializable
Next
From: Robert Haas
Date:
Subject: Re: system views for walsender activity