Home > mailing lists

Re: GIN indexscans versus equality selectivity estimation - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: GIN indexscans versus equality selectivity estimation
Date	January 11, 2011 00:18:55
Msg-id	AANLkTimeZkKm=_Do-bv5okESiF+Vcjz-HWCAHGe2PaQa@mail.gmail.com Whole thread Raw
In response to	Re: GIN indexscans versus equality selectivity estimation (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: GIN indexscans versus equality selectivity estimation (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

On Mon, Jan 10, 2011 at 10:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sun, Jan 9, 2011 at 6:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> or we could hack eqsel() to bound the no-stats estimate to a bit less
>>> than 1.
>
>> This seems like a pretty sensible thing to do.  I can't immediately
>> imagine a situation in which 1.0 is a sensible selectivity estimate in
>> the no-stats case and 0.90 (say) is a major regression.
>
> After sleeping on it, that seems like my least favorite option.  It's
> basically a kluge, as is obvious because there's no principled way to
> choose what the bound is (or the minimum result from
> get_variable_numdistinct, if we were to hack it there).

Well, the general problem is that we have no reasonable way of
handling planning uncertainty.  We have no way of throwing our hands
up in the air and saying "I really have no clue how many rows are
going to come out of that node"; as far as the rest of the planning
process is concerned, a selectivity estimate of 0.005 based on
<column> = <some MCV with a frequency of 0.005> is exactly identical
to one that results from a completely inscrutable equality condition.
So while I agree with you that there's no particular principled way to
choose the exact value, that doesn't strike me as a compelling
argument against fixing some value.  ISTM that selectivity estimates
of exactly 0 and exactly 1 ought to be viewed with a healthy dose of
suspicion.

> I'm currently
> leaning to the idea of tweaking the logic in indxpath.c; in particular,
> why wouldn't it be a good idea to force consideration of the bitmap path
> if the index type hasn't got amgettuple?  If we don't, then we've
> completely wasted the effort spent up to that point inside
> find_usable_indexes.

I guess the obvious question is: why wouldn't it be a good idea to
force consideration of the bitmap path even if the index type DOES
have amgettuple?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: "Kevin Grittner"
Date: 11 January 2011, 00:16:32
Subject: Re: Compatibility GUC for serializable

From: Robert Haas
Date: 11 January 2011, 00:24:15
Subject: Re: system views for walsender activity

Re: GIN indexscans versus equality selectivity estimation - Mailing list pgsql-hackers

Previous

Next