Re: Patch: improve selectivity estimation for IN/NOT IN - Mailing list pgsql-hackers

From Daniele Varrazzo
Subject Re: Patch: improve selectivity estimation for IN/NOT IN
Date
Msg-id CA+mi_8aYBfHEY6ObSwSsuBHX=0FJXLLHdB=PfGBB4jj0iZxDyg@mail.gmail.com
Whole thread Raw
In response to Re: Patch: improve selectivity estimation for IN/NOT IN  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Patch: improve selectivity estimation for IN/NOT IN  (Euler Taveira de Oliveira <euler@timbira.com>)
List pgsql-hackers
On Sun, Mar 4, 2012 at 2:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Daniele Varrazzo <daniele.varrazzo@gmail.com> writes:
>> the attached patch improves the array selectivity estimation for = ANY
>> and <> ALL, hence for the IN/NOT IN operators, to avoid the
>> shortcoming described in
>> <http://archives.postgresql.org/pgsql-performance/2012-03/msg00006.php>.
>
> In connection with Alexander Korotkov's array-estimation patch,
> I just committed some code into scalararraysel() that checks whether the
> operator is equality or inequality of the array element type.

It looks like you have grand plans for array estimation. My patch has
a much more modest scope, and I'm hoping it could be applied to
currently maintained PG versions, as I consider the currently produced
estimations a bug.

> It does
> that by consulting the default btree or hash opclass for the element
> type.  I did that with the thought that it could be used to attack this
> issue too, but I see that you've done it another way, ie check to see
> whether the operator uses eqsel() or neqsel() as selectivity estimator.
>
> I'm not sure offhand which way is better.  It could be argued that yours
> is more appropriate because if the operator isn't btree equality, but acts
> enough like it to use eqsel() as estimator, then it's still appropriate
> for scalararraysel() to treat it as equality.  On the other side of the
> coin, an operator might be equality but have reason to use some
> operator-specific estimator rather than eqsel().  We have real examples
> of the former (such as the approximate-equality geometric operators)
> but I think the latter case is just hypothetical.  Another thing that
> has to be thought about is that there are numerous cross-type operators
> that use eqsel, such as date-vs-timestamp, and it's far from clear
> whether it's appropriate for scalararraysel() to use the modified stats
> calculation when dealing with one of these.  The btree-based logic is
> impervious to that since it won't match any cross-type operator.
>
> Thoughts?
>
> (BTW, in any case I don't trust function pointer comparison to be
> portable.  It'd be a lot safer to look at the function OID.)

My original idea was to compare the comparison function with the
catalog: I just don't know how to inspect the catalog/perform the
check. Studying how to do it I've noticed the fn_addr referring to the
"well known" eqsel/neqsel functions and thought about using it as
indicators of the right operator semantics.

If you are still interested in the patch, for sake of bugfixing, and
somebody provides an example in the source about how to compare the
functions oid, I can try to improve it.

-- Daniele


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Patch: improve selectivity estimation for IN/NOT IN
Next
From: Tom Lane
Date:
Subject: Re: Parameterized-path cost comparisons need some work