Re: Patch: improve selectivity estimation for IN/NOT IN - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Patch: improve selectivity estimation for IN/NOT IN
Date
Msg-id 27162.1330828737@sss.pgh.pa.us
Whole thread Raw
In response to Patch: improve selectivity estimation for IN/NOT IN  (Daniele Varrazzo <daniele.varrazzo@gmail.com>)
Responses Re: Patch: improve selectivity estimation for IN/NOT IN  (Daniele Varrazzo <daniele.varrazzo@gmail.com>)
Re: Patch: improve selectivity estimation for IN/NOT IN  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Daniele Varrazzo <daniele.varrazzo@gmail.com> writes:
> the attached patch improves the array selectivity estimation for = ANY
> and <> ALL, hence for the IN/NOT IN operators, to avoid the
> shortcoming described in
> <http://archives.postgresql.org/pgsql-performance/2012-03/msg00006.php>.

In connection with Alexander Korotkov's array-estimation patch,
I just committed some code into scalararraysel() that checks whether the
operator is equality or inequality of the array element type.  It does
that by consulting the default btree or hash opclass for the element
type.  I did that with the thought that it could be used to attack this
issue too, but I see that you've done it another way, ie check to see
whether the operator uses eqsel() or neqsel() as selectivity estimator.

I'm not sure offhand which way is better.  It could be argued that yours
is more appropriate because if the operator isn't btree equality, but acts
enough like it to use eqsel() as estimator, then it's still appropriate
for scalararraysel() to treat it as equality.  On the other side of the
coin, an operator might be equality but have reason to use some
operator-specific estimator rather than eqsel().  We have real examples
of the former (such as the approximate-equality geometric operators)
but I think the latter case is just hypothetical.  Another thing that
has to be thought about is that there are numerous cross-type operators
that use eqsel, such as date-vs-timestamp, and it's far from clear
whether it's appropriate for scalararraysel() to use the modified stats
calculation when dealing with one of these.  The btree-based logic is
impervious to that since it won't match any cross-type operator.

Thoughts?

(BTW, in any case I don't trust function pointer comparison to be
portable.  It'd be a lot safer to look at the function OID.)
        regards, tom lane


pgsql-hackers by date:

Previous
From: Marti Raudsepp
Date:
Subject: Re: RFC: Making TRUNCATE more "MVCC-safe"
Next
From: Daniele Varrazzo
Date:
Subject: Re: Patch: improve selectivity estimation for IN/NOT IN