Re: BETWEEN optimizer problems with single-value - Mailing list pgsql-performance

From Simon Riggs
Subject Re: BETWEEN optimizer problems with single-value
Date
Msg-id 1142510032.3859.320.camel@localhost.localdomain
Whole thread Raw
In response to Re: BETWEEN optimizer problems with single-value  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BETWEEN optimizer problems with single-value  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On Wed, 2006-03-15 at 21:05 -0500, Tom Lane wrote:
> So we need a more accurate estimate for the boundary case.

Agreed.

> > Using 1.0e-10 isn't very useful... the selectivity for a range should
> > never be less than the selectivity for an equality, so we should simply
> > put in a test against one of the pseudo constants and use that as the
> > minimal value.
>
> That's easier said than done, because you'd first have to find the
> appropriate equality operator to use (ie, one having semantics that
> agree with the inequality operators).
...

Kevin: this is also the reason we can't simply transform the WHERE
clause into a more appropriate form...

> Possibly we could drop this code's reliance on seeing
> SCALARLTSEL/SCALARGTSEL as the estimators, and instead try to locate a
> common btree opclass for the operators --- which would then let us
> identify the right equality operator to use, and also let us distinguish
> > from >= etc.  If we're trying to get the boundary cases right I
> suspect we have to account for that.  I could see such an approach being
> tremendously slow though :-(, because we'd go looking for btree
> opclasses even for operators that have nothing to do with < or >.

Trying to get the information in the wrong place would be very
expensive, I agree. But preparing that information when we have access
to it and passing it through the plan would be much cheaper. Relating
op->opclass will be very useful in other places in planning, even if any
one case seems not to justify the work to record it. (This case feels
like deja vu, all over again.)

The operator and the opclass are only connected via an index access
method, but for a particular index each column has only one opclass. So
the opclass will have a 1-1 correspondence with the operator for *that*
plan only, realising that other plans might have different
correspondences. find_usable_indexes() or thereabouts could annotate a
restriction OpExpr with the opclass it will use.

Once we have the link, clauselist_selectivity() can trivially compare
opclasses for both OpExprs, then retrieve other information for that
opclass for various purposes.

Seems lots of work for such a corner case, but would be worth it if this
solves other problems as well.

Best Regards, Simon Riggs


pgsql-performance by date:

Previous
From: Richard Huxton
Date:
Subject: Re: PostgreSQL and Xeon MP
Next
From: "Guillaume Smet"
Date:
Subject: Re: PostgreSQL and Xeon MP