Thread: Generalizing range-constraint detection in clauselist_selectivity

Generalizing range-constraint detection in clauselist_selectivity

From
Tom Lane
Date:
Over in pgsql-performance, Shaun Thomas was just complaining about the
planner not picking a bitmap indexscan for a query involving a
constraint like 
b.created_dt between a.created_dtand a.created_dt + interval '1 month';

At first I wrote this off as being due to inability to get a good
selectivity estimate, but on second look it seemed like even with the
default estimate for a range constraint, the planner should've made the
choice he wanted.  After a bit of digging I realized that it wasn't
recognizing this as a range constraint on b.created_dt at all, because
the code in clauselist_selectivity that tries to pair up inequality
constraints punts altogether for anything involving a join --- it only
wants to look at "var >= constant" types of clauses:
        * See if it looks like a restriction clause with a pseudoconstant on        * one side.  (Anything more
complicatedthan that might not behave in        * the simple way we are expecting.)
 

I'm thinking that this is overly restrictive, and we could usefully
suppose that "var >= anything" and "var <= anything" should be treated
as a range constraint pair if the vars match and there are no volatile
functions in the expressions.  We are only trying to get a selectivity
estimate here, so rigorous correctness is not required.  However, I'm
a little worried that I might be overlooking cases where this would be
unduly optimistic.  Does anyone see a situation where such a pair of
clauses *shouldn't* be thought to be a range constraint on the var?
For instance, should we still restrict the "var" side to be an
expression in columns of only one relation?
        regards, tom lane



Re: Generalizing range-constraint detection in clauselist_selectivity

From
Josh Berkus
Date:
> I'm thinking that this is overly restrictive, and we could usefully
> suppose that "var >= anything" and "var <= anything" should be treated
> as a range constraint pair if the vars match and there are no volatile
> functions in the expressions.  We are only trying to get a selectivity
> estimate here, so rigorous correctness is not required.  However, I'm
> a little worried that I might be overlooking cases where this would be
> unduly optimistic.  Does anyone see a situation where such a pair of
> clauses *shouldn't* be thought to be a range constraint on the var?
> For instance, should we still restrict the "var" side to be an
> expression in columns of only one relation?

Hmmm.  I don't see why we have to restrict them, at least in theory.
If more than one relation is involved in an expression for "var", then
doesn't the join between the other relations have to be evaluated prior
to evaluating the join conditions on the range relation?  i.e. it seems
to me that for relations a,b,c:

where( a.1 + b.1 ) <= c.1 and ( a.2 + b.2 ) >= c.1

... that we're already forced to join a and b before we can meaningfully
evaluate the join condition on c, no?  If not, then we do have to
restrict, but it seems to me that we are.

Other than that, I can't come up with a real problem for this
optimization which wouldn't already be disqualified (like types which
evaluate >= in a non-scalar manner).

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Generalizing range-constraint detection in clauselist_selectivity

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
>> I'm thinking that this is overly restrictive, and we could usefully
>> suppose that "var >= anything" and "var <= anything" should be treated
>> as a range constraint pair if the vars match and there are no volatile
>> functions in the expressions.  We are only trying to get a selectivity
>> estimate here, so rigorous correctness is not required.  However, I'm
>> a little worried that I might be overlooking cases where this would be
>> unduly optimistic.  Does anyone see a situation where such a pair of
>> clauses *shouldn't* be thought to be a range constraint on the var?
>> For instance, should we still restrict the "var" side to be an
>> expression in columns of only one relation?

> Hmmm.  I don't see why we have to restrict them, at least in theory.
> If more than one relation is involved in an expression for "var", then
> doesn't the join between the other relations have to be evaluated prior
> to evaluating the join conditions on the range relation?  i.e. it seems
> to me that for relations a,b,c:

> where
>     ( a.1 + b.1 ) <= c.1 and ( a.2 + b.2 ) >= c.1

> ... that we're already forced to join a and b before we can meaningfully
> evaluate the join condition on c, no?  If not, then we do have to
> restrict, but it seems to me that we are.

Well, one point that I'm not too sure about the implications of is that
in practice, clauselist_selectivity is not called on random collections
of clauses, but only clauses that are all going to be evaluated at the
"same place", ie a particular scan or join.  So that's already going to
limit the combinations of clauses that it can be pointed at.  An example
of why this might be an issue is
a.x >= b.y AND a.x <= constant

If we change things as I'm thinking, these two clauses would be seen as
a range pair, but only when they appear in the same clause list.  And
most of the time they wouldn't --- a.x <= constant would drop down to
the restriction clause list for "a", but the first clause would be kept
in the a+b join clause list.  This means the size of the a+b join
relation would be estimated without recognizing the range relationship.
But then, if we considered a parameterized indexscan on a.x, it would
have both clauses in its indexqual list, so we'd use the range
interpretation in costing that indexscan, which would likely give that
particular plan an "unfair" advantage.  Maybe that's just fine, or maybe
it isn't.  I'm not sure.

We could probably eliminate that inconsistency by insisting that two
clauses can only be matched for this purpose when they reference the
same set of rels overall, but that doesn't feel right --- it certainly
seems like the example above ought to be thought of as a range
restriction if possible.
        regards, tom lane



Re: Generalizing range-constraint detection in clauselist_selectivity

From
Josh Berkus
Date:
On 9/28/12 5:13 PM, Tom Lane wrote:
> We could probably eliminate that inconsistency by insisting that two
> clauses can only be matched for this purpose when they reference the
> same set of rels overall, but that doesn't feel right --- it certainly
> seems like the example above ought to be thought of as a range
> restriction if possible.

Yes, it does.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com