Thread: Generalizing range-constraint detection in clauselist_selectivity
Over in pgsql-performance, Shaun Thomas was just complaining about the planner not picking a bitmap indexscan for a query involving a constraint like b.created_dt between a.created_dtand a.created_dt + interval '1 month'; At first I wrote this off as being due to inability to get a good selectivity estimate, but on second look it seemed like even with the default estimate for a range constraint, the planner should've made the choice he wanted. After a bit of digging I realized that it wasn't recognizing this as a range constraint on b.created_dt at all, because the code in clauselist_selectivity that tries to pair up inequality constraints punts altogether for anything involving a join --- it only wants to look at "var >= constant" types of clauses: * See if it looks like a restriction clause with a pseudoconstant on * one side. (Anything more complicatedthan that might not behave in * the simple way we are expecting.) I'm thinking that this is overly restrictive, and we could usefully suppose that "var >= anything" and "var <= anything" should be treated as a range constraint pair if the vars match and there are no volatile functions in the expressions. We are only trying to get a selectivity estimate here, so rigorous correctness is not required. However, I'm a little worried that I might be overlooking cases where this would be unduly optimistic. Does anyone see a situation where such a pair of clauses *shouldn't* be thought to be a range constraint on the var? For instance, should we still restrict the "var" side to be an expression in columns of only one relation? regards, tom lane
> I'm thinking that this is overly restrictive, and we could usefully > suppose that "var >= anything" and "var <= anything" should be treated > as a range constraint pair if the vars match and there are no volatile > functions in the expressions. We are only trying to get a selectivity > estimate here, so rigorous correctness is not required. However, I'm > a little worried that I might be overlooking cases where this would be > unduly optimistic. Does anyone see a situation where such a pair of > clauses *shouldn't* be thought to be a range constraint on the var? > For instance, should we still restrict the "var" side to be an > expression in columns of only one relation? Hmmm. I don't see why we have to restrict them, at least in theory. If more than one relation is involved in an expression for "var", then doesn't the join between the other relations have to be evaluated prior to evaluating the join conditions on the range relation? i.e. it seems to me that for relations a,b,c: where( a.1 + b.1 ) <= c.1 and ( a.2 + b.2 ) >= c.1 ... that we're already forced to join a and b before we can meaningfully evaluate the join condition on c, no? If not, then we do have to restrict, but it seems to me that we are. Other than that, I can't come up with a real problem for this optimization which wouldn't already be disqualified (like types which evaluate >= in a non-scalar manner). -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: >> I'm thinking that this is overly restrictive, and we could usefully >> suppose that "var >= anything" and "var <= anything" should be treated >> as a range constraint pair if the vars match and there are no volatile >> functions in the expressions. We are only trying to get a selectivity >> estimate here, so rigorous correctness is not required. However, I'm >> a little worried that I might be overlooking cases where this would be >> unduly optimistic. Does anyone see a situation where such a pair of >> clauses *shouldn't* be thought to be a range constraint on the var? >> For instance, should we still restrict the "var" side to be an >> expression in columns of only one relation? > Hmmm. I don't see why we have to restrict them, at least in theory. > If more than one relation is involved in an expression for "var", then > doesn't the join between the other relations have to be evaluated prior > to evaluating the join conditions on the range relation? i.e. it seems > to me that for relations a,b,c: > where > ( a.1 + b.1 ) <= c.1 and ( a.2 + b.2 ) >= c.1 > ... that we're already forced to join a and b before we can meaningfully > evaluate the join condition on c, no? If not, then we do have to > restrict, but it seems to me that we are. Well, one point that I'm not too sure about the implications of is that in practice, clauselist_selectivity is not called on random collections of clauses, but only clauses that are all going to be evaluated at the "same place", ie a particular scan or join. So that's already going to limit the combinations of clauses that it can be pointed at. An example of why this might be an issue is a.x >= b.y AND a.x <= constant If we change things as I'm thinking, these two clauses would be seen as a range pair, but only when they appear in the same clause list. And most of the time they wouldn't --- a.x <= constant would drop down to the restriction clause list for "a", but the first clause would be kept in the a+b join clause list. This means the size of the a+b join relation would be estimated without recognizing the range relationship. But then, if we considered a parameterized indexscan on a.x, it would have both clauses in its indexqual list, so we'd use the range interpretation in costing that indexscan, which would likely give that particular plan an "unfair" advantage. Maybe that's just fine, or maybe it isn't. I'm not sure. We could probably eliminate that inconsistency by insisting that two clauses can only be matched for this purpose when they reference the same set of rels overall, but that doesn't feel right --- it certainly seems like the example above ought to be thought of as a range restriction if possible. regards, tom lane
On 9/28/12 5:13 PM, Tom Lane wrote: > We could probably eliminate that inconsistency by insisting that two > clauses can only be matched for this purpose when they reference the > same set of rels overall, but that doesn't feel right --- it certainly > seems like the example above ought to be thought of as a range > restriction if possible. Yes, it does. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com