Re: Generalizing range-constraint detection in clauselist_selectivity - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Generalizing range-constraint detection in clauselist_selectivity |
Date | |
Msg-id | 18424.1348877624@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Generalizing range-constraint detection in clauselist_selectivity (Josh Berkus <josh@agliodbs.com>) |
Responses |
Re: Generalizing range-constraint detection in clauselist_selectivity
|
List | pgsql-hackers |
Josh Berkus <josh@agliodbs.com> writes: >> I'm thinking that this is overly restrictive, and we could usefully >> suppose that "var >= anything" and "var <= anything" should be treated >> as a range constraint pair if the vars match and there are no volatile >> functions in the expressions. We are only trying to get a selectivity >> estimate here, so rigorous correctness is not required. However, I'm >> a little worried that I might be overlooking cases where this would be >> unduly optimistic. Does anyone see a situation where such a pair of >> clauses *shouldn't* be thought to be a range constraint on the var? >> For instance, should we still restrict the "var" side to be an >> expression in columns of only one relation? > Hmmm. I don't see why we have to restrict them, at least in theory. > If more than one relation is involved in an expression for "var", then > doesn't the join between the other relations have to be evaluated prior > to evaluating the join conditions on the range relation? i.e. it seems > to me that for relations a,b,c: > where > ( a.1 + b.1 ) <= c.1 and ( a.2 + b.2 ) >= c.1 > ... that we're already forced to join a and b before we can meaningfully > evaluate the join condition on c, no? If not, then we do have to > restrict, but it seems to me that we are. Well, one point that I'm not too sure about the implications of is that in practice, clauselist_selectivity is not called on random collections of clauses, but only clauses that are all going to be evaluated at the "same place", ie a particular scan or join. So that's already going to limit the combinations of clauses that it can be pointed at. An example of why this might be an issue is a.x >= b.y AND a.x <= constant If we change things as I'm thinking, these two clauses would be seen as a range pair, but only when they appear in the same clause list. And most of the time they wouldn't --- a.x <= constant would drop down to the restriction clause list for "a", but the first clause would be kept in the a+b join clause list. This means the size of the a+b join relation would be estimated without recognizing the range relationship. But then, if we considered a parameterized indexscan on a.x, it would have both clauses in its indexqual list, so we'd use the range interpretation in costing that indexscan, which would likely give that particular plan an "unfair" advantage. Maybe that's just fine, or maybe it isn't. I'm not sure. We could probably eliminate that inconsistency by insisting that two clauses can only be matched for this purpose when they reference the same set of rels overall, but that doesn't feel right --- it certainly seems like the example above ought to be thought of as a range restriction if possible. regards, tom lane
pgsql-hackers by date: