EquivalenceClasses vs volatile functions - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | EquivalenceClasses vs volatile functions |
Date | |
Msg-id | 27159.1194378143@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: EquivalenceClasses vs volatile functions
|
List | pgsql-hackers |
Awhile back there was some discussion about how the system assumes that syntactically equal sort expressions can be considered interchangeable, which falls down if volatile functions are involved: http://archives.postgresql.org/pgsql-general/2006-11/msg01523.php I seem to recall the issue coming up again more recently, but can't find the thread in the archives right now. Basically the question is what to do with examples like these: select random() from foo order by random();select random() from foo order by 1;select random() as a, random() as b from fooorder by a, b;select random() as a, random() as b from foo order by b, a; In CVS HEAD, because of syntactic matching of ORDER BY expressions, all four of these effectively order by the first output column. This is arguably the wrong thing for the first example, in which you might expect that random() is called separately to generate the sorting key. And it definitely seems the wrong thing for the last one. What's more, while we have always treated the first two examples alike, a quick check shows that 7.3 through 8.2 all treat the last example as ordering primarily by the second output column. Compare 8.2 regression=# select random() as a, random() as b from int4_tbl order by b, a; a | b -------------------+-------------------0.699889197014272 | 0.1487481608055530.302036631386727 | 0.5015847636386750.927149619441479| 0.536219729576260.919696403201669 | 0.6953000994399190.249512315262109 | 0.709584747441113 (5 rows) vs HEAD regression=# select random() as a, random() as b from int4_tbl order by b, a; a | b -------------------+--------------------0.224498846102506 | 0.789146402385086 0.47139244293794 | 0.08640268212184310.554711272474378| 0.1834706445224580.599514745175838 | 0.7111700745299460.895382364280522 | 0.0681726015172899 (5 rows) So that seems like a regression that needs to be fixed. The difficulty is that, while we understand that EquivalenceClasses involving volatile functions aren't identical even if the function calls are syntactically equal, we don't have any way to figure out which targetlist entry is the matching one for a particular PathKey. We currently match on equal expression trees (cf make_sort_from_pathkeys) and thus we get the above behavior where the first matching column is what gets sorted on. What I'm thinking of doing is adding a field to EquivalenceClass that carries the ressortgroupref of the originating ORDER BY key's targetlist entry, in the case where the EquivalenceClass came from building a PathKey for ORDER BY. Then, if the EquivalenceClass is ec_volatile, we insist on matching that rather than matching the expression tree. This wouldn't affect the behavior for ordinary non-volatile sort keys, for which expression equivalence is valid. If we simply do that then cases 1 and 2 will continue to be treated the same, because the parser converts case 1 to case 2 based on syntactic equality of the two random() calls. I'm unsure whether changing that is a good idea. It's always worked that way, and so we'd risk breaking some applications. I'm also a bit uncomfortable with having the parser make semantic choices based on potentially-changeable attributes like function volatility. If we wanted cases 1 and 2 to behave differently, it'd be better to have the parser always generate resjunk columns for ORDER BY expressions, and get the planner to collapse out columns that are redundant. But that seems too big a change to contemplate for 8.3. Comments? regards, tom lane
pgsql-hackers by date: