EquivalenceClasses vs volatile functions - Mailing list pgsql-hackers

From Tom Lane
Subject EquivalenceClasses vs volatile functions
Date
Msg-id 27159.1194378143@sss.pgh.pa.us
Whole thread Raw
Responses Re: EquivalenceClasses vs volatile functions  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-hackers
Awhile back there was some discussion about how the system assumes that
syntactically equal sort expressions can be considered interchangeable,
which falls down if volatile functions are involved:
http://archives.postgresql.org/pgsql-general/2006-11/msg01523.php
I seem to recall the issue coming up again more recently, but can't
find the thread in the archives right now.  Basically the question
is what to do with examples like these:
select random() from foo order by random();select random() from foo order by 1;select random() as a, random() as b from
fooorder by a, b;select random() as a, random() as b from foo order by b, a;
 

In CVS HEAD, because of syntactic matching of ORDER BY expressions,
all four of these effectively order by the first output column.
This is arguably the wrong thing for the first example, in which you
might expect that random() is called separately to generate the sorting
key.  And it definitely seems the wrong thing for the last one.

What's more, while we have always treated the first two examples alike,
a quick check shows that 7.3 through 8.2 all treat the last example as
ordering primarily by the second output column.  Compare 8.2

regression=# select random() as a, random() as b from int4_tbl order by b, a;        a         |         b         
-------------------+-------------------0.699889197014272 | 0.1487481608055530.302036631386727 |
0.5015847636386750.927149619441479|  0.536219729576260.919696403201669 | 0.6953000994399190.249512315262109 |
0.709584747441113
(5 rows)

vs HEAD

regression=# select random() as a, random() as b from int4_tbl order by b, a;        a         |         b          
-------------------+--------------------0.224498846102506 |  0.789146402385086 0.47139244293794 |
0.08640268212184310.554711272474378|  0.1834706445224580.599514745175838 |  0.7111700745299460.895382364280522 |
0.0681726015172899
(5 rows)

So that seems like a regression that needs to be fixed.  The difficulty
is that, while we understand that EquivalenceClasses involving volatile
functions aren't identical even if the function calls are syntactically
equal, we don't have any way to figure out which targetlist entry is the
matching one for a particular PathKey.  We currently match on equal
expression trees (cf make_sort_from_pathkeys) and thus we get the above
behavior where the first matching column is what gets sorted on.

What I'm thinking of doing is adding a field to EquivalenceClass that
carries the ressortgroupref of the originating ORDER BY key's targetlist
entry, in the case where the EquivalenceClass came from building a
PathKey for ORDER BY.  Then, if the EquivalenceClass is ec_volatile,
we insist on matching that rather than matching the expression tree.
This wouldn't affect the behavior for ordinary non-volatile sort keys,
for which expression equivalence is valid.

If we simply do that then cases 1 and 2 will continue to be treated
the same, because the parser converts case 1 to case 2 based on
syntactic equality of the two random() calls.  I'm unsure whether
changing that is a good idea.  It's always worked that way, and so
we'd risk breaking some applications.  I'm also a bit uncomfortable
with having the parser make semantic choices based on
potentially-changeable attributes like function volatility.  If we
wanted cases 1 and 2 to behave differently, it'd be better to have
the parser always generate resjunk columns for ORDER BY expressions,
and get the planner to collapse out columns that are redundant.
But that seems too big a change to contemplate for 8.3.

Comments?
        regards, tom lane


pgsql-hackers by date:

Previous
From: ohp@pyrenet.fr
Date:
Subject: Re: should I worry?
Next
From: Gregory Stark
Date:
Subject: Re: Visibility map thoughts