Daniel Grace <dgrace@wingsnw.com> writes:
> One possible concern might be typecasts that aren't a 1:1
> representation. While no two VARCHARs are going to produce the same
> TEXT, this is not true in other cases (1.1::float::integer and
> 1.2::float::integer both produce 1, for instance).
> Off the top of my head, I can't think of a good example where this
> would cause a problem -- it'd be easy enough to manufacture a possible
> test case, but it'd be so contrived and I don't know if it's something
> that would be seen in production code. But if we SELECT
> SOME_INTEGER_AGGREGATE(DISTINCT floatcol ORDER BY floatcol), should
> the DISTINCT operate on floatcol (i.e. 1.1 and 1.2 are distinct, even
> if it means the function is called with '1' twice) or
> floatcol::integer (1.1 and 1.2 are not distinct)?
Yes. The current implementation has the advantage that any
unique-ifying step is guaranteed to produce outputs that are distinct
from the point of view of the aggregate function, whereas if we try to
keep the two operations at arms-length, then either we lose that
property or we sort-and-unique twice :-(.
If memory serves, this type of consideration is also why DISTINCT and
GROUP BY are made to follow ORDER BY's choice of semantics in an
ordinary SELECT query --- you might find that surprising, but if they
weren't on the same page it could be even more surprising.
So on reflection I think that the current fix is the best one and
we don't want to reconsider it later.
regards, tom lane