If it is a proposal to add to core, I'd like to suggest a close cousin function of first()/last(): only(). [1]
It would behave like first() but would throw an error if it encountered more than one distinct value in the window.
This would be helpful in dependent grouping situations like this:
select a.keyval, a.name_of_the thing, sum(b.metric_value) as metric_value
from a
join b on b.a_keyval = a.keyval
group by a.keyval, a.name_of_the_thing
Now, everyone's made this optimization to reduce group-by overhead:
select a.keyval, min(a.name_of_the_thing) as name_of_the_thing, sum(b.metric_value) as metric_value
from a
join b on b.a_keyval = a.keyval
group by a.keyval
Which works fine, but it's self-anti-documenting:
- it implies that name of the thing *could* be different across rows with the same keyval
- it implies we have some business preference for names that are first in alphabetical order.
- it implies that the string has more in common with the summed metrics (imagine this query has dozens of them) than the key values to the left.
Using first(a.name_of_the_thing) is less overhead than min()/max(), but has the same issues listed above.
By using only(a.name_of_the_thing) we'd have a bit more clarity that the author expected all of those values to be the same across the aggregate window, and discovering otherwise was reason enough to fail the query.
*IF* we're considering adding these to core, I think that only() would be just a slight modification of the last() implementation, and could be done at the same time.
[1] I don't care what it gets named. I just want the functionality.