"Chris Browne" <cbbrowne@acm.org> writes:
> stark@enterprisedb.com (Gregory Stark) writes:
>
>> You can even do this with GROUP BY as long as the leading columns of
>> the ORDER BY inside the subquery exactly matches the GROUP BY
>> columns.
...
> Is there not some risk that the query planner might choose to do
> hash-based accumulation could discard the subquery's ordering?
This is an interesting meme. Every time this topic comes up people always
think it's hash aggregates that risk destroying the ordering. I suppose
because usually the fear is that you can't iterate through hash keys in the
same order you created them.
However the ordering we're concerned with here isn't the order of the hash
keys. It's the order in which the elements making up the aggregate are applied
to each key. That order is always going to be the order in which the values
are seen.
In fact hash aggregates aren't the question at all; they're pretty much always
going to work. There's no reason for hash aggregates to change the order in
which individual data for a given hash key are processed. That's the whole
advantage of hash aggregates, they don't need to be pre-sorted. So they'll
always see the data in the order the subquery provides them.
The dangerous case is *non* hash aggregates. Regular sorted aggregates need to
have their inputs sorted so Postgres has to go out of its way to check for a
pre-existing matching ordering and avoid re-sorting the data. If it re-sorted
the inputs according to just the GROUP BY key it would destroy the
pre-existing order and the aggregate would see the data for an individual
group by key in an arbitrary order.
(In fact, it's worse, it would work sometimes and not other times depending on
which sort algorithm was used because in-memory we use qsort which is not
stable but on-disk we use mergesort which is.)
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com