Thread: Do AGGREGATES consistently use sort order?

Do AGGREGATES consistently use sort order?

From
"Webb Sprague"
Date:
I have the following query:

select array_accum(name) from (select name from placenames where
desig='crater' order by name desc) a;

with array_accum defined as:

CREATE AGGREGATE array_accum (
   BASETYPE = anyelement,
   SFUNC = array_append,
   STYPE = anyarray,
   INITCOND = '{}'
);

Can I count on this aggregate to take each new item in sorted order
when it adds it to the state vector?  So that if I have the following:

oregon_2007_08_20=# select * from (select name from placenames where
desig='crater' order by name desc) a;

 name
--------------------
 Yapoah Crater
 West Crater
 Twin Craters
 Timber Crater
 Red Crater
 Newberry Crater
 Nash Crater
 Mount Mazama
 Millican Crater
 Little Nash Crater
 Le Conte Crater
 Jordan Craters
 Diamond Craters
 Coffeepot Crater
 Cayuse Crater
 Black Crater
 Big Hole
 Belknap Crater
(18 rows)

I can always count on (note the order name):

\a
oregon_2007_08_20=# select array_accum(name) from (select name from
placenames where desig='crater' order by name desc) a;
array_accum
{"Yapoah Crater","West Crater","Twin Craters","Timber Crater","Red
Crater","Newberry Crater","Nash Crater","Mount Mazama","Millican
Crater","Little Nash Crater","Le Conte Crater","Jordan
Craters","Diamond Craters","Coffeepot Crater","Cayuse Crater","Black
Crater","Big Hole","Belknap Crater"}
(1 row)

I am interested in stitching a line out of points in postgis, but the
order/aggregate thing is a general question.

Thx
W

Re: Do AGGREGATES consistently use sort order?

From
Gregory Stark
Date:
"Webb Sprague" <webb.sprague@gmail.com> writes:

> I can always count on (note the order name):
>
> \a
> oregon_2007_08_20=# select array_accum(name) from (select name from
> placenames where desig='crater' order by name desc) a;
> array_accum
> {"Yapoah Crater","West Crater","Twin Craters","Timber Crater","Red
> Crater","Newberry Crater","Nash Crater","Mount Mazama","Millican
> Crater","Little Nash Crater","Le Conte Crater","Jordan
> Craters","Diamond Craters","Coffeepot Crater","Cayuse Crater","Black
> Crater","Big Hole","Belknap Crater"}
> (1 row)
>
> I am interested in stitching a line out of points in postgis, but the
> order/aggregate thing is a general question.

Yes.

You can even do this with GROUP BY as long as the leading columns of the ORDER
BY inside the subquery exactly matches the GROUP BY columns.

In theory we can't promise anything about future versions of Postgres but
there are lots of people doing this already so if ever this was lost there
would probably be some new explicit way to achieve the same thing.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: Do AGGREGATES consistently use sort order?

From
"Webb Sprague"
Date:
order/aggregate thing is a general question.
>
> Yes.
>
> You can even do this with GROUP BY as long as the leading columns of the ORDER
> BY inside the subquery exactly matches the GROUP BY columns.
>
> In theory we can't promise anything about future versions of Postgres but
> there are lots of people doing this already so if ever this was lost there
> would probably be some new explicit way to achieve the same thing.

Does anyone have any spec links, oracle behavior, or whatever?  For
now I will trust Postgres to continue behaving sanely, but I am
curious.

Thx to Gregory for the quick reply

>
> --
>   Gregory Stark
>   EnterpriseDB          http://www.enterprisedb.com
>

Re: Do AGGREGATES consistently use sort order?

From
Chris Browne
Date:
stark@enterprisedb.com (Gregory Stark) writes:
> "Webb Sprague" <webb.sprague@gmail.com> writes:
>
>> I can always count on (note the order name):
>>
>> \a
>> oregon_2007_08_20=# select array_accum(name) from (select name from
>> placenames where desig='crater' order by name desc) a;
>> array_accum
>> {"Yapoah Crater","West Crater","Twin Craters","Timber Crater","Red
>> Crater","Newberry Crater","Nash Crater","Mount Mazama","Millican
>> Crater","Little Nash Crater","Le Conte Crater","Jordan
>> Craters","Diamond Craters","Coffeepot Crater","Cayuse Crater","Black
>> Crater","Big Hole","Belknap Crater"}
>> (1 row)
>>
>> I am interested in stitching a line out of points in postgis, but
>> the order/aggregate thing is a general question.
>
> Yes.
>
> You can even do this with GROUP BY as long as the leading columns of
> the ORDER BY inside the subquery exactly matches the GROUP BY
> columns.
>
> In theory we can't promise anything about future versions of
> Postgres but there are lots of people doing this already so if ever
> this was lost there would probably be some new explicit way to
> achieve the same thing.

Is there not some risk that the query planner might choose to do
hash-based accumulation could discard the subquery's ordering?

Under the visible circumstances, it's unlikely, but isn't it possible
for the aggregation to pick hashing and make a hash of this?
--
output = reverse("gro.mca" "@" "enworbbc")
http://linuxfinances.info/info/spiritual.html
If anyone ever  markets  a really  well-documented Unix that   doesn't
require  babysitting by a phalanx of  provincial Unix clones, there'll
be a  lot of unemployable,  twinky-braindamaged misfits out deservedly
pounding the pavements.

Re: Do AGGREGATES consistently use sort order?

From
Tom Lane
Date:
Chris Browne <cbbrowne@acm.org> writes:
> stark@enterprisedb.com (Gregory Stark) writes:
>> In theory we can't promise anything about future versions of
>> Postgres but there are lots of people doing this already so if ever
>> this was lost there would probably be some new explicit way to
>> achieve the same thing.

> Is there not some risk that the query planner might choose to do
> hash-based accumulation could discard the subquery's ordering?

Hash aggregation doesn't change the order in which input rows are fed to
any one aggregate; it only results in nominally-independent aggregate
functions being computed in parallel.  If you tried to write an
aggregate function that had hidden private state and wasn't re-entrant,
it would likely fail under hash aggregation; but aggregate functions
that "play by the rules" by keeping all their state in the declared
state object will work fine.

The long and the short of it is that we are very well aware that people
depend on this behavior, and we are not likely to break it.

            regards, tom lane

Re: Do AGGREGATES consistently use sort order?

From
Gregory Stark
Date:
"Chris Browne" <cbbrowne@acm.org> writes:

> stark@enterprisedb.com (Gregory Stark) writes:
>
>> You can even do this with GROUP BY as long as the leading columns of
>> the ORDER BY inside the subquery exactly matches the GROUP BY
>> columns.
...
> Is there not some risk that the query planner might choose to do
> hash-based accumulation could discard the subquery's ordering?

This is an interesting meme. Every time this topic comes up people always
think it's hash aggregates that risk destroying the ordering. I suppose
because usually the fear is that you can't iterate through hash keys in the
same order you created them.

However the ordering we're concerned with here isn't the order of the hash
keys. It's the order in which the elements making up the aggregate are applied
to each key. That order is always going to be the order in which the values
are seen.

In fact hash aggregates aren't the question at all; they're pretty much always
going to work. There's no reason for hash aggregates to change the order in
which individual data for a given hash key are processed. That's the whole
advantage of hash aggregates, they don't need to be pre-sorted. So they'll
always see the data in the order the subquery provides them.

The dangerous case is *non* hash aggregates. Regular sorted aggregates need to
have their inputs sorted so Postgres has to go out of its way to check for a
pre-existing matching ordering and avoid re-sorting the data. If it re-sorted
the inputs according to just the GROUP BY key it would destroy the
pre-existing order and the aggregate would see the data for an individual
group by key in an arbitrary order.

(In fact, it's worse, it would work sometimes and not other times depending on
which sort algorithm was used because in-memory we use qsort which is not
stable but on-disk we use mergesort which is.)

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com