Thread: Do AGGREGATES consistently use sort order?
I have the following query: select array_accum(name) from (select name from placenames where desig='crater' order by name desc) a; with array_accum defined as: CREATE AGGREGATE array_accum ( BASETYPE = anyelement, SFUNC = array_append, STYPE = anyarray, INITCOND = '{}' ); Can I count on this aggregate to take each new item in sorted order when it adds it to the state vector? So that if I have the following: oregon_2007_08_20=# select * from (select name from placenames where desig='crater' order by name desc) a; name -------------------- Yapoah Crater West Crater Twin Craters Timber Crater Red Crater Newberry Crater Nash Crater Mount Mazama Millican Crater Little Nash Crater Le Conte Crater Jordan Craters Diamond Craters Coffeepot Crater Cayuse Crater Black Crater Big Hole Belknap Crater (18 rows) I can always count on (note the order name): \a oregon_2007_08_20=# select array_accum(name) from (select name from placenames where desig='crater' order by name desc) a; array_accum {"Yapoah Crater","West Crater","Twin Craters","Timber Crater","Red Crater","Newberry Crater","Nash Crater","Mount Mazama","Millican Crater","Little Nash Crater","Le Conte Crater","Jordan Craters","Diamond Craters","Coffeepot Crater","Cayuse Crater","Black Crater","Big Hole","Belknap Crater"} (1 row) I am interested in stitching a line out of points in postgis, but the order/aggregate thing is a general question. Thx W
"Webb Sprague" <webb.sprague@gmail.com> writes: > I can always count on (note the order name): > > \a > oregon_2007_08_20=# select array_accum(name) from (select name from > placenames where desig='crater' order by name desc) a; > array_accum > {"Yapoah Crater","West Crater","Twin Craters","Timber Crater","Red > Crater","Newberry Crater","Nash Crater","Mount Mazama","Millican > Crater","Little Nash Crater","Le Conte Crater","Jordan > Craters","Diamond Craters","Coffeepot Crater","Cayuse Crater","Black > Crater","Big Hole","Belknap Crater"} > (1 row) > > I am interested in stitching a line out of points in postgis, but the > order/aggregate thing is a general question. Yes. You can even do this with GROUP BY as long as the leading columns of the ORDER BY inside the subquery exactly matches the GROUP BY columns. In theory we can't promise anything about future versions of Postgres but there are lots of people doing this already so if ever this was lost there would probably be some new explicit way to achieve the same thing. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
order/aggregate thing is a general question. > > Yes. > > You can even do this with GROUP BY as long as the leading columns of the ORDER > BY inside the subquery exactly matches the GROUP BY columns. > > In theory we can't promise anything about future versions of Postgres but > there are lots of people doing this already so if ever this was lost there > would probably be some new explicit way to achieve the same thing. Does anyone have any spec links, oracle behavior, or whatever? For now I will trust Postgres to continue behaving sanely, but I am curious. Thx to Gregory for the quick reply > > -- > Gregory Stark > EnterpriseDB http://www.enterprisedb.com >
stark@enterprisedb.com (Gregory Stark) writes: > "Webb Sprague" <webb.sprague@gmail.com> writes: > >> I can always count on (note the order name): >> >> \a >> oregon_2007_08_20=# select array_accum(name) from (select name from >> placenames where desig='crater' order by name desc) a; >> array_accum >> {"Yapoah Crater","West Crater","Twin Craters","Timber Crater","Red >> Crater","Newberry Crater","Nash Crater","Mount Mazama","Millican >> Crater","Little Nash Crater","Le Conte Crater","Jordan >> Craters","Diamond Craters","Coffeepot Crater","Cayuse Crater","Black >> Crater","Big Hole","Belknap Crater"} >> (1 row) >> >> I am interested in stitching a line out of points in postgis, but >> the order/aggregate thing is a general question. > > Yes. > > You can even do this with GROUP BY as long as the leading columns of > the ORDER BY inside the subquery exactly matches the GROUP BY > columns. > > In theory we can't promise anything about future versions of > Postgres but there are lots of people doing this already so if ever > this was lost there would probably be some new explicit way to > achieve the same thing. Is there not some risk that the query planner might choose to do hash-based accumulation could discard the subquery's ordering? Under the visible circumstances, it's unlikely, but isn't it possible for the aggregation to pick hashing and make a hash of this? -- output = reverse("gro.mca" "@" "enworbbc") http://linuxfinances.info/info/spiritual.html If anyone ever markets a really well-documented Unix that doesn't require babysitting by a phalanx of provincial Unix clones, there'll be a lot of unemployable, twinky-braindamaged misfits out deservedly pounding the pavements.
Chris Browne <cbbrowne@acm.org> writes: > stark@enterprisedb.com (Gregory Stark) writes: >> In theory we can't promise anything about future versions of >> Postgres but there are lots of people doing this already so if ever >> this was lost there would probably be some new explicit way to >> achieve the same thing. > Is there not some risk that the query planner might choose to do > hash-based accumulation could discard the subquery's ordering? Hash aggregation doesn't change the order in which input rows are fed to any one aggregate; it only results in nominally-independent aggregate functions being computed in parallel. If you tried to write an aggregate function that had hidden private state and wasn't re-entrant, it would likely fail under hash aggregation; but aggregate functions that "play by the rules" by keeping all their state in the declared state object will work fine. The long and the short of it is that we are very well aware that people depend on this behavior, and we are not likely to break it. regards, tom lane
"Chris Browne" <cbbrowne@acm.org> writes: > stark@enterprisedb.com (Gregory Stark) writes: > >> You can even do this with GROUP BY as long as the leading columns of >> the ORDER BY inside the subquery exactly matches the GROUP BY >> columns. ... > Is there not some risk that the query planner might choose to do > hash-based accumulation could discard the subquery's ordering? This is an interesting meme. Every time this topic comes up people always think it's hash aggregates that risk destroying the ordering. I suppose because usually the fear is that you can't iterate through hash keys in the same order you created them. However the ordering we're concerned with here isn't the order of the hash keys. It's the order in which the elements making up the aggregate are applied to each key. That order is always going to be the order in which the values are seen. In fact hash aggregates aren't the question at all; they're pretty much always going to work. There's no reason for hash aggregates to change the order in which individual data for a given hash key are processed. That's the whole advantage of hash aggregates, they don't need to be pre-sorted. So they'll always see the data in the order the subquery provides them. The dangerous case is *non* hash aggregates. Regular sorted aggregates need to have their inputs sorted so Postgres has to go out of its way to check for a pre-existing matching ordering and avoid re-sorting the data. If it re-sorted the inputs according to just the GROUP BY key it would destroy the pre-existing order and the aggregate would see the data for an individual group by key in an arbitrary order. (In fact, it's worse, it would work sometimes and not other times depending on which sort algorithm was used because in-memory we use qsort which is not stable but on-disk we use mergesort which is.) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com