Thread: 7.4, 'group by' default ordering?
I notice that 7.4 doesn't do default ordering on a 'group by', so you have to throw in an 'order by' clause to get the output in ascending group order. Is this something that most RDB's have historically done (including PG prior to 7.4) but isn't really part of the SQL standard? On a mostly unrelated topic, does the SQL standard indicate whether NULL should sort to the front or the back? Is there a way to force it to one or the other independent of whether the order by clause uses ascending or descending order? -- Mike Nolan
On Thu, Jan 08, 2004 at 13:42:33 -0600, Mike Nolan <nolan@gw.tssi.com> wrote: > I notice that 7.4 doesn't do default ordering on a 'group by', so you have > to throw in an 'order by' clause to get the output in ascending group order. > > Is this something that most RDB's have historically done (including PG prior > to 7.4) but isn't really part of the SQL standard? That is because group by is often done with a sort, so rows would naturally be in that order. If there isn't an order by clause, the set of return rows can be in any order. > On a mostly unrelated topic, does the SQL standard indicate whether NULL > should sort to the front or the back? Is there a way to force it to > one or the other independent of whether the order by clause uses ascending or > descending order? In SQL for Smarties, Joe Ceclko says that either NULLs should all be first or all be last (independent of whether the sort is ascending or descending). There was also some discussion on how the order is constrained if the sort is on multiple columns where the value of the first column is NULL, but the values of other columns are not. I don't have the book here with me now, but I think the result of the discussion was that within rows with a NULL value for the first column, they should be sorted by the values in the later columns.
Bruno Wolff III <bruno@wolff.to> writes: > On Thu, Jan 08, 2004 at 13:42:33 -0600, >> Is this something that most RDB's have historically done (including PG prior >> to 7.4) but isn't really part of the SQL standard? > That is because group by is often done with a sort, so rows would naturally > be in that order. If there isn't an order by clause, the set of return > rows can be in any order. PG has historically implemented GROUP BY with sort + uniq (and still may if the planner thinks it better than a hash method), but I am not sure that this is particularly widespread among other DBMSes. In any case, the spec certainly says that you cannot expect any particular result ordering if you didn't say ORDER BY. >> On a mostly unrelated topic, does the SQL standard indicate whether NULL >> should sort to the front or the back? Is there a way to force it to >> one or the other independent of whether the order by clause uses >> ascending or descending order? > In SQL for Smarties, Joe Ceclko says that either NULLs should all be first > or all be last (independent of whether the sort is ascending or descending). If Celko really says that, I think he's wrong. SQL92 13.1 general rule 3 says: Whether a sort key value that is null is considered greater or less than a non-null value is implementation-defined, but all sort key values that are null shall either be considered greater than all non-null values or be considered less than all non-null values. Since they use the phraseology "greater than" and "less than", I'd expect that switching between ASC and DESC order would reverse the output ordering, just as it would for two ordinary values one of which is greater than the other. We actually went to some trouble to make this happen, a release or three back. IIRC, at one time PG did sort NULLs to the end regardless of ASC/DESC, but we were persuaded that this was contrary to spec. regards, tom lane
Tom Lane wrote: >Bruno Wolff III <bruno@wolff.to> writes: > > >>On Thu, Jan 08, 2004 at 13:42:33 -0600, >> <snip> >> >> >>>On a mostly unrelated topic, does the SQL standard indicate whether NULL >>>should sort to the front or the back? Is there a way to force it to >>>one or the other independent of whether the order by clause uses >>>ascending or descending order? >>> >>> > > > >>In SQL for Smarties, Joe Ceclko says that either NULLs should all be first >>or all be last (independent of whether the sort is ascending or descending). >> >> > >If Celko really says that, I think he's wrong. SQL92 13.1 general rule >3 says: > > Whether a sort key value that is null is considered greater > or less than a non-null value is implementation-defined, but > all sort key values that are null shall either be considered > greater than all non-null values or be considered less than > all non-null values. > >Since they use the phraseology "greater than" and "less than", I'd >expect that switching between ASC and DESC order would reverse the >output ordering, just as it would for two ordinary values one of which >is greater than the other. > >We actually went to some trouble to make this happen, a release or three >back. IIRC, at one time PG did sort NULLs to the end regardless of >ASC/DESC, but we were persuaded that this was contrary to spec. > > regards, tom lane > > Celko uses the SQL92 13.1 rule wording: "Whether a sort key value that is NULL is considered greater or less than a non-NULL value is implementation defined, but all sort key values that are NULL will either be considered greater than all non-NULL values or be considered less than all non-NULL values. There are SQL products that do it either way." 2nd Ed SQL For Smarties. And of more interest, he also points out that in SQL-89, the last General Rule of <comparison predicate> should still be applied: "Although 'x=y' is unkown if both x and y are NULL values, in the context of GROUP BY, ORDER BY, and DISTINCT, a NULL value is identical to or is a duplicate of another NULL value." So NULL=NULL for purposes of GROUP BY, ORDER BY and DISTINCT. PostgresSQL seems to treat them this way and puts them after non-NULL values. Here's my test case: Welcome to psql 7.4beta5, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help on internal slash commands \g or terminate with semicolon to execute query \q to quit celko=# CREATE TABLE sortable (a INT DEFAULT NULL, b INT); celko=# INSERT INTO sortable (b) VALUES (8); INSERT 60836961 1 celko=# INSERT INTO sortable (b) VALUES (4); INSERT 60836962 1 celko=# SELECT a,b FROM sortable ORDER BY a,b; a | b ---+--- | 4 | 8 (2 rows) celko=# INSERT INTO sortable (a,b) VALUES (5,5); INSERT 60836963 1 celko=# SELECT a,b FROM sortable ORDER BY a,b; a | b ---+--- 5 | 5 | 4 | 8 (3 rows) celko=# INSERT INTO sortable (b) VALUES (5); INSERT 60836964 1 celko=# SELECT a,b FROM sortable ORDER BY a,b; a | b ---+--- 5 | 5 | 4 | 5 | 8 (4 rows) celko=# INSERT INTO sortable (a,b) VALUES (2,2); INSERT 60836965 1 celko=# SELECT a,b FROM sortable ORDER BY a,b; a | b ---+--- 2 | 2 5 | 5 | 4 | 5 | 8 (5 rows) celko=# SELECT a,b FROM sortable ORDER BY b,a; a | b ---+--- 2 | 2 | 4 5 | 5 | 5 | 8 (5 rows) FYI Ron
After takin a swig o' Arrakan spice grog, nolan@gw.tssi.com (Mike Nolan) belched out: > I notice that 7.4 doesn't do default ordering on a 'group by', so you have > to throw in an 'order by' clause to get the output in ascending group order. > > Is this something that most RDB's have historically done (including PG prior > to 7.4) but isn't really part of the SQL standard? No, in the absence of an "ORDER BY" clause to impose order, there is no "standard" reason to expect _any_ SQL database to return results in _any_ particular sort of order. It would be perfectly legitimate for a database to store all data in hash tables, and to return rows in the resulting random order, sorting the result set only if the query specified an order. > On a mostly unrelated topic, does the SQL standard indicate whether > NULL should sort to the front or the back? Is there a way to force > it to one or the other independent of whether the order by clause > uses ascending or descending order? NULL isn't equal to any other value, so that, heading back to that "any order is reasonable" notion, it might, in theory, be "standards-conformant" to randomly intersperse the NULL values amongst the other entries that _ARE_ returned in order. I understand that Oracle declines to include NULL entries in indices, which doubtless has interesting implications... Clustering NULLs together at either the start or end of a query seems sensible; different SQL databases handle this differently... -- If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me http://www.ntlug.org/~cbbrowne/ It's a little known fact that the Dark Ages were caused by the Y1K problem.
On Thu, Jan 08, 2004 at 15:24:41 -0800, Ron St-Pierre <rstpierre@syscor.com> wrote: > Celko uses the SQL92 13.1 rule wording: "Whether a sort key value that > is NULL is considered greater or less than a non-NULL value is > implementation defined, but all sort key values that are NULL will > either be considered greater than all non-NULL values or be considered > less than all non-NULL values. There are SQL products that do it either > way." 2nd Ed SQL For Smarties. The part just after that prompted my statement. The text is: And there are those that have it all wrong; the Sybase family simply treats the NULLs as if they were really values -- that is, they sort low for ascending and high for descending. This seems to me to be saying contradictory things now that I have reread it. If NULLs are treated as real values, I would expect them to be output first for one of descending and ascending and last for the other. But the second part of the statement contradicts this and seems to be saying that Sybase always emits records with NULL values for the key first.