Thread: 7.4, 'group by' default ordering?

7.4, 'group by' default ordering?

From
Mike Nolan
Date:
I notice that 7.4 doesn't do default ordering on a 'group by', so you have
to throw in an 'order by' clause to get the output in ascending group order.

Is this something that most RDB's have historically done (including PG prior
to 7.4) but isn't really part of the SQL standard?

On a mostly unrelated topic, does the SQL standard indicate whether NULL
should sort to the front or the back?  Is there a way to force it to
one or the other independent of whether the order by clause uses ascending or
descending order?
--
Mike Nolan

Re: 7.4, 'group by' default ordering?

From
Bruno Wolff III
Date:
On Thu, Jan 08, 2004 at 13:42:33 -0600,
  Mike Nolan <nolan@gw.tssi.com> wrote:
> I notice that 7.4 doesn't do default ordering on a 'group by', so you have
> to throw in an 'order by' clause to get the output in ascending group order.
>
> Is this something that most RDB's have historically done (including PG prior
> to 7.4) but isn't really part of the SQL standard?

That is because group by is often done with a sort, so rows would naturally
be in that order. If there isn't an order by clause, the set of return
rows can be in any order.

> On a mostly unrelated topic, does the SQL standard indicate whether NULL
> should sort to the front or the back?  Is there a way to force it to
> one or the other independent of whether the order by clause uses ascending or
> descending order?

In SQL for Smarties, Joe Ceclko says that either NULLs should all be first
or all be last (independent of whether the sort is ascending or descending).
There was also some discussion on how the order is constrained if the sort
is on multiple columns where the value of the first column is NULL, but the
values of other columns are not. I don't have the book here with me now,
but I think the result of the discussion was that within rows with a NULL
value for the first column, they should be sorted by the values in the
later columns.

Re: 7.4, 'group by' default ordering?

From
Tom Lane
Date:
Bruno Wolff III <bruno@wolff.to> writes:
> On Thu, Jan 08, 2004 at 13:42:33 -0600,
>> Is this something that most RDB's have historically done (including PG prior
>> to 7.4) but isn't really part of the SQL standard?

> That is because group by is often done with a sort, so rows would naturally
> be in that order. If there isn't an order by clause, the set of return
> rows can be in any order.

PG has historically implemented GROUP BY with sort + uniq (and still may
if the planner thinks it better than a hash method), but I am not sure
that this is particularly widespread among other DBMSes.  In any case,
the spec certainly says that you cannot expect any particular result
ordering if you didn't say ORDER BY.

>> On a mostly unrelated topic, does the SQL standard indicate whether NULL
>> should sort to the front or the back?  Is there a way to force it to
>> one or the other independent of whether the order by clause uses
>> ascending or descending order?

> In SQL for Smarties, Joe Ceclko says that either NULLs should all be first
> or all be last (independent of whether the sort is ascending or descending).

If Celko really says that, I think he's wrong.  SQL92 13.1 general rule
3 says:

              Whether a sort key value that is null is considered greater
              or less than a non-null value is implementation-defined, but
              all sort key values that are null shall either be considered
              greater than all non-null values or be considered less than
              all non-null values.

Since they use the phraseology "greater than" and "less than", I'd
expect that switching between ASC and DESC order would reverse the
output ordering, just as it would for two ordinary values one of which
is greater than the other.

We actually went to some trouble to make this happen, a release or three
back.  IIRC, at one time PG did sort NULLs to the end regardless of
ASC/DESC, but we were persuaded that this was contrary to spec.

            regards, tom lane

Re: 7.4, 'group by' default ordering?

From
Ron St-Pierre
Date:
Tom Lane wrote:

>Bruno Wolff III <bruno@wolff.to> writes:
>
>
>>On Thu, Jan 08, 2004 at 13:42:33 -0600,
>>
<snip>

>>
>>
>>>On a mostly unrelated topic, does the SQL standard indicate whether NULL
>>>should sort to the front or the back?  Is there a way to force it to
>>>one or the other independent of whether the order by clause uses
>>>ascending or descending order?
>>>
>>>
>
>
>
>>In SQL for Smarties, Joe Ceclko says that either NULLs should all be first
>>or all be last (independent of whether the sort is ascending or descending).
>>
>>
>
>If Celko really says that, I think he's wrong.  SQL92 13.1 general rule
>3 says:
>
>              Whether a sort key value that is null is considered greater
>              or less than a non-null value is implementation-defined, but
>              all sort key values that are null shall either be considered
>              greater than all non-null values or be considered less than
>              all non-null values.
>
>Since they use the phraseology "greater than" and "less than", I'd
>expect that switching between ASC and DESC order would reverse the
>output ordering, just as it would for two ordinary values one of which
>is greater than the other.
>
>We actually went to some trouble to make this happen, a release or three
>back.  IIRC, at one time PG did sort NULLs to the end regardless of
>ASC/DESC, but we were persuaded that this was contrary to spec.
>
>            regards, tom lane
>
>
Celko uses the SQL92 13.1 rule wording: "Whether a sort key value that
is NULL is considered greater or less than a non-NULL value is
implementation defined, but all sort key values that are NULL will
either be considered greater than all non-NULL values or be considered
less than all non-NULL values. There are SQL products that do it either
way."  2nd Ed SQL For Smarties.

And of more interest, he also points out that in SQL-89, the last
General Rule of <comparison predicate> should still be applied:
"Although 'x=y' is unkown if both x and y are NULL values, in the
context of GROUP BY, ORDER BY, and DISTINCT, a NULL value is identical
to or is a duplicate of another NULL value." So NULL=NULL for purposes
of GROUP BY, ORDER BY and DISTINCT. PostgresSQL seems to treat them this
way and puts them after non-NULL values.

Here's my test case:

Welcome to psql 7.4beta5, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help on internal slash commands
       \g or terminate with semicolon to execute query
       \q to quit

celko=# CREATE TABLE sortable (a INT DEFAULT NULL, b INT);
celko=# INSERT INTO sortable (b) VALUES (8);
INSERT 60836961 1
celko=# INSERT INTO sortable (b) VALUES (4);
INSERT 60836962 1
celko=# SELECT a,b FROM sortable ORDER BY a,b;
 a | b
---+---
   | 4
   | 8
(2 rows)

celko=# INSERT INTO sortable (a,b) VALUES (5,5);
INSERT 60836963 1
celko=# SELECT a,b FROM sortable ORDER BY a,b;
 a | b
---+---
 5 | 5
   | 4
   | 8
(3 rows)

celko=# INSERT INTO sortable (b) VALUES (5);
INSERT 60836964 1
celko=# SELECT a,b FROM sortable ORDER BY a,b;
 a | b
---+---
 5 | 5
   | 4
   | 5
   | 8
(4 rows)

celko=# INSERT INTO sortable (a,b) VALUES (2,2);
INSERT 60836965 1
celko=# SELECT a,b FROM sortable ORDER BY a,b;
 a | b
---+---
 2 | 2
 5 | 5
   | 4
   | 5
   | 8
(5 rows)

celko=# SELECT a,b FROM sortable ORDER BY b,a;
 a | b
---+---
 2 | 2
   | 4
 5 | 5
   | 5
   | 8
(5 rows)

FYI
Ron


Re: 7.4, 'group by' default ordering?

From
Christopher Browne
Date:
After takin a swig o' Arrakan spice grog, nolan@gw.tssi.com (Mike Nolan) belched out:
> I notice that 7.4 doesn't do default ordering on a 'group by', so you have
> to throw in an 'order by' clause to get the output in ascending group order.
>
> Is this something that most RDB's have historically done (including PG prior
> to 7.4) but isn't really part of the SQL standard?

No, in the absence of an "ORDER BY" clause to impose order, there is
no "standard" reason to expect _any_ SQL database to return results in
_any_ particular sort of order.

It would be perfectly legitimate for a database to store all data in
hash tables, and to return rows in the resulting random order, sorting
the result set only if the query specified an order.

> On a mostly unrelated topic, does the SQL standard indicate whether
> NULL should sort to the front or the back?  Is there a way to force
> it to one or the other independent of whether the order by clause
> uses ascending or descending order?

NULL isn't equal to any other value, so that, heading back to that
"any order is reasonable" notion, it might, in theory, be
"standards-conformant" to randomly intersperse the NULL values amongst
the other entries that _ARE_ returned in order.

I understand that Oracle declines to include NULL entries in indices,
which doubtless has interesting implications...

Clustering NULLs together at either the start or end of a query seems
sensible; different SQL databases handle this differently...
--
If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me
http://www.ntlug.org/~cbbrowne/
It's a  little known fact  that the Dark  Ages were caused by  the Y1K
problem.

Re: 7.4, 'group by' default ordering?

From
Bruno Wolff III
Date:
On Thu, Jan 08, 2004 at 15:24:41 -0800,
  Ron St-Pierre <rstpierre@syscor.com> wrote:
> Celko uses the SQL92 13.1 rule wording: "Whether a sort key value that
> is NULL is considered greater or less than a non-NULL value is
> implementation defined, but all sort key values that are NULL will
> either be considered greater than all non-NULL values or be considered
> less than all non-NULL values. There are SQL products that do it either
> way."  2nd Ed SQL For Smarties.

The part just after that prompted my statement. The text is:
And there are those that have it all wrong; the Sybase family simply
treats the NULLs as if they were really values -- that is, they sort
low for ascending and high for descending.

This seems to me to be saying contradictory things now that I have reread it.
If NULLs are treated as real values, I would expect them to be output
first for one of descending and ascending and last for the other. But
the second part of the statement contradicts this and seems to be saying
that Sybase always emits records with NULL values for the key first.