Re: Unclear guarantees about sort order on https://www.postgresql.org/docs/current/queries-order.html - Mailing list pgsql-docs

From Erik Wienhold
Subject Re: Unclear guarantees about sort order on https://www.postgresql.org/docs/current/queries-order.html
Date
Msg-id nwfjnvslf7o5oxlbmqcak4cnvvrwrhfovh3rzj7qazgttiez24@65ytoxqapecf
Whole thread Raw
In response to Unclear guarantees about sort order on https://www.postgresql.org/docs/current/queries-order.html  (PG Doc comments form <noreply@postgresql.org>)
Responses Re: Unclear guarantees about sort order on https://www.postgresql.org/docs/current/queries-order.html
List pgsql-docs
On 2023-10-04 16:24 +0200, PG Doc comments form write:
> The following documentation comment has been logged on the website:
> 
> Page: https://www.postgresql.org/docs/16/queries-order.html
> Description:
> 
> The document only says this about unsorted queries:
> 
> > After a query has produced an output table (after the select list has been
> > processed) it can optionally be sorted. If sorting is not chosen, the rows
> > will be returned in an unspecified order. The actual order in that case will
> > depend on the scan and join plan types and the order on disk, but it must
> > not be relied on. A particular output ordering can only be guaranteed if the
> > sort step is explicitly chosen.
> 
> It mentions "If sorting is not chosen". This sort of implies that if you
> pick a sort the output order is predictable. However I believe that the only
> actual guarantee is if the sort columns selected produce a unique value.
> 
> For example if you do `ORDER BY name` and have two rows with the same name I
> don't think the order of those rows is predictable.

Right, without an explicit sorting the order of rows is unpredictable.
When only sorting over some columns/expressions then the ordering is
only predictable in those columns/expressions.  The order of rows with
ties within the same sorted group is also unpredictable.  This is also
implied on the same page after the first example: "When more than one
expression is specified, the later values are used to sort rows that are
equal according to the earlier values."  This implies that without later
values, those rows remain in unpredictable order.

The SQL standards linked by [1] provide some definitions:

SQL:2003, Part 2, Annex C, 26b:

"The relative ordering of two rows that are not distinct with respect to
 the <sort specification> is implementation-dependent."

SQL:2011, Part 2, Annex C, 24e:

"If a <query expression> immediately contains an <order by clause> and
 a group of two or more rows in the table specified by that <query
 expression> contain values that are not distinct in all sort keys
 specified in the <order by clause>, then the ordering of these rows in
 that group is implementation-dependent."

I find the first one quite succinct.

> I think the docs should be updated to either:
> 
> 1. Clearly state that the order **is** consent as long as any sort clause is
> present, and specify what that order is.

What do you mean with "any sort clause"?  Any sort clause at all?  Or a
sort clause that covers all columns?  If the order should be predictable
it must be specified by the client to some degree.  And if the client
does not care about rows with ties than he should not be required to
specify a more specific sorting.

> 2. Update the quoted sentence to refer to "If sorting is not chosen or the
> sort columns do not form a unique key" instead of just "If sorting is not
> chosen".

I think "unique key" is misleading in this case because sorting still
leaves duplicates.  I'd go with something that mentions "sorted groups"
and "tie breaks" or some form of the quoted SQL:2003 definition.

[1] https://wiki.postgresql.org/wiki/Developer_FAQ#Where_can_I_get_a_copy_of_the_SQL_standards.3F

-- 
Erik



pgsql-docs by date:

Previous
From: Daniel Fredouille
Date:
Subject: Re: unnest multirange, returned order
Next
From: Laurenz Albe
Date:
Subject: Re: unnest multirange, returned order