Re: Unclear guarantees about sort order on https://www.postgresql.org/docs/current/queries-order.html - Mailing list pgsql-docs
From | Erik Wienhold |
---|---|
Subject | Re: Unclear guarantees about sort order on https://www.postgresql.org/docs/current/queries-order.html |
Date | |
Msg-id | nwfjnvslf7o5oxlbmqcak4cnvvrwrhfovh3rzj7qazgttiez24@65ytoxqapecf Whole thread Raw |
In response to | Unclear guarantees about sort order on https://www.postgresql.org/docs/current/queries-order.html (PG Doc comments form <noreply@postgresql.org>) |
Responses |
Re: Unclear guarantees about sort order on https://www.postgresql.org/docs/current/queries-order.html
|
List | pgsql-docs |
On 2023-10-04 16:24 +0200, PG Doc comments form write: > The following documentation comment has been logged on the website: > > Page: https://www.postgresql.org/docs/16/queries-order.html > Description: > > The document only says this about unsorted queries: > > > After a query has produced an output table (after the select list has been > > processed) it can optionally be sorted. If sorting is not chosen, the rows > > will be returned in an unspecified order. The actual order in that case will > > depend on the scan and join plan types and the order on disk, but it must > > not be relied on. A particular output ordering can only be guaranteed if the > > sort step is explicitly chosen. > > It mentions "If sorting is not chosen". This sort of implies that if you > pick a sort the output order is predictable. However I believe that the only > actual guarantee is if the sort columns selected produce a unique value. > > For example if you do `ORDER BY name` and have two rows with the same name I > don't think the order of those rows is predictable. Right, without an explicit sorting the order of rows is unpredictable. When only sorting over some columns/expressions then the ordering is only predictable in those columns/expressions. The order of rows with ties within the same sorted group is also unpredictable. This is also implied on the same page after the first example: "When more than one expression is specified, the later values are used to sort rows that are equal according to the earlier values." This implies that without later values, those rows remain in unpredictable order. The SQL standards linked by [1] provide some definitions: SQL:2003, Part 2, Annex C, 26b: "The relative ordering of two rows that are not distinct with respect to the <sort specification> is implementation-dependent." SQL:2011, Part 2, Annex C, 24e: "If a <query expression> immediately contains an <order by clause> and a group of two or more rows in the table specified by that <query expression> contain values that are not distinct in all sort keys specified in the <order by clause>, then the ordering of these rows in that group is implementation-dependent." I find the first one quite succinct. > I think the docs should be updated to either: > > 1. Clearly state that the order **is** consent as long as any sort clause is > present, and specify what that order is. What do you mean with "any sort clause"? Any sort clause at all? Or a sort clause that covers all columns? If the order should be predictable it must be specified by the client to some degree. And if the client does not care about rows with ties than he should not be required to specify a more specific sorting. > 2. Update the quoted sentence to refer to "If sorting is not chosen or the > sort columns do not form a unique key" instead of just "If sorting is not > chosen". I think "unique key" is misleading in this case because sorting still leaves duplicates. I'd go with something that mentions "sorted groups" and "tie breaks" or some form of the quoted SQL:2003 definition. [1] https://wiki.postgresql.org/wiki/Developer_FAQ#Where_can_I_get_a_copy_of_the_SQL_standards.3F -- Erik
pgsql-docs by date: