Thread: Nondeterministic collations and the value returned by GROUP BY x

Nondeterministic collations and the value returned by GROUP BY x

From
Jim Finnerty
Date:
PostgreSQL 12 and onward supports nondeterministic collations.  For "GROUP
BY x",  which value of 'x' will PostgreSQL return in this case?  The first
value of x?

The SQL standard (section 8.2) states that the specific value returned is
implementation-defined, but requires that the value must be one of the
specific values in the set of values that compare equally:

d) Depending on the collation, two strings may compare as equal even if they
are of different lengths or contain different sequences of characters. When
any of the operations MAX, MIN, and DISTINCT reference a grouping column,
and the UNION, EXCEPT, and INTERSECT operators refer to character strings,
*the specific value selected by these operations from a set of such equal
values is implementation- dependent*.




-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: Nondeterministic collations and the value returned by GROUP BY x

From
Tom Lane
Date:
Jim Finnerty <jfinnert@amazon.com> writes:
> PostgreSQL 12 and onward supports nondeterministic collations.  For "GROUP
> BY x",  which value of 'x' will PostgreSQL return in this case?  The first
> value of x?

> The SQL standard (section 8.2) states that the specific value returned is
> implementation-defined, but requires that the value must be one of the
> specific values in the set of values that compare equally:

> d) Depending on the collation, two strings may compare as equal even if they
> are of different lengths or contain different sequences of characters. When
> any of the operations MAX, MIN, and DISTINCT reference a grouping column,
> and the UNION, EXCEPT, and INTERSECT operators refer to character strings,
> *the specific value selected by these operations from a set of such equal
> values is implementation- dependent*.

As I recall, "implementation-dependent" means specifically that we *don't*
have to make any promise about which particular value will be selected.
If it said "implementation-defined" then we would.

I expect that in practice it'd be the first of the group that arrives at
the grouping plan node --- but that doesn't really get you any closer
to being able to say which one it is exactly.  The input is either not
ordered at all, or ordered by something like a Sort node, which itself
is not going to make any promises about which one of a group of peers
is delivered first.

            regards, tom lane