Thread: Final result (display) collation?

Final result (display) collation?

From
Jeff Davis
Date:
Say an application developer wants the results in their application to
appear in an order that matches the application user's locale. They
don't want the results themselves to change; they just want the order
to change for display purposes to the end user. Assume the developer
already has ORDER BY clauses in the relevant queries, but no COLLATE
clauses.

That leaves three options:

1. Sort in the application.

2. Change the database default collation.

3. Add COLLATE clauses to the ORDER BY.

#1 is a fine solution in many cases, but not all: if it were universal,
we wouldn't even need top-level ORDER BY. #2 isn't suitable unless you
can settle on a single locale and make it permanent for the entire
database. So I'd like to see if we can improve #3 for the cases where
where #1 and #2 aren't suitable.

Trying to add COLLATE clauses today creates either boilerplate in the
application queries, or creates challenges with query generation. It
can also cause problems porting the application. On top of that, we
don't guarantee that particular locales will work: my machine has "it-
IT-x-icu" but not "it_IT" -- so the application developer needs to be
careful.

The SQL standard specifies "SET COLLATION" to change the default
collation just for the session, which sounds interesting. But it
appears to affect the query semantics, not just the result order, and
that might cause all kinds of problems. For instance, if we allow it to
affect the results of plpgsql functions, then it's another source of
problems similar to search_path. I assume that's why I haven't been
able to find proposals for SET COLLATION?


Crazy idea: what if we treated the top-level ORDER BY as special? That
is, we create a new node ResultOrderBy and make it visible in EXPLAIN.
Then we can have a GUC to control the default collation only for that
node.

  * It would only change the order of the results sent to the client,
not the results themselves. (I realize I'm twisting the definition of
"results" a bit here...)

  * It wouldn't have surprising downstream consequences for collation
determination because returning results is the last step.

  * Setting the result_order_collation to 'default' would be the same
as the current behavior.

  * You could control whether pushdowns of this node are enabled (to be
more like adding COLLATE clauses) or disabled (to be more like sorting
in the application).

  * You could have separate knobs to control whether it applies to
cursors, prepared statements, etc., which could avoid some kinds of
problems.

Thoughts?

Regards,
    Jeff Davis




Re: Final result (display) collation?

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> Crazy idea: what if we treated the top-level ORDER BY as special? That
> is, we create a new node ResultOrderBy and make it visible in EXPLAIN.
> Then we can have a GUC to control the default collation only for that
> node.

Eeek.

This seems like an amazing kluge, and it's not even clear that
there's any field demand for it.

            regards, tom lane



Re: Final result (display) collation?

From
Laurenz Albe
Date:
On Tue, 2024-12-17 at 20:36 -0500, Tom Lane wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
> > Crazy idea: what if we treated the top-level ORDER BY as special? That
> > is, we create a new node ResultOrderBy and make it visible in EXPLAIN.
> > Then we can have a GUC to control the default collation only for that
> > node.
>
> This seems like an amazing kluge, and it's not even clear that
> there's any field demand for it.

Yes, that approach seems strange.  Also, wouldn't that prevent
plans that calculate the final sort using a nested loop or merge
join further down?

But I like the idea of a parameter that determines the collation.
I am aware that it is anathema here to have a GUC that influences
query semantics, but it wouldn't be any worse than creating a
database with a different collation, so I think it would be fine.

FWIW, Oracle has a parameter NLS_SORT that does just that.

Yours,
Laurenz Albe



Re: Final result (display) collation?

From
Jeff Davis
Date:
On Wed, 2024-12-18 at 08:26 +0100, Laurenz Albe wrote:
> Yes, that approach seems strange.

My thought was it would be more like a client option than anything
else: a way for the application to request the same result in a more
convenient order. A bit strange, but depending on how you look at it,
is not very radical. More like setting lc_numeric.

>   Also, wouldn't that prevent
> plans that calculate the final sort using a nested loop or merge
> join further down?

If the purpose is for a final result to be displayed to a human (rather
than fed into some other data processing system), then it's going to be
a small result and the final sort should be very cheap.

> But I like the idea of a parameter that determines the collation.
> I am aware that it is anathema here to have a GUC that influences
> query semantics, but it wouldn't be any worse than creating a
> database with a different collation, so I think it would be fine.

That appears to be how the SET COLLATION statement is specified in SQL.
Perhaps we should explore that in more detail?

I think we'd have to be careful that it only applies to the top-level
query and not SPI queries or CHECK constraints. There are probably some
technical challenges in that, but it might not be too bad.

> FWIW, Oracle has a parameter NLS_SORT that does just that.

Ah, interesting.

Regards,
    Jeff Davis