Final result (display) collation? - Mailing list pgsql-hackers

From Jeff Davis
Subject Final result (display) collation?
Date
Msg-id 0fcc2555a7e9c16858c5bc72d9587a2f4b8fbcfa.camel@j-davis.com
Whole thread Raw
Responses Re: Final result (display) collation?
List pgsql-hackers
Say an application developer wants the results in their application to
appear in an order that matches the application user's locale. They
don't want the results themselves to change; they just want the order
to change for display purposes to the end user. Assume the developer
already has ORDER BY clauses in the relevant queries, but no COLLATE
clauses.

That leaves three options:

1. Sort in the application.

2. Change the database default collation.

3. Add COLLATE clauses to the ORDER BY.

#1 is a fine solution in many cases, but not all: if it were universal,
we wouldn't even need top-level ORDER BY. #2 isn't suitable unless you
can settle on a single locale and make it permanent for the entire
database. So I'd like to see if we can improve #3 for the cases where
where #1 and #2 aren't suitable.

Trying to add COLLATE clauses today creates either boilerplate in the
application queries, or creates challenges with query generation. It
can also cause problems porting the application. On top of that, we
don't guarantee that particular locales will work: my machine has "it-
IT-x-icu" but not "it_IT" -- so the application developer needs to be
careful.

The SQL standard specifies "SET COLLATION" to change the default
collation just for the session, which sounds interesting. But it
appears to affect the query semantics, not just the result order, and
that might cause all kinds of problems. For instance, if we allow it to
affect the results of plpgsql functions, then it's another source of
problems similar to search_path. I assume that's why I haven't been
able to find proposals for SET COLLATION?


Crazy idea: what if we treated the top-level ORDER BY as special? That
is, we create a new node ResultOrderBy and make it visible in EXPLAIN.
Then we can have a GUC to control the default collation only for that
node.

  * It would only change the order of the results sent to the client,
not the results themselves. (I realize I'm twisting the definition of
"results" a bit here...)

  * It wouldn't have surprising downstream consequences for collation
determination because returning results is the last step.

  * Setting the result_order_collation to 'default' would be the same
as the current behavior.

  * You could control whether pushdowns of this node are enabled (to be
more like adding COLLATE clauses) or disabled (to be more like sorting
in the application).

  * You could have separate knobs to control whether it applies to
cursors, prepared statements, etc., which could avoid some kinds of
problems.

Thoughts?

Regards,
    Jeff Davis




pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Exceptional md.c paths for recovery and zero_damaged_pages
Next
From: Robert Pang
Date:
Subject: Back-patch of: avoid multiple hard links to same WAL file after a crash