Thread: Final result (display) collation?
Say an application developer wants the results in their application to appear in an order that matches the application user's locale. They don't want the results themselves to change; they just want the order to change for display purposes to the end user. Assume the developer already has ORDER BY clauses in the relevant queries, but no COLLATE clauses. That leaves three options: 1. Sort in the application. 2. Change the database default collation. 3. Add COLLATE clauses to the ORDER BY. #1 is a fine solution in many cases, but not all: if it were universal, we wouldn't even need top-level ORDER BY. #2 isn't suitable unless you can settle on a single locale and make it permanent for the entire database. So I'd like to see if we can improve #3 for the cases where where #1 and #2 aren't suitable. Trying to add COLLATE clauses today creates either boilerplate in the application queries, or creates challenges with query generation. It can also cause problems porting the application. On top of that, we don't guarantee that particular locales will work: my machine has "it- IT-x-icu" but not "it_IT" -- so the application developer needs to be careful. The SQL standard specifies "SET COLLATION" to change the default collation just for the session, which sounds interesting. But it appears to affect the query semantics, not just the result order, and that might cause all kinds of problems. For instance, if we allow it to affect the results of plpgsql functions, then it's another source of problems similar to search_path. I assume that's why I haven't been able to find proposals for SET COLLATION? Crazy idea: what if we treated the top-level ORDER BY as special? That is, we create a new node ResultOrderBy and make it visible in EXPLAIN. Then we can have a GUC to control the default collation only for that node. * It would only change the order of the results sent to the client, not the results themselves. (I realize I'm twisting the definition of "results" a bit here...) * It wouldn't have surprising downstream consequences for collation determination because returning results is the last step. * Setting the result_order_collation to 'default' would be the same as the current behavior. * You could control whether pushdowns of this node are enabled (to be more like adding COLLATE clauses) or disabled (to be more like sorting in the application). * You could have separate knobs to control whether it applies to cursors, prepared statements, etc., which could avoid some kinds of problems. Thoughts? Regards, Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > Crazy idea: what if we treated the top-level ORDER BY as special? That > is, we create a new node ResultOrderBy and make it visible in EXPLAIN. > Then we can have a GUC to control the default collation only for that > node. Eeek. This seems like an amazing kluge, and it's not even clear that there's any field demand for it. regards, tom lane
On Tue, 2024-12-17 at 20:36 -0500, Tom Lane wrote: > Jeff Davis <pgsql@j-davis.com> writes: > > Crazy idea: what if we treated the top-level ORDER BY as special? That > > is, we create a new node ResultOrderBy and make it visible in EXPLAIN. > > Then we can have a GUC to control the default collation only for that > > node. > > This seems like an amazing kluge, and it's not even clear that > there's any field demand for it. Yes, that approach seems strange. Also, wouldn't that prevent plans that calculate the final sort using a nested loop or merge join further down? But I like the idea of a parameter that determines the collation. I am aware that it is anathema here to have a GUC that influences query semantics, but it wouldn't be any worse than creating a database with a different collation, so I think it would be fine. FWIW, Oracle has a parameter NLS_SORT that does just that. Yours, Laurenz Albe
On Wed, 2024-12-18 at 08:26 +0100, Laurenz Albe wrote: > Yes, that approach seems strange. My thought was it would be more like a client option than anything else: a way for the application to request the same result in a more convenient order. A bit strange, but depending on how you look at it, is not very radical. More like setting lc_numeric. > Also, wouldn't that prevent > plans that calculate the final sort using a nested loop or merge > join further down? If the purpose is for a final result to be displayed to a human (rather than fed into some other data processing system), then it's going to be a small result and the final sort should be very cheap. > But I like the idea of a parameter that determines the collation. > I am aware that it is anathema here to have a GUC that influences > query semantics, but it wouldn't be any worse than creating a > database with a different collation, so I think it would be fine. That appears to be how the SET COLLATION statement is specified in SQL. Perhaps we should explore that in more detail? I think we'd have to be careful that it only applies to the top-level query and not SPI queries or CHECK constraints. There are probably some technical challenges in that, but it might not be too bad. > FWIW, Oracle has a parameter NLS_SORT that does just that. Ah, interesting. Regards, Jeff Davis