Re: Built-in CTYPE provider - Mailing list pgsql-hackers
From | Jeff Davis |
---|---|
Subject | Re: Built-in CTYPE provider |
Date | |
Msg-id | 3c1f4043bb4f76de78160f8afc8678eaa10b0e46.camel@j-davis.com Whole thread Raw |
In response to | Re: Built-in CTYPE provider (Peter Eisentraut <peter@eisentraut.org>) |
Responses |
Re: Built-in CTYPE provider
|
List | pgsql-hackers |
On Thu, 2024-01-18 at 13:53 +0100, Peter Eisentraut wrote: > I think that would be a terrible direction to take, because it would > regress the default sort order from "correct" to "useless". I don't agree that the current default is "correct". There are a lot of ways it can be wrong: * the environment variables at initdb time don't reflect what the users of the database actually want * there are so many different users using so many different applications connected to the database that no one "correct" sort order exists * libc has some implementation quirks * the version of Unicode that libc is based on is not what you expect * the version of libc is not what you expect > Aside from > the overall message this sends about how PostgreSQL cares about > locales > and Unicode and such. Unicode is primarily about the semantics of characters and their relationships. The patches I propose here do a great job of that. Collation (relationships between *strings*) is a part of Unicode, but not the whole thing or even the main thing. > Maybe you don't intend for this to be the default provider? I am not proposing that this provider be the initdb-time default. > But then > who would really use it? I mean, sure, some people would, but how > would > you even explain, in practice, the particular niche of users or use > cases? It's for users who want to respect Unicode support text from international sources in their database; but are not experts on the subject and don't know precisely what they want or understand the consequences. If and when such users do notice a problem with the sort order, they'd handle it at that time (perhaps with a COLLATE clause, or sorting in the application). > Maybe if this new provider would be called "minimal", it might > describe > the purpose better. "Builtin" communicates that it's available everywhere (not a dependency), that specific behaviors can be documented and tested, and that behavior doesn't change within a major version. I want to communicate all of those things. > I could see a use for this builtin provider if it also included the > default UCA collation (what COLLATE UNICODE does now). I won't rule that out, but I'm not proposing that right now and my proposal is already offering useful functionality. > There would still be a risk with that approach, since it would > permanently marginalize ICU functionality Yeah, ICU already does a good job offering the root collation. I don't think the builtin provider needs to do so. > I would be curious what your overall vision is here? Vision: * The builtin provider will offer Unicode character semantics, basic collation, platform-independence, and high performance. It can be used on its own or in combination with ICU via the COLLATE clause. * ICU offers COLLATE UNICODE, locale tailoring, case-insensitive matching, and customization with rules. It's the solution for everything from "slightly more advanced" to "very advanced". * libc would be for databases serving applications on the same machine where a matching sort order is helpful, risks to indexes are acceptable, and performance is not important. > Is switching the > default to ICU still your goal? Or do you want the builtin provider > to > be the default? It's hard to answer this question while initdb chooses the database default collation based on the environment. Neither ICU nor the builtin provider can reasonably handle whatever those environment variables might be set to. Stepping back from the focus on what initdb does, we should be providing the right encouragement in documentation and packaging to guide users toward the right provider based their needs and the vision outlined above. Regards, Jeff Davis
pgsql-hackers by date: