Re: Built-in CTYPE provider - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: Built-in CTYPE provider
Date
Msg-id 2bcd882a-cf20-40fc-84eb-5c5c6365ff56@eisentraut.org
Whole thread Raw
In response to Re: Built-in CTYPE provider  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Built-in CTYPE provider
List pgsql-hackers
On 18.01.24 23:03, Jeff Davis wrote:
> On Thu, 2024-01-18 at 13:53 +0100, Peter Eisentraut wrote:
>> I think that would be a terrible direction to take, because it would
>> regress the default sort order from "correct" to "useless".
> 
> I don't agree that the current default is "correct". There are a lot of
> ways it can be wrong:
> 
>    * the environment variables at initdb time don't reflect what the
> users of the database actually want
>    * there are so many different users using so many different
> applications connected to the database that no one "correct" sort order
> exists
>    * libc has some implementation quirks
>    * the version of Unicode that libc is based on is not what you expect
>    * the version of libc is not what you expect

These are arguments why the current defaults are not universally 
perfect, but I'd argue that they are still most often the right thing as 
the default.

>>    Aside from
>> the overall message this sends about how PostgreSQL cares about
>> locales
>> and Unicode and such.
> 
> Unicode is primarily about the semantics of characters and their
> relationships. The patches I propose here do a great job of that.
> 
> Collation (relationships between *strings*) is a part of Unicode, but
> not the whole thing or even the main thing.

I don't get this argument.  Of course, people care about sorting and 
sort order.  Whether you consider this part of Unicode or adjacent to 
it, people still want it.

>> Maybe you don't intend for this to be the default provider?
> 
> I am not proposing that this provider be the initdb-time default.

ok

>>    But then
>> who would really use it? I mean, sure, some people would, but how
>> would
>> you even explain, in practice, the particular niche of users or use
>> cases?
> 
> It's for users who want to respect Unicode support text from
> international sources in their database; but are not experts on the
> subject and don't know precisely what they want or understand the
> consequences. If and when such users do notice a problem with the sort
> order, they'd handle it at that time (perhaps with a COLLATE clause, or
> sorting in the application).

> Vision:

> * ICU offers COLLATE UNICODE, locale tailoring, case-insensitive
> matching, and customization with rules. It's the solution for
> everything from "slightly more advanced" to "very advanced".

I am astonished by this.  In your world, do users not want their text 
data sorted?  Do they not care what the sort order is?  You consider UCA 
sort order an "advanced" feature?




pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: make dist using git archive
Next
From: Peter Eisentraut
Date:
Subject: Re: partitioning and identity column