Re: Built-in CTYPE provider - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: Built-in CTYPE provider
Date
Msg-id 67df0672-5bc0-4b2b-b9e0-00e12bdca601@eisentraut.org
Whole thread Raw
In response to Re: Built-in CTYPE provider  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Built-in CTYPE provider
Re: Built-in CTYPE provider
List pgsql-hackers
On 12.01.24 03:02, Jeff Davis wrote:
> New version attached. Changes:
> 
>   * Named collation object PG_C_UTF8, which seems like a good idea to
> prevent name conflicts with existing collations. The locale name is
> still C.UTF-8, which still makes sense to me because it matches the
> behavior of the libc locale of the same name so closely.

I am catching up on this thread.  The discussions have been very 
complicated, so maybe I didn't get it all.

The patches look pretty sound, but I'm questioning how useful this 
feature is and where you plan to take it.

Earlier in the thread, the aim was summarized as

 > If the Postgres default was bytewise sorting+locale-agnostic
 > ctype functions directly derived from Unicode data files,
 > as opposed to libc/$LANG at initdb time, the main
 > annoyance would be that "ORDER BY textcol" would no
 > longer be the human-favored sort.

I think that would be a terrible direction to take, because it would 
regress the default sort order from "correct" to "useless".  Aside from 
the overall message this sends about how PostgreSQL cares about locales 
and Unicode and such.

Maybe you don't intend for this to be the default provider?  But then 
who would really use it?  I mean, sure, some people would, but how would 
you even explain, in practice, the particular niche of users or use cases?

Maybe if this new provider would be called "minimal", it might describe 
the purpose better.

I could see a use for this builtin provider if it also included the 
default UCA collation (what COLLATE UNICODE does now).  Then it would 
provide a "common" default behavior out of the box, and if you want more 
fine-tuning, you can go to ICU.  There would still be some questions 
about making sure the builtin behavior and the ICU behavior are 
consistent (different Unicode versions, stock UCA vs CLDR, etc.).  But 
for practical purposes, it might work.

There would still be a risk with that approach, since it would 
permanently marginalize ICU functionality, in the sense that only some 
locales would need ICU, and so we might not pay the same amount of 
attention to the ICU functionality.

I would be curious what your overall vision is here?  Is switching the 
default to ICU still your goal?  Or do you want the builtin provider to 
be the default?  Or something else?




pgsql-hackers by date:

Previous
From: Anton Voloshin
Date:
Subject: 039_end_of_wal: error in "xl_tot_len zero" test
Next
From: Andy Fan
Date:
Subject: Re: the s_lock_stuck on perform_spin_delay