Re: Collation version tracking for macOS - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: Collation version tracking for macOS
Date
Msg-id 0e1bcd64-b32b-a9ef-4d65-fe420d10e5b3@enterprisedb.com
Whole thread Raw
In response to Re: Collation version tracking for macOS  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
On 05.12.22 22:33, Thomas Munro wrote:
> On Tue, Dec 6, 2022 at 6:45 AM Joe Conway <mail@joeconway.com> wrote:
>> On 12/5/22 12:41, Jeff Davis wrote:
>>> On Mon, 2022-12-05 at 16:12 +1300, Thomas Munro wrote:
>>>> 1.  I think we should seriously consider provider = ICU63.  I still
>>>> think search-by-collversion is a little too magical, even though it
>>>> clearly can be made to work.  Of the non-magical systems, I think
>>>> encoding the choice of library into the provider name would avoid the
>>>> need to add a second confusing "X_version" concept alongside our
>>>> existing "X_version" columns in catalogues and DDL syntax, while
>>>> still
>>>> making it super clear what is going on.
>>>
>>> As I understand it, this is #2 in your previous list?
>>>
>>> Can we put the naming of the provider into the hands of the user, e.g.:
>>>
>>>     CREATE COLLATION PROVIDER icu63 TYPE icu
>>>       AS '/path/to/libicui18n.so.63', '/path/to/libicuuc.so.63';
>>>
>>> In this model, icu would be a "provider kind" and icu63 would be the
>>> specific provider, which is named by the user.
>>>
>>> That seems like the least magical approach, to me. We need an ICU
>>> library; the administrator gives us one that looks like ICU; and we're
>>> happy.
>>
>> +1
>>
>> I like this. The provider kind defines which path we take in our code,
>> and the specific library unambiguously defines a specific collation
>> behavior (I think, ignoring bugs?)
> 
> OK, I'm going to see what happens if I try to wrangle that stuff into
> a new catalogue table.

I'm reviewing the commit fest entry 
https://commitfest.postgresql.org/41/3956/, which points to this thread. 
  It appears that the above patch did not come about in time.  The patch 
of record is now Jeff's refactoring patch, which is also tracked in 
another commit fest entry (https://commitfest.postgresql.org/41/4058/). 
So as a matter of procedure, we should probably close this commit fest 
entry for now.  (Maybe we should also use a different thread subject in 
the future.)

I have a few quick comments on the above syntax example:

There is currently a bunch of locale-using code that selects different 
code paths by "collation provider", i.e., a libc-based code path and an 
ICU-based code path (and sometimes also a default provider path).  The 
above proposal would shift the terminology and would probably require 
some churn at those sites, in that they would now have to select by 
"collation provider type".  We could probably avoid that by shifting the 
terms a bit, so instead of the suggested

provider type -> provider

we could use

provider -> version of that provider

(or some other actual term), which would leave the meaning of "provider" 
unchanged as far as locale-using code is concerned.  At least that's my 
expectation, since no code for this has been seen yet.  We should keep 
this in mind in any case.

Also, the above example exposes a lot of operating system level details. 
  This creates issues with dump/restore, which some of the earlier 
patches avoided by using a path-based approach, and it would also 
require some thoughts about permissions.  We probably want 
non-superusers to be able to interact with this system somehow, for 
upgrading (for some meaning of that action) indexes etc. without 
superuser access.  The more stuff from the OS we expose, the more stuff 
we have to be able to lock down again in a usable manner.

(The search-by-collversion approach can probably avoid those issues better.)



pgsql-hackers by date:

Previous
From: Marco Slot
Date:
Subject: Re: Exposing the lock manager's WaitForLockers() to SQL
Next
From: "Drouvot, Bertrand"
Date:
Subject: Change xl_hash_vacuum_one_page.ntuples from int to uint16