Re: Collation versioning - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Collation versioning
Date
Msg-id CAEepm=33gHF92UD8bkoZzzbs+GkVhHFzgV4CPf8ZxFRqZ_UCHQ@mail.gmail.com
Whole thread Raw
In response to Re: Collation versioning  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Responses Re: Collation versioning  (Douglas Doole <dougdoole@gmail.com>)
Re: Collation versioning  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
List pgsql-hackers
On Thu, Sep 13, 2018 at 7:03 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 12/09/2018 13:25, Christoph Berg wrote:
> > Re: Peter Eisentraut 2018-09-12 <0447ec7b-cdb6-7252-7943-88a4664e7bb7@2ndquadrant.com>
> >>> Naive idea: make that catalog shared? Collations are system-wide after
> >>> all.
> >>
> >> By the same argument, extensions should be shared, but they are not.
> >
> > But extensions put a lot of visible stuff into a database, whereas a
> > collation is just a line in some table that doesn't get into the way.
>
> How about C functions?  They are just a system catalog representation of
> something that exists on the OS.
>
> Anyway, we also want to support application-specific collation
> definitions, so that users can CREATE COLLATION
> "my_specific_requirements" and use that that in their application, so
> global collations wouldn't be appropriate for that.
>
> Moreover, the fix for a collation version mismatch is, in the simplest
> case, to go around and REINDEX everything.  Making the collation or
> collation version global doesn't fix that.  It would actually make it
> harder because you couldn't run ALTER COLLATION REFRESH VERSION until
> after you have rebuilt all affected objects *in all databases*.

Here's one idea I came up with.  It involves a new kind of magic.  The
goals are:

1.  Support versioning for the libc provider, including for the
default collation.
2.  Support ICU for the default collation.
3.  Fix the tracking of when reindexes need to be rebuilt, so that you
can't get it wrong (as you're alluding to above).

Changes:

1.  Drop the datcollate and datctype columns from pg_database.
2.  In CheckMyDatabase() or elsewhere in backend initialisation, get
that information instead by loading the pg_collation row with OID =
DEFAULT_COLLATION_OID.
3.  Don't put COLLPROVIDER_DEFAULT into the default collation
collprovider column, instead give it a concrete provider value, ie
COLLPROVIDER_LIBC.
4.  After creating a new database, update that row as appropriate in
the new database (!).  Or find some other way to write a new table out
and switch it around, or something like that.  That is, if you say
CREATE DATABASE foo LC_COLLATE = 'xx_XX', COLLATION_PROVIDER = libc
then those values somehow get written into the default pg_collation
row in the *new* database (so at that point it's not a simple copy of
the template database).
5.  Drop the collversion column from pg_collation.  Get rid of the
REFRESH VERSION command.  Instead, add a new column indcollversion to
pg_index.  It needs to be an array of text (not sure if that is a
problem in a catalog), with elements that correspond to the elements
of indcollation.
6.  Do the check and log warnings when we first open each index.
7.  Update indcollversion at index creation and whenever we REINDEX.

I haven't actually tried any of this so I'm not sure if I'm missing
something other than the inherent difficulty of updating a row in a
table in a database you're not connected to...

-- 
Thomas Munro
http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Christoph Berg
Date:
Subject: Re: [patch] Support LLVM 7
Next
From: Andrey Lepikhov
Date:
Subject: Clarification of nodeToString() use cases