Thread: [18] separate collation and ctype versions, and cleanup of pg_database locale fields
[18] separate collation and ctype versions, and cleanup of pg_database locale fields
From
Jeff Davis
Date:
Definitions: - collation is text ordering and comparison - ctype affects case mapping (e.g. LOWER()) and pattern matching/regexes Currently, there is only one version field, and it represents the version of the collation. So, if your provider is libc and datcollate is "C" and datctype is "en_US.utf8", then the datcollversion will always be NULL. Other providers use datcolllocale, which is only one field, so it doesn't matter. Given the discussion here: https://www.postgresql.org/message-id/1078884.1721762815@sss.pgh.pa.us it seems like it may be a good idea to version collation and ctype separately. The ctype version is, more or less, the Unicode version, and we know what that is for the builtin provider as well as ICU. (Aside: ICU could theoretically report the same Unicode version and still make some change that would affect us, but I have not observed that to be the case. I use exhaustive code point coverage to test that our Unicode functions return the same results as the corresponding ICU functions when the Unicode version matches.) Adding more collation fields is getting to be messy, though, because they all have to be present in pg_database, as well. It's hard to move those fields into pg_collation, because that's not a shared catalog, so that could cause problems with CREATE/ALTER DATABASE. Is it worth thinking about how we can clean this up, or should we just put up with the idea that almost half the fields in pg_database will be locale- related? Regards, Jeff Davis
Re: [18] separate collation and ctype versions, and cleanup of pg_database locale fields
From
Jeff Davis
Date:
On Thu, 2024-07-25 at 13:29 -0700, Jeff Davis wrote: > it may be a good idea to version collation and ctype > separately. The ctype version is, more or less, the Unicode version, > and we know what that is for the builtin provider as well as ICU. Attached a rough patch for the purposes of discussion. It tracks the ctype version separately, but doesn't do anything with it yet. The main problem is that it's one more slightly confusing thing to understand, especially in pg_database because it's the ctype version of the database default collation, not necessarily datctype. Maybe we can do something with the naming or catalog representation to make this more clear? Regards, Jeff Davis
Attachment
Re: [18] separate collation and ctype versions, and cleanup of pg_database locale fields
From
Jeff Davis
Date:
On Sat, 2024-07-27 at 08:34 -0700, Jeff Davis wrote: > Attached a rough patch for the purposes of discussion. It tracks the > ctype version separately, but doesn't do anything with it yet. I'm withdrawing this patch due to a lack of discussion. I may make a related proposal for v19. Regards, Jeff Davis