Re: Collation versioning - Mailing list pgsql-hackers

From Juan José Santamaría Flecha
Subject Re: Collation versioning
Date
Msg-id CAC+AXB2xvqr3w6QPB_THNZ0-ZkG22OXnWALz5-Y1Mn3LYYsgCg@mail.gmail.com
Whole thread Raw
In response to Re: Collation versioning  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Collation versioning  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers

On Tue, Nov 3, 2020 at 10:49 PM Thomas Munro <thomas.munro@gmail.com> wrote:

So we have:

1.  Windows locale names, like "English_United States.1252".  Windows
still returns these from setlocale(), so they finish up in datcollate,
and yet some relevant APIs don't accept them, at least on some
machines.

2.  BCP 47/RFC 5646 language tags, like "en-US".  Windows uses these
in relevant new APIs, including the case in point.

3.  Unix-style (XPG?  ISO/IEC 15897?) locale names, like "en_US"
("language[_territory[.codeset]][@modifier]").  These are used for
message catalogues.

We have a VS2015+ way of converting from form 1 to form 2 (and thence
3 by s/-/_/), and an older way.  Unfortunately, the new way looks a
little too fuzzy: if i'm reading it right, search_locale_enum() might
stop on either "en" or "en-AU", given "English_Australia", depending
on the search order, no?

No, that is not the case. "English" could match any locale if the enumeration order was to be changed in the future, right now the order is a given (Language, Location), but "English_Australia" can only match  "en-AU".

This may be fine for the purpose of looking
up error messages with gettext() (where there is only one English
language message catalogue, we haven't got around to translating our
errors into 'strayan yet), but it doesn't seem like a good way to look
up the collation version; for all I know, "en" variants might change
independently (I doubt it in practice, but in theory it's wrong).  We
want the same algorithm that Windows uses internally to resolve the
old style name to a collation; in other words we probably want
something more like the code path that they took away in VS2015 :-(.

 We could create a static table with the conversion based on what was discussed for commit a169155, please find attached a spreadsheet with the comparison. This would require maintenance as new LCIDs are released [1].


Regards,

Juan José Santamaría
Attachment

pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Next
From: Michael Paquier
Date:
Subject: Re: Collation versioning