Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47language tags. Should it? - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47language tags. Should it?
Date
Msg-id CAH2-Wz=ets89wn434EAG4NJB0SJb=irmXQAVm-BPhrYphcHXqA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47language tags. Should it?  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Tue, Sep 19, 2017 at 7:01 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> I didn't post the patch that generates the errors in my opening e-mail
> because I'm not confident it's correct just yet. And, I think that I
> see a bigger problem: we pass a string that is almost certainly a BCP
> 47 string to ucol_open() from within pg_newlocale_from_collation(). We
> do so despite the fact that ucol_open() apparently doesn't accept BCP
> 47 syntax locale strings until ICU 54 [1]. Seems entirely possible
> that this accounts for the problems you saw on ICU 4.2, back when we
> were still creating keyword variants (I guess that the keyword
> variants seem very "BCP 47-ish" to me).

ISTM that the proper fix here is to use uloc_forLanguageTag() [1] (not
to be confused with uloc_toLanguageTag()) to get a valid locale
identifier on versions of ICU where BCP 47 format tags are not
directly accepted as locale identifiers (versions prior to ICU 54).
This would happen as an extra step within
pg_newlocale_from_collation(), since BCP 47 format would be what is
stored in pg_collation.

Since uloc_forLanguageTag() become stable in ICU 4.2, the earliest
version that we support, I believe that that would leave us in good
shape.

[1] https://ssl.icu-project.org/apiref/icu4c/uloc_8h.html#aa45d6457f72867880f079e27a63c6fcb
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] POC: Cache data in GetSnapshotData()
Next
From: Darafei Praliaskouski
Date:
Subject: Re: [HACKERS] compress method for spgist - 2