Thread: Collation DDL inconsistencies

Collation DDL inconsistencies

From
Jeff Davis
Date:
When I looked at the bug:

https://postgr.es/m/CALDQics_oBEYfOnu_zH6yw9WR1waPCmcrqxQ8+39hK3Op=z2UQ@mail.gmail.com

I noticed that the DDL around collations is inconsistent. For instance,
CREATE COLLATION[1] uses LOCALE, LC_COLLATE, and LC_CTYPE parameters to
specify either libc locales or an icu locale; whereas CREATE
DATABASE[2] uses LOCALE, LC_COLLATE, and LC_CTYPE always for libc, and
ICU_LOCALE if the default collation is ICU.

The catalog representation is strange in a different way:
datcollate/collcollate are always for libc, and daticulocale is for
icu. That means anything that deals with those fields needs to pick the
right one based on the provider.

If this were a clean slate, it would make more sense if it were
something like:

   datcollate/collcollate: to instantiate pg_locale_t
   datctype/collctype: to instantiate pg_locale_t
   datlibccollate: used by libc elsewhere
   datlibcctype: used by libc elsewhere
   daticulocale/colliculocale: remove these fields

That way, if you are instantiating a pg_locale_t, you always just pass
datcollate/datctype/collcollate/collctype, regardless of the provider
(pg_newlocale_from_collation() would figure it out). And if you are
going to do something straight with libc, you always use
datlibccollate/datlibcctype.

Aside: why don't we support different collate/ctype with ICU? It
appears that u_strToTitle/u_strToUpper/u_strToLower just accept a
string "locale", and it would be easy enough to pass it whatever is in
datctype/collctype, right? We should validate that it's a valid locale;
but other than that, I don't see the problem.

Thoughts? Implementation-wise, I suppose this could create some
annoyances in pg_dump.

[1] https://www.postgresql.org/docs/devel/sql-createcollation.html
[2] https://www.postgresql.org/docs/devel/sql-createdatabase.html
[3] https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/ustring_8h.html


--
Jeff Davis
PostgreSQL Contributor Team - AWS