On Thu, 2024-03-14 at 09:54 +0100, Peter Eisentraut wrote:
> * doc/src/sgml/charset.sgml
>
> I don't understand the purpose of this sentence:
>
> "When using this locale, the behavior may depend on the database
> encoding."
The "C" locale (in either the builtin or libc provider) can sort
differently in different encodings, because it's based on memcmp. For
instance:
select U&'\20AC' > U&'\201A' collate "C";
Returns true in UTF-8 and false in WIN1252. That's why UCS_BASIC is
only available in UTF-8, because (at least for some encodings) we'd
have to decode before comparison to get the code-point-order semantics
right.
In other words, the "C" collation is not a well-defined order, but
UCS_BASIC and C.UTF-8 are well-defined.
Suggestions for better wording are welcome.
> * doc/src/sgml/ref/create_database.sgml
>
> The new parameter builtin_locale is not documented.
Thank you, fixed in 0001 (review fixup).
> * src/backend/commands/collationcmds.c
>
> I think DefineCollation() should set collencoding = -1 for the
> COLLPROVIDER_BUILTIN case. -1 stands for any encoding. Or at least
> explain why not?
In the attached v25-0001 (review fixup) I have made it the
responsibility of a function, and then extended that for the C.UTF-8
(0002) and PG_UNICODE_FAST locales (0007).
> * src/backend/utils/adt/pg_locale.c
>
> This part is a bit confusing:
>
> + cache_entry->collate_is_c = true;
> + cache_entry->ctype_is_c = (strcmp(colllocale, "C") == 0);
>
> Is collate always C but ctype only sometimes? Does this anticipate
> future patches in this series? Maybe in this patch it should always
> be true?
Made it a constant in v25-0001, and changed it in 0002
>
> * src/bin/initdb/initdb.c
>
> + printf(_(" --builtin-locale=LOCALE set builtin locale name
> for new databases\n"));
>
> Put in a line break so that the right "column" lines up.
Fixed in 0001
> This output should line up better:
>
> The database cluster will be initialized with this locale
> configuration:
> default collation provider: icu
> default collation locale: en
> LC_COLLATE: C
> LC_CTYPE: C
> ...
>
> Also, why are there two spaces after "provider: "?
>
> Also we call these locale provider on input, why are they collation
> providers on output? What is a "collation locale"?
I tried to fix these things in 0001.
> * src/bin/pg_upgrade/t/002_pg_upgrade.pl
>
> +if ($oldnode->pg_version >= '17devel')
>
> This is weird. >= is a numeric comparison, so providing a string
> with
> non-digits is misleading at best.
It's actually not a numeric comparison, it's an overloaded comparison
op for the Version class.
See 32dd2c1eff and:
https://www.postgresql.org/message-id/1738174.1710274577%40sss.pgh.pa.us
> * src/test/icu/t/010_database.pl
>
> -# Test that LOCALE works for ICU locales if LC_COLLATE and LC_CTYPE
> -# are specified
>
> Why remove this test?
It must have been lost during a rebase, fixed in 0001.
> Change the name of the new database to be different from the name of
> the template database.
Fixed in 0001.
New series attached.
Regards,
Jeff Davis