Thread: missing warning in pg_import_system_collations
Hello hackers, In pg_import_system_collations() there is this fragment of code: enc = pg_get_encoding_from_locale(localebuf, false); if (enc < 0) { /* error message printed by pg_get_encoding_from_locale() */ continue; } However, false passed to pg_get_encoding_from_locale() means write_message argument is false, so no error message is ever printed. I propose an obvious patch (see attachment). Introduced in aa17c06fb in January 2017 when debug was replaced by false, so I guess back-patching through 10 would be appropriate. -- Anton Voloshin Postgres Professional, The Russian Postgres Company https://postgrespro.ru
Attachment
Em qui., 9 de set. de 2021 às 03:45, Anton Voloshin <a.voloshin@postgrespro.ru> escreveu:
Hello hackers,
In pg_import_system_collations() there is this fragment of code:
enc = pg_get_encoding_from_locale(localebuf, false);
if (enc < 0)
{
/* error message printed by pg_get_encoding_from_locale() */
continue;
}
However, false passed to pg_get_encoding_from_locale() means
write_message argument is false, so no error message is ever printed.
I propose an obvious patch (see attachment).
Yeah, seems correct to me.
The comment clearly expresses the intention.
Introduced in aa17c06fb in January 2017 when debug was replaced by
false, so I guess back-patching through 10 would be appropriate.
This is an oversight.
+1 from me.
Ranier Vilela
Anton Voloshin <a.voloshin@postgrespro.ru> writes: > In pg_import_system_collations() there is this fragment of code: > enc = pg_get_encoding_from_locale(localebuf, false); > if (enc < 0) > { > /* error message printed by pg_get_encoding_from_locale() */ > continue; > } > However, false passed to pg_get_encoding_from_locale() means > write_message argument is false, so no error message is ever printed. > I propose an obvious patch (see attachment). > Introduced in aa17c06fb in January 2017 when debug was replaced by > false, so I guess back-patching through 10 would be appropriate. I don't think this is obvious at all. In the original coding (before aa17c06fb, when this code was in initdb), we printed a warning if "debug" was true and otherwise printed nothing. The other "if (debug)" cases in the code that got moved over were translated to "elog(DEBUG1)", but there isn't any API to make pg_get_encoding_from_locale() log at that level. What you propose to do here would promote this case from ought-to-be-DEBUG1 to WARNING, which seems to me to be way too much in the user's face. Or, if there actually is a case for complaining, then all those messages ought to be WARNING not DEBUG1. But I'm inclined to think that having pg_import_system_collations silently ignore unusable locales is the right thing most of the time. Assuming we don't want to change pg_get_encoding_from_locale()'s API, the simplest fix is to duplicate its error message, so more or less if (enc < 0) { - /* error message printed by pg_get_encoding_from_locale() */ + elog(DEBUG1, "could not determine encoding for locale \"%s\"", + localebuf))); continue; } regards, tom lane
On 09/09/2021 21:51, Tom Lane wrote: > What you propose to do here would promote this case from > ought-to-be-DEBUG1 to WARNING, which seems to me to be way too much in the > user's face. Or, if there actually is a case for complaining, then all > those messages ought to be WARNING not DEBUG1. ... > > Assuming we don't want to change pg_get_encoding_from_locale()'s API, > the simplest fix is to duplicate its error message, so more or less > > if (enc < 0) > { > - /* error message printed by pg_get_encoding_from_locale() */ > + elog(DEBUG1, "could not determine encoding for locale \"%s\"", > + localebuf))); > continue; > } Upon thinking a little more, I agree. The warnings I happen to get from initdb on my current machine (with many various locales installed, more than on a typical box) are: performing post-bootstrap initialization ... 2021-09-09 22:04:01.678 +07 [482312] WARNING: could not determine encoding for locale "hy_AM.armscii8": codeset is "ARMSCII-8" 2021-09-09 22:04:01.679 +07 [482312] WARNING: could not determine encoding for locale "ka_GE": codeset is "GEORGIAN-PS" 2021-09-09 22:04:01.679 +07 [482312] WARNING: could not determine encoding for locale "kk_KZ": codeset is "PT154" 2021-09-09 22:04:01.679 +07 [482312] WARNING: could not determine encoding for locale "kk_KZ.rk1048": codeset is "RK1048" 2021-09-09 22:04:01.686 +07 [482312] WARNING: could not determine encoding for locale "tg_TJ": codeset is "KOI8-T" 2021-09-09 22:04:01.686 +07 [482312] WARNING: could not determine encoding for locale "th_TH": codeset is "TIS-620" ok While they are definitely interesting as DEBUG1, not so as a WARNING. So, +1 from me for your proposed elog(DEBUG1, ...); patch. Thank you. -- Anton Voloshin Postgres Professional, The Russian Postgres Company https://postgrespro.ru
Anton Voloshin <a.voloshin@postgrespro.ru> writes: > On 09/09/2021 21:51, Tom Lane wrote: >> Assuming we don't want to change pg_get_encoding_from_locale()'s API, >> the simplest fix is to duplicate its error message, so more or less >> >> if (enc < 0) >> { >> - /* error message printed by pg_get_encoding_from_locale() */ >> + elog(DEBUG1, "could not determine encoding for locale \"%s\"", >> + localebuf))); >> continue; >> } > Upon thinking a little more, I agree. Another approach we could take is to deem the comment incorrect and just remove it, codifying the current behavior of silently ignoring unrecognized encodings. The reason that seems like it might be appropriate is that the logic immediately below this bit silently ignores encodings that are known but are frontend-only: if (!PG_VALID_BE_ENCODING(enc)) continue; /* ignore locales for client-only encodings */ It's sure not very clear to me why one case deserves a message and the other not. Perhaps they both do, which would lead to adding another DEBUG1 message here. regards, tom lane
On 10/09/2021 01:37, Tom Lane wrote: > Another approach we could take is to deem the comment incorrect and > just remove it, codifying the current behavior of silently ignoring > unrecognized encodings. The reason that seems like it might be > appropriate is that the logic immediately below this bit silently > ignores encodings that are known but are frontend-only: > > if (!PG_VALID_BE_ENCODING(enc)) > continue; /* ignore locales for client-only encodings */ > > It's sure not very clear to me why one case deserves a message and the > other not. Perhaps they both do, which would lead to adding another > DEBUG1 message here. I'm not an expert in locales, but I think it makes some sense to be silent about encodings we have consciously decided to ignore as we have them in our tables, but marked them as frontend-only (!PG_VALID_BE_ENCODING(enc)). Just like it makes sense to do give a debug-level warning about encodings seen in locale -a output but not recognized by us at all (pg_get_encoding_from_locale(localebuf, false) < 0). Therefore I think your patch with duplicated error message is better than what we have currently. I don't see how adding debug-level messages about skipping frontend-only encodings would be of any significant use here. Unless someone more experienced in locales' subtleties would like to chime in. -- Anton Voloshin Postgres Professional, The Russian Postgres Company https://postgrespro.ru
Anton Voloshin <a.voloshin@postgrespro.ru> writes: > On 10/09/2021 01:37, Tom Lane wrote: >> It's sure not very clear to me why one case deserves a message and the >> other not. Perhaps they both do, which would lead to adding another >> DEBUG1 message here. > I'm not an expert in locales, but I think it makes some sense to be > silent about encodings we have consciously decided to ignore as we have > them in our tables, but marked them as frontend-only > (!PG_VALID_BE_ENCODING(enc)). I'm not really buying that. It seems to me that the only reason anyone would examine this debug output at all is that they want to know "why didn't this locale (which I can see in 'locale -a' output) get imported?". So the only cases I'm inclined to not log about are when we skip a locale because there's already a matching pg_collation entry. I experimented with the attached draft patch. The debug output on my RHEL8 box (with a more-or-less-default set of locales) looks like 2021-09-11 12:13:09.908 EDT [41731] DEBUG: could not identify encoding for locale "hy_AM.armscii8" 2021-09-11 12:13:09.909 EDT [41731] DEBUG: could not identify encoding for locale "ka_GE" 2021-09-11 12:13:09.909 EDT [41731] DEBUG: could not identify encoding for locale "ka_GE.georgianps" 2021-09-11 12:13:09.909 EDT [41731] DEBUG: could not identify encoding for locale "kk_KZ" 2021-09-11 12:13:09.909 EDT [41731] DEBUG: could not identify encoding for locale "kk_KZ.pt154" 2021-09-11 12:13:09.926 EDT [41731] DEBUG: could not identify encoding for locale "tg_TJ" 2021-09-11 12:13:09.926 EDT [41731] DEBUG: could not identify encoding for locale "tg_TJ.koi8t" 2021-09-11 12:13:09.926 EDT [41731] DEBUG: could not identify encoding for locale "th_TH" 2021-09-11 12:13:09.926 EDT [41731] DEBUG: could not identify encoding for locale "th_TH.tis620" 2021-09-11 12:13:09.926 EDT [41731] DEBUG: could not identify encoding for locale "thai" 2021-09-11 12:13:09.929 EDT [41731] DEBUG: skipping client-only locale "zh_CN.gb18030" 2021-09-11 12:13:09.929 EDT [41731] DEBUG: skipping client-only locale "zh_CN.gbk" 2021-09-11 12:13:09.930 EDT [41731] DEBUG: skipping client-only locale "zh_HK" 2021-09-11 12:13:09.930 EDT [41731] DEBUG: skipping client-only locale "zh_HK.big5hkscs" 2021-09-11 12:13:09.930 EDT [41731] DEBUG: skipping client-only locale "zh_SG.gbk" 2021-09-11 12:13:09.930 EDT [41731] DEBUG: skipping client-only locale "zh_TW" 2021-09-11 12:13:09.930 EDT [41731] DEBUG: skipping client-only locale "zh_TW.big5" I don't see a good reason to think that someone would be less confused about why we reject zh_HK than why we reject th_TH. So I think if we're going to worry about this then we should add both messages. regards, tom lane diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c index 4075f991a0..8fe8227751 100644 --- a/src/backend/commands/collationcmds.c +++ b/src/backend/commands/collationcmds.c @@ -597,11 +597,15 @@ pg_import_system_collations(PG_FUNCTION_ARGS) enc = pg_get_encoding_from_locale(localebuf, false); if (enc < 0) { - /* error message printed by pg_get_encoding_from_locale() */ + elog(DEBUG1, "could not identify encoding for locale \"%s\"", + localebuf); continue; } if (!PG_VALID_BE_ENCODING(enc)) - continue; /* ignore locales for client-only encodings */ + { + elog(DEBUG1, "skipping client-only locale \"%s\"", localebuf); + continue; + } if (enc == PG_SQL_ASCII) continue; /* C/POSIX are already in the catalog */