Thread: missing warning in pg_import_system_collations

missing warning in pg_import_system_collations

From
Anton Voloshin
Date:
Hello hackers,

In pg_import_system_collations() there is this fragment of code:

enc = pg_get_encoding_from_locale(localebuf, false);
if (enc < 0)
{
    /* error message printed by pg_get_encoding_from_locale() */
    continue;
}

However, false passed to pg_get_encoding_from_locale() means 
write_message argument is false, so no error message is ever printed.
I propose an obvious patch (see attachment).

Introduced in aa17c06fb in January 2017 when debug was replaced by 
false, so I guess back-patching through 10 would be appropriate.

-- 
Anton Voloshin
Postgres Professional, The Russian Postgres Company
https://postgrespro.ru

Attachment

Re: missing warning in pg_import_system_collations

From
Ranier Vilela
Date:
Em qui., 9 de set. de 2021 às 03:45, Anton Voloshin <a.voloshin@postgrespro.ru> escreveu:
Hello hackers,

In pg_import_system_collations() there is this fragment of code:

enc = pg_get_encoding_from_locale(localebuf, false);
if (enc < 0)
{
        /* error message printed by pg_get_encoding_from_locale() */
        continue;
}

However, false passed to pg_get_encoding_from_locale() means
write_message argument is false, so no error message is ever printed.
I propose an obvious patch (see attachment).
Yeah, seems correct to me.
The comment clearly expresses the intention.


Introduced in aa17c06fb in January 2017 when debug was replaced by
false, so I guess back-patching through 10 would be appropriate.
This is an oversight.

+1 from me.

Ranier Vilela

Re: missing warning in pg_import_system_collations

From
Tom Lane
Date:
Anton Voloshin <a.voloshin@postgrespro.ru> writes:
> In pg_import_system_collations() there is this fragment of code:

> enc = pg_get_encoding_from_locale(localebuf, false);
> if (enc < 0)
> {
>     /* error message printed by pg_get_encoding_from_locale() */
>     continue;
> }

> However, false passed to pg_get_encoding_from_locale() means 
> write_message argument is false, so no error message is ever printed.
> I propose an obvious patch (see attachment).
> Introduced in aa17c06fb in January 2017 when debug was replaced by 
> false, so I guess back-patching through 10 would be appropriate.

I don't think this is obvious at all.

In the original coding (before aa17c06fb, when this code was in initdb),
we printed a warning if "debug" was true and otherwise printed nothing.
The other "if (debug)" cases in the code that got moved over were
translated to "elog(DEBUG1)", but there isn't any API to make
pg_get_encoding_from_locale() log at that level.

What you propose to do here would promote this case from
ought-to-be-DEBUG1 to WARNING, which seems to me to be way too much in the
user's face.  Or, if there actually is a case for complaining, then all
those messages ought to be WARNING not DEBUG1.  But I'm inclined to think
that having pg_import_system_collations silently ignore unusable locales
is the right thing most of the time.

Assuming we don't want to change pg_get_encoding_from_locale()'s API,
the simplest fix is to duplicate its error message, so more or less

     if (enc < 0)
     {
-        /* error message printed by pg_get_encoding_from_locale() */
+        elog(DEBUG1, "could not determine encoding for locale \"%s\"",
+             localebuf)));
         continue;
     }

            regards, tom lane



Re: missing warning in pg_import_system_collations

From
Anton Voloshin
Date:
On 09/09/2021 21:51, Tom Lane wrote:
> What you propose to do here would promote this case from
> ought-to-be-DEBUG1 to WARNING, which seems to me to be way too much in the
> user's face.  Or, if there actually is a case for complaining, then all
> those messages ought to be WARNING not DEBUG1. ...
> 
> Assuming we don't want to change pg_get_encoding_from_locale()'s API,
> the simplest fix is to duplicate its error message, so more or less
> 
>       if (enc < 0)
>       {
> -        /* error message printed by pg_get_encoding_from_locale() */
> +        elog(DEBUG1, "could not determine encoding for locale \"%s\"",
> +             localebuf)));
>           continue;
>       }

Upon thinking a little more, I agree.
The warnings I happen to get from initdb on my current machine (with 
many various locales installed, more than on a typical box) are:

performing post-bootstrap initialization ... 2021-09-09 22:04:01.678 +07 
[482312] WARNING:  could not determine encoding for locale 
"hy_AM.armscii8": codeset is "ARMSCII-8"
2021-09-09 22:04:01.679 +07 [482312] WARNING:  could not determine 
encoding for locale "ka_GE": codeset is "GEORGIAN-PS"
2021-09-09 22:04:01.679 +07 [482312] WARNING:  could not determine 
encoding for locale "kk_KZ": codeset is "PT154"
2021-09-09 22:04:01.679 +07 [482312] WARNING:  could not determine 
encoding for locale "kk_KZ.rk1048": codeset is "RK1048"
2021-09-09 22:04:01.686 +07 [482312] WARNING:  could not determine 
encoding for locale "tg_TJ": codeset is "KOI8-T"
2021-09-09 22:04:01.686 +07 [482312] WARNING:  could not determine 
encoding for locale "th_TH": codeset is "TIS-620"
ok

While they are definitely interesting as DEBUG1, not so as a WARNING.

So, +1 from me for your proposed elog(DEBUG1, ...); patch. Thank you.

-- 
Anton Voloshin
Postgres Professional, The Russian Postgres Company
https://postgrespro.ru



Re: missing warning in pg_import_system_collations

From
Tom Lane
Date:
Anton Voloshin <a.voloshin@postgrespro.ru> writes:
> On 09/09/2021 21:51, Tom Lane wrote:
>> Assuming we don't want to change pg_get_encoding_from_locale()'s API,
>> the simplest fix is to duplicate its error message, so more or less
>>
>> if (enc < 0)
>> {
>> -        /* error message printed by pg_get_encoding_from_locale() */
>> +        elog(DEBUG1, "could not determine encoding for locale \"%s\"",
>> +             localebuf)));
>> continue;
>> }

> Upon thinking a little more, I agree.

Another approach we could take is to deem the comment incorrect and
just remove it, codifying the current behavior of silently ignoring
unrecognized encodings.  The reason that seems like it might be
appropriate is that the logic immediately below this bit silently
ignores encodings that are known but are frontend-only:

            if (!PG_VALID_BE_ENCODING(enc))
                continue;        /* ignore locales for client-only encodings */

It's sure not very clear to me why one case deserves a message and the
other not.  Perhaps they both do, which would lead to adding another
DEBUG1 message here.

            regards, tom lane



Re: missing warning in pg_import_system_collations

From
Anton Voloshin
Date:
On 10/09/2021 01:37, Tom Lane wrote:
> Another approach we could take is to deem the comment incorrect and
> just remove it, codifying the current behavior of silently ignoring
> unrecognized encodings.  The reason that seems like it might be
> appropriate is that the logic immediately below this bit silently
> ignores encodings that are known but are frontend-only:
> 
>              if (!PG_VALID_BE_ENCODING(enc))
>                  continue;        /* ignore locales for client-only encodings */
> 
> It's sure not very clear to me why one case deserves a message and the
> other not.  Perhaps they both do, which would lead to adding another
> DEBUG1 message here.

I'm not an expert in locales, but I think it makes some sense to be 
silent about encodings we have consciously decided to ignore as we have 
them in our tables, but marked them as frontend-only 
(!PG_VALID_BE_ENCODING(enc)).
Just like it makes sense to do give a debug-level warning about 
encodings seen in locale -a output but not recognized by us at all 
(pg_get_encoding_from_locale(localebuf, false) < 0).

Therefore I think your patch with duplicated error message is better 
than what we have currently. I don't see how adding debug-level messages 
about skipping frontend-only encodings would be of any significant use here.

Unless someone more experienced in locales' subtleties would like to 
chime in.

-- 
Anton Voloshin
Postgres Professional, The Russian Postgres Company
https://postgrespro.ru



Re: missing warning in pg_import_system_collations

From
Tom Lane
Date:
Anton Voloshin <a.voloshin@postgrespro.ru> writes:
> On 10/09/2021 01:37, Tom Lane wrote:
>> It's sure not very clear to me why one case deserves a message and the
>> other not.  Perhaps they both do, which would lead to adding another
>> DEBUG1 message here.

> I'm not an expert in locales, but I think it makes some sense to be
> silent about encodings we have consciously decided to ignore as we have
> them in our tables, but marked them as frontend-only
> (!PG_VALID_BE_ENCODING(enc)).

I'm not really buying that.  It seems to me that the only reason anyone
would examine this debug output at all is that they want to know "why
didn't this locale (which I can see in 'locale -a' output) get imported?".
So the only cases I'm inclined to not log about are when we skip a locale
because there's already a matching pg_collation entry.

I experimented with the attached draft patch.  The debug output on my
RHEL8 box (with a more-or-less-default set of locales) looks like

2021-09-11 12:13:09.908 EDT [41731] DEBUG:  could not identify encoding for locale "hy_AM.armscii8"
2021-09-11 12:13:09.909 EDT [41731] DEBUG:  could not identify encoding for locale "ka_GE"
2021-09-11 12:13:09.909 EDT [41731] DEBUG:  could not identify encoding for locale "ka_GE.georgianps"
2021-09-11 12:13:09.909 EDT [41731] DEBUG:  could not identify encoding for locale "kk_KZ"
2021-09-11 12:13:09.909 EDT [41731] DEBUG:  could not identify encoding for locale "kk_KZ.pt154"
2021-09-11 12:13:09.926 EDT [41731] DEBUG:  could not identify encoding for locale "tg_TJ"
2021-09-11 12:13:09.926 EDT [41731] DEBUG:  could not identify encoding for locale "tg_TJ.koi8t"
2021-09-11 12:13:09.926 EDT [41731] DEBUG:  could not identify encoding for locale "th_TH"
2021-09-11 12:13:09.926 EDT [41731] DEBUG:  could not identify encoding for locale "th_TH.tis620"
2021-09-11 12:13:09.926 EDT [41731] DEBUG:  could not identify encoding for locale "thai"
2021-09-11 12:13:09.929 EDT [41731] DEBUG:  skipping client-only locale "zh_CN.gb18030"
2021-09-11 12:13:09.929 EDT [41731] DEBUG:  skipping client-only locale "zh_CN.gbk"
2021-09-11 12:13:09.930 EDT [41731] DEBUG:  skipping client-only locale "zh_HK"
2021-09-11 12:13:09.930 EDT [41731] DEBUG:  skipping client-only locale "zh_HK.big5hkscs"
2021-09-11 12:13:09.930 EDT [41731] DEBUG:  skipping client-only locale "zh_SG.gbk"
2021-09-11 12:13:09.930 EDT [41731] DEBUG:  skipping client-only locale "zh_TW"
2021-09-11 12:13:09.930 EDT [41731] DEBUG:  skipping client-only locale "zh_TW.big5"

I don't see a good reason to think that someone would be less confused
about why we reject zh_HK than why we reject th_TH.  So I think if we're
going to worry about this then we should add both messages.

            regards, tom lane

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 4075f991a0..8fe8227751 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -597,11 +597,15 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
             enc = pg_get_encoding_from_locale(localebuf, false);
             if (enc < 0)
             {
-                /* error message printed by pg_get_encoding_from_locale() */
+                elog(DEBUG1, "could not identify encoding for locale \"%s\"",
+                     localebuf);
                 continue;
             }
             if (!PG_VALID_BE_ENCODING(enc))
-                continue;        /* ignore locales for client-only encodings */
+            {
+                elog(DEBUG1, "skipping client-only locale \"%s\"", localebuf);
+                continue;
+            }
             if (enc == PG_SQL_ASCII)
                 continue;        /* C/POSIX are already in the catalog */