Thread: initdb --no-locale=C doesn't work as specified when the environment is not C

initdb --no-locale=C doesn't work as specified when the environment is not C

From
Kyotaro Horiguchi
Date:
Commit 3e51b278db leaves lc_* conf lines as commented-out when
their value is "C". This leads to the following behavior.

$ echo LANG
ja_JP.UTF8
$ initdb --no-locale hoge
$ grep lc_ hoge/postgresql.conf
#lc_messages = 'C'                      # locale for system error message
#lc_monetary = 'C'                      # locale for monetary formatting
#lc_numeric = 'C'                       # locale for number formatting
#lc_time = 'C'                          # locale for time formatting

In this scenario, the postmaster ends up emitting log massages in
Japanese, which contradicts the documentation.

https://www.postgresql.org/docs/devel/app-initdb.html

> --locale=locale 
>   Sets the default locale for the database cluster. If this option is
>   not specified, the locale is inherited from the environment that
>   initdb runs in. Locale support is described in Section 24.1.
> 
..
> --lc-messages=locale
>   Like --locale, but only sets the locale in the specified category.

Here's a somewhat amusing case:

$ echo LANG
ja_JP.UTF8
$ initdb --lc_messages=C
$ grep lc_ hoge/postgresql.conf 
#lc_messages = 'C'                      # locale for system error message
lc_monetary = 'ja_JP.UTF8'              # locale for monetary formatting
lc_numeric = 'ja_JP.UTF8'               # locale for number formatting
lc_time = 'ja_JP.UTF8'                  # locale for time formatting

Hmm. it seems that initdb replaces the values of all categories
*except the specified one*. This behavior seems incorrect to
me. initdb should replace the value when explicitly specified in the
command line. If you use -c lc_messages=C, it does perform the
expected behavior to some extent, but I believe this is a separate
matter.

I have doubts about not replacing these lines for purely cosmetic
reasons. In this mail, I've attached three possible solutions for the
original issue: the first one enforces replacement only when specified
on the command line, the second one simply always performs
replacement, and the last one addresses the concern about the absence
of quotes around "C" by allowing explicit specification. (FWIW, I
prefer the last one.)

What do you think about these?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment
Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
> Commit 3e51b278db leaves lc_* conf lines as commented-out when
> their value is "C". This leads to the following behavior.

Hmm ... I see a contributing factor here: this bit in
postgresql.conf.sample is a lie:

#lc_messages = 'C'            # locale for system error message
                    # strings

A look in guc_tables.c shows that the actual default is '' (empty
string), which means "use the environment", and that matches how the
variable is documented in config.sgml.  Somebody --- quite possibly me
--- was misled by the contents of postgresql.conf.sample into thinking
that the lc_xxx GUCs all default to C, when that's only true for the
others.

I think that a more correct fix for this would treat lc_messages
differently from the other lc_xxx GUCs.  Maybe just eliminate the
hack about not substituting "C" for that one?

In any case, we need to fix this mistake in postgresql.conf.sample.

            regards, tom lane



Re: initdb --no-locale=C doesn't work as specified when the environment is not C

From
Kyotaro Horiguchi
Date:
At Wed, 22 Nov 2023 11:04:01 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
> Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
> > Commit 3e51b278db leaves lc_* conf lines as commented-out when
> > their value is "C". This leads to the following behavior.
> 
> Hmm ... I see a contributing factor here: this bit in
> postgresql.conf.sample is a lie:
> 
> #lc_messages = 'C'            # locale for system error message
>                     # strings
> 
> A look in guc_tables.c shows that the actual default is '' (empty
> string), which means "use the environment", and that matches how the
> variable is documented in config.sgml.  Somebody --- quite possibly me
> --- was misled by the contents of postgresql.conf.sample into thinking
> that the lc_xxx GUCs all default to C, when that's only true for the
> others.

It seems somewhat intentional that only lc_messages references the
environment at boot time. On the other hand, previously, in the
absence of a specified locale, initdb would embed the environmental
value in the configuration file, as it seems to be documented. Given
that initdb is always used for cluster creation, it's unlikey that
systems depend on this boot-time default for their operation.

> I think that a more correct fix for this would treat lc_messages
> differently from the other lc_xxx GUCs.  Maybe just eliminate the
> hack about not substituting "C" for that one?

For example, the --no-locale option for initdb is supposed to set all
categories to 'C'. That approach would lead to the postgres
referencing the runtime environment for all categories except
lc_messages, which I believe contradicts the documentation. In my
biew, if lc_messages is exempted from that hack, then all other
categories should be similarly excluded as in the second approach
among the attached in the previous mail.

> In any case, we need to fix this mistake in postgresql.conf.sample.

If you are not particularly concerned about the presence of quotation
marks, I think it would be fine to go with the second approach and
make the necessary modification to the configuration file accordingly.

With the attached patch, initdb --no-locale generates the following
lines in the configuration file.

> lc_messages = C                # locale for system error message
>                     # strings
> lc_monetary = C                # locale for monetary formatting
> lc_numeric = C                # locale for number formatting
> lc_time = C                # locale for time formatting

By the way, the lines around lc_* in the sample file seem to have
somewhat inconsistent indentations. Wouldnt' it be preferable to fix
this? (The attached doesn't that.)


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment
Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
> It seems somewhat intentional that only lc_messages references the
> environment at boot time. On the other hand, previously, in the
> absence of a specified locale, initdb would embed the environmental
> value in the configuration file, as it seems to be documented. Given
> that initdb is always used for cluster creation, it's unlikey that
> systems depend on this boot-time default for their operation.

Yeah, after further reflection there doesn't seem to be a lot of value
in leaving these entries commented-out, even in the cases where that's
technically correct.  Let's just go back to the old behavior of always
uncommenting them; that stood for years without complaints.  So I
committed your latest patch as-is.

> By the way, the lines around lc_* in the sample file seem to have
> somewhat inconsistent indentations. Wouldnt' it be preferable to fix
> this? (The attached doesn't that.)

They look all right if you assume the tab width is 8, which seems to
be what is used elsewhere in the file.  I think there's been some
prior discussion about whether to ban use of tabs at all in these
sample files, so as to reduce confusion about how wide the tabs are.
But I'm not touching that question today.

            regards, tom lane



At Wed, 10 Jan 2024 18:16:03 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
> Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
> > It seems somewhat intentional that only lc_messages references the
> > environment at boot time. On the other hand, previously, in the
> > absence of a specified locale, initdb would embed the environmental
> > value in the configuration file, as it seems to be documented. Given
> > that initdb is always used for cluster creation, it's unlikey that
> > systems depend on this boot-time default for their operation.
> 
> Yeah, after further reflection there doesn't seem to be a lot of value
> in leaving these entries commented-out, even in the cases where that's
> technically correct.  Let's just go back to the old behavior of always
> uncommenting them; that stood for years without complaints.  So I
> committed your latest patch as-is.

I'm glad you understand. Thank you for commiting.

> > By the way, the lines around lc_* in the sample file seem to have
> > somewhat inconsistent indentations. Wouldnt' it be preferable to fix
> > this? (The attached doesn't that.)
> 
> They look all right if you assume the tab width is 8, which seems to
> be what is used elsewhere in the file.  I think there's been some
> prior discussion about whether to ban use of tabs at all in these
> sample files, so as to reduce confusion about how wide the tabs are.
> But I'm not touching that question today.

Ah, I see, I understood.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center