Thread: initdb --no-locale=C doesn't work as specified when the environment is not C
initdb --no-locale=C doesn't work as specified when the environment is not C
From
Kyotaro Horiguchi
Date:
Commit 3e51b278db leaves lc_* conf lines as commented-out when their value is "C". This leads to the following behavior. $ echo LANG ja_JP.UTF8 $ initdb --no-locale hoge $ grep lc_ hoge/postgresql.conf #lc_messages = 'C' # locale for system error message #lc_monetary = 'C' # locale for monetary formatting #lc_numeric = 'C' # locale for number formatting #lc_time = 'C' # locale for time formatting In this scenario, the postmaster ends up emitting log massages in Japanese, which contradicts the documentation. https://www.postgresql.org/docs/devel/app-initdb.html > --locale=locale > Sets the default locale for the database cluster. If this option is > not specified, the locale is inherited from the environment that > initdb runs in. Locale support is described in Section 24.1. > .. > --lc-messages=locale > Like --locale, but only sets the locale in the specified category. Here's a somewhat amusing case: $ echo LANG ja_JP.UTF8 $ initdb --lc_messages=C $ grep lc_ hoge/postgresql.conf #lc_messages = 'C' # locale for system error message lc_monetary = 'ja_JP.UTF8' # locale for monetary formatting lc_numeric = 'ja_JP.UTF8' # locale for number formatting lc_time = 'ja_JP.UTF8' # locale for time formatting Hmm. it seems that initdb replaces the values of all categories *except the specified one*. This behavior seems incorrect to me. initdb should replace the value when explicitly specified in the command line. If you use -c lc_messages=C, it does perform the expected behavior to some extent, but I believe this is a separate matter. I have doubts about not replacing these lines for purely cosmetic reasons. In this mail, I've attached three possible solutions for the original issue: the first one enforces replacement only when specified on the command line, the second one simply always performs replacement, and the last one addresses the concern about the absence of quotes around "C" by allowing explicit specification. (FWIW, I prefer the last one.) What do you think about these? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Attachment
Re: initdb --no-locale=C doesn't work as specified when the environment is not C
From
Tom Lane
Date:
Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes: > Commit 3e51b278db leaves lc_* conf lines as commented-out when > their value is "C". This leads to the following behavior. Hmm ... I see a contributing factor here: this bit in postgresql.conf.sample is a lie: #lc_messages = 'C' # locale for system error message # strings A look in guc_tables.c shows that the actual default is '' (empty string), which means "use the environment", and that matches how the variable is documented in config.sgml. Somebody --- quite possibly me --- was misled by the contents of postgresql.conf.sample into thinking that the lc_xxx GUCs all default to C, when that's only true for the others. I think that a more correct fix for this would treat lc_messages differently from the other lc_xxx GUCs. Maybe just eliminate the hack about not substituting "C" for that one? In any case, we need to fix this mistake in postgresql.conf.sample. regards, tom lane
Re: initdb --no-locale=C doesn't work as specified when the environment is not C
From
Kyotaro Horiguchi
Date:
At Wed, 22 Nov 2023 11:04:01 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in > Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes: > > Commit 3e51b278db leaves lc_* conf lines as commented-out when > > their value is "C". This leads to the following behavior. > > Hmm ... I see a contributing factor here: this bit in > postgresql.conf.sample is a lie: > > #lc_messages = 'C' # locale for system error message > # strings > > A look in guc_tables.c shows that the actual default is '' (empty > string), which means "use the environment", and that matches how the > variable is documented in config.sgml. Somebody --- quite possibly me > --- was misled by the contents of postgresql.conf.sample into thinking > that the lc_xxx GUCs all default to C, when that's only true for the > others. It seems somewhat intentional that only lc_messages references the environment at boot time. On the other hand, previously, in the absence of a specified locale, initdb would embed the environmental value in the configuration file, as it seems to be documented. Given that initdb is always used for cluster creation, it's unlikey that systems depend on this boot-time default for their operation. > I think that a more correct fix for this would treat lc_messages > differently from the other lc_xxx GUCs. Maybe just eliminate the > hack about not substituting "C" for that one? For example, the --no-locale option for initdb is supposed to set all categories to 'C'. That approach would lead to the postgres referencing the runtime environment for all categories except lc_messages, which I believe contradicts the documentation. In my biew, if lc_messages is exempted from that hack, then all other categories should be similarly excluded as in the second approach among the attached in the previous mail. > In any case, we need to fix this mistake in postgresql.conf.sample. If you are not particularly concerned about the presence of quotation marks, I think it would be fine to go with the second approach and make the necessary modification to the configuration file accordingly. With the attached patch, initdb --no-locale generates the following lines in the configuration file. > lc_messages = C # locale for system error message > # strings > lc_monetary = C # locale for monetary formatting > lc_numeric = C # locale for number formatting > lc_time = C # locale for time formatting By the way, the lines around lc_* in the sample file seem to have somewhat inconsistent indentations. Wouldnt' it be preferable to fix this? (The attached doesn't that.) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Attachment
Re: initdb --no-locale=C doesn't work as specified when the environment is not C
From
Tom Lane
Date:
Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes: > It seems somewhat intentional that only lc_messages references the > environment at boot time. On the other hand, previously, in the > absence of a specified locale, initdb would embed the environmental > value in the configuration file, as it seems to be documented. Given > that initdb is always used for cluster creation, it's unlikey that > systems depend on this boot-time default for their operation. Yeah, after further reflection there doesn't seem to be a lot of value in leaving these entries commented-out, even in the cases where that's technically correct. Let's just go back to the old behavior of always uncommenting them; that stood for years without complaints. So I committed your latest patch as-is. > By the way, the lines around lc_* in the sample file seem to have > somewhat inconsistent indentations. Wouldnt' it be preferable to fix > this? (The attached doesn't that.) They look all right if you assume the tab width is 8, which seems to be what is used elsewhere in the file. I think there's been some prior discussion about whether to ban use of tabs at all in these sample files, so as to reduce confusion about how wide the tabs are. But I'm not touching that question today. regards, tom lane
Re: initdb --no-locale=C doesn't work as specified when the environment is not C
From
Kyotaro Horiguchi
Date:
At Wed, 10 Jan 2024 18:16:03 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in > Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes: > > It seems somewhat intentional that only lc_messages references the > > environment at boot time. On the other hand, previously, in the > > absence of a specified locale, initdb would embed the environmental > > value in the configuration file, as it seems to be documented. Given > > that initdb is always used for cluster creation, it's unlikey that > > systems depend on this boot-time default for their operation. > > Yeah, after further reflection there doesn't seem to be a lot of value > in leaving these entries commented-out, even in the cases where that's > technically correct. Let's just go back to the old behavior of always > uncommenting them; that stood for years without complaints. So I > committed your latest patch as-is. I'm glad you understand. Thank you for commiting. > > By the way, the lines around lc_* in the sample file seem to have > > somewhat inconsistent indentations. Wouldnt' it be preferable to fix > > this? (The attached doesn't that.) > > They look all right if you assume the tab width is 8, which seems to > be what is used elsewhere in the file. I think there's been some > prior discussion about whether to ban use of tabs at all in these > sample files, so as to reduce confusion about how wide the tabs are. > But I'm not touching that question today. Ah, I see, I understood. regards. -- Kyotaro Horiguchi NTT Open Source Software Center