Thread: Day and month name localization uses wrong locale category
In 8.2, utils/adt/formatting.c uses our NLS mechanism to localize day and month names (I assume for use by to_char). But since this necessarily ties the outcome to the LC_MESSAGES setting, this comes out inconsistently with Unix locale behavior, e.g., pei@bell:~$ locale LANG= LC_CTYPE="POSIX" LC_NUMERIC="POSIX" LC_TIME="POSIX" LC_COLLATE="POSIX" LC_MONETARY="POSIX" LC_MESSAGES="POSIX" LC_PAPER="POSIX" LC_NAME="POSIX" LC_ADDRESS="POSIX" LC_TELEPHONE="POSIX" LC_MEASUREMENT="POSIX" LC_IDENTIFICATION="POSIX" LC_ALL= pei@bell:~$ date +%A Friday pei@bell:~$ LC_MESSAGES=de_DE@euro date +%A Friday pei@bell:~$ LC_TIME=de_DE@euro date +%A Freitag Is there no API to get the localized names from the C library so that LC_TIME takes effect? -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut wrote: > pei@bell:~$ date +%A > Friday > > pei@bell:~$ LC_MESSAGES=de_DE@euro date +%A > Friday > > pei@bell:~$ LC_TIME=de_DE@euro date +%A > Freitag > > Is there no API to get the localized names from the C library so that LC_TIME > takes effect? > What about using strftime()? So we couldn't worry about gettext translations; "all" is localized. Why didn't I think it before? :-) I'll try to code a patch today later if noone objects. PS> I was thinking about changing this for the same reasons Peter pointed out; I didn't have the time to do it before. -- Euler Taveira de Oliveira http://www.timbira.com/
Euler Taveira de Oliveira wrote: > What about using strftime()? So we couldn't worry about gettext > translations; "all" is localized. > Why didn't I think it before? :-) > > I'll try to code a patch today later if noone objects. How is this going? -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut wrote: > > What about using strftime()? So we couldn't worry about gettext > > translations; "all" is localized. > > Why didn't I think it before? :-) > > > > I'll try to code a patch today later if noone objects. > > How is this going? > Finished. Sorry for the delay I had some trouble understanding how backend treats the locale stuff (Neil pointed out the path). Now TM mode is returning strftime() output. It would be nice if in the future we change this to pg_strftime() but unfortunately the last one is not i18n. :( template1=# show lc_time; lc_time --------- pt_BR (1 registro) template1=# select to_char(now(), 'TMDay, DD TMMonth YYYY'); to_char --------------------------- Segunda, 20 Novembro 2006 (1 registro) template1=# set lc_time to 'C'; SET template1=# select to_char(now(), 'TMDay, DD TMMonth YYYY'); to_char -------------------------- Monday, 20 November 2006 (1 registro) template1=# set lc_time to 'de_DE'; SET template1=# select to_char(now(), 'TMDay, DD TMMonth YYYY'); to_char -------------------------- Montag, 20 November 2006 (1 registro) template1=# Comments? -- Euler Taveira de Oliveira http://www.timbira.com/
Attachment
Euler Taveira de Oliveira <euler@timbira.com> writes: > + /* > + * Return the LC_TIME information > + */ > + char * > + pg_get_lc_time(void) > + { > + return locale_time; > + } locale_time is a global GUC variable, so there is surely no point in the above function. I have not looked at the rest of the patch. regards, tom lane
Tom Lane wrote: > > + /* > > + * Return the LC_TIME information > > + */ > > + char * > > + pg_get_lc_time(void) > > + { > > + return locale_time; > > + } > > locale_time is a global GUC variable, so there is surely no point in the > above function. I have not looked at the rest of the patch. > I know that. If I didn't use it how could i know what is the current LC_TIME setting? The LC_TIME in backend is always C so I need to change it to xx_XX briefly, do the job (strftime) and then get it back to C. Am I wrong? That's what I do. -- Euler Taveira de Oliveira http://www.timbira.com/
Euler Taveira de Oliveira <euler@timbira.com> writes: > Tom Lane wrote: >> locale_time is a global GUC variable, so there is surely no point in the >> above function. I have not looked at the rest of the patch. >> > I know that. If I didn't use it how could i know what is the current > LC_TIME setting? You just look at the variable directly. While there's sometimes value in an encapsulation function, I fail to see any here. regards, tom lane
Tom Lane wrote: > You just look at the variable directly. While there's sometimes value > in an encapsulation function, I fail to see any here. > Oh, my :-) That's the consequence to not sleep at least a little at night. The attached patch, corrects what was pointed out by Tom (thanks). PS> going to bed right now :-) -- Euler Taveira de Oliveira http://www.timbira.com/
Attachment
Is this for 8.2? --------------------------------------------------------------------------- Euler Taveira de Oliveira wrote: > Tom Lane wrote: > > > You just look at the variable directly. While there's sometimes value > > in an encapsulation function, I fail to see any here. > > > Oh, my :-) That's the consequence to not sleep at least a little at > night. > The attached patch, corrects what was pointed out by Tom (thanks). > > PS> going to bed right now :-) > > -- > Euler Taveira de Oliveira > http://www.timbira.com/ [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Is this for 8.2? > This patch "fixes" (reimplements) a feature that was written for 8.2. So I think it's a must-fix. That patch is not so huge or invasive. Comments? -- Euler Taveira de Oliveira http://www.timbira.com/
Euler Taveira de Oliveira wrote: > Bruce Momjian wrote: > > > Is this for 8.2? > > > This patch "fixes" (reimplements) a feature that was written for 8.2. So > I think it's a must-fix. That patch is not so huge or invasive. > Comments? Agreed, patch applied: Fix to_char() locale handling to honor LC_TIME, not LC_MESSAGES. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Euler Taveira de Oliveira <euler@timbira.com> writes: > Bruce Momjian wrote: >> Is this for 8.2? >> > This patch "fixes" (reimplements) a feature that was written for 8.2. So > I think it's a must-fix. That patch is not so huge or invasive. > Comments? Exactly how bad could the consequences get if someone sets LC_TIME to a value not encoding-compatible with the database encoding? One of the reasons LC_MESSAGES is superuser-only is that you can PANIC the backend by choosing an incompatible value --- will that happen now for LC_TIME too? I think it might be OK, because the reason for the PANIC in the bogus message case is that the encoding-violation error happens recursively inside error processing, and that shouldn't need to happen here. But one thing we'll need to be damn sure of is that control can't get into elog.c while we've got LC_TIME set to a non-C value, else the same recursion scenario could occur due to log_line_prefix expansion. regards, tom lane
Am Dienstag, 21. November 2006 00:52 schrieb Euler Taveira de Oliveira: > Finished. Sorry for the delay I had some trouble understanding how > backend treats the locale stuff (Neil pointed out the path). > Now TM mode is returning strftime() output. It would be nice if in the > future we change this to pg_strftime() but unfortunately the last one is > not i18n. :( What's concerning me about the way this is written is that it calls setlocale() for each formatting instance, which will be very slow. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut <peter_e@gmx.net> writes: > What's concerning me about the way this is written is that it calls > setlocale() for each formatting instance, which will be very slow. Perhaps, the first time the info is needed, do setlocale(), ask strftime for the 12+7 strings we need and save them away, then revert to C locale and proceed from there. regards, tom lane
Peter Eisentraut wrote: > Am Dienstag, 21. November 2006 00:52 schrieb Euler Taveira de Oliveira: > > Finished. Sorry for the delay I had some trouble understanding how > > backend treats the locale stuff (Neil pointed out the path). > > Now TM mode is returning strftime() output. It would be nice if in the > > future we change this to pg_strftime() but unfortunately the last one is > > not i18n. :( > > What's concerning me about the way this is written is that it calls > setlocale() for each formatting instance, which will be very slow. Should we have it set from a guc hook on lc_time? -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
It is too close to the RC1 release to apply this patch. I have added documentation that "TM"'s locale is controlled by "lc_messages". This has been saved for the 8.3 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --------------------------------------------------------------------------- Euler Taveira de Oliveira wrote: > Peter Eisentraut wrote: > > > > What about using strftime()? So we couldn't worry about gettext > > > translations; "all" is localized. > > > Why didn't I think it before? :-) > > > > > > I'll try to code a patch today later if noone objects. > > > > How is this going? > > > Finished. Sorry for the delay I had some trouble understanding how > backend treats the locale stuff (Neil pointed out the path). > Now TM mode is returning strftime() output. It would be nice if in the > future we change this to pg_strftime() but unfortunately the last one is > not i18n. :( > > template1=# show lc_time; > lc_time > --------- > pt_BR > (1 registro) > > template1=# select to_char(now(), 'TMDay, DD TMMonth YYYY'); > to_char > --------------------------- > Segunda, 20 Novembro 2006 > (1 registro) > > template1=# set lc_time to 'C'; > SET > template1=# select to_char(now(), 'TMDay, DD TMMonth YYYY'); > to_char > -------------------------- > Monday, 20 November 2006 > (1 registro) > > template1=# set lc_time to 'de_DE'; > SET > template1=# select to_char(now(), 'TMDay, DD TMMonth YYYY'); > to_char > -------------------------- > Montag, 20 November 2006 > (1 registro) > > template1=# > > > Comments? > > -- > Euler Taveira de Oliveira > http://www.timbira.com/ [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: doc/src/sgml/func.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/func.sgml,v retrieving revision 1.346 diff -c -c -r1.346 func.sgml *** doc/src/sgml/func.sgml 20 Nov 2006 20:20:18 -0000 1.346 --- doc/src/sgml/func.sgml 24 Nov 2006 23:18:59 -0000 *************** *** 4689,4695 **** </row> <row> <entry><literal>TM</literal> prefix</entry> ! <entry>translation mode (print localized day and month names)</entry> <entry><literal>TMMonth</literal></entry> </row> <row> --- 4689,4695 ---- </row> <row> <entry><literal>TM</literal> prefix</entry> ! <entry>translation mode (print localized day and month names based on <varname>lc_messages</>)</entry> <entry><literal>TMMonth</literal></entry> </row> <row> Index: doc/src/sgml/func.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/func.sgml,v retrieving revision 1.346 diff -c -c -r1.346 func.sgml *** doc/src/sgml/func.sgml 20 Nov 2006 20:20:18 -0000 1.346 --- doc/src/sgml/func.sgml 24 Nov 2006 23:18:59 -0000 *************** *** 4689,4695 **** </row> <row> <entry><literal>TM</literal> prefix</entry> ! <entry>translation mode (print localized day and month names)</entry> <entry><literal>TMMonth</literal></entry> </row> <row> --- 4689,4695 ---- </row> <row> <entry><literal>TM</literal> prefix</entry> ! <entry>translation mode (print localized day and month names based on <varname>lc_messages</>)</entry> <entry><literal>TMMonth</literal></entry> </row> <row>
I now remember a new problem with this feature, irregardless of whether we use 'lc_messages' or 'lc_time'. The problem is having a function's output affected by a GUC variable. If you create an expression index using the function, and later query the index with a different GUC value, or you do inserts with different GUC values, the index will not work. I know we have had this problem in the past, but I can't remember if or how we addressed it. --------------------------------------------------------------------------- Euler Taveira de Oliveira wrote: > Peter Eisentraut wrote: > > > > What about using strftime()? So we couldn't worry about gettext > > > translations; "all" is localized. > > > Why didn't I think it before? :-) > > > > > > I'll try to code a patch today later if noone objects. > > > > How is this going? > > > Finished. Sorry for the delay I had some trouble understanding how > backend treats the locale stuff (Neil pointed out the path). > Now TM mode is returning strftime() output. It would be nice if in the > future we change this to pg_strftime() but unfortunately the last one is > not i18n. :( > > template1=# show lc_time; > lc_time > --------- > pt_BR > (1 registro) > > template1=# select to_char(now(), 'TMDay, DD TMMonth YYYY'); > to_char > --------------------------- > Segunda, 20 Novembro 2006 > (1 registro) > > template1=# set lc_time to 'C'; > SET > template1=# select to_char(now(), 'TMDay, DD TMMonth YYYY'); > to_char > -------------------------- > Monday, 20 November 2006 > (1 registro) > > template1=# set lc_time to 'de_DE'; > SET > template1=# select to_char(now(), 'TMDay, DD TMMonth YYYY'); > to_char > -------------------------- > Montag, 20 November 2006 > (1 registro) > > template1=# > > > Comments? > > -- > Euler Taveira de Oliveira > http://www.timbira.com/ [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > > I now remember a new problem with this feature, irregardless of whether > we use 'lc_messages' or 'lc_time'. > > The problem is having a function's output affected by a GUC variable. > If you create an expression index using the function, and later query > the index with a different GUC value, or you do inserts with different > GUC values, the index will not work. > > I know we have had this problem in the past, but I can't remember if or > how we addressed it. Mark the function as volatile, which precludes from using it in a functional index? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Bruce Momjian wrote: >> The problem is having a function's output affected by a GUC variable. > Mark the function as volatile, which precludes from using it in a > functional index? Stable, not volatile. It looks like we have some of the variants marked stable already for their dependence on TimeZone, but the dependence on lc_messages is new. regards, tom lane
I wrote: > It looks like we have some of the variants marked stable already for > their dependence on TimeZone, but the dependence on lc_messages is new. Actually, now that I look at it, *most* of the variants of to_char, to_number, and friends have been broken on this score since day one. There's been a dependency on LC_NUMERIC for the numeric variants all along, but they're marked immutable :-( regards, tom lane
Tom Lane wrote: > I wrote: > > It looks like we have some of the variants marked stable already for > > their dependence on TimeZone, but the dependence on lc_messages is new. > > Actually, now that I look at it, *most* of the variants of to_char, > to_number, and friends have been broken on this score since day one. > There's been a dependency on LC_NUMERIC for the numeric variants all > along, but they're marked immutable :-( Where are we on this? Should I mark to_char as stable? Seems this would break pre-8.2 to_char expression indexes. Do we need something in the release notes for this? -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Tom Lane wrote: >> Actually, now that I look at it, *most* of the variants of to_char, >> to_number, and friends have been broken on this score since day one. >> There's been a dependency on LC_NUMERIC for the numeric variants all >> along, but they're marked immutable :-( > Where are we on this? In the past our practice has been to go ahead and back-patch such changes, eg http://archives.postgresql.org/pgsql-committers/2003-04/msg00095.php Obviously we can't force initdb in the back branches, but we can and should get the entries right for future installations. I think the only interesting question is whether we should force initdb in HEAD. I'd prefer not to since we're past RC1. This is a pretty minor bug really, and I don't think it justifies forcing a dump/reload cycle on people testing RC1. regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Tom Lane wrote: > >> Actually, now that I look at it, *most* of the variants of to_char, > >> to_number, and friends have been broken on this score since day one. > >> There's been a dependency on LC_NUMERIC for the numeric variants all > >> along, but they're marked immutable :-( > > > Where are we on this? > > In the past our practice has been to go ahead and back-patch such > changes, eg > http://archives.postgresql.org/pgsql-committers/2003-04/msg00095.php > Obviously we can't force initdb in the back branches, but we can and > should get the entries right for future installations. > > I think the only interesting question is whether we should force initdb > in HEAD. I'd prefer not to since we're past RC1. This is a pretty > minor bug really, and I don't think it justifies forcing a dump/reload > cycle on people testing RC1. Agreed. There is no reported failure in the field, and it is a preventative fix. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
I wrote: > I think the only interesting question is whether we should force initdb > in HEAD. I take that back; there's something possibly worth discussing in the markings themselves. What I find after experimentation is: * These functions pay attention to LC_NUMERIC, and have since at least 7.3, and therefore should be marked STABLE in all branches: to_char(numeric,text) | i | numeric_to_char WRONGto_char(integer,text) | i | int4_to_char WRONGto_char(bigint,text) | i | int8_to_char WRONGto_char(real,text) | i | float4_to_char WRONGto_char(double precision,text) | i | float8_to_char WRONGto_number(text,text) | i | numeric_to_number WRONG * These functions are already correctly marked STABLE, because they have depended on TimeZone all along: to_char(timestamp with time zone,text) | s | timestamptz_to_char OKto_timestamp(text,text) | s |to_timestamp OK * This function is clearly mis-marked as of HEAD, because of its new dependence on LC_MESSAGES (but shouldn't be changed in back branches): to_char(timestamp without time zone,text) | i | timestamp_to_char * These functions appear to still not depend on any GUC variable: to_date(text,text) | i | to_dateto_char(interval,text) | i | interval_to_char It's the last two that are bothering me. It seems likely that somebody will soon fix to_date() to support input as well as output using the localizable format items. Should we mark it stable now, rather than risk missing this again? How about to_char for intervals --- it seems we currently have INVALID_FOR_INTERVAL on all localizable format items, but is that going to be true forevermore? I'm much tempted to mark the last two STABLE as well, and just have a consistent rule that all the formatting functions are stable. Thoughts? regards, tom lane
Would someone update this patch with the optimization below. The patch is at? http://momjian.postgresql.org/cgi-bin/pgpatches_hold --------------------------------------------------------------------------- Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > What's concerning me about the way this is written is that it calls > > setlocale() for each formatting instance, which will be very slow. > > Perhaps, the first time the info is needed, do setlocale(), ask strftime > for the 12+7 strings we need and save them away, then revert to C locale > and proceed from there. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Added to TODO: o Use LC_TIME for localized weekday/month names, rather than LC_MESSAGES http://archives.postgresql.org/pgsql-hackers/2006-11/msg00390.php --------------------------------------------------------------------------- Euler Taveira de Oliveira wrote: > Peter Eisentraut wrote: > > > > What about using strftime()? So we couldn't worry about gettext > > > translations; "all" is localized. > > > Why didn't I think it before? :-) > > > > > > I'll try to code a patch today later if noone objects. > > > > How is this going? > > > Finished. Sorry for the delay I had some trouble understanding how > backend treats the locale stuff (Neil pointed out the path). > Now TM mode is returning strftime() output. It would be nice if in the > future we change this to pg_strftime() but unfortunately the last one is > not i18n. :( > > template1=# show lc_time; > lc_time > --------- > pt_BR > (1 registro) > > template1=# select to_char(now(), 'TMDay, DD TMMonth YYYY'); > to_char > --------------------------- > Segunda, 20 Novembro 2006 > (1 registro) > > template1=# set lc_time to 'C'; > SET > template1=# select to_char(now(), 'TMDay, DD TMMonth YYYY'); > to_char > -------------------------- > Monday, 20 November 2006 > (1 registro) > > template1=# set lc_time to 'de_DE'; > SET > template1=# select to_char(now(), 'TMDay, DD TMMonth YYYY'); > to_char > -------------------------- > Montag, 20 November 2006 > (1 registro) > > template1=# > > > Comments? > > -- > Euler Taveira de Oliveira > http://www.timbira.com/ [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +