Thread: localization problem (and solution)
Here is a test case for a previously reported bug (see http://archives.postgresql.org/pgsql-general/2005-11/msg01235.php): initdb using es_MX.ISO-8859-1, start postgres using es_MX.UTF-8 and execute: create procedural language plperl; create or replace function foo() returns int as 'return 1' language 'plperl'; create table persona (nombre text check (nombre ~ '^[[:upper:]][[:lower:]]*([-''. [:alpha:]]+)?$'::text)); copy persona (nombre) from stdin; José \. It will error out saying: ERROR: new row for relation "persona" violates check constraint "persona_nombre_check" CONTEXT: COPY persona, line 1: "José" Commenting the creation of the plperl function (or moving it after the copy command) this script runs without errors. Also applying this patch solves the problem: *** src/backend/access/transam/xlog.c~ 2005-11-22 12:23:05.000000000 -0600 --- src/backend/access/transam/xlog.c 2005-12-19 20:34:22.000000000 -0600 *************** *** 3626,3631 **** --- 3626,3632 ---- " which is not recognized by setlocale().", ControlFile->lc_collate), errhint("It looks like you need to initdb or install locale support."))); + setenv("LC_COLLATE", ControlFile->lc_collate, 1); if (setlocale(LC_CTYPE, ControlFile->lc_ctype) == NULL) ereport(FATAL, (errmsg("database files are incompatible with operating system"), *************** *** 3633,3638 **** --- 3634,3640 ---- " which is not recognized by setlocale().", ControlFile->lc_ctype), errhint("It looks like you need to initdb or install locale support."))); + setenv("LC_CTYPE", ControlFile->lc_ctype, 1); /* Make the fixed locale settings visible as GUC variables,too */ SetConfigOption("lc_collate", ControlFile->lc_collate, Some fprintf's around the regex code shows that someone is changing the localization parameters by those found in the enviroment, at least for the LC_CTYPE and LC_COLLATE categories, and plperl seems to be the culprit. Needless to say that this bug might lead to index corruption beside other problems. It also explains some very wired (and very difficult to reproduce) anomalies I have seen. Regards, Manuel.
Manuel Sugawara <masm@fciencias.unam.mx> writes: > Some fprintf's around the regex code shows that someone is changing > the localization parameters by those found in the enviroment, at least > for the LC_CTYPE and LC_COLLATE categories, and plperl seems to be the > culprit. Indeed. Please file a bug with the Perl people asking what right libperl has to fool with the localization environment of its host application. (Your proposed fix seems entirely useless ... maybe we could fix it by resetting the LC_FOO variables after every call to libperl, but I bet that would break libperl instead.) regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > (Your proposed fix seems entirely useless ... While there are reasons to argue that's Perl fault, IMO, an environment that reflects the current state of the host program is a good compromise, and behave environment-consistent is also a good compromise for libperl (I think some applications of libperl will get really upset if this compromise is broken by the library.) Regards, Manuel.
Manuel Sugawara <masm@fciencias.unam.mx> writes: > While there are reasons to argue that's Perl fault, IMO, an > environment that reflects the current state of the host program is a > good compromise, and behave environment-consistent is also a good > compromise for libperl (I think some applications of libperl will get > really upset if this compromise is broken by the library.) I looked into this a bit more, and it seems the issue is that libperl will dosetlocale(LC_ALL, ""); the first time any locale-related Perl function is invoked. To defend ourselves against that, we'd have to set more environment variables than just LC_COLLATE and LC_CTYPE. What I'm thinking about is: * during startup, putenv("LC_ALL=C") and unsetenv any other LC_ variables that may be lurking, except LC_MESSAGES. * copy LC_COLLATE and LC_CTYPE into the environment when we get them from pg_control, as Manuel suggested. * in locale_messages_assign(), set the environment variable on all platforms not just Windows. You could still break the backend by doing setlocale explicitly in plperlu functions, but that's why it's an untrusted language ... Comments? regards, tom lane
Tom Lane writes: > I looked into this a bit more, and it seems the issue is that libperl > will do > setlocale(LC_ALL, ""); > the first time any locale-related Perl function is invoked. To defend > ourselves against that, we'd have to set more environment variables than > just LC_COLLATE and LC_CTYPE. > > What I'm thinking about is: > * during startup, putenv("LC_ALL=C") and unsetenv any other LC_ variables > that may be lurking, except LC_MESSAGES. > * copy LC_COLLATE and LC_CTYPE into the environment when we get them > from pg_control, as Manuel suggested. I'm afraid having LC_ALL in the environment at this time would still do the wrong thing on setlocale(LC_ALL, ""); since a LC_ALL environment variable overrides the other categories. Maybe setting LANG instead would be a better choice? regards, Andreas --
Andreas Seltenreich <andreas+pg@gate450.dyndns.org> writes: > I'm afraid having LC_ALL in the environment at this time would still > do the wrong thing on setlocale(LC_ALL, ""); since a LC_ALL > environment variable overrides the other categories. Doh, of course, I was misremembering the precedence. So we needLANG=CLC_ALL unset (probably LANGUAGE too, for glibc)othersas stated regards, tom lane
Tom Lane said: > Andreas Seltenreich <andreas+pg@gate450.dyndns.org> writes: >> I'm afraid having LC_ALL in the environment at this time would still >> do the wrong thing on setlocale(LC_ALL, ""); since a LC_ALL >> environment variable overrides the other categories. > > Doh, of course, I was misremembering the precedence. So we need > LANG=C > LC_ALL unset (probably LANGUAGE too, for glibc) > others as stated > We need to test any solution carefully on Windows, which deals with locales very differently from *nix, and where we still have some known locale issues (see recent discussion). I wonder if the complained of behaviour is triggered by our recent changes to support utf8 in pl/perl? cheers andrew
"Andrew Dunstan" <andrew@dunslane.net> writes: > We need to test any solution carefully on Windows, which deals with locales > very differently from *nix, and where we still have some known locale issues Right, of course. I was thinking that this change might actually bring the Windows and Unix code closer together --- at least for LC_MESSAGES it seems it would do so. If I prepare a patch, do you want to test it on Windows before it goes in, or is it easier just to commit and then test CVS tip? regards, tom lane
Tom Lane said: > "Andrew Dunstan" <andrew@dunslane.net> writes: >> We need to test any solution carefully on Windows, which deals with >> locales very differently from *nix, and where we still have some known >> locale issues > > Right, of course. I was thinking that this change might actually bring > the Windows and Unix code closer together --- at least for LC_MESSAGES > it seems it would do so. > > If I prepare a patch, do you want to test it on Windows before it goes > in, or is it easier just to commit and then test CVS tip? > Can't do anything for cvs tip until the md5 mess is fixed. I don't have much time to spare for testing till at least next week - maybe someone else does. cheers andrew
"Andrew Dunstan" <andrew@dunslane.net> writes: > We need to test any solution carefully on Windows, which deals with locales > very differently from *nix, and where we still have some known locale issues > (see recent discussion). I've committed a proposed change in HEAD --- would you check out the Windows behavior at your convenience? If it seems to work, I'll back-patch, but let's test first. regards, tom lane
Tom Lane wrote: >"Andrew Dunstan" <andrew@dunslane.net> writes: > > >>We need to test any solution carefully on Windows, which deals with locales >>very differently from *nix, and where we still have some known locale issues >>(see recent discussion). >> >> > >I've committed a proposed change in HEAD --- would you check out the >Windows behavior at your convenience? If it seems to work, I'll >back-patch, but let's test first. > > > > Will try. Not quite sure how, though. Any suggestions? cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> I've committed a proposed change in HEAD --- would you check out the >> Windows behavior at your convenience? If it seems to work, I'll >> back-patch, but let's test first. > Will try. Not quite sure how, though. Any suggestions? Well, one thing to try is whether you can reproduce the plperl-induced breakage I posted this morning on Windows; and if so whether the patch fixes it. Also, what were those "known locale issues" you were referring to? regards, tom lane
Tom Lane wrote: >Andrew Dunstan <andrew@dunslane.net> writes: > > >>Tom Lane wrote: >> >> >>>I've committed a proposed change in HEAD --- would you check out the >>>Windows behavior at your convenience? If it seems to work, I'll >>>back-patch, but let's test first. >>> >>> > > > >>Will try. Not quite sure how, though. Any suggestions? >> >> > >Well, one thing to try is whether you can reproduce the plperl-induced >breakage I posted this morning on Windows; and if so whether the patch >fixes it. > > We have a build failure to fix first: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=loris&dt=2005-12-29%2000:44:52 >Also, what were those "known locale issues" you were referring to? > > > > The issue is that if I set my machine's locale to Turkish or French, say, it doesn't matter what locale I set during initdb or in postgresql.conf, the server's log messages always seem to come out in the machine's locale. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > We have a build failure to fix first: > http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=loris&dt=2005-12-29%2000:44:52 Weird. It seems to be choking on linking to check_function_bodies, but plpgsql does that exactly the same way, and there's no problem there. I wonder whether all those warnings in the perl header files mean anything ... > The issue is that if I set my machine's locale to Turkish or French, > say, it doesn't matter what locale I set during initdb or in > postgresql.conf, the server's log messages always seem to come out in > the machine's locale. Is this possibly related to the fact that we don't even try to do setlocale() for LC_MESSAGES? regards, tom lane
Tom Lane said: > Andrew Dunstan <andrew@dunslane.net> writes: >> We have a build failure to fix first: >> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=loris&dt=2005-12-29%2000:44:52> > Weird. It seems to be choking on linking to check_function_bodies, but > plpgsql does that exactly the same way, and there's no problem there. > I wonder whether all those warnings in the perl header files mean > anything ... We always get those - see http://www.pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=loris&dt=2005-12-23%2019%3A56%3A12&stg=makefor example. One day whenI get time I want to clean them up. > >> The issue is that if I set my machine's locale to Turkish or French, >> say, it doesn't matter what locale I set during initdb or in >> postgresql.conf, the server's log messages always seem to come out in >> the machine's locale. > > Is this possibly related to the fact that we don't even try to do > setlocale() for LC_MESSAGES We can't on Windows - it doesn't define LC_MESSAGES. But libintl does some stuff, I believe. cheers andrew
> The issue is that if I set my machine's locale to Turkish or > French, say, it doesn't matter what locale I set during > initdb or in postgresql.conf, the server's log messages > always seem to come out in the machine's locale. Does this happen only for those locales? And how specifically do you set the locale? I just installed to verify, and my server goes up in english no problem, even though my locale is set to swedish. The client tools (psql, for example) come up in swedish, so it's definitly swedish locale. And by donig "set LANG=en" before I start psql, it comes up in english just fine. //Magnus
> > The issue is that if I set my machine's locale to Turkish > or French, > > say, it doesn't matter what locale I set during initdb or in > > postgresql.conf, the server's log messages always seem to > come out in > > the machine's locale. > > Does this happen only for those locales? And how specifically > do you set the locale? > > I just installed to verify, and my server goes up in english > no problem, even though my locale is set to swedish. The > client tools (psql, for > example) come up in swedish, so it's definitly swedish > locale. And by donig "set LANG=en" before I start psql, it > comes up in english just fine. I should probably say this is 8.1.1, not cvs head, but I don't recall any changes around this. //Magnus
Tom Lane wrote: >Andrew Dunstan <andrew@dunslane.net> writes: > > >>We have a build failure to fix first: >>http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=loris&dt=2005-12-29%2000:44:52 >> >> > >Weird. It seems to be choking on linking to check_function_bodies, >but plpgsql does that exactly the same way, and there's no problem >there. I wonder whether all those warnings in the perl header files >mean anything ... > > > I have committed a fix - the perl headers were mangling DLLIMPORT so I moved the declaration above the perl includes. I would also like to add -Wno-comment to the CFLAGS for win32/gcc, top suppress at least some of those warnings. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > I would also like to add -Wno-comment to the CFLAGS for win32/gcc, top > suppress at least some of those warnings. Why don't you complain to the Perl people, instead? The fact that no such warnings occur on Unix Perl installations makes these seem pretty suspicious. regards, tom lane
Andrew Dunstan <andrew@dunslane.net> writes: > I have committed a fix - the perl headers were mangling DLLIMPORT so I > moved the declaration above the perl includes. BTW, probably a cleaner answer is to put check_function_bodies into some header file instead of having an "extern" in the PLs' .c files. I was thinking about that yesterday, but couldn't decide where was a good place to put it. regards, tom lane
Tom Lane said: > Andrew Dunstan <andrew@dunslane.net> writes: >> I would also like to add -Wno-comment to the CFLAGS for win32/gcc, top >> suppress at least some of those warnings. > > Why don't you complain to the Perl people, instead? The fact that no > such warnings occur on Unix Perl installations makes these seem pretty > suspicious. > Well, it's probably not even the Perl people - perl's config_h.SH seems to do the right thing and put a space between the second / and *, so that the compiler won't complain, so it could be ActiveState's doing. Maybe I'll just make a tiny script to fix config.h in my perl distro. There is a more serious problem, though, in these warnings. Perl is apparently trying to hijack the *printf functions, just as libintl tries to do. There's a #define we can set to inhibit that, and I think we should. That would leave 2 lots of warnings to fix - one about uid_t/gid_t and one about isnan. cheers andrew
Tom Lane said: > Andrew Dunstan <andrew@dunslane.net> writes: >> I have committed a fix - the perl headers were mangling DLLIMPORT so I >> moved the declaration above the perl includes. > > BTW, probably a cleaner answer is to put check_function_bodies into > some header file instead of having an "extern" in the PLs' .c files. I > was thinking about that yesterday, but couldn't decide where was a good > place to put it. > miscadmin.h ? cheers andrew
"Andrew Dunstan" <andrew@dunslane.net> writes: > Tom Lane said: >> BTW, probably a cleaner answer is to put check_function_bodies into >> some header file instead of having an "extern" in the PLs' .c files. I >> was thinking about that yesterday, but couldn't decide where was a good >> place to put it. > miscadmin.h ? Ugh :-( I was thinking about pg_proc.h, because the variable itself is in pg_proc.c, but that seems pretty ugly too. Another possibility is to move the variable someplace else... regards, tom lane
Tom Lane said: > "Andrew Dunstan" <andrew@dunslane.net> writes: >> Tom Lane said: >>> BTW, probably a cleaner answer is to put check_function_bodies into >>> some header file instead of having an "extern" in the PLs' .c files. >>> I was thinking about that yesterday, but couldn't decide where was a >>> good place to put it. > >> miscadmin.h ? > > Ugh :-( I was thinking about pg_proc.h, because the variable itself is > in pg_proc.c, but that seems pretty ugly too. Another possibility is > to move the variable someplace else... I trust whatever choice you make. cheers andrew