Thread: localization problem (and solution)

localization problem (and solution)

From
Manuel Sugawara
Date:
Here is a test case for a previously reported bug (see
http://archives.postgresql.org/pgsql-general/2005-11/msg01235.php):

initdb using es_MX.ISO-8859-1, start postgres using es_MX.UTF-8 and
execute:

create procedural language plperl;
create or replace function foo() returns int as 'return 1' language 'plperl';
create table persona (nombre text check (nombre ~ '^[[:upper:]][[:lower:]]*([-''. [:alpha:]]+)?$'::text));
copy persona (nombre) from stdin;
José
\.

It will error out saying:

ERROR:  new row for relation "persona" violates check constraint "persona_nombre_check"
CONTEXT:  COPY persona, line 1: "José"

Commenting the creation of the plperl function (or moving it after the copy
command) this script runs without errors. Also applying this patch solves
the problem:

*** src/backend/access/transam/xlog.c~    2005-11-22 12:23:05.000000000 -0600
--- src/backend/access/transam/xlog.c    2005-12-19 20:34:22.000000000 -0600
***************
*** 3626,3631 ****
--- 3626,3632 ----                        " which is not recognized by setlocale().",
ControlFile->lc_collate),             errhint("It looks like you need to initdb or install locale support."))); 
+         setenv("LC_COLLATE", ControlFile->lc_collate, 1);     if (setlocale(LC_CTYPE, ControlFile->lc_ctype) == NULL)
       ereport(FATAL,             (errmsg("database files are incompatible with operating system"), 
***************
*** 3633,3638 ****
--- 3634,3640 ----                   " which is not recognized by setlocale().",
ControlFile->lc_ctype),             errhint("It looks like you need to initdb or install locale support."))); 
+         setenv("LC_CTYPE", ControlFile->lc_ctype, 1);      /* Make the fixed locale settings visible as GUC
variables,too */     SetConfigOption("lc_collate", ControlFile->lc_collate, 


Some fprintf's around the regex code shows that someone is changing
the localization parameters by those found in the enviroment, at least
for the LC_CTYPE and LC_COLLATE categories, and plperl seems to be the
culprit. Needless to say that this bug might lead to index corruption
beside other problems. It also explains some very wired (and very
difficult to reproduce) anomalies I have seen.

Regards,
Manuel.



Re: localization problem (and solution)

From
Tom Lane
Date:
Manuel Sugawara <masm@fciencias.unam.mx> writes:
> Some fprintf's around the regex code shows that someone is changing
> the localization parameters by those found in the enviroment, at least
> for the LC_CTYPE and LC_COLLATE categories, and plperl seems to be the
> culprit.

Indeed.  Please file a bug with the Perl people asking what right
libperl has to fool with the localization environment of its host
application.

(Your proposed fix seems entirely useless ... maybe we could fix it
by resetting the LC_FOO variables after every call to libperl, but
I bet that would break libperl instead.)
        regards, tom lane


Re: localization problem (and solution)

From
Manuel Sugawara
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> (Your proposed fix seems entirely useless ... 

While there are reasons to argue that's Perl fault, IMO, an
environment that reflects the current state of the host program is a
good compromise, and behave environment-consistent is also a good
compromise for libperl (I think some applications of libperl will get
really upset if this compromise is broken by the library.)

Regards,
Manuel.


Re: localization problem (and solution)

From
Tom Lane
Date:
Manuel Sugawara <masm@fciencias.unam.mx> writes:
> While there are reasons to argue that's Perl fault, IMO, an
> environment that reflects the current state of the host program is a
> good compromise, and behave environment-consistent is also a good
> compromise for libperl (I think some applications of libperl will get
> really upset if this compromise is broken by the library.)

I looked into this a bit more, and it seems the issue is that libperl
will dosetlocale(LC_ALL, "");
the first time any locale-related Perl function is invoked.  To defend
ourselves against that, we'd have to set more environment variables than
just LC_COLLATE and LC_CTYPE.

What I'm thinking about is:
* during startup, putenv("LC_ALL=C") and unsetenv any other LC_ variables that may be lurking, except LC_MESSAGES.
* copy LC_COLLATE and LC_CTYPE into the environment when we get them from pg_control, as Manuel suggested.
* in locale_messages_assign(), set the environment variable on all platforms not just Windows.

You could still break the backend by doing setlocale explicitly in
plperlu functions, but that's why it's an untrusted language ...

Comments?
        regards, tom lane


Re: localization problem (and solution)

From
Andreas Seltenreich
Date:
Tom Lane writes:

> I looked into this a bit more, and it seems the issue is that libperl
> will do
>     setlocale(LC_ALL, "");
> the first time any locale-related Perl function is invoked.  To defend
> ourselves against that, we'd have to set more environment variables than
> just LC_COLLATE and LC_CTYPE.
>
> What I'm thinking about is:
> * during startup, putenv("LC_ALL=C") and unsetenv any other LC_ variables
>   that may be lurking, except LC_MESSAGES.
> * copy LC_COLLATE and LC_CTYPE into the environment when we get them
>   from pg_control, as Manuel suggested.

I'm afraid having LC_ALL in the environment at this time would still
do the wrong thing on setlocale(LC_ALL, ""); since a LC_ALL
environment variable overrides the other categories. Maybe setting
LANG instead would be a better choice?

regards,
Andreas
-- 


Re: localization problem (and solution)

From
Tom Lane
Date:
Andreas Seltenreich <andreas+pg@gate450.dyndns.org> writes:
> I'm afraid having LC_ALL in the environment at this time would still
> do the wrong thing on setlocale(LC_ALL, ""); since a LC_ALL
> environment variable overrides the other categories.

Doh, of course, I was misremembering the precedence.  So we needLANG=CLC_ALL unset (probably LANGUAGE too, for
glibc)othersas stated
 
        regards, tom lane


Re: localization problem (and solution)

From
"Andrew Dunstan"
Date:
Tom Lane said:
> Andreas Seltenreich <andreas+pg@gate450.dyndns.org> writes:
>> I'm afraid having LC_ALL in the environment at this time would still
>> do the wrong thing on setlocale(LC_ALL, ""); since a LC_ALL
>> environment variable overrides the other categories.
>
> Doh, of course, I was misremembering the precedence.  So we need
>     LANG=C
>     LC_ALL unset (probably LANGUAGE too, for glibc)
>     others as stated
>


We need to test any solution carefully on Windows, which deals with locales
very differently from *nix, and where we still have some known locale issues
(see recent discussion).

I wonder if the complained of behaviour is triggered by our recent changes
to support utf8 in pl/perl?

cheers

andrew




Re: localization problem (and solution)

From
Tom Lane
Date:
"Andrew Dunstan" <andrew@dunslane.net> writes:
> We need to test any solution carefully on Windows, which deals with locales
> very differently from *nix, and where we still have some known locale issues

Right, of course.  I was thinking that this change might actually bring
the Windows and Unix code closer together --- at least for LC_MESSAGES
it seems it would do so.

If I prepare a patch, do you want to test it on Windows before it goes
in, or is it easier just to commit and then test CVS tip?
        regards, tom lane


Re: localization problem (and solution)

From
"Andrew Dunstan"
Date:
Tom Lane said:
> "Andrew Dunstan" <andrew@dunslane.net> writes:
>> We need to test any solution carefully on Windows, which deals with
>> locales very differently from *nix, and where we still have some known
>> locale issues
>
> Right, of course.  I was thinking that this change might actually bring
> the Windows and Unix code closer together --- at least for LC_MESSAGES
> it seems it would do so.
>
> If I prepare a patch, do you want to test it on Windows before it goes
> in, or is it easier just to commit and then test CVS tip?
>


Can't do anything for cvs tip until the md5 mess is fixed.

I don't have much time to spare for testing till at least next week - maybe
someone else does.

cheers

andrew





Re: localization problem (and solution)

From
Tom Lane
Date:
"Andrew Dunstan" <andrew@dunslane.net> writes:
> We need to test any solution carefully on Windows, which deals with locales
> very differently from *nix, and where we still have some known locale issues
> (see recent discussion).

I've committed a proposed change in HEAD --- would you check out the
Windows behavior at your convenience?  If it seems to work, I'll
back-patch, but let's test first.
        regards, tom lane


Re: localization problem (and solution)

From
Andrew Dunstan
Date:

Tom Lane wrote:

>"Andrew Dunstan" <andrew@dunslane.net> writes:
>  
>
>>We need to test any solution carefully on Windows, which deals with locales
>>very differently from *nix, and where we still have some known locale issues
>>(see recent discussion).
>>    
>>
>
>I've committed a proposed change in HEAD --- would you check out the
>Windows behavior at your convenience?  If it seems to work, I'll
>back-patch, but let's test first.
>
>
>  
>

Will try. Not quite sure how, though. Any suggestions?

cheers

andrew


Re: localization problem (and solution)

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> I've committed a proposed change in HEAD --- would you check out the
>> Windows behavior at your convenience?  If it seems to work, I'll
>> back-patch, but let's test first.

> Will try. Not quite sure how, though. Any suggestions?

Well, one thing to try is whether you can reproduce the plperl-induced
breakage I posted this morning on Windows; and if so whether the patch
fixes it.

Also, what were those "known locale issues" you were referring to?
        regards, tom lane


Re: localization problem (and solution)

From
Andrew Dunstan
Date:

Tom Lane wrote:

>Andrew Dunstan <andrew@dunslane.net> writes:
>  
>
>>Tom Lane wrote:
>>    
>>
>>>I've committed a proposed change in HEAD --- would you check out the
>>>Windows behavior at your convenience?  If it seems to work, I'll
>>>back-patch, but let's test first.
>>>      
>>>
>
>  
>
>>Will try. Not quite sure how, though. Any suggestions?
>>    
>>
>
>Well, one thing to try is whether you can reproduce the plperl-induced
>breakage I posted this morning on Windows; and if so whether the patch
>fixes it.
>  
>

We have a build failure to fix first: 
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=loris&dt=2005-12-29%2000:44:52

>Also, what were those "known locale issues" you were referring to?
>
>    
>  
>


The issue is that if I set my machine's locale to Turkish or French, 
say, it doesn't matter what locale I set during initdb or in 
postgresql.conf, the server's log messages always seem to come out in 
the machine's locale.

cheers

andrew


Re: localization problem (and solution)

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> We have a build failure to fix first: 
> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=loris&dt=2005-12-29%2000:44:52

Weird.  It seems to be choking on linking to check_function_bodies,
but plpgsql does that exactly the same way, and there's no problem
there.  I wonder whether all those warnings in the perl header files
mean anything ...

> The issue is that if I set my machine's locale to Turkish or French, 
> say, it doesn't matter what locale I set during initdb or in 
> postgresql.conf, the server's log messages always seem to come out in 
> the machine's locale.

Is this possibly related to the fact that we don't even try to do
setlocale() for LC_MESSAGES?
        regards, tom lane


Re: localization problem (and solution)

From
"Andrew Dunstan"
Date:
Tom Lane said:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> We have a build failure to fix first:
>>
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=loris&dt=2005-12-29%2000:44:52>
> Weird.  It seems to be choking on linking to check_function_bodies, but
> plpgsql does that exactly the same way, and there's no problem there.
> I wonder whether all those warnings in the perl header files mean
> anything ...

We always get those - see
http://www.pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=loris&dt=2005-12-23%2019%3A56%3A12&stg=makefor example. One day
whenI get time I want to clean them up.
 

>
>> The issue is that if I set my machine's locale to Turkish or French,
>> say, it doesn't matter what locale I set during initdb or in
>> postgresql.conf, the server's log messages always seem to come out in
>> the machine's locale.
>
> Is this possibly related to the fact that we don't even try to do
> setlocale() for LC_MESSAGES


We can't on Windows - it doesn't define LC_MESSAGES. But libintl does some
stuff, I believe.

cheers

andrew




Re: localization problem (and solution)

From
"Magnus Hagander"
Date:
> The issue is that if I set my machine's locale to Turkish or
> French, say, it doesn't matter what locale I set during
> initdb or in postgresql.conf, the server's log messages
> always seem to come out in the machine's locale.

Does this happen only for those locales? And how specifically do you set
the locale?

I just installed to verify, and my server goes up in english no problem,
even though my locale is set to swedish. The client tools (psql, for
example) come up in swedish, so it's definitly swedish locale. And by
donig "set LANG=en" before I start psql, it comes up in english just
fine.

//Magnus


Re: localization problem (and solution)

From
"Magnus Hagander"
Date:
> > The issue is that if I set my machine's locale to Turkish
> or French,
> > say, it doesn't matter what locale I set during initdb or in
> > postgresql.conf, the server's log messages always seem to
> come out in
> > the machine's locale.
>
> Does this happen only for those locales? And how specifically
> do you set the locale?
>
> I just installed to verify, and my server goes up in english
> no problem, even though my locale is set to swedish. The
> client tools (psql, for
> example) come up in swedish, so it's definitly swedish
> locale. And by donig "set LANG=en" before I start psql, it
> comes up in english just fine.

I should probably say this is 8.1.1, not cvs head, but I don't recall
any changes around this.

//Magnus


Re: localization problem (and solution)

From
Andrew Dunstan
Date:

Tom Lane wrote:

>Andrew Dunstan <andrew@dunslane.net> writes:
>  
>
>>We have a build failure to fix first: 
>>http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=loris&dt=2005-12-29%2000:44:52
>>    
>>
>
>Weird.  It seems to be choking on linking to check_function_bodies,
>but plpgsql does that exactly the same way, and there's no problem
>there.  I wonder whether all those warnings in the perl header files
>mean anything ...
>
>  
>

I have committed a fix - the perl headers were mangling DLLIMPORT so I 
moved the declaration above the perl includes.

I would also like to add -Wno-comment to the CFLAGS for win32/gcc, top 
suppress at least some of those warnings.

cheers

andrew


Re: localization problem (and solution)

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> I would also like to add -Wno-comment to the CFLAGS for win32/gcc, top 
> suppress at least some of those warnings.

Why don't you complain to the Perl people, instead?  The fact that no
such warnings occur on Unix Perl installations makes these seem pretty
suspicious.
        regards, tom lane


Re: localization problem (and solution)

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> I have committed a fix - the perl headers were mangling DLLIMPORT so I 
> moved the declaration above the perl includes.

BTW, probably a cleaner answer is to put check_function_bodies into some
header file instead of having an "extern" in the PLs' .c files.  I was
thinking about that yesterday, but couldn't decide where was a good
place to put it.
        regards, tom lane


Re: localization problem (and solution)

From
"Andrew Dunstan"
Date:
Tom Lane said:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> I would also like to add -Wno-comment to the CFLAGS for win32/gcc, top
>>  suppress at least some of those warnings.
>
> Why don't you complain to the Perl people, instead?  The fact that no
> such warnings occur on Unix Perl installations makes these seem pretty
> suspicious.
>


Well, it's probably not even the Perl people - perl's config_h.SH seems to
do the right thing and put a space between the second / and *, so that the
compiler won't complain, so it could be ActiveState's doing. Maybe I'll just
make a tiny script to fix config.h in my perl distro.


There is a more serious problem, though, in these warnings. Perl is
apparently trying to hijack the *printf functions, just as libintl tries to
do. There's a #define we can set to inhibit that, and I think we should.
That would leave 2 lots of warnings to fix - one about uid_t/gid_t and one
about isnan.

cheers

andrew





Re: localization problem (and solution)

From
"Andrew Dunstan"
Date:
Tom Lane said:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> I have committed a fix - the perl headers were mangling DLLIMPORT so I
>>  moved the declaration above the perl includes.
>
> BTW, probably a cleaner answer is to put check_function_bodies into
> some header file instead of having an "extern" in the PLs' .c files.  I
> was thinking about that yesterday, but couldn't decide where was a good
> place to put it.
>


miscadmin.h ?

cheers

andrew





Re: localization problem (and solution)

From
Tom Lane
Date:
"Andrew Dunstan" <andrew@dunslane.net> writes:
> Tom Lane said:
>> BTW, probably a cleaner answer is to put check_function_bodies into
>> some header file instead of having an "extern" in the PLs' .c files.  I
>> was thinking about that yesterday, but couldn't decide where was a good
>> place to put it.

> miscadmin.h ?

Ugh :-(  I was thinking about pg_proc.h, because the variable itself is
in pg_proc.c, but that seems pretty ugly too.  Another possibility is to
move the variable someplace else...
        regards, tom lane


Re: localization problem (and solution)

From
"Andrew Dunstan"
Date:
Tom Lane said:
> "Andrew Dunstan" <andrew@dunslane.net> writes:
>> Tom Lane said:
>>> BTW, probably a cleaner answer is to put check_function_bodies into
>>> some header file instead of having an "extern" in the PLs' .c files.
>>> I was thinking about that yesterday, but couldn't decide where was a
>>> good place to put it.
>
>> miscadmin.h ?
>
> Ugh :-(  I was thinking about pg_proc.h, because the variable itself is
> in pg_proc.c, but that seems pretty ugly too.  Another possibility is
> to move the variable someplace else...


I trust whatever choice you make.

cheers

andrew