Thread: broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ?

broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ?

From
Oleg Bartunov
Date:
Hi,

while I'm doing more accurate test I just want to ask if
somebody test locale in 7.0.2 under FreeBSD ?
the point is that I usually compile postgres with 
--enable-locale --enable-multibyte and never had a problem 
with locale. Today I decided to use only --enable-locale
and found that LC_CTYPE support seems broken in 7.0.2 
under FreeBSD 4.01. release. 
I used folowing select: select c_name from city where c_name ~* 'ал';
interesting that there are no problem  under Linux !
I used the same compiler gcc version 2.95.2 19991024 (release)
on both systems. One hypothesis is that gcc 2.95.2 under Linux
treats 'char' as 'unsigned char' while on FreeBSD there is no such 
default. This could be demonstrated using test-ctype.c from
src/test/locale directory. In current version there is

.......
void
describe_char(int c)
{       char            cp = c,                               up = toupper(c),                               lo =
tolower(c);
...........

which works as expected on Linux and broken under FreeBSD (gcc 2.95.2)
It's clear that we must use 'unsigned char' instead of 'char'
and corrected version runs ok on both systems. That's why I suspect
that gcc 2.95.2 has different default under FreeBSD which could
cause problem with LC_CTYPE in 7.0.2 
I didn't test current CVS under FreeBSD but probably will check it.

Regards,
    Oleg
PS.
forget to mention that collation works fine 

_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ?

From
Tom Lane
Date:
Oleg Bartunov <oleg@sai.msu.su> writes:
> It's clear that we must use 'unsigned char' instead of 'char'
> and corrected version runs ok on both systems. That's why I suspect
> that gcc 2.95.2 has different default under FreeBSD which could
> cause problem with LC_CTYPE in 7.0.2 
> I didn't test current CVS under FreeBSD but probably will check it.

I think Peter recently went around and inserted explicit casts to
fix this problem.  Please do see if it's fixed in CVS.
        regards, tom lane


Re: broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ?

From
Oleg Bartunov
Date:
On Sat, 16 Sep 2000, Tom Lane wrote:

> Date: Sat, 16 Sep 2000 11:23:33 -0400
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: Oleg Bartunov <oleg@sai.msu.su>
> Cc: pgsql-hackers@postgresql.org, t-ishii@sra.co.jp
> Subject: Re: [HACKERS] broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ? 
> 
> Oleg Bartunov <oleg@sai.msu.su> writes:
> > It's clear that we must use 'unsigned char' instead of 'char'
> > and corrected version runs ok on both systems. That's why I suspect
> > that gcc 2.95.2 has different default under FreeBSD which could
> > cause problem with LC_CTYPE in 7.0.2 
> > I didn't test current CVS under FreeBSD but probably will check it.
> 
> I think Peter recently went around and inserted explicit casts to
> fix this problem.  Please do see if it's fixed in CVS.

ok. will check this. I've recompile 7.0.2 on freebsd with -funsigned-char
and the problem has gone away. This prove my suggestion. I also 
checked 6.5 and it has the same probelm on FreeBSD. Also,
this makes clear many complains about broken locale under FreeBSD
I got from people. 
Hmm, current cvs has the same problem :-(
Regards,    Oleg



> 
>             regards, tom lane
> 

_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83




Re: broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ?

From
Tom Lane
Date:
Oleg Bartunov <oleg@sai.msu.su> writes:
>>>> It's clear that we must use 'unsigned char' instead of 'char'
>>>> and corrected version runs ok on both systems. That's why I suspect
>>>> that gcc 2.95.2 has different default under FreeBSD which could
>>>> cause problem with LC_CTYPE in 7.0.2 
>> 
>> I think Peter recently went around and inserted explicit casts to
>> fix this problem.  Please do see if it's fixed in CVS.

> Hmm, current cvs has the same problem :-(

Now that I look at it, what Peter was doing was just trying to eliminate
compiler warnings on some platform or other, and he made changes like
these (this example is interfaces/ecpg/preproc/pgc.l):

@@ -491,7 +491,7 @@       /* this should leave the last byte set to '\0' */       strncpy(lower_text, yytext,
NAMEDATALEN-1);      for(i = 0; lower_text[i]; i++)
 
-           if (isascii((unsigned char)lower_text[i]) && isupper(lower_text[i]))
+           if (isascii((int)lower_text[i]) && isupper((int) lower_text[i]))       lower_text[i] =
tolower(lower_text[i]);      if (i >= NAMEDATALEN)
 
@@ -682,7 +682,7 @@       /* skip the ";" and trailing whitespace. Note that yytext contains          at least one
non-spacecharacter plus the ";" */
 
-       for ( i = strlen(yytext)-2; i > 0 && isspace(yytext[i]); i-- ) {}
+       for ( i = strlen(yytext)-2; i > 0 && isspace((int) yytext[i]); i-- ) {}       yytext[i+1] = '\0';       for (
defptr= defines; defptr != NULL &&
 
@@ -754,7 +754,7 @@     /* skip the ";" and trailing whitespace. Note that yytext contains        at least one
non-spacecharacter plus the ";" */
 
-     for ( i = strlen(yytext)-2; i > 0 && isspace(yytext[i]); i-- ) {}
+     for ( i = strlen(yytext)-2; i > 0 && isspace((int) yytext[i]); i-- ) {}     yytext[i+1] = '\0';     yyin = NULL;

Peter, I suppose what you were trying to clean up is a "char used as
array subscript" kind of warning?  These changes wouldn't help Oleg's
problem, in fact the first change in this file would have broken code
that was previously not broken for him.

I think that the correct coding convention is to be careful to call the
<ctype.h> macros only with values that are either declared as or casted
to "unsigned char".  I would like to think that your compiler will not
complain about         if (isascii((unsigned char)lower_text[i]) ...
If it does we'd have to write something as ugly as         if (isascii((int)(unsigned char)lower_text[i]) ...
which I can see no value in from a portability standpoint.
        regards, tom lane


Re: broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ?

From
Oleg Bartunov
Date:
INteresting, 
that I tried to find out what cause the problem just compiling
backend/utils/adt/ with -funsigned-char option but this won't help.
I thought (as in earlier time)  locale-aware code are in this 
directory. The problem already exists in 6.5 release,
so I'm not sure  recent Peter's changes could cause the problem 
Regards,    Oleg

On Sat, 16 Sep 2000, Tom Lane wrote:

> Date: Sat, 16 Sep 2000 13:21:20 -0400
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: Oleg Bartunov <oleg@sai.msu.su>, Peter Eisentraut <peter_e@gmx.net>
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ? 
> 
> Oleg Bartunov <oleg@sai.msu.su> writes:
> >>>> It's clear that we must use 'unsigned char' instead of 'char'
> >>>> and corrected version runs ok on both systems. That's why I suspect
> >>>> that gcc 2.95.2 has different default under FreeBSD which could
> >>>> cause problem with LC_CTYPE in 7.0.2 
> >> 
> >> I think Peter recently went around and inserted explicit casts to
> >> fix this problem.  Please do see if it's fixed in CVS.
> 
> > Hmm, current cvs has the same problem :-(
> 
> Now that I look at it, what Peter was doing was just trying to eliminate
> compiler warnings on some platform or other, and he made changes like
> these (this example is interfaces/ecpg/preproc/pgc.l):
> 
> @@ -491,7 +491,7 @@
>         /* this should leave the last byte set to '\0' */
>         strncpy(lower_text, yytext, NAMEDATALEN-1);
>         for(i = 0; lower_text[i]; i++)
> -           if (isascii((unsigned char)lower_text[i]) && isupper(lower_text[i]))
> +           if (isascii((int)lower_text[i]) && isupper((int) lower_text[i]))
>         lower_text[i] = tolower(lower_text[i]);
>  
>         if (i >= NAMEDATALEN)
> @@ -682,7 +682,7 @@
>  
>         /* skip the ";" and trailing whitespace. Note that yytext contains
>            at least one non-space character plus the ";" */
> -       for ( i = strlen(yytext)-2; i > 0 && isspace(yytext[i]); i-- ) {}
> +       for ( i = strlen(yytext)-2; i > 0 && isspace((int) yytext[i]); i-- ) {}
>         yytext[i+1] = '\0';
>  
>         for ( defptr = defines; defptr != NULL &&
> @@ -754,7 +754,7 @@
>  
>       /* skip the ";" and trailing whitespace. Note that yytext contains
>          at least one non-space character plus the ";" */
> -     for ( i = strlen(yytext)-2; i > 0 && isspace(yytext[i]); i-- ) {}
> +     for ( i = strlen(yytext)-2; i > 0 && isspace((int) yytext[i]); i-- ) {}
>       yytext[i+1] = '\0';
>  
>       yyin = NULL;
> 
> Peter, I suppose what you were trying to clean up is a "char used as
> array subscript" kind of warning?  These changes wouldn't help Oleg's
> problem, in fact the first change in this file would have broken code
> that was previously not broken for him.
> 
> I think that the correct coding convention is to be careful to call the
> <ctype.h> macros only with values that are either declared as or casted
> to "unsigned char".  I would like to think that your compiler will not
> complain about
>           if (isascii((unsigned char)lower_text[i]) ...
> If it does we'd have to write something as ugly as
>           if (isascii((int)(unsigned char)lower_text[i]) ...
> which I can see no value in from a portability standpoint.
> 
>             regards, tom lane
> 

_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ?

From
Tatsuo Ishii
Date:
> INteresting, 
> that I tried to find out what cause the problem just compiling
> backend/utils/adt/ with -funsigned-char option but this won't help.
> I thought (as in earlier time)  locale-aware code are in this 
> directory. The problem already exists in 6.5 release,
> so I'm not sure  recent Peter's changes could cause the problem 

You might want to compile backend/regex also with -funsigned-char
option.
--
Tatsuo Ishii


Re: broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ?

From
Peter Eisentraut
Date:
Tom Lane writes:

> -           if (isascii((unsigned char)lower_text[i]) && isupper(lower_text[i]))
> +           if (isascii((int)lower_text[i]) && isupper((int) lower_text[i]))

> Peter, I suppose what you were trying to clean up is a "char used as
> array subscript" kind of warning?

Yep.

> I would like to think that your compiler will not complain about
>           if (isascii((unsigned char)lower_text[i]) ...
> If it does we'd have to write something as ugly as
>           if (isascii((int)(unsigned char)lower_text[i]) ...
> which I can see no value in from a portability standpoint.

I think that the problem might rather be that lower_text (and various
other arrays) are not declared as unsigned char in the first place. That
would also explain why -funsigned-chars fixes it. Because calling
toupper() etc. with a signed char argument is in violation of the spec.

(Hmm, template/aix contains this: CFLAGS='-qchars=signed ...'. That can't
be good.)


-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ?

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> I think that the problem might rather be that lower_text (and various
> other arrays) are not declared as unsigned char in the first place. That
> would also explain why -funsigned-chars fixes it. Because calling
> toupper() etc. with a signed char argument is in violation of the spec.

Well, we could fix it either by propagating use of "unsigned char" all
over the place, or by casting the arguments given to ctype macros.
The former would be a lot more invasive because it would propagate to
routines that don't actually call any ctype macros (since they'd have
to conform to prototypes, struct definitions, etc).  So I'm inclined to
go with the latter.  A quick search-and-replace scan ought to do it.

Also, on machines where the ctype macros actually are implemented as
array lookups, gcc -Wall should warn about any calls we miss, so as
long as someone is paying attention on such a platform, we don't have
to worry about the problem sneaking back in.

> (Hmm, template/aix contains this: CFLAGS='-qchars=signed ...'. That can't
> be good.)

Probably Andreas put that in --- maybe he still remembers why.  But it
shouldn't matter.  We need to be able to run on platforms where char is
signed and there's no handy "-funsigned-chars" compiler option.
        regards, tom lane


Re: broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ?

From
Oleg Bartunov
Date:
Well,

with the  help of Tatsuo I found the problem is in backend/regex/regcomp.c
I'll look for more details and probably could make a fix.
quick question: is there in sources locale-aware strncmp function
or I need to write myself ?
As for the compiler option I think we should'nt use any 
"-funsigned-chars" like options !
Regards,
    Oleg
On Sun, 17 Sep 2000, Tom Lane wrote:

> Date: Sun, 17 Sep 2000 13:48:37 -0400
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: Peter Eisentraut <peter_e@gmx.net>
> Cc: Oleg Bartunov <oleg@sai.msu.su>, pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ? 
> 
> Peter Eisentraut <peter_e@gmx.net> writes:
> > I think that the problem might rather be that lower_text (and various
> > other arrays) are not declared as unsigned char in the first place. That
> > would also explain why -funsigned-chars fixes it. Because calling
> > toupper() etc. with a signed char argument is in violation of the spec.
> 
> Well, we could fix it either by propagating use of "unsigned char" all
> over the place, or by casting the arguments given to ctype macros.
> The former would be a lot more invasive because it would propagate to
> routines that don't actually call any ctype macros (since they'd have
> to conform to prototypes, struct definitions, etc).  So I'm inclined to
> go with the latter.  A quick search-and-replace scan ought to do it.
> 
> Also, on machines where the ctype macros actually are implemented as
> array lookups, gcc -Wall should warn about any calls we miss, so as
> long as someone is paying attention on such a platform, we don't have
> to worry about the problem sneaking back in.
> 
> > (Hmm, template/aix contains this: CFLAGS='-qchars=signed ...'. That can't
> > be good.)
> 
> Probably Andreas put that in --- maybe he still remembers why.  But it
> shouldn't matter.  We need to be able to run on platforms where char is
> signed and there's no handy "-funsigned-chars" compiler option.
> 
>             regards, tom lane
> 

_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83




Re: broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ?

From
Oleg Bartunov
Date:
On Sun, 17 Sep 2000, Tatsuo Ishii wrote:

> Date: Sun, 17 Sep 2000 11:05:42 +0900
> From: Tatsuo Ishii <t-ishii@sra.co.jp>
> To: oleg@sai.msu.su
> Cc: tgl@sss.pgh.pa.us, peter_e@gmx.net, pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ? 
> 
> > INteresting, 
> > that I tried to find out what cause the problem just compiling
> > backend/utils/adt/ with -funsigned-char option but this won't help.
> > I thought (as in earlier time)  locale-aware code are in this 
> > directory. The problem already exists in 6.5 release,
> > so I'm not sure  recent Peter's changes could cause the problem 
> 
> You might want to compile backend/regex also with -funsigned-char
> option.

Thanks, 

backend/regex/regcomp.c cause the problem. I compiled only this file
with -funsigned-char option and the problem gone away !
Also, I know that in case of --enable-multibyte I dont' have any problem,
so in principle it's enough to look into 
#ifdef MULTIBYTE sections in backend/regex/regcomp.c
and made according
#ifdef USE_LOCALE  sections

Tatsuo, am I right and what critical sections in backend/regex/regcomp.c ?
Regards,
    Oleg




> --
> Tatsuo Ishii
> 

_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ?

From
Tatsuo Ishii
Date:
> backend/regex/regcomp.c cause the problem. I compiled only this file
> with -funsigned-char option and the problem gone away !
> Also, I know that in case of --enable-multibyte I dont' have any problem,
> so in principle it's enough to look into 
> #ifdef MULTIBYTE sections in backend/regex/regcomp.c
> and made according
> #ifdef USE_LOCALE  sections
> 
> Tatsuo, am I right and what critical sections in backend/regex/regcomp.c ?

Besides the toupper etc. vs. signed char issues, there is an upper
limit of char values defined in include/regex/regex2.h. For none MB
installtions, it is defined as:

#define OUT          (CHAR_MAX+1)    /* a non-character value */

where CHAR_MAX gives 255 for "char = unsigned char" platforms. This is
good. However it gives 127 for "char = signed char" platforms. So if
you have some none ascii letters greater than 128 on "char = unsigned
char" platforms, you will lose.

Changing above to:

#define OUT          (UCHAR_MAX+1)    /* a non-character value */

might help...
--
Tatsuo Ishii


Re: broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ?

From
Peter Eisentraut
Date:
Tom Lane writes:

> Well, we could fix it either by propagating use of "unsigned char" all
> over the place, or by casting the arguments given to ctype macros.
> The former would be a lot more invasive because it would propagate to
> routines that don't actually call any ctype macros (since they'd have
> to conform to prototypes, struct definitions, etc).

I'm not married to either solution, I just opine that it is cleaner to use
"signed char" and "unsigned char" explicitly when you depend on the
signed-ness. Otherwise you might just end up moving the problem elsewhere,
namely those structs and prototypes, etc.

> > (Hmm, template/aix contains this: CFLAGS='-qchars=signed ...'. That can't
> > be good.)
> 
> Probably Andreas put that in --- maybe he still remembers why.  But it
> shouldn't matter.  We need to be able to run on platforms where char is
> signed and there's no handy "-funsigned-chars" compiler option.

What I meant was that

(a) according to Oleg's report, the source depends on char being unsigned
in some places, so those places break on AIX, and

(b) according to the above, the source apparently requires char to be
signed in some places, so it breaks when char is made unsigned.

*That* can't be good.


-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Oleg Bartunov <oleg@sai.msu.su> wrote a couple months ago:
>>>> It's clear that we must use 'unsigned char' instead of 'char'
>>>> and corrected version runs ok on both systems. That's why I suspect
>>>> that gcc 2.95.2 has different default under FreeBSD which could
>>>> cause problem with LC_CTYPE in 7.0.2 

> ok. will check this. I've recompile 7.0.2 on freebsd with -funsigned-char
> and the problem has gone away. This prove my suggestion. I also 
> checked 6.5 and it has the same probelm on FreeBSD. Also,
> this makes clear many complains about broken locale under FreeBSD
> I got from people. 
> Hmm, current cvs has the same problem :-(

Today I inserted (unsigned char) casts into all the <ctype.h> function
calls I could find.  This issue should be fixed as of current cvs.
Please try it again when you have time.
        regards, tom lane


Re: broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ?

From
Oleg Bartunov
Date:
On Sun, 3 Dec 2000, Tom Lane wrote:

> Date: Sun, 03 Dec 2000 18:13:47 -0500
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: Oleg Bartunov <oleg@sai.msu.su>
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ? 
> 
> Oleg Bartunov <oleg@sai.msu.su> wrote a couple months ago:
> >>>> It's clear that we must use 'unsigned char' instead of 'char'
> >>>> and corrected version runs ok on both systems. That's why I suspect
> >>>> that gcc 2.95.2 has different default under FreeBSD which could
> >>>> cause problem with LC_CTYPE in 7.0.2 
> 
> > ok. will check this. I've recompile 7.0.2 on freebsd with -funsigned-char
> > and the problem has gone away. This prove my suggestion. I also 
> > checked 6.5 and it has the same probelm on FreeBSD. Also,
> > this makes clear many complains about broken locale under FreeBSD
> > I got from people. 
> > Hmm, current cvs has the same problem :-(
> 
> Today I inserted (unsigned char) casts into all the <ctype.h> function
> calls I could find.  This issue should be fixed as of current cvs.
> Please try it again when you have time.
> 

Just tried on FreeBSD 3.4-STABLE, current cvs, gcc version 2.95.2 
19991024 (release), ru-RU.KOI8-R locale, postgresql configured with
--enable-locale, no gcc option like --unisgned-chars
Looks like your changes did right job !
regards,
    Oleg


>             regards, tom lane
> 

_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83