Thread: broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ?
Hi, while I'm doing more accurate test I just want to ask if somebody test locale in 7.0.2 under FreeBSD ? the point is that I usually compile postgres with --enable-locale --enable-multibyte and never had a problem with locale. Today I decided to use only --enable-locale and found that LC_CTYPE support seems broken in 7.0.2 under FreeBSD 4.01. release. I used folowing select: select c_name from city where c_name ~* 'ал'; interesting that there are no problem under Linux ! I used the same compiler gcc version 2.95.2 19991024 (release) on both systems. One hypothesis is that gcc 2.95.2 under Linux treats 'char' as 'unsigned char' while on FreeBSD there is no such default. This could be demonstrated using test-ctype.c from src/test/locale directory. In current version there is ....... void describe_char(int c) { char cp = c, up = toupper(c), lo = tolower(c); ........... which works as expected on Linux and broken under FreeBSD (gcc 2.95.2) It's clear that we must use 'unsigned char' instead of 'char' and corrected version runs ok on both systems. That's why I suspect that gcc 2.95.2 has different default under FreeBSD which could cause problem with LC_CTYPE in 7.0.2 I didn't test current CVS under FreeBSD but probably will check it. Regards, Oleg PS. forget to mention that collation works fine _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes: > It's clear that we must use 'unsigned char' instead of 'char' > and corrected version runs ok on both systems. That's why I suspect > that gcc 2.95.2 has different default under FreeBSD which could > cause problem with LC_CTYPE in 7.0.2 > I didn't test current CVS under FreeBSD but probably will check it. I think Peter recently went around and inserted explicit casts to fix this problem. Please do see if it's fixed in CVS. regards, tom lane
Re: broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ?
From
Oleg Bartunov
Date:
On Sat, 16 Sep 2000, Tom Lane wrote: > Date: Sat, 16 Sep 2000 11:23:33 -0400 > From: Tom Lane <tgl@sss.pgh.pa.us> > To: Oleg Bartunov <oleg@sai.msu.su> > Cc: pgsql-hackers@postgresql.org, t-ishii@sra.co.jp > Subject: Re: [HACKERS] broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ? > > Oleg Bartunov <oleg@sai.msu.su> writes: > > It's clear that we must use 'unsigned char' instead of 'char' > > and corrected version runs ok on both systems. That's why I suspect > > that gcc 2.95.2 has different default under FreeBSD which could > > cause problem with LC_CTYPE in 7.0.2 > > I didn't test current CVS under FreeBSD but probably will check it. > > I think Peter recently went around and inserted explicit casts to > fix this problem. Please do see if it's fixed in CVS. ok. will check this. I've recompile 7.0.2 on freebsd with -funsigned-char and the problem has gone away. This prove my suggestion. I also checked 6.5 and it has the same probelm on FreeBSD. Also, this makes clear many complains about broken locale under FreeBSD I got from people. Hmm, current cvs has the same problem :-( Regards, Oleg > > regards, tom lane > _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes: >>>> It's clear that we must use 'unsigned char' instead of 'char' >>>> and corrected version runs ok on both systems. That's why I suspect >>>> that gcc 2.95.2 has different default under FreeBSD which could >>>> cause problem with LC_CTYPE in 7.0.2 >> >> I think Peter recently went around and inserted explicit casts to >> fix this problem. Please do see if it's fixed in CVS. > Hmm, current cvs has the same problem :-( Now that I look at it, what Peter was doing was just trying to eliminate compiler warnings on some platform or other, and he made changes like these (this example is interfaces/ecpg/preproc/pgc.l): @@ -491,7 +491,7 @@ /* this should leave the last byte set to '\0' */ strncpy(lower_text, yytext, NAMEDATALEN-1); for(i = 0; lower_text[i]; i++) - if (isascii((unsigned char)lower_text[i]) && isupper(lower_text[i])) + if (isascii((int)lower_text[i]) && isupper((int) lower_text[i])) lower_text[i] = tolower(lower_text[i]); if (i >= NAMEDATALEN) @@ -682,7 +682,7 @@ /* skip the ";" and trailing whitespace. Note that yytext contains at least one non-spacecharacter plus the ";" */ - for ( i = strlen(yytext)-2; i > 0 && isspace(yytext[i]); i-- ) {} + for ( i = strlen(yytext)-2; i > 0 && isspace((int) yytext[i]); i-- ) {} yytext[i+1] = '\0'; for ( defptr= defines; defptr != NULL && @@ -754,7 +754,7 @@ /* skip the ";" and trailing whitespace. Note that yytext contains at least one non-spacecharacter plus the ";" */ - for ( i = strlen(yytext)-2; i > 0 && isspace(yytext[i]); i-- ) {} + for ( i = strlen(yytext)-2; i > 0 && isspace((int) yytext[i]); i-- ) {} yytext[i+1] = '\0'; yyin = NULL; Peter, I suppose what you were trying to clean up is a "char used as array subscript" kind of warning? These changes wouldn't help Oleg's problem, in fact the first change in this file would have broken code that was previously not broken for him. I think that the correct coding convention is to be careful to call the <ctype.h> macros only with values that are either declared as or casted to "unsigned char". I would like to think that your compiler will not complain about if (isascii((unsigned char)lower_text[i]) ... If it does we'd have to write something as ugly as if (isascii((int)(unsigned char)lower_text[i]) ... which I can see no value in from a portability standpoint. regards, tom lane
Re: broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ?
From
Oleg Bartunov
Date:
INteresting, that I tried to find out what cause the problem just compiling backend/utils/adt/ with -funsigned-char option but this won't help. I thought (as in earlier time) locale-aware code are in this directory. The problem already exists in 6.5 release, so I'm not sure recent Peter's changes could cause the problem Regards, Oleg On Sat, 16 Sep 2000, Tom Lane wrote: > Date: Sat, 16 Sep 2000 13:21:20 -0400 > From: Tom Lane <tgl@sss.pgh.pa.us> > To: Oleg Bartunov <oleg@sai.msu.su>, Peter Eisentraut <peter_e@gmx.net> > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ? > > Oleg Bartunov <oleg@sai.msu.su> writes: > >>>> It's clear that we must use 'unsigned char' instead of 'char' > >>>> and corrected version runs ok on both systems. That's why I suspect > >>>> that gcc 2.95.2 has different default under FreeBSD which could > >>>> cause problem with LC_CTYPE in 7.0.2 > >> > >> I think Peter recently went around and inserted explicit casts to > >> fix this problem. Please do see if it's fixed in CVS. > > > Hmm, current cvs has the same problem :-( > > Now that I look at it, what Peter was doing was just trying to eliminate > compiler warnings on some platform or other, and he made changes like > these (this example is interfaces/ecpg/preproc/pgc.l): > > @@ -491,7 +491,7 @@ > /* this should leave the last byte set to '\0' */ > strncpy(lower_text, yytext, NAMEDATALEN-1); > for(i = 0; lower_text[i]; i++) > - if (isascii((unsigned char)lower_text[i]) && isupper(lower_text[i])) > + if (isascii((int)lower_text[i]) && isupper((int) lower_text[i])) > lower_text[i] = tolower(lower_text[i]); > > if (i >= NAMEDATALEN) > @@ -682,7 +682,7 @@ > > /* skip the ";" and trailing whitespace. Note that yytext contains > at least one non-space character plus the ";" */ > - for ( i = strlen(yytext)-2; i > 0 && isspace(yytext[i]); i-- ) {} > + for ( i = strlen(yytext)-2; i > 0 && isspace((int) yytext[i]); i-- ) {} > yytext[i+1] = '\0'; > > for ( defptr = defines; defptr != NULL && > @@ -754,7 +754,7 @@ > > /* skip the ";" and trailing whitespace. Note that yytext contains > at least one non-space character plus the ";" */ > - for ( i = strlen(yytext)-2; i > 0 && isspace(yytext[i]); i-- ) {} > + for ( i = strlen(yytext)-2; i > 0 && isspace((int) yytext[i]); i-- ) {} > yytext[i+1] = '\0'; > > yyin = NULL; > > Peter, I suppose what you were trying to clean up is a "char used as > array subscript" kind of warning? These changes wouldn't help Oleg's > problem, in fact the first change in this file would have broken code > that was previously not broken for him. > > I think that the correct coding convention is to be careful to call the > <ctype.h> macros only with values that are either declared as or casted > to "unsigned char". I would like to think that your compiler will not > complain about > if (isascii((unsigned char)lower_text[i]) ... > If it does we'd have to write something as ugly as > if (isascii((int)(unsigned char)lower_text[i]) ... > which I can see no value in from a portability standpoint. > > regards, tom lane > _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Re: broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ?
From
Tatsuo Ishii
Date:
> INteresting, > that I tried to find out what cause the problem just compiling > backend/utils/adt/ with -funsigned-char option but this won't help. > I thought (as in earlier time) locale-aware code are in this > directory. The problem already exists in 6.5 release, > so I'm not sure recent Peter's changes could cause the problem You might want to compile backend/regex also with -funsigned-char option. -- Tatsuo Ishii
Re: broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ?
From
Peter Eisentraut
Date:
Tom Lane writes: > - if (isascii((unsigned char)lower_text[i]) && isupper(lower_text[i])) > + if (isascii((int)lower_text[i]) && isupper((int) lower_text[i])) > Peter, I suppose what you were trying to clean up is a "char used as > array subscript" kind of warning? Yep. > I would like to think that your compiler will not complain about > if (isascii((unsigned char)lower_text[i]) ... > If it does we'd have to write something as ugly as > if (isascii((int)(unsigned char)lower_text[i]) ... > which I can see no value in from a portability standpoint. I think that the problem might rather be that lower_text (and various other arrays) are not declared as unsigned char in the first place. That would also explain why -funsigned-chars fixes it. Because calling toupper() etc. with a signed char argument is in violation of the spec. (Hmm, template/aix contains this: CFLAGS='-qchars=signed ...'. That can't be good.) -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
Peter Eisentraut <peter_e@gmx.net> writes: > I think that the problem might rather be that lower_text (and various > other arrays) are not declared as unsigned char in the first place. That > would also explain why -funsigned-chars fixes it. Because calling > toupper() etc. with a signed char argument is in violation of the spec. Well, we could fix it either by propagating use of "unsigned char" all over the place, or by casting the arguments given to ctype macros. The former would be a lot more invasive because it would propagate to routines that don't actually call any ctype macros (since they'd have to conform to prototypes, struct definitions, etc). So I'm inclined to go with the latter. A quick search-and-replace scan ought to do it. Also, on machines where the ctype macros actually are implemented as array lookups, gcc -Wall should warn about any calls we miss, so as long as someone is paying attention on such a platform, we don't have to worry about the problem sneaking back in. > (Hmm, template/aix contains this: CFLAGS='-qchars=signed ...'. That can't > be good.) Probably Andreas put that in --- maybe he still remembers why. But it shouldn't matter. We need to be able to run on platforms where char is signed and there's no handy "-funsigned-chars" compiler option. regards, tom lane
Re: broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ?
From
Oleg Bartunov
Date:
Well, with the help of Tatsuo I found the problem is in backend/regex/regcomp.c I'll look for more details and probably could make a fix. quick question: is there in sources locale-aware strncmp function or I need to write myself ? As for the compiler option I think we should'nt use any "-funsigned-chars" like options ! Regards, Oleg On Sun, 17 Sep 2000, Tom Lane wrote: > Date: Sun, 17 Sep 2000 13:48:37 -0400 > From: Tom Lane <tgl@sss.pgh.pa.us> > To: Peter Eisentraut <peter_e@gmx.net> > Cc: Oleg Bartunov <oleg@sai.msu.su>, pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ? > > Peter Eisentraut <peter_e@gmx.net> writes: > > I think that the problem might rather be that lower_text (and various > > other arrays) are not declared as unsigned char in the first place. That > > would also explain why -funsigned-chars fixes it. Because calling > > toupper() etc. with a signed char argument is in violation of the spec. > > Well, we could fix it either by propagating use of "unsigned char" all > over the place, or by casting the arguments given to ctype macros. > The former would be a lot more invasive because it would propagate to > routines that don't actually call any ctype macros (since they'd have > to conform to prototypes, struct definitions, etc). So I'm inclined to > go with the latter. A quick search-and-replace scan ought to do it. > > Also, on machines where the ctype macros actually are implemented as > array lookups, gcc -Wall should warn about any calls we miss, so as > long as someone is paying attention on such a platform, we don't have > to worry about the problem sneaking back in. > > > (Hmm, template/aix contains this: CFLAGS='-qchars=signed ...'. That can't > > be good.) > > Probably Andreas put that in --- maybe he still remembers why. But it > shouldn't matter. We need to be able to run on platforms where char is > signed and there's no handy "-funsigned-chars" compiler option. > > regards, tom lane > _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Re: broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ?
From
Oleg Bartunov
Date:
On Sun, 17 Sep 2000, Tatsuo Ishii wrote: > Date: Sun, 17 Sep 2000 11:05:42 +0900 > From: Tatsuo Ishii <t-ishii@sra.co.jp> > To: oleg@sai.msu.su > Cc: tgl@sss.pgh.pa.us, peter_e@gmx.net, pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ? > > > INteresting, > > that I tried to find out what cause the problem just compiling > > backend/utils/adt/ with -funsigned-char option but this won't help. > > I thought (as in earlier time) locale-aware code are in this > > directory. The problem already exists in 6.5 release, > > so I'm not sure recent Peter's changes could cause the problem > > You might want to compile backend/regex also with -funsigned-char > option. Thanks, backend/regex/regcomp.c cause the problem. I compiled only this file with -funsigned-char option and the problem gone away ! Also, I know that in case of --enable-multibyte I dont' have any problem, so in principle it's enough to look into #ifdef MULTIBYTE sections in backend/regex/regcomp.c and made according #ifdef USE_LOCALE sections Tatsuo, am I right and what critical sections in backend/regex/regcomp.c ? Regards, Oleg > -- > Tatsuo Ishii > _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Re: broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ?
From
Tatsuo Ishii
Date:
> backend/regex/regcomp.c cause the problem. I compiled only this file > with -funsigned-char option and the problem gone away ! > Also, I know that in case of --enable-multibyte I dont' have any problem, > so in principle it's enough to look into > #ifdef MULTIBYTE sections in backend/regex/regcomp.c > and made according > #ifdef USE_LOCALE sections > > Tatsuo, am I right and what critical sections in backend/regex/regcomp.c ? Besides the toupper etc. vs. signed char issues, there is an upper limit of char values defined in include/regex/regex2.h. For none MB installtions, it is defined as: #define OUT (CHAR_MAX+1) /* a non-character value */ where CHAR_MAX gives 255 for "char = unsigned char" platforms. This is good. However it gives 127 for "char = signed char" platforms. So if you have some none ascii letters greater than 128 on "char = unsigned char" platforms, you will lose. Changing above to: #define OUT (UCHAR_MAX+1) /* a non-character value */ might help... -- Tatsuo Ishii
Re: broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ?
From
Peter Eisentraut
Date:
Tom Lane writes: > Well, we could fix it either by propagating use of "unsigned char" all > over the place, or by casting the arguments given to ctype macros. > The former would be a lot more invasive because it would propagate to > routines that don't actually call any ctype macros (since they'd have > to conform to prototypes, struct definitions, etc). I'm not married to either solution, I just opine that it is cleaner to use "signed char" and "unsigned char" explicitly when you depend on the signed-ness. Otherwise you might just end up moving the problem elsewhere, namely those structs and prototypes, etc. > > (Hmm, template/aix contains this: CFLAGS='-qchars=signed ...'. That can't > > be good.) > > Probably Andreas put that in --- maybe he still remembers why. But it > shouldn't matter. We need to be able to run on platforms where char is > signed and there's no handy "-funsigned-chars" compiler option. What I meant was that (a) according to Oleg's report, the source depends on char being unsigned in some places, so those places break on AIX, and (b) according to the above, the source apparently requires char to be signed in some places, so it breaks when char is made unsigned. *That* can't be good. -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
Oleg Bartunov <oleg@sai.msu.su> wrote a couple months ago: >>>> It's clear that we must use 'unsigned char' instead of 'char' >>>> and corrected version runs ok on both systems. That's why I suspect >>>> that gcc 2.95.2 has different default under FreeBSD which could >>>> cause problem with LC_CTYPE in 7.0.2 > ok. will check this. I've recompile 7.0.2 on freebsd with -funsigned-char > and the problem has gone away. This prove my suggestion. I also > checked 6.5 and it has the same probelm on FreeBSD. Also, > this makes clear many complains about broken locale under FreeBSD > I got from people. > Hmm, current cvs has the same problem :-( Today I inserted (unsigned char) casts into all the <ctype.h> function calls I could find. This issue should be fixed as of current cvs. Please try it again when you have time. regards, tom lane
Re: broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ?
From
Oleg Bartunov
Date:
On Sun, 3 Dec 2000, Tom Lane wrote: > Date: Sun, 03 Dec 2000 18:13:47 -0500 > From: Tom Lane <tgl@sss.pgh.pa.us> > To: Oleg Bartunov <oleg@sai.msu.su> > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] broken locale in 7.0.2 without multibyte support (FreeBSD 4.1-RELEASE) ? > > Oleg Bartunov <oleg@sai.msu.su> wrote a couple months ago: > >>>> It's clear that we must use 'unsigned char' instead of 'char' > >>>> and corrected version runs ok on both systems. That's why I suspect > >>>> that gcc 2.95.2 has different default under FreeBSD which could > >>>> cause problem with LC_CTYPE in 7.0.2 > > > ok. will check this. I've recompile 7.0.2 on freebsd with -funsigned-char > > and the problem has gone away. This prove my suggestion. I also > > checked 6.5 and it has the same probelm on FreeBSD. Also, > > this makes clear many complains about broken locale under FreeBSD > > I got from people. > > Hmm, current cvs has the same problem :-( > > Today I inserted (unsigned char) casts into all the <ctype.h> function > calls I could find. This issue should be fixed as of current cvs. > Please try it again when you have time. > Just tried on FreeBSD 3.4-STABLE, current cvs, gcc version 2.95.2 19991024 (release), ru-RU.KOI8-R locale, postgresql configured with --enable-locale, no gcc option like --unisgned-chars Looks like your changes did right job ! regards, Oleg > regards, tom lane > _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83