Thread: Turkish downcasting in PL/pgSQL

Turkish downcasting in PL/pgSQL

From
ntufar
Date:
Your name               :
Your email address      :


System Configuration
---------------------
  Architecture (example: Intel Pentium)         : Intel Pentium

  Operating System (example: Linux 2.4.18)      : Debian unstable
                          Linux 2.6.6-1-k7

  PostgreSQL version (example: PostgreSQL-8.0):  PostgreSQL-8.0 CVS HEAD

  Compiler used (example:  gcc 2.95.2)          : gcc 3.3.4


Please enter a FULL description of your problem:
------------------------------------------------

Problems with Turkish locale are widely known to developers.
Another one, now in PL/pgSQL have reared it's ugly head.
Regression tests are failing at triggers, plpgsql, copy2
and rangefuncs. Examienation of regression.diff showed that
the failures were due to unrecognised statements like
BEGIN, RAISE and IF in PL/pgSQL functions. Replacing
capital "I" with lower-case "i" (BEGiN, RAiSE, iF) completely
sloves the problem.


If you know how this problem might be fixed, list the solution below:
---------------------------------------------------------------------

Apparently problem is caused by the following directive:

     %option case-insensitive

on line 76 in file src/pl/plpgsql/src/scan.l

flex (flex version 2.5.4) incorporates case-insensitivity in it's
state tables because if I run flex stage with LANG=C everything
works fine. A quick and dirty fix could be implemented by placing

     LANG=C
     export LANG

in file src/pl/plpgsql/src/Makefile before calling flex.

A long term fix can be done by implementing a function
for keyword lookup like ScanKeywordLookup() in
src/backend/parser/keywords.c.

I would gladly prepare a patch and send it for your consideration
tomorrow morning.

Best regards,
Nicolai Tufar

Re: Turkish downcasting in PL/pgSQL

From
Tom Lane
Date:
ntufar <ntufar@pisem.net> writes:
> flex (flex version 2.5.4) incorporates case-insensitivity in it's
> state tables because if I run flex stage with LANG=C everything
> works fine.

Ick.  That is of course why it worked for me when I tested it :-(

> A quick and dirty fix could be implemented by placing
>      LANG=C
>      export LANG
> in file src/pl/plpgsql/src/Makefile before calling flex.

This is probably what we'd better do.  Otherwise we have
build-context-dependency in the system's behavior, which is bad.

Peter, any thoughts on this one way or the other?  At the moment
plpgsql's scan.l seems to be the only use of '%option case-insensitive'
but we have enough flex lexers laying about that I wouldn't be surprised
to have this same risk elsewhere.  Is it reasonable to try to force
LANG=C in some global fashion during the build?

            regards, tom lane

Re: Turkish downcasting in PL/pgSQL

From
Tom Lane
Date:
ntufar <ntufar@pisem.net> writes:
> I attached a diff of fix that adds LANG=C; before call to $(FLEX).
> Fixes the problem here but I don't know if adding environment variable
> assignment like this is appropriate. I am not too fluent in PostgreSQL
> build environment and do not know where one can put a global deffinition
> you are talking below.

Um, the attachment was unreadable :-( but I get the idea.

As for the global solution, I was wondering if it would work to put
"LANG=C" right inside the definition of $(FLEX).  That would ensure
the right behavior from all our flex builds without unnecessarily
messing up people's build environments otherwise.  I don't know however
whether this would parse properly.

            regards, tom lane

Re: Turkish downcasting in PL/pgSQL

From
ntufar
Date:
12-08-2004 Perþembe günü saat 22:27 sularýnda, Tom Lane dedi ki:
> ntufar <ntufar@pisem.net> writes:
> > I attached a diff of fix that adds LANG=C; before call to $(FLEX).
> > Fixes the problem here but I don't know if adding environment variable
> > assignment like this is appropriate. I am not too fluent in PostgreSQL
> > build environment and do not know where one can put a global deffinition
> > you are talking below.
>
> Um, the attachment was unreadable :-( but I get the idea.

Something to do with my mail provider, sorry.
in file src/pl/plpgsql/src/Makefile:
    LANG=C;$(FLEX) $(FLEXFLAGS) -Pplpgsql_base_yy -o'$@' $<
instead of
    $(FLEX) $(FLEXFLAGS) -Pplpgsql_base_yy -o'$@' $<

>
> As for the global solution, I was wondering if it would work to put
> "LANG=C" right inside the definition of $(FLEX).  That would ensure
> the right behavior from all our flex builds without unnecessarily
> messing up people's build environments otherwise.  I don't know however
> whether this would parse properly.

The only thing that comest in mind is that it may break Win32 port.
Can someone comment on this?

>
>             regards, tom lane

Regards,
Nicolai Tufar

Re: Turkish downcasting in PL/pgSQL

From
ntufar
Date:
Greetings,


12-08-2004 Perşembe günü saat 18:32 sularında, Tom Lane dedi ki:
ntufar <ntufar@pisem.net> writes:
> > flex (flex version 2.5.4) incorporates case-insensitivity in it's
> > state tables because if I run flex stage with LANG=C everything
> > works fine.
>
> Ick.  That is of course why it worked for me when I tested it :-(
>
> > A quick and dirty fix could be implemented by placing
> >      LANG=C
> >      export LANG
> > in file src/pl/plpgsql/src/Makefile before calling flex.
>
> This is probably what we'd better do.  Otherwise we have
> build-context-dependency in the system's behavior, which is bad.
>
I attached a diff of fix that adds LANG=C; before call to $(FLEX).
Fixes the problem here but I don't know if adding environment variable
assignment like this is appropriate. I am not too fluent in PostgreSQL
build environment and do not know where one can put a global deffinition
you are talking below.

Peter, any thoughts on this one way or the other?  At the moment
> plpgsql's scan.l seems to be the only use of '%option
case-insensitive'
> but we have enough flex lexers laying about that I wouldn't be
surprised
> to have this same risk elsewhere.  Is it reasonable to try to force
> LANG=C in some global fashion during the build?
>
>                       regards, tom lane
>
Best regards,
Nicolai Tufar


Attachment

Re: Turkish downcasting in PL/pgSQL

From
Peter Eisentraut
Date:
Tom Lane wrote:
> Peter, any thoughts on this one way or the other?  At the moment
> plpgsql's scan.l seems to be the only use of '%option
> case-insensitive' but we have enough flex lexers laying about that I
> wouldn't be surprised to have this same risk elsewhere.  Is it
> reasonable to try to force LANG=C in some global fashion during the
> build?

You'd have to set LC_ALL=C to be really sure to override everything.
But I would stay away from doing that globally, because all the
translation work in gcc and make would go to waste.

I would also suggest that Nicolai report this issue to the flex
developers.  It's only bound to reappear everywhere case-insensitive
flex scanners are used.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Turkish downcasting in PL/pgSQL

From
Peter Eisentraut
Date:
ntufar wrote:
> Apparently problem is caused by the following directive:
>
>      %option case-insensitive
>
> on line 76 in file src/pl/plpgsql/src/scan.l
>
> flex (flex version 2.5.4) incorporates case-insensitivity in it's
> state tables because if I run flex stage with LANG=C everything
> works fine. A quick and dirty fix could be implemented by placing
>
>      LANG=C
>      export LANG
>
> in file src/pl/plpgsql/src/Makefile before calling flex.

I have tried running flex (2.5.4) with a number of different locales
including tr_TR, but the output file is always the same.  Can you show
us a diff of the generated files?

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Turkish downcasting in PL/pgSQL

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> You'd have to set LC_ALL=C to be really sure to override everything.
> But I would stay away from doing that globally, because all the
> translation work in gcc and make would go to waste.

Agreed.  I was toying with changing the FLEX variable to contain
"LC_ALL=C flex" but I'm a bit worried about breaking the build on
some platforms (especially Windows).

> I would also suggest that Nicolai report this issue to the flex
> developers.  It's only bound to reappear everywhere case-insensitive
> flex scanners are used.

True.  Maybe we should just call it a flex bug and wait for them to
fix it.  It's not going to affect builds from tarballs anyway, only
people who build from CVS.

            regards, tom lane

Re: Turkish downcasting in PL/pgSQL

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> I have tried running flex (2.5.4) with a number of different locales
> including tr_TR, but the output file is always the same.  Can you show
> us a diff of the generated files?

Hmm ... a quick look at the flex sources shows that flex does rely on
the <ctype.h> routines for case-folding, so I have no doubt that
ntufar's report is accurate.  Maybe you used the wrong tr_TR locale?

(Just for the record, though, I can't see any change in the generated
pl_scan.c output in any of the tr_TR variants available on either HPUX
or OS X.  I don't have a full set of locales installed on my Linux
machine so I can't try it there.)

            regards, tom lane

Re: Turkish downcasting in PL/pgSQL

From
Devrim GUNDUZ
Date:
Hi,

On Thu, 12 Aug 2004, Tom Lane wrote:

> > flex (flex version 2.5.4) incorporates case-insensitivity in it's
> > state tables because if I run flex stage with LANG=C everything
> > works fine.
>
> Ick.  That is of course why it worked for me when I tested it :-(

Nicolai is on holiday now. I tested on my Fedora Core 2 and RHEL 3 ES
systems and all regression tests passed:

======================
 All 96 tests passed.
======================

I'm using the latest tr_TR locale of glibc, and flex-2.5.4a-29 (of RHEL)
and flex-2.5.4a-31 (of FC 2).

What am I missing?

Regards,
--
Devrim GUNDUZ
devrim~gunduz.org                devrim.gunduz~linux.org.tr
            http://www.tdmsoft.com
            http://www.gunduz.org

Re: Turkish downcasting in PL/pgSQL

From
Tom Lane
Date:
Devrim GUNDUZ <devrim@gunduz.org> writes:
>  All 96 tests passed.

> I'm using the latest tr_TR locale of glibc, and flex-2.5.4a-29 (of RHEL)
> and flex-2.5.4a-31 (of FC 2).

> What am I missing?

If you built from a tarball, then the flex run is already done for you.
Remove src/pl/plpgsql/src/pl_scan.c and rebuild to see if you see a
problem.

            regards, tom lane

Re: Turkish downcasting in PL/pgSQL

From
Devrim GUNDUZ
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

On Mon, 16 Aug 2004, Tom Lane wrote:

> > I'm using the latest tr_TR locale of glibc, and flex-2.5.4a-29 (of RHEL)
> > and flex-2.5.4a-31 (of FC 2).
>
> > What am I missing?
>
> If you built from a tarball, then the flex run is already done for you.
> Remove src/pl/plpgsql/src/pl_scan.c and rebuild to see if you see a
> problem.

I tried beta1 and latest CVS snapshot before sending the mail. All
produced the same result.

A few minutes before I tried on a Debian unstable, as Nicolai reported.
But all the regression tests passed again, using the latest flex +
glibc... I can't reproduce the problem :( Or there is not a bug :)

Regards,
- --
Devrim GUNDUZ
devrim~gunduz.org                devrim.gunduz~linux.org.tr
            http://www.tdmsoft.com
            http://www.gunduz.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFBIMRTtl86P3SPfQ4RAhPVAJ9g+UZRyv6SROp9TOzzp2/UsD/W4gCfSKNI
3cdoGjAZ4WLLZHXWs0Wq5Lk=
=1TsT
-----END PGP SIGNATURE-----

Re: Turkish downcasting in PL/pgSQL

From
Tom Lane
Date:
Devrim GUNDUZ <devrim@gunduz.org> writes:
> A few minutes before I tried on a Debian unstable, as Nicolai reported.
> But all the regression tests passed again, using the latest flex +
> glibc... I can't reproduce the problem :( Or there is not a bug :)

Hmph.  Either Nicolai has a weird locale setting, or he made a mistake.

We'll have to put this on hold until he gets back, I guess.  Fortunately
there's still lots of time till release.

            regards, tom lane