Re: compiler warnings on the buildfarm - Mailing list pgsql-hackers

From Tom Lane
Subject Re: compiler warnings on the buildfarm
Date
Msg-id 8374.1184295380@sss.pgh.pa.us
Whole thread Raw
In response to Re: compiler warnings on the buildfarm  (Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>)
Responses Re: compiler warnings on the buildfarm  (Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>)
List pgsql-hackers
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> animal: lionfish            warnings: 16
> scan.l:180: warning, the character range [<80>-<FF>] is ambiguous in a
> case-insensitive scanner
> scan.l:180: warning, the character range [<80>-<FF>] is ambiguous in a
> case-insensitive scanner
> scan.l:302: warning, the character range [<80>-<FF>] is ambiguous in a
> case-insensitive scanner

This is evidently complaining about plpgsql's scan.l, which specifies
%option case-insensitive
and then defines
ident_start        [A-Za-z\200-\377_]
which is the way we do it in the main grammar too.  But I've never
seen this message in any of the flex versions I've used with PG.
(Which flex version is installed on lionfish anyway?)

I find some relevant points in the flex manual:
http://flex.sourceforge.net/manual/Patterns.html
 Character classes are expanded immediately when seen in the flex input. This means the character classes are sensitive
tothe locale in which flex is executed, and the resulting scanner will not be sensitive to the runtime locale. This may
ormay not be desirable.  Character classes with ranges, such as `[a-Z]', should be used with caution in a
case-insensitivescanner if the range spans upper or lowercase characters. Flex does not know if you want to fold all
upperand lowercase characters together, or if you want the literal numeric range specified (with no case folding). When
indoubt, flex will assume that you meant the literal numeric range, and will issue a warning. The exception to this
ruleis a character range such as `[a-z]' or `[S-W]' where it is obvious that you want case-folding to occur.
 

What I suspect is happening is that lionfish is running the buildfarm
script in a non-C locale, in which flex finds that some high-bit-set
characters are case-folded by tolower() and accordingly issues this
complaint.  Now the statements that "it assumes you meant the literal
numeric range" and that the behavior is fully determined at compile time
(ie, no run-time invocations of tolower(), as indeed are not to be seen
in pl_scan.c) seem to mean that we'll get the behavior we want anyway.
But the warning is a bit nervous-making.

I wonder if it'd be a good idea to invoke flex with a command likeLANG=C flex ...
to try to improve the odds that it sees C locale when it's figuring
out what "case insensitive" means.

Anyone want to look into it more closely?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Jeremy Drake
Date:
Subject: Re: compiler warnings on the buildfarm
Next
From: "Sibte Abbas"
Date:
Subject: Re: schema_to_xmlschema() seems a bit less than finished