Thread: AW: Re: [BUGS] Turkish locale bug

AW: Re: [BUGS] Turkish locale bug

From
Zeugswetter Andreas SB
Date:
> > Anyway, your proposal is just fine since we haven't decoupled these
> > things farther back in the server. But eventually we should hope to have
> > SQL_ASCII and other character sets enforced in context.
> 
> Now I'm confused.  Are you saying that we *should* treat identifier case
> under ASCII rules only?  That seems like a step backwards to me, but
> then I don't use any non-US locale myself...

I think we need to treat anything that is not quoted as US_ASCII,
iirc this is how Informix behaves. Users wanting locale aware identifiers
would need to double quote those, thus avoiding non ASCII case conversions
alltogether.

Andreas


Re: AW: Re: [BUGS] Turkish locale bug

From
Tom Lane
Date:
Zeugswetter Andreas SB  <ZeugswetterA@wien.spardat.at> writes:
>> Now I'm confused.  Are you saying that we *should* treat identifier case
>> under ASCII rules only?  That seems like a step backwards to me, but
>> then I don't use any non-US locale myself...

> I think we need to treat anything that is not quoted as US_ASCII,
> iirc this is how Informix behaves. Users wanting locale aware identifiers
> would need to double quote those, thus avoiding non ASCII case conversions
> alltogether.

I dug into the SQL99 spec, and I find it appears to have different rules
for identifier folding than for keyword recognition.  Section 5.2 syntax
rules 1-12 make it perfectly clear that they have an expansive idea of
what characters are allowed in identifiers (most of Unicode, it looks
like ;-)).  They also define the case-normalized form of an identifier
in terms of Unicode case translations (rule 21).  But they then say
       28) For the purposes of identifying <key word>s, any <simple Latin           lower case letter> contained in a
candidate<key word> shall           be effectively treated as the corresponding <simple Latin upper           case
letter>.

It appears to me that to implement the SQL99 rules correctly in a non-C
locale, we need to do casefolding twice.  First, casefold only 'A'..'Z'
and test to see if we have a keyword.  If not, do the casefolding again
using isupper/tolower to produce the normalized form of the identifier.

This would solve Sezai's problem without adding a special case for
Turkish, and it doesn't seem unreasonably slow.  Anyone object to it?
        regards, tom lane