Thread: Case Conversion Functions

Case Conversion Functions

From
Volkan YAZICI
Date:
Hi,

There're lots of places in the code which uses either pg_tolower()
or just tolower() - without aware of MB characters; or some
on-their-own implementations of pg_tolower(). (Actually, AFAIK,
whole MB case conversion is broken in -rHEAD.)

For instance, consider backend/utils/adt/{like.c, like_match.c}
file. Some lines of iwchareq() are a duplication of pg_tolower().

Another example: backend/parser/scansup.c 152 else if (ch >= 0x80 && isupper(ch)) 153     ch = tolower(ch);

Is this an intended behaviour or they're waiting for somebody to
clean them up.


Regards.


Re: Case Conversion Functions

From
Tom Lane
Date:
Volkan YAZICI <yazicivo@ttnet.net.tr> writes:
> There're lots of places in the code which uses either pg_tolower()
> or just tolower() - without aware of MB characters; or some
> on-their-own implementations of pg_tolower(). (Actually, AFAIK,
> whole MB case conversion is broken in -rHEAD.)

The upper/lower functions themselves work AFAIK, but I agree that stuff
like ILIKE probably is broken for MB encodings.  regex character classes
need help too.

> Another example:
>   backend/parser/scansup.c
>   152 else if (ch >= 0x80 && isupper(ch))
>   153     ch = tolower(ch);

Fooling with that is fairly risky --- we've been burnt before by
locale-dependent case folding of SQL identifiers.  In particular
it'd be really bad if the folding could change on-the-fly at runtime.
        regards, tom lane