Thread: Re: [GENERAL] pg_dump error - LOCALIZATION PROBLEM

Re: [GENERAL] pg_dump error - LOCALIZATION PROBLEM

From
Peter Eisentraut
Date:
Tom Lane writes:

> I think our problems are worse than that: once the identifier has been
> through a locale-dependent case conversion we really have a problem
> matching it to an ASCII string.  The only real solution may be to
> require *all* keywords to be matched in the lexer, and forbid strcmp()
> matching in later phases entirely.

There are several classes of strcasecmp() misuse:

1. Using strcasecmp() on strings that are guaranteed to be lower case,
because the parser has assigned to the variable one of a finite set of
literal strings.  See CREATE SEQUENCE, commands/sequence.c for example.

2. Using strcasecmp() on strings that were parsed as keywords.  See CREATE
OPERATOR, CREATE AGGREGATE, CREATE TYPE, commands/define.c.

3. Using strcasecmp() on the values of GUC variables.

4. Using strcasecmp() for parsing configuration files or other things with
separate syntax rules.  See libpq/hba.c for reading the recode table.

For #1, strcasecmp is just a waste.

For #2, we should export parts of ScanKeywordLookup as a generic function,
perhaps "normalize_identifier", and then we can replace
   strcasecmp(var, "expected_value")

with
   strcmp(normalize_identifier(var), "expected_value")

For #3, it's not quite clear, because the string value could have been
created by an identifier or a string constant, so it's either #2 or #4.

For #4, we need some ASCII-only strcasecmp version.

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter



Re: [GENERAL] pg_dump error - LOCALIZATION PROBLEM

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> 2. Using strcasecmp() on strings that were parsed as keywords.  See CREATE
> OPERATOR, CREATE AGGREGATE, CREATE TYPE, commands/define.c.

But the real point is that they were parsed as identifiers, *not*
keywords, and therefore have already been through a locale-dependent
case conversion.  (Look at what happens in scan.l after
ScanKeywordLookup fails.)  Unless we can undo or short-circuit that,
it won't help to apply a correct ASCII-only comparison.

Possibly we should change the parser's Ident node type to carry both the
raw string and the downcased-as-identifier string.  The latter would
serve the existing needs, the former could be used for keyword matching.

> For #2, we should export parts of ScanKeywordLookup as a generic function,
> perhaps "normalize_identifier", ...
> For #4, we need some ASCII-only strcasecmp version.

I think these are the same thing.
        regards, tom lane