PostgreSQL 7.1 and bugs with locale support - Mailing list pgsql-bugs

From pgsql-bugs@postgresql.org
Subject PostgreSQL 7.1 and bugs with locale support
Date
Msg-id 200104042323.f34NN7i14830@hub.org
Whole thread Raw
List pgsql-bugs
Rob Gaszewski (graszew@poland.com) reports a bug with a severity of 2
The lower the number the more severe it is.

Short Description
PostgreSQL 7.1 and bugs with locale support

Long Description
I've discovered bugs in locale support in PostgreSQL (encoding set to UNICODE, locale set to pl_PL).

I've compiled PostgreSQL 7.RC2 with --enable-multibyte=UNICODE
--enable-unicode-conversion --enable-locale

locale settings:
LANG=pl_PL  LC_ALL=pl_PL  LC_CTYPE=pl_PL  LC_COLLATE=pl_PL  LC_MONETARY=pl_PL

I have Debian GNU/Linux 2.2 "Potato" - Intel Celeron - kernel 2.2.19
PostgreSQL compiled with gcc 2.95.2  - glibc 2.1


When I try SELECT UPPER('some_text_with_polish_national_chars'); or
SELECT LOWER('some_text_with_polish_national_chars'); I get wrong results.
But when I try upper() and lower() functions with other chars (a...z A...Z)
everything works OK.
Detailed results below.



Tests doing with polish national chars
    |----------------------------------------------------
    | char |  Hex   ||           UPPER(char)             |
    |      |        ||-----------------------------------|
 No |      |        ||  result  | should be | conclusion |
----|------|--------||----------------------|------------|
   1|   ±  | 0xc485 ||   0xc485 |   0xc484  |    WRONG   |
   2|   æ  | 0xc487 ||   0xc487 |   0xc486  |    WRONG   |
   3|   ê  | 0xc499 ||   0xc499 |   0xc498  |    WRONG   |
   4|   ³  | 0xc582 ||   0xc582 |   0xc581  |    WRONG   |
   5|   ñ  | 0xc584 ||   0xc584 |   0xc583  |    WRONG   |
   6|   ó  | 0xc3b3 ||   0xc3a3 |   0xc393  |    WRONG   |
   7|   ¶  | 0xc59b ||   0xc59b |   0xc59a  |    WRONG   |
   8|   ¼  | 0xc5ba ||   0xc5aa |   0xc5b9  |    WRONG   |
   9|   ¿  | 0xc5bc ||   0xc5ac |   0xc5bb  |    WRONG   |
    |      |        ||          |           |            |
  10|   ¡  | 0xc484 ||   0xc484 |   0xc484  |     OK     |
  11|   Æ  | 0xc486 ||   0xc486 |   0xc486  |     OK     |
  12|   Ê  | 0xc498 ||   0xc498 |   0xc498  |     OK     |
  13|   £  | 0xc581 ||   0xc581 |   0xc581  |     OK     |
  14|   Ñ  | 0xc583 ||   0xc583 |   0xc583  |     OK     |
  15|   Ó  | 0xc393 ||   0xc393 |   0xc393  |     OK     |
  16|   ¦  | 0xc59a ||   0xc59a |   0xc59a  |     OK     |
  17|   ¬  | 0xc5b9 ||   0xc5b9 |   0xc5b9  |     OK     |
  18|   ¯  | 0xc5bb ||   0xc5bb |   0xc5bb  |     OK     |
---------------------------------------------------------


    |----------------------------------------------------
    | char |  Hex   ||           LOWER(char)             |
    |      |        ||-----------------------------------|
 No |      |        ||  result  | should be | conclusion |
----|------|--------||----------------------|------------|
   1|   ±  | 0xc485 ||   0xe485 |   0xc485  |    WRONG   |
   2|   æ  | 0xc487 ||   0xe487 |   0xc487  |    WRONG   |
   3|   ê  | 0xc499 ||   0xe499 |   0xc499  |    WRONG   |
   4|   ³  | 0xc582 ||   0xe582 |   0xc582  |    WRONG   |
   5|   ñ  | 0xc584 ||   0xe584 |   0xc584  |    WRONG   |
   6|   ó  | 0xc3b3 ||   0xe3b3 |   0xc3b3  |    WRONG   |
   7|   ¶  | 0xc59b ||   0xe59b |   0xc59b  |    WRONG   |
   8|   ¼  | 0xc5ba ||   0xe5ba |   0xc5ba  |    WRONG   |
   9|   ¿  | 0xc5bc ||   0xe5bc |   0xc5bc  |    WRONG   |
    |      |        ||          |           |            |
  10|   ¡  | 0xc484 ||   0xe484 |   0xc485  |    WRONG   |
  11|   Æ  | 0xc486 ||   0xe486 |   0xc487  |    WRONG   |
  12|   Ê  | 0xc498 ||   0xe498 |   0xc499  |    WRONG   |
  13|   £  | 0xc581 ||   0xe581 |   0xc582  |    WRONG   |
  14|   Ñ  | 0xc583 ||   0xe583 |   0xc584  |    WRONG   |
  15|   Ó  | 0xc393 ||   0xe393 |   0xc3b3  |    WRONG   |
  16|   ¦  | 0xc59a ||   0xe59a |   0xc59b  |    WRONG   |
  17|   ¬  | 0xc5b9 ||   0xe5b9 |   0xc5ba  |    WRONG   |
  18|   ¯  | 0xc5bb ||   0xe5bb |   0xc5bc  |    WRONG   |
---------------------------------------------------------
Letters from 1 to 9 are small, from 10 to 18 are capital.
For example: letter 12 is capital version of letter 3



Also I've discovered that rows are sorted (ORDER BY) impropertly.

And "automatic encoding translation between backend and frontend" works
improperly. For example:
setting client encoding \encoding LATIN2 and doing a test :
SELECT upper('acelnoszx'); (these are Polish national chars, not the ASCII ones),
I keep getting the message:

utf_to_latin: could not convert UTF-8 (0xc3a3) ignored
(repeated 3x for different chars).

The letters are not converted to uppercase, either.



When I do all tests with PostgreSQL compiled only with --enable-locale, everything works good.

Unfortunately, unicode support is a must because of the i18n issues with Tcl 8.x.


Greetings,
Robert

------------------
Robert Gaszewski
graszew@poland.com

Sample Code


No file was uploaded with this report

pgsql-bugs by date:

Previous
From: Karel Zak
Date:
Subject: Re: to_char miscalculation on April Fool's Day, the start of daylight savings
Next
From: pgsql-bugs@postgresql.org
Date:
Subject: compilation error