tough locale bug - Mailing list pgsql-hackers

From Goran Thyni
Subject tough locale bug
Date
Msg-id 36B0DAE0.C2F9A42E@kirra.net
Whole thread Raw
In response to Re: [HACKERS] Postgres Speed or lack thereof  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I have found a bug and tried to trace the source but now I need some
pointers.

ENV: Pgsql 6.4.2 (and current) compiled with locale (using Swedish
locale)

BUG:
A small table: CREATE TABLE x(txt text);

goran=> SELECT * FROM x;
txt   
------
abc   
åäö   
Grön
Gunnar
GUNNAR
Göran 
göran 
GÖRAN 
(8 rows)

but:

goran=> select * from x WHERE txt LIKE 'G%';
txt   
------
Grön
Gunnar
GUNNAR
(3 rows)

the same goes for: "select * from x WHERE txt ~ '^G'" which is the same
as above LIKE-stmt in the backend.
Otherwise regex works correct.

To sum up:
Case sensitive regex anchored with '^' and where the first char
following
the match is a non-ascii char gives misses.

I have tracked it down to the following, 3 functions are called to test
the expression:
1. textregexeq OK for 5 instances
2. text_ge('G','G') also OK
3. text_lt('G','G\0xFF') this is not correct!

Case 3 work not work with strcoll(), in varstr_cmp().
If I change it to strcoll() to strncmp() it works as expected,
but it probably breaks sorting etc big time.

Any suggestions how to proceed?
TIA,
-- 
---------------------------------------------
Göran Thyni, JMS Bildbasen, Kiruna
This is Penguin Country. On a quiet night you can hear Windows NT
reboot!


pgsql-hackers by date:

Previous
From: Edmund Mergl
Date:
Subject: [Fwd: DBD::Pg::db do failed: pqReadData() Error]
Next
From: Bruce Momjian
Date:
Subject: Another TEMP table trick