Thread: BUG: ILIKE with single-byte encoding

BUG: ILIKE with single-byte encoding

From
Rolf Jentsch
Date:
Hello,

With PostgreSQL 8.3.0 the following bug has been introduced with the ILIKE =
or=20
~~* operator:

In a database with single-byte encoding as LATIN1 the expression

SELECT 'a=FC' ILIKE '%=FC';
returns false.

This error is true for every pattern, where a % is followed by a char with =
a=20
decimal value between 128 and 255.=20

I was able to track down the error to the file=20
src/backend/utils/adt/like_match.c=20

For the single-byte case there are some places where a (signed) char value =
is=20
compared to the return value auf tolower() which is an int. The '=FC' in La=
tin1=20
is -4 as signed char and 252 as int as returned by tolower() which is=20
obviously not equal.

It could be fixed, with the appended patch.

cu
Rolf Jentsch
Entwicklung Mitglieder-Systeme Dezentral

ElectronicPartner GmbH
M=FCndelheimer Weg 40
40472 D=FCsseldorf
phone: +49-(0)211-4156-0
fax:   +49-(0)211-4156-6865
eMail: rjentsch@electronicpartner.de

Sitz der Gesellschaft D=FCsseldorf
Amtsgericht - Registergericht D=FCsseldorf - HRB 4078
Gesch=E4ftsf=FChrer: Oliver Haubrich,=20
Dr. Sven-Olaf Krau=DF, Karl Trautman



--- src/backend/utils/adt/like_match.c       2008-02-28 18:19:30.000000000=
=20
+0100
+++ src/backend/utils/adt/like_match.c        2008-02-28 18:19:43.000000000=
=20
+0100
@@ -71,7 +71,7 @@
  */

 #ifdef MATCH_LOWER
-#define TCHAR(t) tolower((t))
+#define TCHAR(t) ((char)tolower((t)))
 #else
 #define TCHAR(t) (t)
 #endif

Re: BUG: ILIKE with single-byte encoding

From
Tom Lane
Date:
Rolf Jentsch <RJentsch@electronicpartner.de> writes:
> With PostgreSQL 8.3.0 the following bug has been introduced with the ILIKE or
> ~~* operator:
> In a database with single-byte encoding as LATIN1 the expression
> SELECT 'aü' ILIKE '%ü';
> returns false.

> For the single-byte case there are some places where a (signed) char
> value is compared to the return value auf tolower() which is an int.

Patch applied, thanks!  It turns out there was a second bug on the very
same line: some machines have problems if the argument of tolower()
isn't explicitly cast to unsigned char ...

            regards, tom lane