Thread: BUG #2070: Encoding dependent error in comparison operators

BUG #2070: Encoding dependent error in comparison operators

From
"Jan Jockusch"
Date:
The following bug has been logged online:

Bug reference:      2070
Logged by:          Jan Jockusch
Email address:      jan@jockusch.de
PostgreSQL version: 8.1.0
Operating system:   Linux
Description:        Encoding dependent error in comparison operators
Details:

With terminal encoding Latin-1, client encoding Latin-1
and database encoding LATIN1, I do:

\l
     Name      |  Owner   | Encoding
---------------+----------+-----------
 encoding_test | postgres | LATIN1
...
encoding_test=# select 'ä' = 'ö';
 ?column?
----------
 t
(1 row)

And although the two values are quite clearly
different, the operator finds them equal.

I hope you see the different umlauts in the query
(also latin-1 encoded).

The comparison operator works OK for 7-bit ASCII values
and finds characters below 128 different from those
above 128. It finds all characters above 128 equal, though.

The bug also applies for ascii strings which are the
same except for a different umlaut at the same
position, e.g. 'Größe' = 'Grüße'. This comparison
also renders true in latin-1 scenarios.

The bug does not apply for clean UTF-8 scenarios.

I think this is a serious bug which produces surprising
and very hard to find problems. If I can be of any
assistance in diagnosing or fixing, please contact me.

Re: BUG #2070: Encoding dependent error in comparison operators

From
Tom Lane
Date:
"Jan Jockusch" <jan@jockusch.de> writes:
> With terminal encoding Latin-1, client encoding Latin-1
> and database encoding LATIN1, I do:

... and what database locale?  This sort of misbehavior is a common
symptom of having a database encoding that's not what the locale
expects.

            regards, tom lane